Abstract

OBJECTIVE:

To assess the ability of regression tree boosting to risk-adjust health care cost predictions, using diagnostic groups and demographic variables as inputs. Systems for risk-adjusting health care cost, described in the literature, have consistently employed deterministic models to account for interactions among diagnostic groups, simplifying their statistical representation, but sacrificing potentially useful information. An alternative is to use a statistical learning algorithm such as regression tree boosting that systematically searches the data for consequential interactions, which it automatically incorporates into a risk-adjustment model that is customized to the population under study.

DATA SOURCE:

STUDY DESIGN:

The Agency for Healthcare Research and Quality's Clinical Classification Software (CCS) was used to sort 2001 diagnoses into 260 diagnosis categories (DCs). For each plan type (indemnity, PPO, and POS), boosted regression trees and main effects linear models were fitted to predict concurrent (2001) and prospective (2002) total health care cost per patient, given DCs and demographic variables.

CONCLUSIONS:

The combination of regression tree boosting and a diagnostic grouping scheme, such as CCS, represents a competitive alternative to risk-adjustment systems that use complex deterministic models to account for interactions among diagnostic groups.

Relative Importance of Independent Variables for Predicting 2001 and 2002 Total Health Care Cost for Preferred Provider Organization Enrollees Based on Final Boosted Regression Trees Models.Independent variables are listed in order of importance for predicting 2001 cost. Only the 44 variables for which 2001 relative importance is largest are shown. Region of residence is for the corresponding year. (A figure displaying relative importance of all 264 independent variables can be found in “Supplementary Material.”)