Approximating the risk score for disease diagnosis using MARS

Abstract:
In disease screening and diagnosis, often multiple markers are measured and combined to improve the accuracy of diagnosis. McIntosh and Pepe [Combining several screening tests: optimality of the risk score, Biometrics 58 (2002), pp. 657-664] showed that the risk score, defined as the probability of disease conditional on multiple markers, is the optimal function for classification based on the Neyman-Pearson lemma. They proposed a two-step procedure to approximate the risk score. However, the resulting receiver operating characteristic (ROC) curve is only defined in a subrange (L, h) of false-positive rates in (0,1) and the determination of the lower limit L needs extra prior information. In practice, most diagnostic tests are not perfect, and it is usually rare that a single marker is uniformly better than the other tests. Using simulation, I show that multivariate adaptive regression spline is a useful tool to approximate the risk score when combining multiple markers, especially when ROC curves from multiple tests cross. The resulting ROC is defined in the whole range of (0,1) and is easy to implement and has intuitive interpretation. The sample code of the application is shown in the appendix.