This article describes the development of a predictive model to help estimate disease prognosis, using a number of clinical variables. An accompanying "Perspective" piece gives a good overview of how such models can be used in clinical practice.

Summary

Background

Survival with primary cutaneous melanoma is routinely predicted based on tumor thickness, but this method is imperfect. The authors previously described a six-variable model that was a better predictor of survival but which used a number of pathologic variables frequently not available to the clinician. In this paper, they describe the development of a model that incorporates data routinely available, thus with wider clinical applicability.

Methods

The outcome of interest was 10-year survival after surgical treatment of primary melanoma. Patients were classified as either alive at 10 years (with or without melanoma) or dead before 10 years from melanoma (patients dying before 10 years follow-up from non-melanoma causes were excluded).

Subjects. 624 patients with primary melanoma were evaluated by the Pigmented Lesion Group at the University of Pennsylvania between September 1, 1972 and December 31, 1979. Of these, 136 were excluded, most because of death from non-melanoma cause before 10 years of follow-up (44), lack of follow-up (32), metastatic disease at the outset (29). The remaining 488 patients were followed for at least 10 years or until death from melanoma.

Variables. A review of 100 pathology reports was performed to determine which pathologic variables were most commonly available. This information was combined with prior knowledge of prognostic factors to determine the variables to be studied. These variables were age, sex, site of the primary lesion (divided into axial and extremity), tumor thickness, histologic subtype and Clark level of invasion.

Analysis. The six variables selected were first analyzed with univariate regression for correlation with survival. Those that were correlated were then included in a multivariate regression analysis. A model was developped to predict the probability of ten year survival based on the results of that analysis. Finally, this model was evaluated against the data used to generate it and then with a new sample of 142 patients (gathered between 1980 and 1981).

Evaluation of the model. After the model was developed, it was evaluated by several criteria.

The four-variable model was compared to the thickness-only model, using an "area under the receiver-operating curve method" (see below for a description of this concept). For the four-variable method, the area under the curve was 0.8742, better than the 0.8225 obtained for the tumor thickness alone model.

The model was also compared to the thickness alone model by calculating the percentage of patients that would have been correctly predicted by each model. For each model, a survival probability of over 70% was considered to predict survival, a probability of under 30% was considered to predict death and anything in between was considered indeterminate. Using this approach, the four variable model correctly predicted outcome in 74% of cases (vs. 68% for the tumor thickness alone model), incorectly predicted outcome in 8% (vs. 9%), and was in the gray zone in 18% (vs. 22%).

The model was tested against its original data to see how well the model fits the data (using the Homer-Lemeshow test, a so-called goodness of fit test) which yielded a good fit.

Finally, the model was applied to a set of data which had not been used to develop it. This test was applied to data obtained between 1980 and 1981 (in a fashion similar to the original data) and it again outperformed the thickness alone model.

Perspective -- Predicting clinical states in individual patients

Leonard E. Braitman, PhD, and Frank Davidoff, MD

This article reviews the basic concepts behind probability modelling and its application to clinical medicine. Before such a model can be applied to a given patient, two basic issues must be adressed. First, how good is the model in general? Second, how applicable is the model to the specific patient.

The first issue involves the quality of the model. How well does the model "fit" the patients that were used to develop it? How does it compare to other models? Was it tested on patients other than those used to develop it?

The second issue requires knowledge of the clinical situation, of the specific patient. Is the patient sufficiently similar to those for whom the model was developed? Does the model provide the type of information we are looking for?

In the Perspective piece, the authors deal with these issues in detail, adding much insight to the process. I would strongly recommend reading this piece.

Comment

Although the model presented here appears to be more precise than the tumor-thickness-alone model, the advantage is surprisingly small. This may be a case where statistically significant does not mean clinically very significant.

The authors state that their model adds the most information for patients for whom the thickness-alone model predicts an intermediate probability of survival, and where their model yields either a higher or lower estimate (thickness alone model yields a survival probability of 0.59; their model yields probabilities ranging from 0.24 to 0.89). They do not indicate how accurate their model is for these intermediate cases, however. Without knowing this, we cannot judge whether or not the four-variable model is really more precise in this subgroup of patients.

Prognostic models such as the one described here can help clinicians and patients make more informed choices about approaches to therapy. In addition, such models can assist in assessing the results of newer therapies, by comparing predicted survival to achieved survival. In randomized trials, they can help insure that groups are prognostically "comparable". With the increasing emphasis on hard data in health care, prognostic models will be used to generate numbers for various purposes (risk-adjustment in capitated systems, for example). When applying probabilistic models to individual patients, the various considerations detailed in the "Perspective" article should be kept in mind.

A note on receiver operating curves

Most diagnostic tests involve criteria that can be varied. As these criteria are varied, there is usually a tradeoff between sensitivity and specificity. An example will make this clearer. If fasting blood sugar is used as a test for diabetes and if the cutoff for abnormal is set at 200 mg/dl, the test will be highly specific (very few false positives), but sensitivity will be very poor (many diabetics will be missed, there will be a lot of false negatives). As the cutoff point is gradually lowered, the specificity will decrease while the sensitivity increases. At a cutoff of 100, the test will be extremely sensitive (picking up nearly all diabetics) but specificity will be terrible. For each possible cutoff point, we can determine the test's specificity and sensitivity. We can then plot sensitivity versus specificity (actually, what is usually plotted is sensitivity vs. 1-specificity). This sensitivity vs. specificity curve is a receiver operating curve. The less of a tradeoff there is between sensitivity and specificity, the better the test is. This is tantamount to saying: the closer the area under the ROC is to one, the better the test is.

The figure below represents two typical ROC curves. Curve B represents less of a tradeoff between the sensitivity and the specificity than curve A, has an area under the curve closer to one, and is the "better" test.