Abstract

Early detection strategies for lung cancer may be improved by using valid risk prediction models to identify persons at highest risk for the disease. However, external validation of lung cancer risk prediction models has been limited. We sought to externally validate the PLCOM2012 model, which predicts the probability of lung cancer within six years on the basis of age, race, education, body mass index, chronic obstructive pulmonary disease, personal history of cancer, family history of lung cancer, and smoking status, quantity, duration, and quit years, in the Kaiser Permanente Northern California (KPNC) Research Program on Genes, Environment, and Health (RPGEH) cohort. To increase comparability to the populations of smokers used to initially develop and validate the PLCOM2012 model, we restricted our analysis to the 28,757 ever smokers ages 55 to 74 with no history of lung cancer, no history of other non-melanoma skin cancers in the prior five years, and complete data on all model predictors. For each person, the predicted probability of lung cancer risk was estimated with data ascertained from the RPGEH survey on all predictors except quit years, which was ascertained from electronic health records. Using KPNC Cancer Registry data, we identified 672 diagnosed with lung cancer within six years post-survey. Both calibration and discrimination were examined to assess model performance. Calibration was assessed by determining the mean absolute difference in observed and predicted probabilities of lung cancer for each decile of predicted risk. Discrimination was assessed by estimating the area under curve (AUC). The absolute difference in observed and predicted probabilities of lung cancer risk was generally small: <0.010 and <0.035 in half and 90% of the analytic cohort, respectively. Although the mean absolute difference in observed and predicted probabilities of lung cancer risk was smallest for the lowest decile (0.003; observed probability: 0.003) and largest for the highest decile (0.049; observed probability: 0.08), it did not increase monotonically with each increasing decile of predicted risk. Discrimination was modest with an AUC of 0.73 (95% confidence interval: 0.71, 0.75). In a large, independent population of smokers, we found the PLCOM2012 model did not perform as well as demonstrated previously.