"Increasingly, the diagnostic codes from administrative claims data are being used to measure clinical outcomes," Dr. Bruce M. Psaty from the University of Washington, Seattle, said by email. "The methods of using only claims data as outcomes nonetheless influence the results. Methods that seek to avoid misclassification tend to underestimate event rates, and methods that attempt to include all events tend to include misclassified events such as non-event hospitalizations as part of the outcome."

Dr. Psaty's team used data from the Cardiovascular Health Study (CHS) to evaluate the degree of both misclassification and underestimation of event rates for cardiovascular disease outcomes identified solely from claims data compared with those identified through active surveillance.

An ICD9 code of 410 in the first position had a 90.6% positive predictive value (PPV) for MI, but this code only identified 53.8% of the incident MIs ascertained by active surveillance. Inclusion of this code as a secondary diagnosis identified an additional 16.6% of MI events.

Similarly, main stroke codes in the first position had an 80.4% PPV for stroke, but identified only 63.8% of the incident stroke events. For heart failure, main diagnostic codes had a high PPV of 93.2%, but identified only 27.2% of heart failure events.

Estimates of disease incidence differed markedly according to whether the incidence rates were determined by CHS surveillance, a first-position diagnostic code, or a diagnostic code in any position.

In general, misclassified events in the administrative claims data appeared to have little effect on the magnitude of associations for most cardiovascular disease risk factors, the researchers report in Circulation, online Nov. 4.

"No study is perfect, and some incomplete identification of events in a study is common," Dr. Psaty explained. "The effect depends on both the type of study and the degree of incompleteness. In a clinical trial, if there is no bias, the relative-risk comparison between the two groups remains valid even if some of the events were missed. If, on the other hand, the goal is the development of a model for the prediction of event rates to decide on whether to start a cholesterol lowering therapy, the incomplete identification of events introduces a bias in the model that is directly related to the degree of incompleteness in the identification of events."

"Events data collection should be appropriate for the study," Dr. Psaty concluded. "Published studies need to provide sufficient detail so that readers can judge whether the methods were indeed fit for purpose."