BackgroundWe plan to conduct a case-control study to investigate whether exposure to nitrogen dioxide NO2 increases the risk of stroke. In case-control studies, selective participation can lead to bias and loss of efficiency. A two-phase design can reduce bias and improve efficiency by combining information on the non-participating subjects with information from the participating subjects. In our planned study, we will have access to individual disease status and data on NO2 exposure on group area level for a large population sample of Scania, southern Sweden. A smaller sub-sample will be selected to the second phase for individual-level assessment on exposure and covariables. In this paper, we simulate a case-control study based on our planned study. We develop a two-phase method for this study and compare the performance of our method with the performance of other two-phase methods.

MethodsA two-phase case-control study was simulated with a varying number of first- and second-phase subjects. Estimation methods: Method 1: Effect estimation with second-phase data only. Method 2: Effect estimation by adjusting the first-phase estimate with the difference between the adjusted and unadjusted second-phase estimate. The first-phase estimate is based on individual disease status and residential address for all study subjects that are linked to register data on NO2-exposure for each geographical area. Method 3: Effect estimation by using the expectation-maximization EM algorithm without taking area-level register data on exposure into account. Method 4: Effect estimation by using the EM algorithm and incorporating group-level register data on NO2-exposure.

ResultsThe simulated scenarios were such that, unbiased or marginally biased < 7% odds ratio OR estimates were obtained with all methods. The efficiencies of method 4, are generally higher than those of methods 1 and 2. The standard errors in method 4 decreased further when the case-control ratio is above one in the second phase. For all methods, the standard errors do not become substantially reduced when the number of first-phase controls is increased.

ConclusionIn the setting described here, method 4 had the best performance in order to improve efficiency, while adjusting for varying participation rates across areas.

List of abbreviationsEMExpectation-Maximization

GISGeographical Information Systems

MLMaximum-Likelihood

NO2Nitrogen dioxide

OROdds ratio

Electronic supplementary materialThe online version of this article doi:10.1186-1476-069X-6-34 contains supplementary material, which is available to authorized users.