Assessing the impact of population stratification on association studies of rare variation.

Abstract

AIMS:

The study of rare variants, which can potentially explain a great proportion of heritability, has emerged as an important topic in human gene mapping of complex diseases. Although several statistical methods have been developed to increase the power to detect disease-related rare variants, none of these methods address an important issue that often arises in genetic studies: false positives due to population stratification. Using simulations, we investigated the impact of population stratification on false-positive rates of rare-variant association tests.

METHODS:

We simulated a series of case-control studies assuming various sample sizes and levels of population structure. Using such data, we examined the impact of population stratification on rare-variant collapsing and burden tests of rare variation. We further evaluated the ability of 2 existing methods (principal component analysis and genomic control) to correct for stratification in such rare-variant studies.

RESULTS:

We found that population stratification can have a significant influence on studies of rare variants especially when the sample size is large and the population is severely stratified. Our results showed that principal component analysis performed quite well in most situations, while genomic control often yielded conservative results.

CONCLUSIONS:

Our results imply that researchers need to carefully match cases and controls on ancestry in order to avoid false positives caused by population structure in studies of rare variants, particularly if genome-wide data are not available.

Type 1 error rate for the CMC and burden tests uncorrected or corrected by principal component and genomic control for a 10kb region. 1-A: 500 cases and 500 controls collapsed by the CMC method; 1-B: 500 cases and 500 controls collapsed by the burden test; 1-C: 1000 case and 1000 control collapsed by the CMC method; 1-D: 1000 cases and 1000 controls collapsed by the burden test. Note that in all simulations, 50% of controls have African ancestry and 50% have European ancestry, while the proportion of cases with African ancestry varies across simulations (X-axis).

The power of rare variants sequencing studies subject to population stratification, 2-A: θ (the odds ratio of the disease risk between African and European subjects)=1, collapsed by the CMC method; 2-B: θ =4 collapsed by the CMC method; 2-C: θ =1, collapsed by the burden test; 2-D: θ =4 collapsed by the burden test.