Imputation without doing imputation: a new method for the detection of non-genotyped causal variants.

Howey R, Cordell HJ - Genet. Epidemiol. (2014)

Bottom Line:
This observation motivates popular but computationally intensive approaches based on imputation or haplotyping.These two SNPs are used as predictors in linear or logistic regression analysis to generate a final significance test.Previous analysis showed that fine-scale sequencing of a Gambian reference panel in the region of the known causal locus, followed by imputation, increased the signal of association to genome-wide significance levels.

fig02: Results from a single power replicate (top plot) and a single type I error replicate (bottom plot) of Scenario 1. Gray crosses show the results obtained from imputation (Imp), black dots show the results obtained from AI and black crosses show the results obtained from single-SNP logistic regression (LR) analysis in PLINK.

Mentions:
Interestingly, Figure 1 shows that, with imputation, the proportion of replicates showing type I errors is much larger than with single-SNP logistic regression, haplotype analysis, and AI. Thus, although the left hand panels of Figure 1 suggest that imputation has generally slightly higher detection power than AI, this higher detection power is achieved at the expense of a higher number (although not necessarily a higher rate) of type I errors. We attribute this phenomenon to the larger multiple testing burden that is incurred when carrying out imputation, on account of larger number of SNPs that are imputed (and therefore tested). To illustrate this phenomenon, we present in Figure 2 the results seen within a single simulation replicate of power (top plot) or type I error (bottom plot). In the power replicate, both imputation and AI are successful in detecting the simulated disease SNP. However, in both the power replicate and the type I error replicate, the much larger number of tests performed when using imputation results in a much higher likelihood of observing a false detection (at any given significance level) within any region, compared to single-SNP analysis, or AI, even though we anticipate that the nominal (i.e., per-SNP) type I error rate for all three methods should be the same.

fig02: Results from a single power replicate (top plot) and a single type I error replicate (bottom plot) of Scenario 1. Gray crosses show the results obtained from imputation (Imp), black dots show the results obtained from AI and black crosses show the results obtained from single-SNP logistic regression (LR) analysis in PLINK.

Mentions:
Interestingly, Figure 1 shows that, with imputation, the proportion of replicates showing type I errors is much larger than with single-SNP logistic regression, haplotype analysis, and AI. Thus, although the left hand panels of Figure 1 suggest that imputation has generally slightly higher detection power than AI, this higher detection power is achieved at the expense of a higher number (although not necessarily a higher rate) of type I errors. We attribute this phenomenon to the larger multiple testing burden that is incurred when carrying out imputation, on account of larger number of SNPs that are imputed (and therefore tested). To illustrate this phenomenon, we present in Figure 2 the results seen within a single simulation replicate of power (top plot) or type I error (bottom plot). In the power replicate, both imputation and AI are successful in detecting the simulated disease SNP. However, in both the power replicate and the type I error replicate, the much larger number of tests performed when using imputation results in a much higher likelihood of observing a false detection (at any given significance level) within any region, compared to single-SNP analysis, or AI, even though we anticipate that the nominal (i.e., per-SNP) type I error rate for all three methods should be the same.

Bottom Line:
This observation motivates popular but computationally intensive approaches based on imputation or haplotyping.These two SNPs are used as predictors in linear or logistic regression analysis to generate a final significance test.Previous analysis showed that fine-scale sequencing of a Gambian reference panel in the region of the known causal locus, followed by imputation, increased the signal of association to genome-wide significance levels.