The effects of linkage disequilibrium in large scale SNP datasets for MDR.

Grady BJ, Torstenson ES, Ritchie MD - BioData Min (2011)

Bottom Line:
In this study, we examined the effect of LD on the sensitivity of the Multifactor Dimensionality Reduction (MDR) software package.Higher levels of LD begin to confound the MDR algorithm and lead to a drop in sensitivity with respect to the identification of a direct association; it does not, however, affect the ability to detect indirect association.As such, the results of MDR analysis in datasets with LD should be carefully examined to consider the underlying LD structure of the dataset.

Background: In the analysis of large-scale genomic datasets, an important consideration is the power of analytical methods to identify accurate predictive models of disease. When trying to assess sensitivity from such analytical methods, a confounding factor up to this point has been the presence of linkage disequilibrium (LD). In this study, we examined the effect of LD on the sensitivity of the Multifactor Dimensionality Reduction (MDR) software package.

Results: Four relative amounts of LD were simulated in multiple one- and two-locus scenarios for which the position of the functional SNP(s) within LD blocks varied. Simulated data was analyzed with MDR to determine the sensitivity of the method in different contexts, where the sensitivity of the method was gauged as the number of times out of 100 that the method identifies the correct one- or two-locus model as the best overall model. As the amount of LD increases, the sensitivity of MDR to detect the correct functional SNP drops but the sensitivity to detect the disease signal and find an indirect association increases.

Conclusions: Higher levels of LD begin to confound the MDR algorithm and lead to a drop in sensitivity with respect to the identification of a direct association; it does not, however, affect the ability to detect indirect association. Careful examination of the solution models generated by MDR reveals that MDR can identify loci in the correct LD block; though it is not always the functional SNP. As such, the results of MDR analysis in datasets with LD should be carefully examined to consider the underlying LD structure of the dataset.

Figure 6: D' and r2 for one-locus models chosen by MDR. D' and r2 between models selected by MDR and the functional locus in cases where MDR picked a one-locus model. Points with no transparency indicate a count of at least 20 models.

Mentions:
In drawing conclusions from the research presented in this paper, we wish to make recommendations about the future use of MDR in performing gene-gene interaction analysis in data with significant amounts of LD among the SNPs. We propose that the linkage disequilibrium structure surrounding MDR results should be carefully considered before undertaking a follow-up study. This includes both patterns of D' and r2. It has been discovered that MDR will sometimes select SNPs which possess a high D' with the functional SNP even if the r2 would not be considered to be of significant strength (Figure 6, 7). Figure 6 shows the r2 and D' of the single embedded disease SNP with all single-locus best models chosen during MDR analysis. Figure 7 displays the r2 and D' of both functional SNPs participating in epistatic interactions with each of the SNPs selected by MDR only when the best model chosen was a two-locus model. SNPs selected with disproportionate D' and r2 values tend to have a minor allele frequency (MAF) within 0.1 of the functional locus but the low resulting r2 might preclude the follow-up of the functional locus these SNPs are tagging. This scenario is seen much more often as the size of the genetic effect increases. In such a case, MDR has a higher probability of selecting tagging SNPs with lower r2 LD values. We recommend as a result that it might be wise to consider the area around the significant SNPs found which have high D' as well as those areas with high r2, at least for the purposes of replication. In addition, it might be useful to pick a tagging SNP for regions with extremely high levels of LD (r2 > 0.90) when performing gene-gene interaction analysis with MDR, as this high level of LD could result in different best models being chosen in separate cross-validation intervals during the MDR process. This would in turn lower the cross-validation consistency of each of the models containing loci tagging the disease signal and reduce the detection sensitivity from MDR. While the current large-scale genetic association studies benefit from segments of linkage disequilibrium across the genome, it is important during quality control and analysis to consider possible detrimental effects of this disequilibrium.

Figure 6: D' and r2 for one-locus models chosen by MDR. D' and r2 between models selected by MDR and the functional locus in cases where MDR picked a one-locus model. Points with no transparency indicate a count of at least 20 models.

Mentions:
In drawing conclusions from the research presented in this paper, we wish to make recommendations about the future use of MDR in performing gene-gene interaction analysis in data with significant amounts of LD among the SNPs. We propose that the linkage disequilibrium structure surrounding MDR results should be carefully considered before undertaking a follow-up study. This includes both patterns of D' and r2. It has been discovered that MDR will sometimes select SNPs which possess a high D' with the functional SNP even if the r2 would not be considered to be of significant strength (Figure 6, 7). Figure 6 shows the r2 and D' of the single embedded disease SNP with all single-locus best models chosen during MDR analysis. Figure 7 displays the r2 and D' of both functional SNPs participating in epistatic interactions with each of the SNPs selected by MDR only when the best model chosen was a two-locus model. SNPs selected with disproportionate D' and r2 values tend to have a minor allele frequency (MAF) within 0.1 of the functional locus but the low resulting r2 might preclude the follow-up of the functional locus these SNPs are tagging. This scenario is seen much more often as the size of the genetic effect increases. In such a case, MDR has a higher probability of selecting tagging SNPs with lower r2 LD values. We recommend as a result that it might be wise to consider the area around the significant SNPs found which have high D' as well as those areas with high r2, at least for the purposes of replication. In addition, it might be useful to pick a tagging SNP for regions with extremely high levels of LD (r2 > 0.90) when performing gene-gene interaction analysis with MDR, as this high level of LD could result in different best models being chosen in separate cross-validation intervals during the MDR process. This would in turn lower the cross-validation consistency of each of the models containing loci tagging the disease signal and reduce the detection sensitivity from MDR. While the current large-scale genetic association studies benefit from segments of linkage disequilibrium across the genome, it is important during quality control and analysis to consider possible detrimental effects of this disequilibrium.

Bottom Line:
In this study, we examined the effect of LD on the sensitivity of the Multifactor Dimensionality Reduction (MDR) software package.Higher levels of LD begin to confound the MDR algorithm and lead to a drop in sensitivity with respect to the identification of a direct association; it does not, however, affect the ability to detect indirect association.As such, the results of MDR analysis in datasets with LD should be carefully examined to consider the underlying LD structure of the dataset.

Background: In the analysis of large-scale genomic datasets, an important consideration is the power of analytical methods to identify accurate predictive models of disease. When trying to assess sensitivity from such analytical methods, a confounding factor up to this point has been the presence of linkage disequilibrium (LD). In this study, we examined the effect of LD on the sensitivity of the Multifactor Dimensionality Reduction (MDR) software package.

Results: Four relative amounts of LD were simulated in multiple one- and two-locus scenarios for which the position of the functional SNP(s) within LD blocks varied. Simulated data was analyzed with MDR to determine the sensitivity of the method in different contexts, where the sensitivity of the method was gauged as the number of times out of 100 that the method identifies the correct one- or two-locus model as the best overall model. As the amount of LD increases, the sensitivity of MDR to detect the correct functional SNP drops but the sensitivity to detect the disease signal and find an indirect association increases.

Conclusions: Higher levels of LD begin to confound the MDR algorithm and lead to a drop in sensitivity with respect to the identification of a direct association; it does not, however, affect the ability to detect indirect association. Careful examination of the solution models generated by MDR reveals that MDR can identify loci in the correct LD block; though it is not always the functional SNP. As such, the results of MDR analysis in datasets with LD should be carefully examined to consider the underlying LD structure of the dataset.