Bottom Line:
Here we develop and evaluate a Bayesian hierarchical regression method that incorporates prior information on the likelihood of variant causality through weighting of variant effects.By simulation studies using both simulated and real sequence variants, we compared a standard single variant test for analyzing variant-disease association with the proposed method using different weighting schemes.We found that by leveraging linkage disequilibrium of variants with known GWAS signals and sequence conservation (phastCons), the proposed method provides a powerful approach for detecting causal variants while controlling false positives.

Affiliation: Center for Human Genome Variation, Duke University School of Medicine, Durham, North Carolina, United States of America. n.long@duke.edu

ABSTRACTAlthough many methods are available to test sequence variants for association with complex diseases and traits, methods that specifically seek to identify causal variants are less developed. Here we develop and evaluate a Bayesian hierarchical regression method that incorporates prior information on the likelihood of variant causality through weighting of variant effects. By simulation studies using both simulated and real sequence variants, we compared a standard single variant test for analyzing variant-disease association with the proposed method using different weighting schemes. We found that by leveraging linkage disequilibrium of variants with known GWAS signals and sequence conservation (phastCons), the proposed method provides a powerful approach for detecting causal variants while controlling false positives.

pcbi-1003093-g003: Causal variant detection in the exome sequencing data analysis.(A): NOD2 data; (B): ITPA data. The two top panels are from one replicate of the simulation. For single variant test, SNP effect size was represented by −log10 of p value from logistic regression model; for Bayesian liability model, it was represented by the standardized effect estimated at each SNP. Red dots indicate two causal variants (see Table 1 for more information). Blue vertical bars show values of SNP weights (r × phastCons). The horizontal dashed line indicates effect size at the significance threshold (permutation p value = 0.01). The bottom panel shows proportion of simulations where a variant was detected (i.e., significant at permutation p = 0.01 level). Causal variants are marked in red color.

Mentions:
In Figure 3 (A) and (B), we first show a Manhattan plot from single variant test and one from Bayesian liability model with r×phastCons weight, based on one representative example out of the 100 simulated data sets. We then summarize results from both methods by displaying for each candidate variant the proportion of the 100 simulations where it was detected (i.e., being declared as significant).

pcbi-1003093-g003: Causal variant detection in the exome sequencing data analysis.(A): NOD2 data; (B): ITPA data. The two top panels are from one replicate of the simulation. For single variant test, SNP effect size was represented by −log10 of p value from logistic regression model; for Bayesian liability model, it was represented by the standardized effect estimated at each SNP. Red dots indicate two causal variants (see Table 1 for more information). Blue vertical bars show values of SNP weights (r × phastCons). The horizontal dashed line indicates effect size at the significance threshold (permutation p value = 0.01). The bottom panel shows proportion of simulations where a variant was detected (i.e., significant at permutation p = 0.01 level). Causal variants are marked in red color.

Mentions:
In Figure 3 (A) and (B), we first show a Manhattan plot from single variant test and one from Bayesian liability model with r×phastCons weight, based on one representative example out of the 100 simulated data sets. We then summarize results from both methods by displaying for each candidate variant the proportion of the 100 simulations where it was detected (i.e., being declared as significant).

Bottom Line:
Here we develop and evaluate a Bayesian hierarchical regression method that incorporates prior information on the likelihood of variant causality through weighting of variant effects.By simulation studies using both simulated and real sequence variants, we compared a standard single variant test for analyzing variant-disease association with the proposed method using different weighting schemes.We found that by leveraging linkage disequilibrium of variants with known GWAS signals and sequence conservation (phastCons), the proposed method provides a powerful approach for detecting causal variants while controlling false positives.

Affiliation:
Center for Human Genome Variation, Duke University School of Medicine, Durham, North Carolina, United States of America. n.long@duke.edu

ABSTRACTAlthough many methods are available to test sequence variants for association with complex diseases and traits, methods that specifically seek to identify causal variants are less developed. Here we develop and evaluate a Bayesian hierarchical regression method that incorporates prior information on the likelihood of variant causality through weighting of variant effects. By simulation studies using both simulated and real sequence variants, we compared a standard single variant test for analyzing variant-disease association with the proposed method using different weighting schemes. We found that by leveraging linkage disequilibrium of variants with known GWAS signals and sequence conservation (phastCons), the proposed method provides a powerful approach for detecting causal variants while controlling false positives.