Many common diseases are complex genetic traits determined by the interplay of numerous genes and their interactions with environmental factors. One of the expectations from genome wide association studies was that they would provide the decoder for these complex genetic diseases that together with information about environmental exposure could be used to compute individual risk for a disease and to suggest appropriate prophylactic treatments or lifestyle changes. While genetic and exposure data may provide the necessary information to decipher complex diseases, the construction of comprehensive models that can inform about disease risk of a specific patient based on genetic and exposure data has proved to be very challenging. Many statistical methods to identify gene x environment interactions have been proposed but sample size issues, efficient analytic approaches and feasible computations continue to be a challenge. Furthermore, interpretation of models with many interactions and translation into useful tools for clinicians is difficult. Supported by the NIH/NHLBIR01 HL87681-01, "Genome-Wide Association Studies in Sickle Cell Anemia and in Centenarians", we introduced a novel Bayesian approach to estimate the genetic predisposition to exceptional longevity using more than 200 SNPs discovered from a genome wide association study of exceptional longevity. The approach used an ensemble of models for prediction and to compute an individual's genetic risk profile that provides a graphical display of the relative contribution of each SNP for prediction. We demonstrated that cluster analysis of the genetic risk profiles can help dissect the complex phenotype of exceptional longevity into sub-phenotypes with characteristic genetic signatures. Here we propose to expand this approach to include the effect of environmental factors, and to evaluate the method in genome wide association studies of exceptional longevity and phenotypic diversity of sickle cell anemia in 3 specific aims. Specific Aim 1 : Development of a novel class of Bayesian genetic risk models that include gene x environment interactions. We propose to expand the class of Bayesian directed graphical models to include non genetic risk factors and their multiple interactions with genetic variants, and to develop a search procedure to build these models from data. The approach will be applicable to massive data sets such as those produced, for example, from next generation sequencing technology. The Bayesian models generated with this approach will produce gene x environment risk profiles that can be used to graphically display and interpret the joint effect of genes and environment on the risk for disease of an individual. Specific Aim 2 : Discovery of genetic x environment signatures of complex traits. We will develop a method to cluster risk profiles as determined with Specific Aim 1 and discover genetic x environment signatures of complex traits. The relevance of this method is that it will help summarize the complex interactions between many genetic variants and risk factors at a population level and understand the relative contribution of genes and environment to disease prevalence. Specific Aim 3 : Implementation and Evaluation. We will implement the procedures in a statistical package using R software (open source). We will evaluate the procedures in simulated data, and two real genome wide association studies: exceptional longevity and phenotypic diversity in sickle cell anemia in which we have rich phenotypic, genetic and exposure data and the opportunity for independent replication of the findings.

Public Health Relevance

Many statistical methods to identify gene x environment interactions have been proposed but sample size issues, efficient analytic approaches and feasible computations continue to be a challenge. We propose a novel approach to build genetic risk models that include the effect of environmental factors, and to evaluate the method in genome wide association studies of exceptional longevity and of phenotypic diversity of sickle cell anemia. Our proposal will deliver a general class of genetic risk models, an approach for dissecting the effects of genes and environment, implementation of these methods in open source software and a better understanding of the role of genes and the environment to long and healthy lives, and to different phenotypes of sickle cell anemia.