Studies of human genetic variation over the last decade have revealed that mixture events between highly diverged population groups (archaic admixture), such as between Neandertals and non-Africans, have been a common occurrence and are likely to have had a major impact on human phenotypes. For example, studies have documented its phenotypic impact in analyses of individual loci, such as the MHC locus. However, a lack of adequate analytical tools has hindered a systematic understanding of the phenotypic impact of archaic admixture. This K99/R00 research proposal proposes to develop and validate statistical methods to infer the genetic structure arising from archaic admixture and to leverage this structure to identify genetic variants introduced by archaic admixture that influence phenotypes. Insights from the application of these methods will not only produce a more complete understanding of the genetic factors underlying complex phenotypes, such as common diseases, but will also ensure that currently under-served minority populations, many of whom descend from admixture events or from ancestral groups distinct from those of Europeans, can be studied just as effectively as populations of European descent and can benefit from the discoveries of genomic medicine. The first goal of this proposal is to extend and validate our current statistical model for accurate inference of local ancestry in archaic admixtures. The proposed model attempts to integrate a large number of patterns of genetic variation using the statistical framework of Conditional Random Fields (CRF). An important first example for the application of this model is the inference of Neandertal local ancestry in non-African populations. The inferred Neandertal ancestry will be leveraged for the second goal: to associate Neandertal variants with specific phenotypes. This goal will be pursued by analyzing a custom array designed to capture Neandertal-derived variants and by extending the CRF to infer Neandertal ancestry from SNP genotyping arrays rather than from next-generation sequencing. A complementary approach to study the action of natural selection on Neandertal variants, using a novel diffusion process-based statistical test, will be explored. Finally, the CR will be generalized to handle multiple ancestral populations as well as to the case where no reference genomes are available for the ancestral populations, and will be tested and validated on important examples for each case such as Denisovan admixture into Melanesian populations and sub-Saharan African populations that have evidence of unknown archaic ancestry. All of the methods and the results from this research will be made publicly available.

Public Health Relevance

Although mixture events between highly diverged population groups (archaic admixture) are now known to have been common throughout human history and are likely to have had a major impact on human phenotypes, this impact has not been systematically understood due to the lack of powerful analytical tools. The proposed research will develop and validate statistical methods to understand the phenotypic impact of archaic admixture. Insights from the application of these methods will not only produce a more complete understanding of genetic variants that modulate complex phenotypes, such as common diseases, but will also ensure that currently under-served minority populations, many of whom descend from admixture events or from ancestral groups distinct from those of Europeans, can be studied just as effectively as populations of European descent and can benefit from the discoveries of genomic medicine.