Department of Integrative Biology, University of California Berkeley, Berkeley, CA.

2

Department of Molecular Biotechnology and Health Sciences, University of Torino, Turin, Italy.

3

School of Natural Sciences, University of California Merced, Merced, CA.

Abstract

Comparisons of DNA from archaic and modern humans show that these groups interbred, and in some cases received an evolutionary advantage from doing so. This process-adaptive introgression-may lead to a faster rate of adaptation than is predicted from models with mutation and selection alone. Within the last couple of years, a series of studies have identified regions of the genome that are likely examples of adaptive introgression. In many cases, once a region was ascertained as being introgressed, commonly used statistics based on both haplotype as well as allele frequency information were employed to test for positive selection. Introgression by itself, however, changes both the haplotype structure and the distribution of allele frequencies, thus confounding traditional tests for detecting positive selection. Therefore, patterns generated by introgression alone may lead to false inferences of positive selection. Here we explore models involving both introgression and positive selection to investigate the behavior of various statistics under adaptive introgression. In particular, we find that the number and allelic frequencies of sites that are uniquely shared between archaic humans and specific present-day populations are particularly useful for detecting adaptive introgression. We then examine the 1000 Genomes dataset to characterize the landscape of uniquely shared archaic alleles in human populations. Finally, we identify regions that were likely subject to adaptive introgression and discuss some of the most promising candidate genes located in these regions.

Receiver operating characteristic curves for a scenario of adaptive introgression (s = 0.1) compared with a scenario of neutrality (s = 0), using 1,000 simulations for each case. Populations A and B split from each other 4,000 generations ago, and their ancestral population split from population C 16,000 generations ago. Population sizes were constant and set at . The admixture event occurred 1,600 generations ago from population C to population B, at rate 2% (top panels) or 25% (bottom panels). The right panels are zoomed-in versions of the left panels.

Joint distribution of and for different choices of w (1%, 10%) and x (20%, 50%). We set y to 100% in all cases. 100 individuals were sampled from panel A, 100 from panel B, and 2 from panel C. The demographic parameters were the same as in .

We computed the number of uniquely shared sites in the autosomes and the X chromosome between particular archaic humans and different choices of present-day non-African panels X (x-axis) from phase 3 of the 1000 Genomes Project. We used a shared frequency cutoff of 0% (A), 20% (B), and 50% (C). Nea-only = . Den-only = . Nea-all = . Den-all = . Both = . Finally, we scaled each of the statistics from panels A to C by the number of segregating sites in each 1000 Genomes population panel, yielding panels D–F.

For each population panel from the 1000 Genomes Project, we jointly plotted the U and Q95 statistics with an archaic frequency cutoff of within each population. Nea-only = and . Den-only = and . Nea-all = and . Den-all = and . Both = and .

We partitioned the genome into non-overlapping windows of 40 kb. Within each window, we computed and , where Out = EAS + AFR for EUR as the target introgressed population, and Out = EUR + AFR for EAS as the target introgressed population. We searched for Neanderthal-specific alleles (), Denisovan-specific alleles () and alleles present in both archaic genomes () that were uniquely shared with either EUR or EAS at frequencies above different cutoffs (x = 0%, x = 20%, x = 50%, and x = 80%). Windows that fall within the upper tail of the distribution for each modern-archaic population pair are colored in red (P < 0.001/number of pairs tested) and those that do not are colored in blue, except for those in the X chromosome, which are in green. Ovals drawn around multiple points contain multiple windows with uniquely shared alleles that are contiguous. For comparison, the number of high frequency uniquely shared sites between Denisova and Tibetans is also shown (), although Tibetans are not included in the 1000 Genomes data and the region is 32 kb long, so this may be an underestimate.

We plotted the 40kb regions in the 99.9% highest quantiles of both the and statistics for different choices of target introgressed population (Pop) and outgroup non-introgressed population (Out), and different archaic allele frequency cutoffs within the target population (x). (A) We plotted the extreme regions for continental populations EUR (Out = EAS + AFR), EAS (Out = EUR + AFR), and Eurasians (EUA, Out = AFR), using a target population archaic allele frequency cutoff x of 20%. (B) We plotted the extreme regions from the same statistics as in panel A, but with a more stringent target population archaic allele frequency cutoff x of 50%. (C) We plotted the extreme regions for individual non-African populations within the 1000 Genomes data, using all African populations (excluding African-Americans) as the outgroup, and a cutoff x of 20%. (D) We plotted the extreme regions from the same statistics as in panel C, but with a more stringent target population archaic allele frequency cutoff x of 50%. Nea-only = and . Den-only = and . Both = and .

We explored the haplotype structure of six candidate regions with strong evidence for AI. For each region, we applied a clustering algorithm to the haplotypes of particular human populations and then ordered the clusters by decreasing similarity to the archaic human genome with the larger number of uniquely shared sites (see “Methods” Section). We also plotted the number of differences to the archaic genome for each human haplotype and sorted them simply by decreasing similarity. In the latter case, no clustering was performed, so the rows in the cumulative difference plots do not necessarily correspond to the rows in the adjacent haplotype structure plots. POU2F3: chr11:120120001–120200000. BNC2: chr9:16720001–16760000. LARS: chr5:145480001–145520000. FAP/IFIH1: chr2:163040001–163120000. OAS1: chr12:113360001–113400000. LIPA: chr10:90920001–90980000.