Abstract

Population mixture is an important process in biology. We present a suite of methods for learning about population mixtures, implemented in a software package called ADMIXTOOLS, that support formal tests for whether mixture occurred and make it possible to infer proportions and dates of mixture. We also describe the development of a new single nucleotide polymorphism (SNP) array consisting of 629,433 sites with clearly documented ascertainment that was specifically designed for population genetic analyses and that we genotyped in 934 individuals from 53 diverse populations. To illustrate the methods, we give a number of examples that provide new insights about the history of human admixture. The most striking finding is a clear signal of admixture into northern Europe, with one ancestral population related to present-day Basques and Sardinians and the other related to present-day populations of northeast Asia and the Americas. This likely reflects a history of admixture between Neolithic migrants and the indigenous Mesolithic population of Europe, consistent with recent analyses of ancient bones from Sweden and the sequencing of the genome of the Tyrolean "Iceman."

f-statistics: (A) A simple phylogenetic tree, (B) the additivity of branch lengths; the genetic drift between (A, B) computed using our f-statistic-based methods is the same as the sum of the genetic drifts between (A, B) and (B, C), regardless of the population in which SNPs are ascertained, (C) phylogenetic tree with simple admixture, (D) a more general form of , (E) example of an outgroup case, and (F) example of admixture with an outgroup.

D-statistics provide formal tests for whether an unrooted phylogenetic tree applies to the data, assuming that the analyzed SNPs are ascertained as polymorphic in a population that is an outgroup to both populations (Y, Z) that make up one of the clades. (A) A simple unrooted phylogeny, (B) phylogenies in which (Y, Z) and (W, X) are clades that diverge from a common root, (C) phylogenies in which (Y, Z) are a clade and W and X are increasingly distant outgroups, and (D) a phylogeny to test if human Eurasian populations (A, B) form a clade with respect to sub-Saharan Africans (Yoruba).

Admixture graph fitting: We show an admixture graph fitted by qpGraph for simulated data. We simulated 50,000 unlinked SNPs ascertained as heterozygous in a single diploid individual from the outgroup Out. Sample sizes were 50 in all populations and the historical population sizes were all taken to be 10,000. The true values of parameters are before the colon and the estimated values afterward. Mixture proportions are given as percentages, and branch lengths are given in units of Fst (before the colon) and f2 values (after). F2 and Fst are multiplied by 1000. The fitted admixture weights are exact, up to the resolution shown, while the match of branch lengths to the truth is rather approximate.

rolloff simulation results: We simulated data for 100 individuals of 20% European and 80% African ancestry, where the mixture occurred between 50 and 800 generations ago. Phased data from HapMap3 CEU and YRI populations was used for the simulations. We performed rolloff analysis using CEU and YRI (A) and using Gujarati and Maasai (B) as reference populations. We plot the true date of mixture (dotted line) against the estimated date computed by rolloff (points in blue A and green B). Standard errors were calculated using the weighted block jackknife. To test the bias in the estimated dates, we repeated each simulation 10 times. The estimated date based on the 10 simulations is shown in red.

rolloff analysis of real data: We applied rolloff to compute admixture LD between all pairs of markers in each admixed population. We plot the correlation as a function of genetic distance for (A) Xhosa, (B) Uygur, (C) Spain, (D) Greece, and (E) CEU and French. The title of each includes information about the reference populations that were used for the analysis. We fit an exponential distribution to the output of rolloff to estimate the date of the mixture (estimated dates ±SE shown in years). We do not show inter-SNP intervals of <0.5 cM as we have found that at this distance admixture LD begins to be confounded by background LD.

Bell-Beaker culture. On the left we show some Beaker culture objects (from Bruchsal City Museum). On the right we show a map of Bell-Beaker attested sites. We are grateful to Thomas Ihle for the Bruchsal Museum photograph. It is licensed under the Creative Commons Attribution-Share Alike 3.0 Unported license, and a GNU Free documentation license. The map is public domain, licensed under a creative commons license, and adapted from a map in Harrison (1980).