Inferring human population size and separation history from multiple genome sequences.

Schiffels S, Durbin R - Nat. Genet. (2014)

Bottom Line:
The availability of complete human genome sequences from populations across the world has given rise to new population genetic inference methods that explicitly model ancestral relationships under recombination and mutation.The multiple sequentially Markovian coalescent (MSMC) analyzes the observed pattern of mutations in multiple individuals, focusing on the first coalescence between any two individuals.Results from applying MSMC to genome sequences from nine populations across the world suggest that the genetic separation of non-African ancestors from African Yoruban ancestors started long before 50,000 years ago and give information about human population history as recent as 2,000 years ago, including the bottleneck in the peopling of the Americas and separations within Africa, East Asia and Europe.

ABSTRACTThe availability of complete human genome sequences from populations across the world has given rise to new population genetic inference methods that explicitly model ancestral relationships under recombination and mutation. So far, application of these methods to evolutionary history more recent than 20,000-30,000 years ago and to population separations has been limited. Here we present a new method that overcomes these shortcomings. The multiple sequentially Markovian coalescent (MSMC) analyzes the observed pattern of mutations in multiple individuals, focusing on the first coalescence between any two individuals. Results from applying MSMC to genome sequences from nine populations across the world suggest that the genetic separation of non-African ancestors from African Yoruban ancestors started long before 50,000 years ago and give information about human population history as recent as 2,000 years ago, including the bottleneck in the peopling of the Americas and separations within Africa, East Asia and Europe.

Figure 3: Population Size Inference from whole genome sequences(a) Population size estimates from four haplotypes (two phased individuals) from each of 9 populations. The dashed line was generated from a reduced data set of only the Native American components of the MXL genomes. Estimates from two haplotypes for CEU and YRI are shown for comparison as dotted lines. (b) Population size estimates from eight haplotypes (four phased individuals) from the same populations as above except MXL and MKK. In contrast to four haplotypes, estimates are more recent. For comparison, we show the result from four haplotypes for CEU, CHB and YRI as dotted lines. Data for this Figure is available via Supplementary Table 5.

Mentions:
The results from two individuals (four haplotypes) are shown in Figure 3a. In all cases the inferred population history from four haplotypes matches the estimates from 2 haplotypes where their inference range overlaps (60kya-200kya, see thin lines for CEU and YRI in Figure 3, and Supplementary Figure 7a). We find that all non-African populations that we analyzed show a remarkably similar history of population decline from 200kya until about 50kya, consistent with a single non-African ancestral population that underwent a bottleneck at the time of the exodus from Africa around 40-60kya [21-23]. The prior separation of non-African and African ancestral population size estimates begins much earlier at 150-200kya, clearly preceding this bottleneck, as already observed using PSMC [7]. We will quantify this further below by directly estimating the relative cross coalescence rate over time. In contrast, we see only a mild bottleneck in the African population histories with an extended period of relatively constant population size more recently than 100kya. Between 30kya and 10kya we see similar expansions in population size for the CEU, TSI, GIH, and CHB populations. For the Mexican ancestors we see an extended period of low population size following the out of Africa bottleneck, with the lowest value around 15kya, which is particularly pronounced when filtering out genomic regions of recent European ancestry due to admixture (dashed line in Figure 3, see Online Methods). This extended bottleneck is consistent with estimates of the time that the Native American ancestors crossed the Bering Strait and moved into America [21, 24-26]. We repeated all analyses based on four haplotypes on a replicate data set, based on different individuals, available for all populations except MXL. All results are well reproduced and show differences only in the most recent time intervals, shown in Supplementary Figure 8.

Figure 3: Population Size Inference from whole genome sequences(a) Population size estimates from four haplotypes (two phased individuals) from each of 9 populations. The dashed line was generated from a reduced data set of only the Native American components of the MXL genomes. Estimates from two haplotypes for CEU and YRI are shown for comparison as dotted lines. (b) Population size estimates from eight haplotypes (four phased individuals) from the same populations as above except MXL and MKK. In contrast to four haplotypes, estimates are more recent. For comparison, we show the result from four haplotypes for CEU, CHB and YRI as dotted lines. Data for this Figure is available via Supplementary Table 5.

Mentions:
The results from two individuals (four haplotypes) are shown in Figure 3a. In all cases the inferred population history from four haplotypes matches the estimates from 2 haplotypes where their inference range overlaps (60kya-200kya, see thin lines for CEU and YRI in Figure 3, and Supplementary Figure 7a). We find that all non-African populations that we analyzed show a remarkably similar history of population decline from 200kya until about 50kya, consistent with a single non-African ancestral population that underwent a bottleneck at the time of the exodus from Africa around 40-60kya [21-23]. The prior separation of non-African and African ancestral population size estimates begins much earlier at 150-200kya, clearly preceding this bottleneck, as already observed using PSMC [7]. We will quantify this further below by directly estimating the relative cross coalescence rate over time. In contrast, we see only a mild bottleneck in the African population histories with an extended period of relatively constant population size more recently than 100kya. Between 30kya and 10kya we see similar expansions in population size for the CEU, TSI, GIH, and CHB populations. For the Mexican ancestors we see an extended period of low population size following the out of Africa bottleneck, with the lowest value around 15kya, which is particularly pronounced when filtering out genomic regions of recent European ancestry due to admixture (dashed line in Figure 3, see Online Methods). This extended bottleneck is consistent with estimates of the time that the Native American ancestors crossed the Bering Strait and moved into America [21, 24-26]. We repeated all analyses based on four haplotypes on a replicate data set, based on different individuals, available for all populations except MXL. All results are well reproduced and show differences only in the most recent time intervals, shown in Supplementary Figure 8.

Bottom Line:
The availability of complete human genome sequences from populations across the world has given rise to new population genetic inference methods that explicitly model ancestral relationships under recombination and mutation.The multiple sequentially Markovian coalescent (MSMC) analyzes the observed pattern of mutations in multiple individuals, focusing on the first coalescence between any two individuals.Results from applying MSMC to genome sequences from nine populations across the world suggest that the genetic separation of non-African ancestors from African Yoruban ancestors started long before 50,000 years ago and give information about human population history as recent as 2,000 years ago, including the bottleneck in the peopling of the Americas and separations within Africa, East Asia and Europe.

ABSTRACTThe availability of complete human genome sequences from populations across the world has given rise to new population genetic inference methods that explicitly model ancestral relationships under recombination and mutation. So far, application of these methods to evolutionary history more recent than 20,000-30,000 years ago and to population separations has been limited. Here we present a new method that overcomes these shortcomings. The multiple sequentially Markovian coalescent (MSMC) analyzes the observed pattern of mutations in multiple individuals, focusing on the first coalescence between any two individuals. Results from applying MSMC to genome sequences from nine populations across the world suggest that the genetic separation of non-African ancestors from African Yoruban ancestors started long before 50,000 years ago and give information about human population history as recent as 2,000 years ago, including the bottleneck in the peopling of the Americas and separations within Africa, East Asia and Europe.