February 01, 2012

A timeline of human prehistory

I will be using the following equation to estimate some times in human prehistory:

Time t can be estimated if one has an estimate of the effective population size Ne, and of Fst between a pair of ancestral populations. We also need some estimate of the generation length to convert times into years.

I will use the Fst values recently inferred by myself using ADMIXTURE on a large dataset of Old World variation; I note that this dataset did not include African hunter gatherers.

I use the estimates of Ne by Li et al. My procedure is as follows: I note which population is modal for each of the 12 ancestral components, and note its Ne. I take the average of the two populations for which there is an Fst value.

For example, the Atlantic_Med component is modal in the HGDP French_Basque population (Ne=6,137), and the Sub_Saharan in the HGDP Yoruba population (Ne=9,960). The average Ne=8,049. The Fst betwen these two components is 0.185. Hence, the time of divergence is:

23*log(1-0.185)/log(1-1/(2*8049)) = 75.7ky

Below are the pairwise divergence times of the maximally differentiated components: Atlantic_Med, Southeast_Asian, Sub_Saharan; within the HGDP, these are modal in French_Basque, Dai, and Yoruba

These dates appear plausible. An important calibration point is the appearance of the first modern humans in Europe which has been archaeologically placed at precisely the time given above for the Atlantic_Med vs. Southeast_Asian divergence.

We can probably assume that the differentiation between Eurasians and Sub-Saharan Africans is best estimated by the Sub_Saharan vs. Southeast_Asian comparison, since Southeast_Asians would have extremely limited opportunity to experience subsequent gene flow with Africans in either direction. The time of 86.4ky is older than the 70ky estimated for mtDNA haplogroup L3 which almost certainly marks the major human population expansion. The difference can be probably attributed to the absorption of different populations in Africa and Asia (archaic humans or pre-L3 expansion modern humans).

Of particular interest to me is an evaluation of my womb of nations theory, with respect to the origin of West Eurasians. The Caucasus component exhibits Fst distances between 0.033 and 0.052 with the components I have labeled "the Six". Within the HGDP, the following populations are modal for them:

There is reason to think that the last 3 ages are inflated: the Northwest_African and Gedrosia component appear divergent in a Sub_Saharan and South_Asian direction respectively, so they probably represent stabilized mixtures of a West Eurasian with a substrate element. Moreover, the effective population size used for the Bedouin is quite higher than for any other West Eurasian population, presumably because of the African admixture in that population; that high effective size is, hence, not reflective of the effective size of the Southwest_Asian ancestral component.

All in all, these results are consistent with a mainly Neolithic origin of West Eurasian populations, and incompatible with their pre-LGM differentiation. As I have argued in the womb of nations, the expanding Neolithic population probably absorbed -to a limited extent- aborigines from outside the core area of the Near East, however, the bulk of the population appears to share a fairly recent common ancestry.

But you need to show us ALL of the calculations. For eg Caucasion to Sub Saharan Africa.

Plus my first thought is the estimate for Atlantic_Med vs. Sub_Saharan is 10k lower than for Sub_Saharan vs. Southeast_Asian. That is either a huge error or reflects a later flow into Europe from Africa possibly via Tunisia or Gibralter. I suppose differential archaics input is also possibly a facter, but this pattern of effect does not seem to fit our current assumptions.

Whatever, 10k minimum must then be added to Caucausian comparison if we assume 86k ya out of Africa.

And in anycase there is archaelogical and historical evidence of ongoing bidirectional flow between the Caucasus and Western Europe. These were not isolated populations, not even relatively isolated. The flow would soften any original differences.

Plus the neolithic did not hit Western Europe until 5 ka so the magic number for population replacement (French Basque) from the Caucasus ought to be circa 5kya. Not 10 kya.

No, I don't, since I've described my method and linked to the data, so that anyone can repeat my calculations.

Plus my first thought is the estimate for Atlantic_Med vs. Sub_Saharan is 10k lower than for Sub_Saharan vs. Southeast_Asian. That is either a huge error or reflects a later flow into Europe from Africa possibly via Tunisia or Gibralter. I suppose differential archaics input is also possibly a facter, but this pattern of effect does not seem to fit our current assumptions.

It does reflect gene flow of some kind, but the data does not allow us to determine whether it reflects gene flow Out- or Into-Africa.

And in anycase there is archaelogical and historical evidence of ongoing bidirectional flow between the Caucasus and Western Europe. These were not isolated populations, not even relatively isolated. The flow would soften any original differences.

The bidirectional gene flow would have brought _populations_ closer, but I don't see why it would bring the _ancestral components_ peculiar to each region closer

Plus the neolithic did not hit Western Europe until 5 ka so the magic number for population replacement (French Basque) from the Caucasus ought to be circa 5kya. Not 10 kya.

There is no component centered on Western Europe. Two components predominate there, the Atlantic_Med and North_European.

The former is modal in French Basques and Sardinians, and the Neolithic was established in Sardinia about 9,000 years ago, consistent with the divergence between the Caucasus and Atlantic_Med components, and with the apparent affinity of Otzi with present-day Sardinians.

The North_European component is modal in the Baltic area, where there seems to be an excess of mtDNA U types, and this is consistent with a stronger pre-Neolithic stratum in that population.

"The time of 86.4ky is older than the 70ky estimated for mtDNA haplogroup L3 which almost certainly marks the major human population expansion. The difference can be probably attributed to the absorption of different populations in Africa and Asia (archaic humans or pre-L3 expansion modern humans)".

>The North_European component is modal in the Baltic area, where there seems to be an excess of mtDNA U types

Interesting, Central European and East European groups which are equally rich in mtdna*H appear to differ in their full-genome "North_European" scores. Why? There seems to be literally a correlation between U4+U5 percentage and "North_European" component and not between the latter one and any particular candidate for being the major Neolithic haplogroup, say H hg. Neither the diversity of U4/5 participates in this correlation: it reaches the maximum in Central (or Central-Eastern) Europe due to higher population density there than between Baltic area and Urals.

"The bidirectional gene flow would have brought _populations_ closer, but I don't see why it would bring the _ancestral components_ peculiar to each region closer"

You again appear to be assuming the intra-WEurasian ADMIXTURE components represent actual ancient populations. From looking at the behavior of these components as K is varied, it's obvious they can't be meaningful in that sense for most (if any) values of K. Nor have you given us any particular reason to think this K is the "correct" one.

Ask ADMIXTURE to model West Eurasians as a mixture of six "ancestral components", and ADMIXTURE will give you the best fit it can find. This does not mean West Eurasians are profitably conceived of as a mix of six ancient populations, nor have you demonstrated that they are. Even if W. Eurasians were a composite of six discrete populations of Neolithic Caucasian origin, it would remain to be demonstrated that the ancestral components picked up by ADMIXTURE correspond to these populations.

The best genetic insight into ancient population movements will come from haplotype-based approaches (and, ideally, whole genome data). I'm fairly confident your preferred model will not turn out to be the correct one. What you seem to be mainly picking up here is that populations closer to one another geographically tend to be more similar genetically, and the Caucasus fall roughly in the middle of W. Eurasia.

You again appear to be assuming the intra-WEurasian ADMIXTURE components represent actual ancient populations. From looking at the behavior of these components as K is varied, it's obvious they can't be meaningful in that sense for most (if any) values of K. Nor have you given us any particular reason to think this K is the "correct" one.

Use of the ancestral populations does not mean that these were actually existing populations, but it helps remove layer(s) of admixture from extant populations.

For example, Fst between Basques and Bedouins does not give an accurate age estimate for the divergence of the common ancestors of the two, because the latter contain African admixture, which inflates this estimate.

The choice of K is not particularly material for establishing the time frame. For example, the Atlantic_Baltic and West_Asian Fst=0.035 at K7b is not that different from the Caucasus/Atlantic_Med (0.033) or Caucasus/North_European (0.041) at K12b. Since effective sizes in West Eurasian populations are also quite similar to each other, the use of K=7 or K=12 will not affect materially the conclusions about time depth.

There are two possible ways in which West Eurasians came to be so similar to each other: one is that they share fairly recent common descent, modified by drift and substrate admixture during the last 10,000 years or so.

The other is that they were very dissimilar to each other initially, but became so thoroughly intermixed along their entire range, that what we now perceive as differences are tiny differences of ancestral components that were originally very different from each other. Moreover, this intermixture would have to be limited at some point to allow for the emergence of the very distinct foci of geographical structure picked up by ADMIXTURE, and also visible in PCA and other methods.

"There is reason to think that the last 3 ages are inflated: the Northwest_Asian and Gedrosia component appear divergent in a Sub_Saharan and South_Asian direction respectively, so they probably represent stabilized mixtures of a West Eurasian with a substrate element."

Alternatively, what do you think about this proposition - The labeled South Asian component in the K12a, which peaks in the Pulayar of Tamil Nadu from the Metspalu et al. data-set, is, as a stand-alone component, very ASI-like. But, it still has a subsumed West-Eurasian fraction. This is illustrated by it's low Fst distance of 0.075 with the Gedrosian (modal in the Brahui) component, which in turn has a low Fst distance with the Caucasus (modal in Abhkazians) component at 0.036. In light of this, a Gedrosian-like (or perhaps Gedrosian itself) West-Eurasian component is very possibly what comprises of the West-Eurasian fraction subsumed under the South Asian component. This is why Gedrosia is shifted towards the South Asian component, since it has contributed to the formation of that composite, as opposed to being a composite in itself.

But, the Gedrosia component is also shifted towards the East_Asian component as well, so its closer distance to the South_Asian cannot be explained simply by the South_Asian incorporating some of it. The Gedrosia must include some ASI-like component which pulls it towards South Asians, and, since ASI is related to East Asians, towards East Asians as wel.

Dienekes, is there any paper discussing the nature of such proximity? Is it possible to estimate the time of separation of both using techniques more sophisticated than FST-Ne considerations? For instance, with shared phased haplotypes? It's interesting to see where is the contribution of the initial peopling of S/E Asia and where is an admixture from E Asia.

"The bidirectional gene flow would have brought _populations_ closer, but I don't see why it would bring the _ancestral components_ peculiar to each region closer"

The stats relate to similarity difference. They arent really ancestral components, just approximations of ancestral components, and are affected by your assumptions. Which is why those found by Eurogenes are slightly different. They only really work for divergence if the populations are relatively isolated. Connection will make the original components and the apparent ancestrals seem closer. As I understand it.

------Thinking about Archaics. As I recall Eurasians are sitting at about 3% archaic with the South East Asians having an additional 5% from the Denisovans. South East Asia (0.209) and East Asia (0.206) have near identical Fst values against SSAs so I think we can discard the effect of the archaics.-----------Thinking about the FSTs.

It seems to me that you can only date Out of Africa from the biggest fst, which is the Siberians at 0.219. So assuming the same population (a big assumption) 94,041 ya. Seems a bit large to me but its not inconceivable.

It seems to me that that lower Fsts we are seeing in comparison with SSA must relate to later waves of Out of Africa. These are not easily dated because of the admixture of the later waves. There do seem however to be several distinct and closely related groups. To me it looks like:

(Wave 1) Out of Africa, most probably represented by the Siberians (0.218, 94k).

(Wave 2) Affecting the ancestor population (because the numbers are so similar) of the South East Asians (0.209) and the East Asians (0.206).

(Wave 3) Affecting the ancestor population of the Atlantic_Med (0.185) and Northern Europeans (0.183). This could also be a higher admixture of wave 2. I expect this ancestor population was in the Caucasus or somewhere nearby at the time.

(Wave 4) Affecting the Caucasus (additionaly) after the Atlantic_Med and Northern Europeans had left. Ancestor population to Gedrosia (0.173), Caucasus (0.171), South Asia (0.172) South West Asia (0.169)

Dienekes, can you explain is the human race degenerating due to the drift? Is the effective population size of all people on the earth telling about the disaster through the genetic drift? How the present diversity can have been obtained if the effective population size tells about opposite trend? Confusing.

Thanks, Dienekes. So, am I right in saying that both the Gedorsia and South-Asian components are a composite, except the former is predominantly West-Eurasian and the latter is predominantly aboriginal south-Asian/ASI? I ask, because the relationship equation used to calculate total inferred ANI (West-Eurasian) fractions for the South-Asian reference populations and participants using Dodecad K12a is;Mediterranean + North_European + Caucusus + Gedrosia + 0.16*South_Asian + SW Asian

This assumes Gedrosia to be a wholly West-Eurasian component. With this formula, for instance, the Pakistani Pathan are 77.16 West-Eurasian in total; which corroborates with the ~76.9% (1.1) ANI score inferred by Reich. Likewise, the HGDP Pakistani Sindhi's score using this formula is 72.3%, which is close to Reich's total ANI of ~73.7% (+1.1).

In addition to my usual hesitation regarding effective population sizes over long times with many bottlenecks and expansions, I wonder to what extent the effective population sizes in Li et al. would need to be much increased if talking about wide groups, such as Atlantic_Med, North_European, or Southwest_Asian (rather than specific sub-populations). For N_e >>1 and F_st <1 and N_e >> 1/F_st, the equation becomes:

t = t_gen * 2N_e *F_st *(1 +F_st/2 +F_st^2/3 +...)

that is, it is approximately linear in both N_e and F_st. (In your example, the above gives t=68,500 linearly, and 74,800 with the first correction, and then 75,600, etc.).

In other words, if we get the effective population size wrong by a factor of two, so is the age estimate.

"Plus the neolithic did not hit Western Europe until 5 ka so the magic number for population replacement (French Basque) from the Caucasus ought to be circa 5kya. Not 10 kya."

Im guessing that the estimated genetic divergence times allude to the times that the ancestral components departed from West Asia. The Neolithic presumably made its way across Europe either gradually or in spurts. Thus a departure time of 10K is consistent with an arrival time of 5K or whenever. The Caucasus component could conceivably then have overlaid the earlier Neolithic components as an Indo-European conquest.

Old Blog Archive

Dienekes' Anthropology blog is dedicated to human population genetics, physical anthropology, archaeology, and history.

You are free to reuse any of the materials of this blog for non-commercial purposes, as long as you attribute them to Dienekes Pontikos and provide a link to either the individual blog entry or to Dienekes Anthropology Blog.

Feel free to send e-mail to Dienekes Pontikos, or follow @dienekesp on Twitter.