July 30, 2012

Estimating the age of Y-chromosome Adam (again)

I have used the official phase1 chrY SNP data instead of the working data used in my first experiment. The histogram of pairwise TMRCA values looks much sharper now; not sure what the difference between the two datasets was:

In any case, the divergence of the most basal African clade is very evident here on the right, corresponding to an age for the human Y-MRCA of 159,298 years.

Also of interest are the other peaks in the distribution of pairwise TMRCAs which correspond to 6.5, 40.0, 66.3 thousand years. I think we are getting some good signals corresponding to Out-of-Arabia (66.3ky?) where a hyper-arid phase in Arabia may very well have caused a bottleneck in the population of modern humans, and full behavioral modernity/UP revolution (40.0ky?) where modern humans start turning up all over Eurasia, and even some Africans look like UP Europeans.

Perhaps, I'll spend some more time assigning the Y-chromosomes to haplogroups so that I can give a more complete estimate of the major clades of the Y-chromsome phylogeny.

UPDATE: Node Ages

I will add various node ages as I calculate them. Thanks to ISOGG for a convenient correspondence between haplogroups and SNP genetic positions.

(*) This is, strictly speaking, the common ancestor of J1 and J2, since J*(xJ1, J2) chromosomes have also been observed.
(**) This is also the common ancestor of R1a and R1b, since R1* chromosomes have also been observed
(***) Paragroup DE* chromosomes have also been observed
(****) Paragroup E* chromosomes have also been observed, so this is, strictly speaking, the common ancestor of E1 and E2; this is only a small underestimate, given that the DE node (62,205 years) is only marginally older
(*****) There is also the paragroup I2a1* and I2a1c

(x) Due to absence of published bifurcating structure within K-M9, I estimated this using the same model-free method used for Y-MRCA. This result in an "older peak" of pairwise K-M9 TMRCAs of 36,389 years. This seems appropriate as it lies between IJK (41,910 years) and P (33,043 years)

44 comments:

That means that humans arrived in Australia very soon after that split between C and F.

"IJK (IJ-P125 vs. K-P131): 41,910 years"

So SE Asian K must be more recent than C's arrival in Australia. Haplogroups M and S must be even more recent than 41,910 years old. Probably around the time of NO 32,467 years and P 33,043 years. Fits perfectly with what I see showing up in other evidence. C was the first Y-DNA into Australia. K-derived haplogroups in New Guinea and Australia arrived perhaps as recently as 30,000 years ago.

Dienekes,Thanks for looking into the prior work/idea/methodology that I did back in April 2012 (which you have already nicely summarized in http://dienekes.blogspot.com.au/2012/05/drawing-human-y-chromosome-tree-with.html ).

See "UPDATE10: TMRCA of Y-Haplogroups - based of 1000 Genomes Project data" in http://www.goggo.com/terry/HaplogroupI1/ . And also in PDF format: http://www.goggo.com/terry/HaplogroupI1/SNP_y-Haplogroup_Tree_v5.pdf

Similar dates to yours, using two different calibrations, are shown there. And a computed phylogeny tree is also shown too.

Good to see that your dates are similar. As I have said elsewhere, I think this method of dating, using whole Y-chromosome SNP counts, is in theory more accurate and reliable than STR methods. I am glad that others, such as yourself, are now trying the same thing, and confirming earlier results.

Hector,There were no haplogroup "M" or "S" samples in the 100Genomes data. But there are seven haplogroup "LT" samples.

As in the computed phylo tree at the bottom of http://www.goggo.com/terry/HaplogroupI1/ , the age of the paragroup "K" is somewhat over 40 thousand years ago (but depending on the calibration scale you use).

Nice work. As almost always, I would feel more comfortable with slightly older dates - although things seem to be converging rapidly, now with the use of full-chromosome SNPs. I'd apply about a factor of 1.2 (20%) throughout:

neolithic diversity peak: 4-8kya --> 4.8-9.6kya

ice age minimum: 14-18kya --> 16.8-21.6kya

culturally modern human expansion peak: 40-44kya --> 48-52.8kya

post-Toba adverse climate minimum: 52-60kya --> 62.4-72kya

end of initial growth, steady-state before expansion into northern realms and (the far reaches of) SE Asia: 60-66kya --> 72-79.2kya

I think this makes several dates fit better. It also combines the green deserts growth with more-or-less immediate expansion eastward, when there was still opportunity to do so before the route got cut off due to extreme aridity.

The culturally modern expansion may appear early, but we must remember that such an expansion must have taken place for the first time in northern areas of India, China, and Pakistan and also in SE Asia before further expansion to Siberia, Western Asia, and finally Europe.

As for phyllogenetics of MNOPS branch, Terry's tree appears to indicate that it bifurcates into NO and P, NOT NOP and MS because the exclusive common ancestry of NO and P is less than 2000 years.

So M, S and K1 K2 etc will most likely attach themselves to either NO or P(obviously appropriately extended NO and P branches).However even if just one of Southeast Asian Ks attaches itself to P instead of NO an interesting conclusion is inevitable; P has an Southeast Asian origin.

At best, you are finding the TMRCA of V221. The only individual that you have that does not belong to BT is NA21313, a Maasai from Kenya that is derived for M32. Your estimate is ~50% older than that of Cruciani et al. 2011. Given the similar mutation rates in both analysis, the difference is considerable.Any thoughts on this matter?

After initial colonization, Europeans received almost continuous further input from West Asians - which I believe is one reason for the rather early "culturally modern" date I posted above. That is, the expansion of the earliest modern Europeans is a smallish subset of the y-DNA diversity of all modern humans.

It would be interesting to do the same study with subgroups. For example, just using Europeans against East Asians might reveal a firmer date for culturally modern human expansion. Given sufficient data, one could also posit I vs. East Asia and R1b vs. East Asia (not Central or South Asia!) and see if there are any differences.

"The assumption that the first wave of Australian settlers carried C haplogroup ancestral to any of extant C has a problem. That would mean that TMCRA of extant C lived in Austalia following the principle of parsimony"

Not that the first Australians would have a haplogroup ancestral to C as whole. They would have had ancestral C, which then formed C4 in Australia through drift, and C2 in Southern Wallacea.

"C1-C3 (C1-P122 vs. C3-Z1453): 25,022 years"

That would place C1 and C3 on a separate, but combined, line from both C2 and C4. I have long thought that C's phylogeny is yet to be finalised.

"However even if just one of Southeast Asian Ks attaches itself to P instead of NO an interesting conclusion is inevitable; P has an Southeast Asian origin".

Were you not already aware of that probability? I have long suspected that P has an SE Asian origin. I'd also place big money on mt-DNA R having an SE Asian origin. Interestingly Behar et al give P the earliest R-derived haplogroup's expansion at around the same time as that of CF here.

"So was the refugium during the Toba Eruption for AMH outside of Africa in Arabia?"

One challenge that the whole-y-Chromosome-SNP-count method faces, for dating the splits between haplogroups, is that the mutation rate is not well known (yet).

As with Dienekes, I first used an assumed mutation rate per generation (and I used the same value as he did I think). But that left the challenge of then estimating the average father-to-son generation time to use over the last hundred thousand years or so.

That introduces a constant, but unknown, relative error in all the dates that are computed from the SNP counts.

The next approach I took, and that is shown in the second timeline I added to the computed phylo tree, is that I just assumed a supposed date for the split in the CT paragroup (between the DE Eurasian group, and the CF Afrasian group) based on what the archaeological evidence might be indicating according to some. That approach sidesteps the average father-to-son generation time estimation, but puts any error into the assumed CT date.

With a few estimates from external assumptions about haplogroup split dates, one might come up with a timescale that is consistent with more assumed event dates.

The second timeline I used is in the ball park of being correct I hope.

For those wishing to try that approach, I added the schematic map of where the modern haplogroups are located, as shown at the end of Update 10 in http://www.goggo.com/terry/HaplogroupI1/ .Pick any pair of haplogroups (or enclosing paragroups) in that schematic world map, and tell me when in time you think they split, and then I could use that to calibrate the computed phylo tree based on that single date assumption.

CF looks heavily bottlenecked in comparison to DE, and even E. The TMRCA difference between DE and CF, at about 15,000 years, is even larger than the difference between BT and CT, at less than 10,000 years.

So was the refugium during the Toba Eruption for AMH outside of Africa in Arabia?What does that say about any earlier AMH remains in East Asia - were they wiped out?

IMO, no refugium sensu stricto was required. I think Denisovan admixture data indicate that SE Asa was in parts - but not heavily - populated before Toba, while further expansion and admixture got seriously started up after (humans mostly avoid rain forests for the lack of reliably available protein and imminent dangers - lions, and tigers, and bears! Oh my!).

It is interesting using this method that YDNA CT-M168 would coalesce close to the same time as mtDNA L3 as determined by Soares '11 (~63.1KYA). However, E-V12 vs E-V13 time of 13.8 KYA should not be younger than E-V13 vs E-V22 time of 16.2 KYA, since E-V22 and E-V13 have a more recent common ancestor (Z1919/1920) than either of them have with E-V12.

I think that the problem all along has been undercounting mutations? GD doesn't look at the history of a dys loci, it just observes the present difference of allele values. The Math suggests that the faster mutators have a high probability of 1 or more mutations in 500-600 years. How many would you expect in 10X that number? Variance calculations work on the same principle.

Its nice to see an analysis that makes sense. Kudos to Terry and Dienekes. I would sure love to see estimates for P312, U106 and U152 and even L21.

I stumbled across this website (below) while looking for distribution information on ■R1b1a2a1a1a5a vs ■R1b1a2a1a1a5b.

It is obviously a personal view of haplogroup affiliations, but if true this data puts both these haplogroups (sub groups of U106 in the UK for 7k+ years. That is thousands of years before the British bronze age. More neolithic than anything, and this is a SUBCLADE of U106. So the major R splits must be even earlier.

"Interestingly Behar et al give P the earliest R-derived haplogroup's expansion at around the same time as that of CF here".

Sorry. I should have actually checked. Behar et al give mt-DNA P's expansion as 54,801.3 years ago. That is 7000 years before the CF split.

"CF looks heavily bottlenecked in comparison to DE, and even E. The TMRCA difference between DE and CF, at about 15,000 years, is even larger than the difference between BT and CT, at less than 10,000 years".

As they stand the figures indicate that DE's expansion considerably pre-dates that of CF. That indicates that D's expansion through Eurasia is more ancient than the split between C and F. Interesting.

"I think that the problem all along has been undercounting mutations?"

I agree. It is even more likely in the case of some specific haplogroups.

"Kudos to Terry and Dienekes".

Again, agreed.

"Not necessarily. 70000 years ago instincts were different. The place in India where they got Pre-Toba proof is full of tigers, leopards even 2000 years ago".

But were those regions heavily forested? I actually agree completely with Eurologist on this matter:

"(humans mostly avoid rain forests for the lack of reliably available protein and imminent dangers - lions, and tigers, and bears!"

Obviously, I did not mean that Wizard of Oz citation literally. But humans seem to be more comfortable with predators out in the open. And note that during arid times, much if not most of the subcontinent was not rainforest, but savannah or desert.

IMO, clearly, the subcontinent and SE Asia were the main reservoirs of AMH both before, during, and after Toba - although Toba certainly mixed things up. The subcontinent IMO is also the most logical and central place for the UP cultural (end final cognitive?) revolution.

Annie, I agree - we have a few such data points, now - although some of the ancient DNA must still be further analyzed for appropriate subgroups. For example, is Ötzi G2a4* or something still further down the tree? That's not subtleties - it could mean a huge scaling factor in the results.

As to the website:"R1b1a2a1a1a5 (Z381) The most important subgroup of U106, all members descended from a Germanic speaking ancestor"

I wouldn't quite date Germanic that early.... ;) Funny when reality catches up with all these Celtic and Germanic myth lovers. I agree with U106 expanding locally with the early neolithic. With my temporary 1.2 fudge factor, that brings it to 7,800ya - locally in Western Europe.

I have contemplated before that perhaps the northern R lineages came north along the Rhine with La Hoguette and then intermingled with the westernmost LBK. The date fits perfectly well with La Hoguette. La Hoguette clearly also had different breeds of animals, so if the (partial) LBK collapse was caused by diseases, it could have simply been a re-filling from the West rather than a Mesolithic revenge (or both) that saw R1b sky-rocketing against G. And, when you continue north, you eventually end up at the U106 "Hochburg" - the Isles.

As they stand the figures indicate that DE's expansion considerably pre-dates that of CF. That indicates that D's expansion through Eurasia is more ancient than the split between C and F. Interesting.

I have stated before that the peculiar remnant-like distribution of D indicates that they were the first AMHs in East Asia (although at that point, I still had two different migrations out of Northern Africa in mind). And, more correctly, they are the only haplogroup from that time that survived, there. Note that with my "fudge factor," DE is pre-Toba. And so is CF, likely - because such a huge time difference between those two (as the numbers suggest) is impossible. CT did not undergo a single mutation for 15,000 years? In fact, a post-Toba population collapse likely is distorting the numbers in its vicinity.

Above, I mentioned a possible La Hoguette link with U106. I should note that this does not imply a shipment of highly diverse Rs via the Mediterranean to Southern France, at that time. A pre-LGM arrival of R1b to Western Europe is still quite viable - if not even required by these latest dates.

Now I'm curious, what could have caused all those roughly simultaneous major splits at 24-26ky? What's more, AFAIK all of them are implicated in known post-ice age re-population events with wide dispersals.

Let me speculate a little bit here. I don't know of any global event at ~25kya that could have caused that.

However, if we locate the splits at the verge of the late glacial maximum, with the climate improving from 17-16 kya onwards, everything seems to fit: the splits would correlate with the documented exits and dispersals of northern Eurasian groups from ice age refugia.

What do you think? Does it make sense to push the dates forward a bit**? And if not, why ~25kya?

**I don't know what changes that would require in the underlying calculation, nor the implications for the other dates, but the coincidence seems to good to ignore.

"However, if we locate the splits at the verge of the late glacial maximum, with the climate improving from 17-16 kya onwards, everything seems to fit: the splits would correlate with the documented exits and dispersals of northern Eurasian groups from ice age refugia".

That is quite possibly the explanation.

"Does it make sense to push the dates forward a bit**?"

It is probably not necessary to push the dates forward. 25 kya could still be a good fit if the haplogroups managed to squeeze north at that time but were prevented from expanding widely until the climate improved. In other words O, for example, became isolated in 3 separate regions after it formed from NO (NO at 32,467 years, O diversification 26,145-27,870). Something similar may be the explanation for Europe as well.

"And if not, why ~25kya?"

Good question. Perhaps all haplogroups are descended from haplogroups who had made it as far north as possible during the LGM. Then several lines became extinct with the subsequent cooling leaving remnant populations.

Here is another interesting fact: Q splits from R and N splits from O around the same time at ~32 kya. I think it is fair to assume that those two haplogroups, Q and N, were both north of the others (they are among the most common haplogroups in the northernmost areas of Siberia (N) and across Beringia (Q)).

So we'd imagine something significant should be taking place in Siberia at this time, but there is no obvious single event.

Say my speculation is on track, the date should be roughly 2/3 that of the table, so in this case ~21kya.

What is happening in Siberia at 21ya?

From what I've checked, at 22-21kya we see the emergence of the Siberian Late Upper Paleolithic industry, after a gap of 4ky. In other words, that is the precise date when Siberia was apparently being repopulated. (By Q and N ?)

But now there was a difference from previous human occupation there. Mammoth was increasingly scarce. There is evidence that human groups during this phase had begun to hunt reindeer and bisons. Well...

Hunting reindeer (and later herding) = N ?Hunting bisons (and later going after them all the way through Beringia) = Q ?

I almost think like I'm onto something here (even without the revised dates, though they'd seem to make more sense).

(By the way, I didnt' mention in the above post, but the C3-C1 split ALSO dates to the magic 25kya ! (or 16 kya). And, just like the other offshoots of that era, C3 seems to have expanded widely post-ice age, this time in eastern Siberia – but after Q...)

I wouldn't get hung up so much on haplogroup splits - much of it is ad-hoc definition/naming and coincidence. There is nothing magical about it other than more-or-less random letter assignments. The meat is in either long, isolated roots (e.g., I, I1; I2a2/a) or sudden, almost contemporaneous wide divergences (U106). The former indicate either good times followed by severe pruning, and/or extremely long periods of isolation in large populations; the latter indicate really good times.

As I observed above, correlation timing with global events and climate indicates at least a factor of 1.2 on all of these dates. But, of course, mutation rates are not the same everywhere and for each group, so one should be prepared for a +- 20% deviation or so at least. So, something called 25kya here IMO is more likely 30kya, or with error bars, something like 25-36kya.

40-30kya climate was relatively mild, compared to later times - so my bet is on diversification of haplogroups more likely during this time than any time later until after LGM.

Now I'm curious, what could have caused all those roughly simultaneous major splits at 24-26ky? What's more, AFAIK all of them are implicated in known post-ice age re-population events with wide dispersals."

Given likely archaeological dates for the first modern human compared to Y-DNA Adam, CT compared to Out of Africa dates, settlement of Papua New Guinea and Australia compared to C1-C3 dates, and the coincidence around 25kya for haplogroups that it would make sense to see expanding around the Upper Paleolithic revolution ca. 40kya.

I am inclined to think that the calibration of the Y-DNA is off and that the actual dates are 50%-60% higher.

Thus, the 25kya dates suddenly become Upper Paleolithic Revolution dates, rather than emerging ice age dates coinciding with archaeologicall evidence of declining sophistication of lithic tool kids when existing haplogroups should be culled - not population and range expansions leading to major expansions of multiple haplogroups, the C1-C3 date starts to approach within the margin of error the date for settlement of Australia and New Guinea, the R1b1a2a1a1a5 date moves to within the margin of error of the Mediterranean Neolithic or Epipaleolithic rather than Chalcolithic, I becomes appropriately timed to fit its presumed role as the dominant UP Y-DNA haplogroup in Europe.

The C1-C3 date is still surpringly low relative to O however. I would have expected a date long before the dates for O, not almost the same, given that there is C4 in Australia and C2 in Melanesia, for which a natural differentiation time would have been ca. 45,000 years ago, given that Denisovian admixture suggests that men with Y-DNA C would have been among those admixing with Denisovians and hence presumably in the first wave of modern human migration into the area, given the absence of Y-DNA O in Australia and New Zealand despite overlap on the mainland with C.

Also, given that D is so old, why is there no Denisovian admixture in its carriers? Does it perhaps become distinct in Africa, but not actually leave that continent until tens of thousands of years later?

"I have stated before that the peculiar remnant-like distribution of D indicates that they were the first AMHs in East Asia (although at that point, I still had two different migrations out of Northern Africa in mind). And, more correctly, they are the only haplogroup from that time that survived, there".

That is quite possible. It seems unlikely D reached Australia though. It seems Y-DNA is unlikely to have been eliminated in a population that presumably had become rapidly dispersed through the continent. On the other hand we do have quite a diversity of mt-DNA Ns in Australia. Perhaps D carried N as far as Wallacea, and was then almost completely replaced through the region.

"So we'd imagine something significant should be taking place in Siberia at this time, but there is no obvious single event".

I would expect that at that time selection would have been severe in any population that had managed to move very far north. The diversification may have happened as various representatives of the haplogroups became isolated and so diversified.

"I almost think like I'm onto something here"

Could be.

"The C1-C3 date is still surpringly low relative to O however. I would have expected a date long before the dates for O, not almost the same, given that there is C4 in Australia and C2 in Melanesia, for which a natural differentiation time would have been ca. 45,000 years ago"

That's why I said earlier that C1 and C3 may be connected, and C2 and C4 are older splits in Y-DNA C. I really think someone should do some serious work of Y-DNA C.

50% to 60% is too much, IMO. If you look at my post above and the overall pairwise histogram, you can see that a factor of 1.2 fits well-dated events really well, whereas a factor of 1.5 would make a mess of things.

When you include the error bars, things don't look as bleak as you make it sound. For example, using my 20% and slight upward correction (10ky error bar!), CF becomes pre-Toba, as it should be to explain the known intricacies of Denisovan admixture, and there is no longer a problem with Australian settlement dating.

Likewise, IJ dating then fits first Europeans perfectly. Don't forget that I has a very long root: what we see is probably all other "IJs" dying out in Europe except for I, perhaps coinciding with the Aurignacian --> Gravettian transition.

As to D, as I often state, it very much looks like a remnant population of the earliest arrivers. That is, most y-DNA D does not exist any longer - but autosomally, things carried on: in particular, those D populations that had admixture lived where there are now y-replaced populations that do have admixture. In other words, Denisovan admixture was a (just slightly) localized event that did not occur for the source populations in the extremes of Tibet or Japan or for the original Andaman Island population.One of the Denisovan papers talks exactly about such a secondary pathway for Denisovan DNA.

"you can see that a factor of 1.2 fits well-dated events really well, whereas a factor of 1.5 would make a mess of things".

That could be correct. Interestingly I've just done a quick check between the Y-DNA dates Dienekes has here and the mt-DNA dates in Behar et al. The CT and DE are earlier than either mt-DNA N or M. But mt-DNA N appears about 3000 years before Y-DNA CF, much earlier than the appearance of mt-DNA M. With Eurologist's adjustments they become contemporaries. E appears around the same time as M but to me it is extremely doubtful that the two haplogroups have any connection. The IJ/K split occurs soon after M appears however. Again, with Eurologist's adjustments they too are contemporaries. Obviously both sets of dates are open to adjustment but the results are interesting. I have long accepted that Y-DNA C and mt-DNA N are connected while Y-DNA F and mt-DNA M are connected. The dates in the two calculations fit that scenario.

Eurologist, actually, I wasn't referring to "letters" at all, rather to representatives of deep splits in the 1K Genomes and the UCSC Browser that were available but not used in the calculations. The "letters" are just mnemonics for the actual deep subclades that are represented. ("G" being G2a3-L30*, as I said.) I am aware that I2a1 is in fact are as old or older than the J1-J2 split and R1 as a whole. I mentioned the G2a3-L30*s because the split of G with the remainder of F is very ancient and we can now calculate the relative ages of F*, the split with G, and IJK*. This will give us a rough intermediate date for H and F2 which will be somewhere in between, and an upper bound for H* in India.

Old Blog Archive

Dienekes' Anthropology blog is dedicated to human population genetics, physical anthropology, archaeology, and history.

You are free to reuse any of the materials of this blog for non-commercial purposes, as long as you attribute them to Dienekes Pontikos and provide a link to either the individual blog entry or to Dienekes Anthropology Blog.

Feel free to send e-mail to Dienekes Pontikos, or follow @dienekesp on Twitter.