August 02, 2013

A couple of important new papers on human Y-chromosome phylogeny appeared in Science today.

Francalacci et al. reconstructs the phylogeny of European Y-chromosomes based on a huge sample of 1,200 Sardinians. Naturally, Sardinians don't have every haplogroup in Europe or the planet, but with such a huge sample it was possible to find almost everything, minus obvious newcomers such as Uralic haplogroup N.

Poznik et al. build a human Y-chromosome phylogeny from 69 male genomes. The main thrust of their paper is to reconcile the "younger" Y-chromosome vs. "older" mtDNA in humans. In my opinion, that ship has sailed with the discovery of Y-haplogroup A00 which now makes the Y-chromosome MRCA of humans ("Adam") much older than the mtDNA one ("Eve").

And, indeed, the fact that the two are of different ages is not particularly troubling or in need of remedy, since for most reasonable models of human origins we do not expect them to be of the same age. Well, unless you believe the latest archaeological models that have early proto-sapiens perfecting their craft by scratching lines and perforating beads in some south African cave before spreading out to colonize the planet in one swift swoop.

The issue of the "discrepancy" aside, Poznik et al. resolve the issue of the binary structure of Y-haplogroup F, by showing that Y-haplogroup G (which is the Iceman's haplogroup, and the lineage most strongly associated with easly European farmers) branches off first from the tree.

Haplogroup G is an unambiguously west Eurasian lineage, so the fact that it is basal within F surely has implications about the origins of this most successful Eurasian group. The pattern is not quite clear, however, because the next most basal branch is H, which is unambiguously South Asian, and the next one after that is IJ vs. K, with IJ again being west Eurasian, with most east Eurasians nested within K. But, if we go up the tree, we see the split of C (Asian) vs. F (Eurasian), and further up DE (African+Eurasian) vs. CF (Eurasian). It seems to me that apart from the unambiguous African rooting of the entire tree, the rest of the topology paints a picture of a complex peopling of Eurasia, rather than a simple model of successive founder effects.

Another interesting finding of Poznik et al. is the discover of deep substructure within Y-haplogroup A-L419 (bottom of the picture).

The authors arrive at the following mutation rate:

Using entry to the Americas as a calibration point, we estimate a mutation rate of 0.82 × 10−9 per base pair (bp) per year [95% confidence interval (CI): 0.72 × 10−9 to 0.92 × 10−9/bp/year] (table S3).

This is notable for being lower than the directly estimated mutation rate of 1x10-9 of Xue et al. (2009).

The usefulness of "archaeological calibration" eludes me, which brings us back to Francalacci et al. who also archaeologically calibrate their mutation rate and find:

Considering that our analysis focused on approximately 8.97 Mbp of sequence from the Y chromosome X-degenerated region, this rate is equivalent to 0.53 × 10−9 bp−1 year−1.

So, the Francalacci et al. mutation rate is about half that of Xue et al., with that of Poznik et al. being intermediate. The Francalacci et al. rate was calibrated by "the initial expansion of the Sardinian population". Now, whether the current Sardinian population is descended from that initial expansion or from a later successful founder remains to be seen. In any case, using their ultra-slow mutation rate, these authors suggest that:

The main non-African super-haplogroup F-R shows an average variation of 534.8 (±28.7) SNPs, corresponding to a MRCA of ~110,000 years ago, in agreement with fossil remains of archaic Homo sapiens out of Africa (7, 18) though not with mtDNA, whose M and N super-haplogroups coalesce at a younger age (13). The main European subclades show a differentiation predating the peopling of Sardinia, with an average variation ranging from 70 to 120 SNPs (Table 1), corresponding to a coalescent age between 14,000 and 24,000 years ago, which is compatible with the postglacial peopling of Europe.

I am personally skeptical of all such archaeological calibrations and I'd like to see the mutation rate directly estimated using a well-behaved process (say, a 1,000-year old deep pedigree between two modern males separated by 60 meioses). It seems that there is no escape from mutation rate controversies in human genetics.

The most striking piece of data from this paper is the following figure:

Going from left-to-right:

R2 in Sardinia! This is extremely rare in Europe and underscores the importance of large sample sizes. It'd be wonderful to study it in the future in the context of, say, South Asian R2 which is much more numerous.

The clear "explosive" expansion of R1b-related lineages

A very deep common ancestry of haplogroups L and T.

Quite deep coalescences within Y-haplogroup J

"Explosive" growth of I2a1a1; this "southwest European" lineage attains its maximum in Sardinia and looks like a clear founder effect. It should definitely be visible in the ancient DNA record of the island.

Fairly deep splits within G2a. It would be interesting to see how G2 compares with Caucasian G1. We now know that G is very old lineage in West Eurasia (being the first split in haplogroup F), but how much of its present-diversity dates back to splits shortly after the haplogroup's appearance?

Finally, the deep splits within African haplogroup E correspond to the likely varied origins of these lineages

There's probably much more of interest in these twin papers, so if you notice anything in the supplementary materials, feel free to leave a comment.

Science 2 August 2013:
Vol. 341 no. 6145 pp. 562-565

DOI: 10.1126/science.1237619

Sequencing Y Chromosomes Resolves Discrepancy in Time to Common Ancestor of Males Versus Females

G. David Poznik et al.

The Y chromosome and the mitochondrial genome have been used to estimate when the common patrilineal and matrilineal ancestors of humans lived. We sequenced the genomes of 69 males from nine populations, including two in which we find basal branches of the Y-chromosome tree. We identify ancient phylogenetic structure within African haplogroups and resolve a long-standing ambiguity deep within the tree. Applying equivalent methodologies to the Y chromosome and the mitochondrial genome, we estimate the time to the most recent common ancestor (TMRCA) of the Y chromosome to be 120 to 156 thousand years and the mitochondrial genome TMRCA to be 99 to 148 thousand years. Our findings suggest that, contrary to previous claims, male lineages do not coalesce significantly more recently than female lineages.

Genetic variation within the male-specific portion of the Y chromosome (MSY) can clarify the origins of contemporary populations, but previous studies were hampered by partial genetic information. Population sequencing of 1204 Sardinian males identified 11,763 MSY single-nucleotide polymorphisms, 6751 of which have not previously been observed. We constructed a MSY phylogenetic tree containing all main haplogroups found in Europe, along with many Sardinian-specific lineage clusters within each haplogroup. The tree was calibrated with archaeological data from the initial expansion of the Sardinian population ~7700 years ago. The ages of nodes highlight different genetic strata in Sardinia and reveal the presumptive timing of coalescence with other human populations. We calculate a putative age for coalescence of ~180,000 to 200,000 years ago, which is consistent with previous mitochondrial DNA–based estimates.

44 comments:

Gs are present in a lot of royalty. My guess is they are a remnant of some wave of invasion that has mostly washed out and am not too surprised by eastern connections.

Not impressed by yet more backsolving to get results that agree with your presuppositions, though.

I2a is found everywhere megaliths are found (which sprang up all over the earth simultaneously), and even a bit in south america. That genetiker guy or whatever his name was based a lot of what he said on that. I don't agree with much of it but it's not much more implausible than OoA and black civilizations in asia (obviously it's more like asia came to africa not vice versa, as happened to europe and americas). I think it's simply too far back to ever know for sure and sort it all out, unless we get some huge discoveries.

"Haplogroup G is an unambiguously west Eurasian lineage, so the fact that it is basal within F surely has implications about the origins of this most successful Eurasian group. The pattern is not quite clear, however, because the next most basal branch is H, which is unambiguously South Asian, and the next one after that is IJ vs. K, with IJ again being west Eurasian, with most east Eurasians nested within K."

Please do not forget the F-others, such as F1-P91/P104, F2-M427/M428, F3-L279/L281/L284/L285/L286/M282/P96, and F4-M481. Of these, F1 and F4 seem to have been found only in South Asians, F2 has been found mainly in Loloish peoples of southwestern China, and F3 has been found with low frequency throughout Western Eurasia as far as I can tell. There may be some additional F* Y-DNA in South Asia and East Asia that does not belong to any of the aforementioned subclades.

Without knowing how the F-others relate to G, H, and IJK, I think it is too soon to declare that the opposition of G vs. H+IJK is indicative of a Western Eurasian origin of the entire F-M89 clade.

''R2 in Sardinia! This is extremely rare in Europe and underscores the importance of large sample sizes. It'd be wonderful to study it in the future in the context of, say, South Asian R2 which is much more numerous.''Giving a hint isn't the Gedrosian/baloch component present in Sardinians?.Good day.

"[F3 is] also found in the big sample of Sardinians where it is a sister group of G."

Quoting JCA's post at the former Quetzalcoatl forum,

"In Table S2 on page 34 of the supplementary material of Poznik et al., it is shown that HGDP00528, a French individual who has been classified previously as F3-M282, exhibits the derived allele (T) and therefore belongs to haplogroup HIJK-M578. It is also shown that two Lahu individuals, HGDP00757 and HGDP01318, who belong to haplogroup F2-M427, exhibit the ancestral allele (C) and therefore belong to paragroup F-M89(xHIJK-M578). I have not found any indication that F2-M427 is more closely related to G-M201 than to HIJK-M578, so, at present, we must assume a trifurcation of F-M89 into HIJK-M578 (Greater Eurasia), G-M201 (Western Eurasia), and F2-M427 (Yunnan)."

So it appears that the positions of at least F2 and F3 have been resolved already, with F3 being a branch of HIJK-M578 and F2-M427 being a branch of F-M89(xHIJK-M578) according to Poznik et al. (2013).

There is a discrepancy between these two papers in regard to their placement of the F3 clade. Poznik et al. have found that a French F3-M282 individual shares the M578 SNP in common with members of haplogroups H, I, J, and K and in contrast to members of A-M13, A-M6, B1-M236, B2-M112, C1-M8, C2-M38, C3-M217, C5-M356, C*-M130, D1-M15, D2-M55, D3a-P47, E1b1a1a1f1a-M191, F2-M427, G1-M285, G2a*-P15, G2a1a-P16, G2a3*-M485, G2a3a-M406, G2a3b1-P303G2a3b1a2-L497, G2a3b1a1a-M527, and G2b-M377.

Francalacci et al. have found one SNP in common among their samples of F3, G2a2b, and G2a3 Sardinians as opposed to their samples of I1a3a2, I2a1a, I2a1b, I2a2a, I2c, J1c, J2a, J2b, L, T, Q1a3c, R1a1a1, R1b1a2, R1b1c, and R2a1 Sardinians.

One or the other SNP must have back-mutated at least once. I suppose, with the data we have at present, Poznik's opposition between HIJK+F3-M578 and F-M89(xM578) (with the latter paragroup including at least G-M201 and F2-M427) is more robust than Francalacci's opposition between IJK and G+F3 because Poznik et al. have confirmed the absence of the M578 mutation throughout the rest of the human Y-DNA phylogeny, including even haplogroups B, C, and D.

Ebizur, we noticed this discrepancy as well and were curious. Francalacci et al. report a T to C variant shared by F3 and G at coordinate 21551752. We do not report this variant because it did not pass our quality filters (e.g., there were no mapped sequencing reads in 23/69 samples). Interestingly, we do observe evidence for a C allele in both G individuals, but also in the R individual and one of the hgA samples. However, the discrepancy between the two trees is resolved as follows: 20 of the 320 F3 derived alleles reported by Francalacci et al. are possessed by our hgH individual! So some fascinating new structure has emerged from the intersection of these data sets.

I hadn't really paid much attention to this, but one of the striking features to me is that the I/J split is very little removed from K, or the G/ H/ IJK split, in general. This thus may indicate one of the earliest western back-migrations from the NW subcontinent population.

I was really hoping that someone would see fit to answer my question. I am genuinely curious to know what the “calibration date” that the rest of the math is based on is and I do not have access to the papers to check for myself. I share Dienekes’ skepticism about archeological calibrations so I think that it is important to know what the base assumption is before making judgments about the research. Would someone please assist me?

R2 is untypical for Roma. Typical for Roma is H any other Haplogroup among them is mostly typical for their host country. For example R2* was only found among Roma from Tajikistan, which themselves have significant presence of it.

"Without knowing how the F-others relate to G, H, and IJK, I think it is too soon to declare that the opposition of G vs. H+IJK is indicative of a Western Eurasian origin of the entire F-M89 clade".

I have long assumed that G originated in Western Eurasia rather than in South Asia. In fact I have got into many arguments on the matter. But we do know (in spite of Grognard's doubts) that F's ancestors lived in Africa. Surely we should therefore see a progressive 'peelig off' of haplogroups with increasing distance from the continent. I do agree, though, that H branching off before IJK is a bit of a surprise.

Yeah, the dating is a bit suspect. For Native Americans, it is based on Q-M3 -- but that haplogroup is also found in Siberia (e.g., Malyarchuk et al., 2011). But even if it were not present there, given the harsh environment of Beringia, clearly it would be easy for a haplogroup to just vanish (compared to the expansion in the Americas). Theoretically, it could have been present 30,000 ya - doubling the time estimate the same way the true ooA date likely doubles that estimate.

Given that AMHs were present in Siberia ~35,000 ya (based on known, dated sites), and in Beringia likely >~30,000 ya (based on autosomal studies), I don't believe any of the time estimates for Siberian nor Native American haplogroups.

I echo eurologists concerns about calibration dates. You should be calibrating Native American mtDNA dates from when the population became isolated, no later than their arrival in Beringia and possibly earlier as he suggests, not from their arrival deep into South America.

A 50kya calibration date for OoA likewise is dubious given the archaeological record.

I find it interesting that the Q lineage is basal to the R lineages. This may be due to how they filtered their SNPs rather than a true relation.

What irritates me though is the assumption that Haplogroup Q came from Siberia. There is more Y-STR diversity in America than there is in Siberia indicating that the Q's in Siberia were actually a back migration from America, not the other way around.

Perhaps we need to look for another source of Haplogroup Q to America. It would be interesting to see the Y-STR variability of Q in West Asia. I consider that Q and R are sister clades and emanated out from West Asia, R went to Europe and Q went to America.

@ Aaron "What irritates me though is the assumption that Haplogroup Q came from Siberia. There is more Y-STR diversity in America than there is in Siberia indicating that the Q's in Siberia were actually a back migration from America, not the other way around."

Well you clearly have missed the past 2 years of discussion as to how it is almost pointless to speculate where a haplogroup 'originates' from based on miniscule differences in STR variation

Before I learned of the dates used for “calibration” in this study (thank you once again Barthélémy ) I thought that there would be much to criticize about the date of entry into the Americas but now I find the 50kya date for OoA the much more dubious assumption. I am continually amazed at the malleability of this date amongst OoA supporters. I have heard everything from 250kya up to 50kya and everything in between. The date always varies and is usually “calibrated” so as to include whatever fossil that particular proponent thinks makes the best case for OoA. As for the date of entry into the Americas I am glad to see the researchers acknowledge the Monte Verde dates but question their use for calibration purposes. In my opinion it is unlikely that the humans responsible for Monte Verde traveled straight to that site from Beringia but since there is currently no evidence to indicate that they did not I cannot rule it out. I think the date of Monte Verde should be considered a bare minimum for entry into the Americas.

Thus, the P/Q split could correspond to a post-Toba differentiation. Although a more conservative timing could point to the G/H/IJK split to represent a post-Toba expansion, I see no reason why the lines shouldn't have split soon after they first entered the Subcontinent, and I prefer to think that as pre-Toba. It is crazy enough that there were 30,000 years between that and ooA, as is (although this ignores the DE (D) and C lineages, which may have blocked F-spread until an opportune moment).

Moving on, R started with the UP, and the R1a/ R1b split conforms with the beginning of the Gravettian. That could indicate the source of the (proto-) E Siberian admixture in Europeans, and would leave sufficient time to make these haplogroups pre-LGM in Europe. Also, compare to the first main I split (230 SNPs = 46,000 y), which conveniently coincides with the first populating of Europe.

"But even if it were not present there, given the harsh environment of Beringia, clearly it would be easy for a haplogroup to just vanish (compared to the expansion in the Americas). Theoretically, it could have been present 30,000 ya - doubling the time estimate the same way the true ooA date likely doubles that estimate".

Quite.

"I find it interesting that the Q lineage is basal to the R lineages".

That's not really the case. Both R and Q derive independently from P.

"What irritates me though is the assumption that Haplogroup Q came from Siberia. There is more Y-STR diversity in America than there is in Siberia indicating that the Q's in Siberia were actually a back migration from America, not the other way around".

Q's phylogeny strongly suggests an origin west of the Altai/Hindu Kush mountain ranges. Same goes for R. Just the derived versions Q1a2-L56 (America and Ket/Selkup)and Q1a1a1-F1096 (northern China)are found further east, although Q1a-MEH2 is listed in Koryaks and Eskimos but downstream mutations may not have been checked.

"I consider that Q and R are sister clades and emanated out from West Asia, R went to Europe and Q went to America".

"It might be bad dating (which applies to all dating) but there's now quite a few sites that put things back a lot further than this."

"Clovis First" has joined the ranks of the undead and, like other zombie hypotheses, will likely be with us for at least another generation. Among the laity and in disciplines outside of North-American anthropology, it will persist even longer.

"I find it interesting that the Q lineage is basal to the R lineages. This may be due to how they filtered their SNPs rather than a true relation.

What irritates me though is the assumption that Haplogroup Q came from Siberia. There is more Y-STR diversity in America than there is in Siberia indicating that the Q's in Siberia were actually a back migration from America, not the other way around."

Absolutely agree with your critique of the assumption that Q is derived from Siberia. In fact it's a clear sign of a back migration. (Just like hg E is a clear sign of a back migration to Africa).

Just to be clear about the R2 found in Sardinia, it's the form of R2a1 (L295+). This is the most common R2a subclade found in India today. In the R2-WTY project this specific clade is also found in Arabia and a very small number in Europe (Britian, Italy, and Iberia).

Whether it's from a Roma background or not I cannot say since we have yet to receive a Roma sample in the project. The only R2 among the Roma I'm aware of goes back to the Spencer Wells study where a small number of Sinti tested high for this haplogroup. But this is an old study so we're unsure if the lineage they carried was R2a1 or not.

Also another important note to mention about R2a in Europe is that it's mostly present among Ashkenazi Jews, but it has been established that the European Jewish R2a is L295-.

I agree with you that Y-STR diversity is not perfect, but the difference between Central Americans and Siberians is not slight, it is massive, the Asians are clearly branches off of the much larger diversity seen in the Americans.

@German Dziebel

On the Phylogeny Tree in the Francalacci et al. paper you can see that the Q lineage has a very short branch length. It only varies from the R-Q split by 7 SNPs. I still think this might be due to their filtering methods. The authors stated that they only considered SNPs valid if there were 2 instances of that SNP appearing. Since there was only 1 Q haplotype individual in their Sardinian sample, their filtering may have shortened the branch length.

I should also note that these two dates roughly align with the two MIS 5c (100 kya) and MIS 5a (80 kya) wet phases. So, AMHs outside the D and C male lineages may indeed have been stuck in some still decently wet area of the NE subcontinent before being able to move on.

In the blog Dienekis is a typo. I quote: "The main non-African super-haplogroup FR shows an average variation of 534.8 (± 28.7) SNPs, corresponding to a MRCA of ~ 110,000 years ago, in agreement with fossil remains of archaic Homo sapiens out of Africa". 534 SNPs obtained for CT. Apparently he meant haplogroup CT, as It is connected with the output from Africa.

To support Birko's comment, it is worth highlighting the insignificance of R2a-M124 in the Roma of Europe. Of several studies concerning their Y-DNA, I've only ever seen one confirmed sample (1/39 (2.6%), Tokaj Vlax Roma of Hungary).

The Sinti Roma of Central Asia do have an excess of R2a. Given the higher frequency of R2a in Central Asia, it's more likely in my opinion that the high frequency is nothing more than genetic drift on account of either an aboriginal Central Asian line or one from the Indian Subcontinent.

"On the Phylogeny Tree in the Francalacci et al. paper you can see that the Q lineage has a very short branch length. It only varies from the R-Q split by 7 SNPs. I still think this might be due to their filtering methods. The authors stated that they only considered SNPs valid if there were 2 instances of that SNP appearing. Since there was only 1 Q haplotype individual in their Sardinian sample, their filtering may have shortened the branch length."

In Fig.1 of Francalacci, the average no. of SNPs in Q is given as 13.8. Where did you get 7? R2a1 is 8.5. There are fewer SNPs between Q and the QR node than between R1 and the QR node, but it seems that R2 has even fewer. The whole picture is unusual in the sense that Q looks sandwiched between two Rs - R1 and R2.

I agree with you, though, that the diversity of Qs in America is much larger than in South Siberia suggesting a back-migration.

"the diversity of Qs in America is much larger than in South Siberia suggesting a back-migration".

Not necessarily so at all. The diversity in Siberia may have been greatly reduced by the reduction in habitat available during periods of intense cold whereas diversity in America is a product of the haplogroup's huge expansion there.

The chart in the Francalacci et al. paper is a little misaligned, and I can see why you thought the 13.8 is associated with the Q haplogroup, but if you look at the color coding, you can see that the 13.8 is the diversity in the R1a samples they had. The Q individual is a purple branch that is only 7 SNPs off from the Q-R split.

534 SNPs obtained for CT. Apparently he meant haplogroup CT, as It is connected with the output from Africa.

Vladimir,

The paper cannot say much about CT, because it does not have any C nor D or T samples. The 535 SNPs is indeed for the FR / DE split, measured on the FR side. For comparison, E is a quite similar 542. Apparently, E split from DE right at ooA, while F (including all descendants - so better FR or FT, in general) is the only survivor of likely many groups on that side.

"Whether it's from a Roma background or not I cannot say since we have yet to receive a Roma sample in the project. The only R2 among the Roma I'm aware of goes back to the Spencer Wells study where a small number of Sinti tested high for this haplogroup. But this is an old study so we're unsure if the lineage they carried was R2a1 or not."

Thank you, and this Sinti from Spencer Wells study were Sintis from Tajikistan, which itself has significant frequency of this Haplogroup. So I assume it was more a local thing. At least other studies on Roma show no presence of R2 but H and quite significant number of other local Haplogroups (depending on the country the tested individuals are from.

"Thank you, and this Sinti from Spencer Wells study were Sintis from Tajikistan, which itself has significant frequency of this Haplogroup. So I assume it was more a local thing. At least other studies on Roma show no presence of R2 but H and quite significant number of other local Haplogroups (depending on the country the tested individuals are from."

While the Sinti sample was taken from Central Asia. They were originally deported from Europe by the Nazis during the second world war.

Giving a hint isn't the Gedrosian/baloch component present in Sardinians?

The answer is no; although the Gedrosia component is present throughout western Europe, Sardinia is one of the places where this component reaches 0%. In my opinion European Gedrosia is associated with R1b, and apparently most R1b reached Sardinia late and didn't get as common as in western/central Europe.

I know they were deported from Europe but than since R2a is almost extinct among Europe Sinti and Roma but common among Central Asians, I assume that they got it somewhere there or it was a founder effect. The only Europeans with significant frequency of R2a are the Jews.

Dienekes wrote: "Another interesting finding of Poznik et al. is the discover of deep substructure within Y-haplogroup A-L419 (bottom of the picture)."

Actually it just shows what were once called A2 and A3, now A1b1a and A1b1b, which were just in the past year or so discovered by Thomas Krahn to be united by the L419 mutation.

When you consider that these are the most recent major downstream clades of A, with A1a and especially A0 being much deeper, yet widespread in some parts of Africa, it give you a clue how much their choice to use only those two subclades limits their TMRCA to a more recent timeframe than would have been the case with a more complete set of A haplogroups represented.

Old Blog Archive

Dienekes' Anthropology blog is dedicated to human population genetics, physical anthropology, archaeology, and history.

You are free to reuse any of the materials of this blog for non-commercial purposes, as long as you attribute them to Dienekes Pontikos and provide a link to either the individual blog entry or to Dienekes Anthropology Blog.

Feel free to send e-mail to Dienekes Pontikos, or follow @dienekesp on Twitter.