May 10, 2010

Origin and dispersal of Y chromosome haplogroup C (Zhong et al. 2010)

A beautiful new paper has appeared that tackles the distribution and substructure of Y chromosome haplogroup C, a widely dispersed lineage that binds Asia, Oceania, and the Americas. This will be invaluable as a resource for students of East Eurasian anthropology and genetics. My only problem with the paper is in its use of the evolutionary mutation rate that I have criticized elsewhere.

From the paper:

Hg C is prevalent in various geographical areas (Figures 1 and 2), including Australia (65.74%), Polynesia (40.52%), Heilongjiang of northeastern China (Manchu, 44.00%), Inner Mongolia (Mongolian, 52.17%; Oroqen, 61.29%), Xinjiang of northwestern China (Hazak, 75.47%), Outer Mongolia (52.80%) and northeastern Siberia (37.41%). Hg C is also present in other regions, extending longitudinally from Sardinia in Southern Europe all the way to Northern Colombia, and latitudinally from Yakutia of Northern Siberia and Alaska of Northern America to India, Indonesia and Polynesia, but absent in Africa.

On the structure of haplogroup C:

As shown in Figure 1, most of the subhaplogroups of Hg C have a geographically pronounced distribution. Hg C6, which is defined by a recently identified marker, was not detected in our samples. Hg C1 and C4 are completely restricted to Japan and Australia, respectively, and not detected in the other samples from East Asia and Southeast Asia. Hg C5 occurs in India and its neighboring regions Pakistan and Nepal. In mainland East Asia, four Hg C5 individuals were detected, including two in Xibe, one in Uygur and one in Shanxi Han. Although the dispersal of Hg C2 is relatively wide, its distribution remains limited to Oceania and its neighboring regions, except Australia. In our samples, only three Hg C2 individuals were observed in Eastern Indonesia, which is consistent with previous reports. Hg C3 is the most widespread subhaplogroup, which was detected in Central Asia, South Asia, Southeast Asia, East Asia, Siberia and the Americas, but absent in Oceania. Different subhaplogroups of Hg C that do not overlap between the regions suggest that these individuals have undergone long-time isolation. As these subhaplogroups have a common origin by sharing the M130-derived allele, their geographical distributions enable us to infer the prehistoric migration routes of this lineage.

The MDS plot is quite instructive. Notice the duality of Japanese C chromosomes, which parallels what we know about the dual origins of the Japanese. It would be instructive to test European haplogroup C outliers to see where they fall within haplogroup C diversity.

The C3 MDS is quite instructive, and shows quite well the distribution of C3 diversity in Chinese ethnic populations.

Off the top of my head, I detect a top-right Mongolian-Manchu-Tibetan quadrant (note that Mongolians and Tibetans are also linked by the rare haplogroup D) and a left Central/Southern Chinese ethnic quadrant. Notice the closeness of Hani to Yi, which may validate the former's oral traditions.

Finally, getting back to the controversial issue of Y-chromosome age estimation, here are the dates proposed by the authors for the age of STR variation within haplogroups and their divergence times. In my opinion these are overestimates due to the use of the evolutionary rate.

A case in point is haplogroup C3b-P39; according to the authors' date, this ought to be related to the early arrival of the ancestors of Amerindians, but haplogroup C in the Americans has a strong relationship with Na-Dene speakers such as Athapaskans, and it seems to me that a late spread of this haplogroup is more consistent with its limited geographical distribution and strong linguistic associations.

Global distribution of Y-chromosome haplogroup C reveals the prehistoric migration routes of African exodus and early settlement in East Asia

Hua Zhong et al.

The regional distribution of an ancient Y-chromosome haplogroup C-M130 (Hg C) in Asia provides an ideal tool of dissecting prehistoric migration events. We identified 465 Hg C individuals out of 4284 males from 140 East and Southeast Asian populations. We genotyped these Hg C individuals using 12 Y-chromosome biallelic markers and 8 commonly used Y-short tandem repeats (Y-STRs), and performed phylogeographic analysis in combination with the published data. The results show that most of the Hg C subhaplogroups have distinct geographical distribution and have undergone long-time isolation, although Hg C individuals are distributed widely across Eurasia. Furthermore, a general south-to-north and east-to-west cline of Y-STR diversity is observed with the highest diversity in Southeast Asia. The phylogeographic distribution pattern of Hg C supports a single coastal ‘Out-of-Africa’ route by way of the Indian subcontinent, which eventually led to the early settlement of modern humans in mainland Southeast Asia. The northward expansion of Hg C in East Asia started ~40 thousand of years ago (KYA) along the coastline of mainland China and reached Siberia ~15 KYA and finally made its way to the Americas.

Thanks Natsuya. I've just looked at the Supplementary Table 1 of this study, where detailed information about the populations included in this study (including their classification and geographic region) are found. According to it, the populations classified in this study as Altaic are Evenks (Ewenki), Manchus and Mongolians, and the populations classified in this study as Western Altaic are Uyghurs (Uygurs), Kazaks, Kyrgyz, Xibe (Xibo) and, surprisingly Hui (as they are predominantly Chinese speaking Muslims and none of them speaks an Altaic language). All of the Turkic speaking Western Altaics are from the Xinjiang region. So the westernmost Western Altaics on the MDS plots are those that live in Xinjiang.

The age estimates for both C-M8 derived (basically Jomon Japanese) which is 27-59 kya (when artifact dating supports about 30 kya) and the C3b-P39 for Native Americans of 9-20 kya both seem right on target so long as one sees them as confidence intervals rather than giving undue weight to the mean. C5 at 14-52 kya also seems right in line with where it should be, and is problematic if one goes much younger.

The key issue with the confidence intervals IMHO is to recall that the confidence intervals are not simply measure error bars, they are reflects of the fundamental inexactitude of the fundamentally probabilistic mutation rate that is being measured. There should be results all across the confidence intervals is the theory is sound; they are irreducibly uncertain. Indeed, one should expect about a third of the results to be outside of a one SD confidence interval and about one in twenty to be outside a two SD confidence interval.

The fact that a few mean estimates are significant out of whack with what we know to be more likely does not imply that the mutation rate has to be wrong.

What is so odd about hap C in NW Norway? Hap C is common in a lot of arctic or near arctic populations. If you look down on a globe instead of looking at a typical world map, this makes a great deal of sense.

Arctic populations adjacent to each other have been sharing genes with neighboring populations for millenia and it is only to be expected that some of those genes would end up in NW Norway.

The key issue with the confidence intervals IMHO is to recall that the confidence intervals are not simply measure error bars, they are reflects of the fundamental inexactitude of the fundamentally probabilistic mutation rate that is being measured.

Even if the mutation rate/model was known with exactitude there would be residual uncertainty about the age estimate, depending on the time scale of the process in question.

For example, if the mutation rate is 1/200 generations, then this is like using a sundial to time the 100m dash at the Olympics: it's simply not a very good clock.

As I have noted before there are multiple sources of error in age estimation, and the mutation rate is only part of the problem.

1: "Notice the duality of Japanese C chromosomes, which parallels what we know about the dual origins of the Japanese. It would be instructive to test European haplogroup C outliers to see where they fall within haplogroup C diversity."

So cool that we finally see evidence for a dual migration to Japan: an early Upper Paleolithic migration to Japan and a much more recent one.

2: "A case in point is haplogroup C3b-P39; according to the authors' date, this ought to be related to the early arrival of the ancestors of Amerindians, but haplogroup C in the Americans has a strong relationship with Na-Dene speakers such as Athapaskans, and it seems to me that a late spread of this haplogroup is more consistent with its limited geographical distribution and strong linguistic associations."

So, based on the datas of the Henn paper, I'm suspecting another dual migration of Na-Dene speakers. Early migrators mixed with later Na-Dene speakers. I'm not sure in which group and time Na-Dene y-haplogroup C arrived, although, for the purposes of understanding the y clock, it would be helpful to find out.

It is problematic to associate the Na-Dene with Eskimo rather than Amerind populations.

The Ket of Siberia whose language (Yenesian) is part of the same language family as the Na-Dene, is not ethnically or linguistically closely related to modern Eskimo (aka Inuit) populations.

The presence of the mtDNA C3b (P39) haplogroup in the Na-Dene imply a lack of genetic identity with the Inuit or other modern Eskimo populations (and a likely ancestral connection to South American tribes who have also in the case of one tribe shown an mtDNA X2 haplotype). Similarly, mtDNA haplotypes D2a in the Aleuts and Eskimos and D3 in the Eskimos are private haplotypes not found in the Na-Dene, and mtDna haplotype A2b found in the Inuit and Siberians but not the Na-Dene, also suggesting a lack of population genetic identity. The mtDNA A2, B2, C1 and D1 divergence dates are all similar and the alignment of those divergence dates and the divergence dates of the related Northeast Siberian clades basal to them also suggest a single episode of pre-Eskimo arrival for Native Americans. The fact that the Na-Dene, Inuits and Siberians share mtDNA type A2a "is probably due to secondary expansions of haplogroup A2 from Beringia long after the end of the LGM" according to Achilli (2008) (i.e. it back migrated, quite possibly due to introgression from Na-Dene into Inuit populations after the Inuits reach North America ca. 3000 BCE).

So using archeological evidence of Eskimo culture arriving ca. 3000 BCE to imply Na-Dene culture arriving at that time is problematic. There is indisputable evidence of modern human populations in the Americas from 13500 years ago in the Pacific Northwest, and I am aware of no archeological evidence that distinguishes a later date.

The Na-Dene have about 25% mtDNA X2a (a haplotype shared also by 15% of Algonquins, 10% of Sioux and 6-10% of other Northwest Coastal Indian groups, as well as with a variety of Asians (the only group to have X*, X1 and X2 in abundance are the Druze in Israel, so it probably originated in Asia).

Y-DNA Q (M242) is also present at about 25% in the Na-Dene and while unusual in being found in both the Northeast Asia and the Americas, has an estimated date of origin consistent with being part of the original migration.

The migration of the NaDene from the Pacific Northwest to the Navajo areas in the American Southwest dates to about 1000 AD, but this does not mean that the Na-Dene arrived later in time than other early Native Americans.

A more plausible hypothesis is that there were multiple tribes in Beringia, that the Na-Dene tribe was the only one with a substantial Siberian component (perhaps the leader of other groups) and that some daughter tribes of those tribes took a Pacific route, while others an Atlantic route, with the Na-Dene daughter tribes being among those taking the Atlantic route but not among those taking the Pacific route. It is also appears fairly likely from the archeology that the Clovis (whose archeological evidence is more abundant in the East than in the West and generally clines from East to West) may correspond to the Atlantic route rather than the Pacific one; and that Na-Dene daughter tribes may have been among the Clovis.

The fact that the Na-Dene may have closer ties to modern Siberians than some other tribes of Native Americans may be a product of the other tribes being more distant (from the East and Southeast Asia) and may also be a reflection of the Asian relatives of other early American Indian tribes being completely destroyed, while the Ket have merely been reduced to a few thousand people but survived.

1. Note to Aaron and Andrew: the black in "NW Norway and those northern islands in the arctic circle" is not part of the gradient shading that represents Y-DNA haplogroup C, but rather part of the black outline that marks the boundaries of countries and the edges of bodies of land.

2. This study's reported frequency of Y-DNA haplogroup C in Polynesia is much too low. The average for Polynesia as a whole should be greater than 50% C2; some Polynesian populations, such as the Maori of New Zealand, have only about 40% Y-DNA haplogroup C2, but half of present-day Maori males appear to be patrilineally descended from recent European males, as they belong to haplogroups R1-M173, I-M170, etc. If the European influence were excluded, over 80% of the Maori should belong to haplogroup C2.

3. This study's small sample of Manchus from Heilongjiang Province of northeasternmost China contains a much greater percentage of Y-DNA haplogroup C3 than previous studies' samples of Manchus, most of which have been obtained from the province of Liaoning.

4. The frequency of haplogroup C (essentially C3*-M217 in this case) in samples of Kazakhs has varied from about 23% (Karafet et al. 1999) to about 75% (present study). Kazakhs need to be more thoroughly sampled.

I got so much laugh out of "Han Chinese cultural demic defusion". Bing Su tried to tone down his usual Chinese nationalism but not successfully.(I suppose that "cultural" was inserted just to avoid the protests from Koreans.)

Among his other masterpieces is "Han Chinese Neolithic" apparently referring to Hongshan culture of Inner Mongolia, a place Chinese were forbidden to enter until the 20th century.

If you posted a comment on this yesterday it may have been lost in a moderation accident. Unfortunately I have had to start moderating comments because of the appearance of certain trolls, but I deleted a bunch of comments by accident when I meant to delete a single one. So, if it's not too much trouble, post your comment again.

Li Jin's own massive data had the percentage of C130(RPS4Y711T) among Han Chinese at 2 percent for southern Chinese and 4 percent for northern Chinese.The discrepancies(with this study) cannot all be attributed to sampling.Every effort appears to have been expended in order to make it appear that Han Chinese were the source, not the recipient of genes in East Asia.

As I have mentioned in my third point above, the present study's sample of Manchus from Heilongjiang Province reveals a much higher proportion of Y-DNA haplogroup C, but the sample size is so small that I am unsure of the statistical significance of this difference:

I got so much laugh out of "Han Chinese cultural demic defusion". Bing Su tried to tone down his usual Chinese nationalism but not successfully.(I suppose that "cultural" was inserted just to avoid the protests from Koreans.)

---

I also don't agree with author's statement of "Han Chinese cultural demic defusion" for explantion of Hg C3d's Southward migration, that should be related to pre-historic microlithic culture expanding from the north to the southwest of China.

"Interestingly, besides Hg C1, Japanese also have M217-derived individuals who have a close relationship with the Han Chinese ( Figures 3a and b ), rather than with the Altaic-speaking populations."

I really wonder when these C3* moved from China to Japan. It's possible that part of C3* were brought to Japan by Microlithic hunter-gatherers, and part of C3* were carried to Japan by early farmers from the coastline of China. And how would Ainu C3* fit in the picture?

"I got so much laugh out of "Han Chinese cultural demic defusion". Bing Su tried to tone down his usual Chinese nationalism but not successfully.(I suppose that "cultural" was inserted just to avoid the protests from Koreans.)

"I also don't agree with author's statement of "Han Chinese cultural demic defusion" for explantion of Hg C3d's Southward migration, that should be related to pre-historic microlithic culture expanding from the north to the southwest of China."

Pre-historic microlithic cultural expansion from the north to the southwest. . . Can you expand on this?

Can you help us with your thoughts on dates and motivation for some of these expansions?

"I really wonder when these C3* moved from China to Japan. It's possible that part of C3* were brought to Japan by Microlithic hunter-gatherers, and part of C3* were carried to Japan by early farmers from the coastline of China. And how would Ainu C3* fit in the picture?"

I wonder about these questions too.

In so many ways, the difference between Korean, Japanese, Vietnamese, Thai, Han Chinese and Tibetan culture, to name a few, seems thousands of years old.

In many ways, some of the practices of some Asian cultures seem closer to some North American indigenous groups than to each other.

I would guess so. You'll have to ask Underhill et alii if you want to obtain more details about the Maori cases of R1-M173, though, because they have not published any Y-STRs for any of the Maori individuals whose haplogroup has been "considered of European heritage."

"You'll have to ask Underhill et alii if you want to obtain more details"

Well, that's fine I think. The Maori, and anybody else, are entitled to a little bit of privacy, I think.

I'm really asking because for years I've been puzzled by the European and Asian looking features in indigenous Americans. Even before DNA, I've always thought that it couldn't be only due to recent admixture. So I'm wondering if there will ever be a way to distinguish recent and ancient admixture in American indigenous people.

"The frequency of haplogroup C (essentially C3*-M217 in this case) in samples of Kazakhs has varied from about 23% (Karafet et al. 1999) to about 75% (present study). Kazakhs need to be more thoroughly sampled."

I should not have put an asterisk in C3-M217 in this context. Please excuse this silly mistake of mine.

According to Wells et al. 2001 and Zerjal et al. 2002, most Kazakhs in Kazakhstan (or at least some part of it) belong to the subclade C3c-M48. This contrasts with the present study's sample of Kazakhs from Xinjiang, most of whom have been placed in C3*-M217.

"In sum, the evidence points to the following: (1)an ancestral G allele at the DYS257 STS, (2) a single origin for the DYS257-A allele in human evolution, and(3) the occurrence of both an A>G transition and a G>A reversion at the SRY10831 site. This informationconclusively demonstrates that the root of the tree is haplotype 1A and resolves the reticulation in the haplotype network.

This mutational event defines the origin of haplotype 1B from its recursor, 1A. The second (G>A reversion)mutational event at this site, representing the evolution of haplotype 1D from 1C, did not occur until more than 100,000 years later. The estimated age of the YAP insertional event marking the origin of the YAP1 clade was 5,000 6 19,000 years. Subsequent mutations on this lineage occurred between approximately 11,000 and 31,000 years ago."

If either SRY10831.1 or 10831.2 are negative at the site, the haplotype is not B/BT, i.e., African/khoisan, but European R1a.

Old Blog Archive

Dienekes' Anthropology blog is dedicated to human population genetics, physical anthropology, archaeology, and history.

You are free to reuse any of the materials of this blog for non-commercial purposes, as long as you attribute them to Dienekes Pontikos and provide a link to either the individual blog entry or to Dienekes Anthropology Blog.

Feel free to send e-mail to Dienekes Pontikos, or follow @dienekesp on Twitter.