August 17, 2010

mtDNA relics in south China

In a past post I had noted that the occurrence of low-frequency divergent haplotypes in a population might be a "relic of a bygone age". The point I was trying to make is that early settlement in a region may create a diverse gene pool (as there is plenty of time for variation to accumulate), but this antiquity of settlement may be obscured by later (including fairly recent) expansions of sublineages that appear to be young in evolutionary terms.

Hence, the importance of outliers in age estimation, as these may alternatively be "relics" of the most ancient population (prior to the expansion, due to either selection or demographic increase, of the recent lineages), or introgressed lineages from abroad.

In order to discover outliers, you need a large sample. The authors of this paper, in the context of mtDNA, discovered 5 new basal (=near the trunk) lineages within Eurasian macrohaplogroups M and N. This is less than 0.1% of their huge Chinese sample. In a smaller sample, as is customary in most mtDNA studies, these outliers would probably have been undetected.

What is most interesting, is that the authors explicitly tried to distinguish between the two competing hypotheses described above: admixture and "relics". The new lineages do not appear to be the result of foreign admixture (e.g., some rare Indian M subclade that somehow found itself into southern China), but to be true relics.

The existence of relics pushes back the time of settlement/Out of Africa expansion, as more time is needed to "tie in" the relics with the rest of the tree.

This should serve as a warning for age estimation: so many times, peculiar lineages are brushed aside with a paragroup label as oddities, while researchers focus on the more established and phylogeographically informative lineages. While full-mtDNA sequencing is a viable option, the same procedure is not widely-applied in Y chromosomes, as the Y chromosome is much larger than mtDNA, and hence more difficult (and expensive) to fully sequence.

A 6,000-strong sample is probably not available for most countries and populations, except for the Genographic project -which seems to be missing in action of late. There are also large commercial samples which benefit from the desire of paying customers with unusual haplotypes to look deeper into their ancestry. Unfortunately these same customers are WEIRD, and give us little information about most of mankind, including about the most interesting and mysterious aspects of human prehistory.

Nonetheless, there is hope for the future, as sample sizes continue to increase and genotyping costs to decrease. While there is reason to share Craig Venter's bleak assessment of the accomplishment of genomics, the single, clear, field where human genetics has triumphed and will continue to triumph is that of human origins.

UPDATE: Gene Expression notes that commercial companies like 23andMe have even larger samples, and customers can download 550k SNPs for their sample. However, most of the people who buy 23andMe tests are -in the global context- near clones of each other, being predominantly of western European origin. Moreover, the thousands of SNPs included in the technology used by 23andMe include a limited number of mtDNA and Y chromosome SNPs which have been chosen for their informativeness, i.e., they define studies clades of the phylogeny, and are thus unsuitable for discovering new clades -as was done in this paper. I'm pretty sure there are paragroups a-plenty in both the 23andMe customer base or in the Genographic Project, but, as far as I know neither of the two aggressively mine their data for SNP discovery/phylogeny refinement, and there are ethical limitations to consider, as people who sign up for either service do not, necessarily approve of their DNA sample being used beyong the narrow scope of the provided service.

Molecular Biology and Evolution, doi:10.1093/molbev/msq219

Large-scale mtDNA screening reveals a surprising matrilineal complexity in East Asia and its implications to the peopling of the region

Qing-Peng Kong et al.

In order to achieve a thorough coverage of the basal lineages in the Chinese matrilineal pool, we have sequenced the mitochondrial DNA (mtDNA) control region and partial coding-region segments of 6,093 mtDNAs sampled from 84 populations across China. By comparing with the available complete mtDNA sequences, 194 of those mtDNAs could not be firmly assigned into the available haplogroups. Completely sequencing 51 representatives selected from these unclassified mtDNAs identified a number of novel lineages, including five novel basal haplogroups that directly emanate from the Eurasian founder nodes (M and N). No matrilineal contribution from the archaic hominid was observed. Subsequent analyses suggested that these newly identified basal lineages likely represent the genetic relics of modern humans initially peopling East Asia, instead of being the results of gene flow from the neighboring regions. The observation that most of the newly recognized mtDNA lineages have already differentiated and show the highest genetic diversity in southern China provided additional evidence in support of the Southern-Route peopling hypothesis of East Asians. Specifically, the enrichment of most of the basal lineages in southern China and their rather ancient ages in Late Pleistocene further suggested that this region was likely the genetic reservoir of modern humans after they entered East Asia.

8 comments:

Divergent haplotypes, specially in the Y, where SNPs constraint us to a limited range of time, can be due also to what I have defined mutations for the tangent more than those around the modal. In the mtDNA the SNP are "mutations", all countable and all counted.

"The observation that most of the newly recognized mtDNA lineages have already differentiated and show the highest genetic diversity in southern China provided additional evidence in support of the Southern-Route peopling hypothesis of East Asians."

The same applies to the various M*, N* and R* that were earlier reported as frequent in South China. But how in the world can it be consistent with the Southern route hypothesis? For the out-of-Africa-along-the-southern-route hypothesis to be true, we need to find those basal lineages in East Africa, North Africa, Southwest Asia, India and only THEN in South China. As of now, we have a huge geographic gap between L3, on the one hand, and M and N areas, on the other, which pretty much falsifies the out-of-Africa-along-the-southern-route hypothesis.

The reason I'm calling it "the out-of-Africa-along-the-southern-route hypothesis" is that this is the analytical unit that needs to be proven. Scholars tend to break it down into the "out of Africa" idea, which is supposedly proven, and a couple of possible migration routes out of Africa, which are under discussion. What this paper has reminded us of once again is that, in the absence of those basal non-African lineages in areas directly adjacent to Africa, out-of-Africa remains a hypothesis in need of a proof.

This study also suggests that India may not constitute the center of dispersal of M lineages (as previously argued by some researchers), but East Asia/SE Asia is in fact a center of dispersal for both M and N macrohaplogroups. What we currently have are two/three geographically structured mtDNA clusters with roots in the Mid-Late Pleistocene: L that expanded in Africa and M/N that expanded in East Asia.

"While there is reason to share Craig Venter's bleak assessment of the accomplishment of genomics, the single, clear field where human genetics has triumphed and will continue to triumph is that of human origins."

Just the opposite: per above, human origins research is one of the clearest demonstrations of the truth behind Venter's bleak assessment of genomics' accomplishments to date.

As to M and N originating from southeast Asia rather than from India, given the huge reservoir there before much of it flooded, there is surely a strong possibility of a mixture of both (i.e., India plus back-migration).

...our observations ... raise a possibility that southern (especially southwest) China was probably the genetic reservoir of modern humans when they first populated East Asia. However, one has to admit that the lack of extensive data from Southeast Asia, especially for the some key regions e.g. Myanmar, make any precise localization of this initial scenario at best tentative.

I don't think the authors have really shown anything that supports a role of southern China as a pool. Rather (in line with their mentioning of Myanmar, and as I have suggested numerous times), once again data suggest that in addition to the very southern route, there is a strong possibility that another group of people migrated (just ever-so-slightly later) via the Brahmaputra valley into northern Myanmar and into the adjacent regions of China, from there.

I know it goes against all mainstream science, but If we see a map of the most densely populated regions in the world we would put the craddle of humanity somewhere in India or China.

If we think that civilization is the accumulation of wealth and knowledge and it gives better chances for population growth, to me is only natural to look for our origins in the most densely populated areas of the world as well as the craddle of ancient civilizations.

Old Blog Archive

Dienekes' Anthropology blog is dedicated to human population genetics, physical anthropology, archaeology, and history.

You are free to reuse any of the materials of this blog for non-commercial purposes, as long as you attribute them to Dienekes Pontikos and provide a link to either the individual blog entry or to Dienekes Anthropology Blog.

Feel free to send e-mail to Dienekes Pontikos, or follow @dienekesp on Twitter.