search this blog

Tuesday, January 26, 2016

Four major ancestries in mainland India

PNAS has just released a new paper on the population history of India. It's not a bad effort, but very speculative and not particularly insightful, mainly because it doesn't include any ancient DNA from South Asia. Let's be honest, nowadays, if you want a really hard hitting paper of this sort, you need some ancient DNA. It's open access. Here's the abstract.

India, occupying the center stage of Paleolithic and Neolithic migrations, has been underrepresented in genome-wide studies of variation. Systematic analysis of genome-wide data, using multiple robust statistical methods, on (i) 367 unrelated individuals drawn from 18 mainland and 2 island (Andaman and Nicobar Islands) populations selected to represent geographic, linguistic, and ethnic diversities, and (ii) individuals from populations represented in the Human Genome Diversity Panel (HGDP), reveal four major ancestries in mainland India. This contrasts with an earlier inference of two ancestries based on limited population sampling. A distinct ancestry of the populations of Andaman archipelago was identified and found to be coancestral to Oceanic populations. Analysis of ancestral haplotype blocks revealed that extant mainland populations (i) admixed widely irrespective of ancestry, although admixtures between populations was not always symmetric, and (ii) this practice was rapidly replaced by endogamy about 70 generations ago, among upper castes and Indo-European speakers predominantly. This estimated time coincides with the historical period of formulation and adoption of sociocultural norms restricting intermarriage in large social strata. A similar replacement observed among tribal populations was temporally less uniform.

I find it interesting that there is no discussion of what happens when they try to add an additional ancestral population to the model. Also, there are two pretty novel findings: (1) the arrival of strict caste endogamy across India at 70 generation ago, although the 22.5 year generation assumed seems to be driven by a historically plausible narrative rather than direct evidence for such a short generation, and (2) the Andamanese really cluster strongly towards Papuans and not with mainland ASI to nearly the extent previously assumed.

It is also notable that the samples are top and bottom heavy. Lots of Tribals, lots of upper caste, nobody in the middle.

I just started doing some PCA with ancient samples (putting them on map made from contemporary individuals) and have some problems and questions, with which I believe I am almost there, but can't jump over it. To make things interesting I'm working with Roman Iron Age samples from Poland - full autosomal genome sequencing.

Looking closely at the South Asian components in calculators like HarappaWorld, they appear to be ~50% something CHG like (a cross between Urkarah and Iraqi-Arab for example, with lots of East African), 25% Mongolia/Oroqen-like and 25% Pacific Islander-like (Tongan/Samoan).

I'd say the most novel finding is the split between Ancestral South Indian and Ancestral Austro-Asiatic components. Most populations that aren't dominated by one or the other seem to have both in roughly equal proportions, e.g. Marathas have 21% of each, West Bengal Brahmins have 10% of each, Iyer have 11% ASI and 8% AAA.

If this result is legit, it would mean that Austro-Asiatic ancestry was spread all over India. That's possible, there is quite a bit of Southeast Asian Neolithic and earlier influence in South Asia.

I'm not convinced that the components are real, though. For upper/middle caste groups the admixture times for ASI and AAA components are identical or nearly so, indicating they come from the same source. Tibeto-Burman tribals from the Northeast and Gonds from Central India have different admixture times for these components, but they have existing Austro-Asiatic neighbours that they have actually mixed with.

There is also the Ancestral Tibeto-Burman component, which might well represent some genuine admixture in Austro-Asiatic tribals, since the admixture date for ATB is a good bit later (10+ generations) than for ASI.

-Admixture is again very sensitive to how admixed the least-admixed representative of a component is. Paniya are not pure ASI, Birhor are not pure Austro-Asiatic, etc. Look at the broader PCAs, "ASI" is pulled toward Onge. True ASI was probably Onge's ancestor's cousin. You can even see this effect within the study by comparing the South Asia-only Admixture on SI pg 3 vs the Admixture with East Asia on SI pg 19. On pg 3 the Tibeto-Burmans don't show any Austro-Asiatic component, but on pg 19 they all have it. Both Admixtures would show less True ASI or True AAA than they currently show "ASI"/"AAA". Quite possibly A LOT less.

-On the PCA, Tibeto-Burman form a pretty straight line, not a spray, from E Asia towards "ANI", not towards "ASI". "AAA" Admixture component is quite consistent among T-B pops. This all matches history for the region (AA first, then T-B intrusion, then I-A), if the T-B was a lot of replacement and I-A was elite.

-In the "ANI"s, "ASI" and "AAA" correlate extremely well, even as far west as Marathi and Gujarati and as far south as Pallan and Iyengar. Perfecting the East Asian component from pg 3 to pg 19 doesn't change this, which makes me think it's real and not an Admixture illusion. This, and the endogamy thing, makes me think "ANI" encountered and admixed with BOTH groups quite early on.

-Wow, even though I think true ANI is being underestimated, we still have low-caste Tamil speakers with majority "ANI". That and the old Reich (I think?) result that admixture was older in the South than the North, and I have to think Dravidian is an ANI phenomenon and has history much further north. If we dare to draw a parallel to the EEF expansion, the expansion itself didn't see much admixture, it was instead thousands of years of geographical proximity post-expansion that was a little additional HG.

-Accordingly, I'm adjusting my Harappa prediction. Old prediction was mostly (CHG relative) plus some (Paniya-CHG). New prediction: Harappa will look overwhelmingly (CHG relative) with some (Paniya-CHG) and some (Birhor-CHG).

Well Surely there were some Non-IE speaking folks residing before Indo-Europeans came to India, its very difficult to show who were they, the best guess is Munda related but it can't be proven. However, The IE came there well before 2000 BC ...Harappa dna will have ANE, be sure of that.

And be sure of also that the difference of Indian population before and after the 4.2 kyo event will be huge, the Sindhu-Sarasvati population of the north started to migrate from the late mature harappan period , which was the start of the formation of the Indian cline that we mostly see now.Actually , every time there is an event there seems to be a migration.

Yeah, and looking into it, the young age of Austroasiatic definitely complicates things. So I'm thinking of other possibilities:

-maybe there was structure in the HGs and Admixture is having difficulty picking up on the northwest HG cuz it was more thoroughly diluted.-maybe I'm overthinking and Admixture really just likes to turn clinal endpoints into components.

As for the paper, I agree with Chad that it is an ADMIXTURE-driven bogus paper. It uses the wrong methodologies to come to its conclusions and misinterprets its analysis results. The only good thing about the paper is its new samples.

This fits perfectly with what we know of the linguistic history of India. Although, it is convenient to say that there are only two "major" language groups in India, all major languages have components from the other two (Indo tibetan and Austro-asiatic (Mundari) families. Take my own mother tongue Malayalam which is believed to have a top to bottom Sanskrit-Dravidian-Munda structure. However certain words are shared even with chinese and other Indo-tibetan family (eg the word for Intimate "you" equivalent to Hindi "tu" is "ni" same as in Chinese and even some of the north eastern indo-tibetan-burmese languages. Our founding myth is shared with some relatively pure austro-asiatic munda tribes (the story of Mahabali and Onam)The austro-asiatic and Dravidian (ancestral south indian) is a real split supported on linguistic cultural surnames and other evidence and very obvious to us who are south Indians. The real aberration was that Reich et had failed to show this. Any one dismissing this is at best ignorant, or racist at worse

I think that this paper is quite valuable since it is putting the Reich et.al. theory at rest that south asian populations are just a simple mix of two ancestral populations namely ANI and ASI by bringing in the genuine and equal role of Ancestral austro-asiatic and Ancestral tibeto-burman in the genetic admixture history of south asia particularly india.

The results clearly show that ASI is not more significant contributor to the genetic heritage of most of south asia as the Ancestral austro-asiatic. I had always suspected merely from the anthropology that both Austro-asiatic and tibeto-burman ancestral populations have a more dominant role in the archaic east-eurasian admixture of northern half of south asia as compared to the role of genuine ASI which has played more important role in the case of archaic east-eurasian admixture in the peninsular southern indian populations.

No one dismissed anything. Just because of their recent East Eurasian (Mongoloid) origins, the Tibeto-Burman and Austro-Asiatic populations of India do not fall on the ANI-ASI cline, their native Indian genes are no different from those of Indo-Aryans and Dravidians. Austro-Asiatics and Tibeto-Burmans are late comers to India from East Eurasia, they are not some Paleolithic or Mesolithic remnants, no mystery about them.

Also, neither ANI nor ASI exist in pure form today, some ADMIXTURE components based on modern populations do not equal to them. It is even doubtful ANI has ever existed as a single population, apparently multiple populations contributed to the mix we now call "ANI".

Now I see where the problem lies. The non-south asian researchers are much more interested to interpret the genetics of modern populations in terms of archaic genetic components whereas it seems like the indian authors share no such interest. Indian researchers just seem to bother only about modern population and their most recent connection to each other, this may also explain the interpretation of their results in their paper.

I don't find anything wrong with the results of this indian paper they have just based their paper strictly on the modern populations and trying to understand their inter-relationship in current time and space. I find it actually more problematic when some amateur genetic enthusiasts run admixture by throwing in ancient and modern populations simultaneously which always messes up the results. At least when the admixture is based entirely on modern population to generate distinct components then the the time frame by default i.e. the current time. Why people never ensure that the time frame is same when they use ancient populations to generate admixture components. I always see bronze age, copper age, neolithic , mesolithic and even some modern populations like Bedouine, paniyas etc. all thrown once in the admixture to generate the distinct components. I think the time line of all the samples used to generate components in an admixture run should always be same for reference samples.

Reich et al didn't ignore Austro-Asiatic and Tibeto-Burman ancestry, they noted that these people had distinctive East Asian ancestry that pulled them off the ASI-ANI cline. They modelled Indians as a mix of just two ancestral populations because it was a necessary simplification to make the method work, but they note that in reality there would probably have been many different populations.

Whether the Austro-Asiatic component in non-East Indians found by this study is real remains to be seen. ADMIXTURE by itself is not particularly reliable in determining such things, and as noted above the ASI and AAA components in non-East Indians were not distinguished by linkage disequilbrium. Not that it is an easy thing to distinguish.

However, Austro-Asiatic speakers are distinguished by very high frequency of Y haplogroup O2a1-M95, which is quite rare outside of East India. Y haplogroups don't necessarily track autosomal ancestry of course, but this still casts some doubt on the idea that true Austro-Asiatic ancestry is really widespread and important.

"(2) the Andamanese really cluster strongly towards Papuans and not with mainland ASI to nearly the extent previously assumed".

To me that is the most interesting aspect of the paper. It demonstrates that it is unlikely the ancestors of the Andaman/New Guinea population had any ancient connection to India. Or, if they did, that contribution has been obliterated by later migrations.

"I'd say the most novel finding is the split between Ancestral South Indian and Ancestral Austro-Asiatic components".

Didn't we always know that at least the Austro-Asiatic language was intrusive, and from SE Asia?

"However, Austro-Asiatic speakers are distinguished by very high frequency of Y haplogroup O2a1-M95, which is quite rare outside of East India. Y haplogroups don't necessarily track autosomal ancestry of course, but this still casts some doubt on the idea that true Austro-Asiatic ancestry is really widespread and important".

I would suggest that genetic expansion is often led by Y-DNA expansion although the Y-DNA tends to get diluted at the margins of such expansions. It is quite possible Y-DNA O2a1 has become replaced at the margins of its original expansion. We can also be quite sure that Y-DNA often expands much further than do the language it carries.

"There is also the Ancestral Tibeto-Burman component, which might well represent some genuine admixture in Austro-Asiatic tribals"

The Tibeto-Burman group looks originally totally independent of the Austro-Asiatic group. It is even characterised by different Y-DNA: D1 instead of O2.

"This all matches history for the region (AA first, then T-B intrusion, then I-A), if the T-B was a lot of replacement and I-A was elite".

Northeast Indian Tibeto-Burmans actually tend to have little D1, no more than a few percent, and even then some of it could be ancient in the area. Their typical Y haplogroup is O3a2c1(a). Austro-Asiatics of East India have virtually no O3 (Khasi from Northeast India have a good deal, but that is to be expected). Northeast Indian Tibeto-Burmans on the other hand have quite a lot of O2a1, in agreement with the above order AA > TB.

However, this paper finds some of the TB component in A-A tribals from East India. This could represent some genuine TB ancestry from more recent times, or shared ancestry from the indigenous population of Northeast India, or it could be merely an artifact of admixture.

A significant number of scholars have argued that A-A originated in East or Northeast India, but I agree with those (probably the majority) who favour Mainland Southeast Asia (or perhaps Southern China). As you know O2a1 is extremely unlikely to be native to India.

Their evidence that Onge is closely related to Papuan is the position on a PCA. That means something, but not necessarily very much.

Compare the D statisticsGorilla, Onge; Dai, Papuan -0.0341 -8.57Gorilla, Papuan; Dai, Onge 0.0062 1.761showing Onge is much closer to East Asians than to Papuans, and Papuans are only marginally (if at all) closer to Onge than to East Asians. The first statistic is exaggerated by the Denisovan in Papuans, however.

In a paper from some time ago Reich et al suggested that Australians/Papuans were about half something distantly related to Onge and half something quite divergent carrying Denisovan admixture. The very distant connection of Onge with Papuans is not evidence against a distant relationship of Onge with ASI, in any case.

"Northeast Indian Tibeto-Burmans actually tend to have little D1, no more than a few percent, and even then some of it could be ancient in the area. Their typical Y haplogroup is O3a2c1(a)".

I agree that Tibeto-Burman speakers included O3a when they moved in. It is very seldom that any migration would include just one haplogroup. And I agree that D1 is not common in India. However I doubt very much its presence there pre-dates the arrival of Tibeto-Burman speakers.

"Northeast Indian Tibeto-Burmans on the other hand have quite a lot of O2a1, in agreement with the above order AA > TB".

Yes Some O2as adopted the incoming language. That is quite normal.

"A significant number of scholars have argued that A-A originated in East or Northeast India, but I agree with those (probably the majority) who favour Mainland Southeast Asia (or perhaps Southern China)".

I agree that the argument for an Indian origin for Austro-Asian has long been disproved.

"Their evidence that Onge is closely related to Papuan is the position on a PCA. That means something, but not necessarily very much".

Yes. Papuans and Onge are hardly 'closely related'. Their haplotypes are very different from each other. I personally feel there was a huge amount of toing and froing in SE Asia/South China for a very long time. Populations of various origins in the region have become well and truly mixed up, however none indicate any Indian origin except obviously Y-DNA F derived haplotype must have arrived there from India. However it may have been mainly the Y-DNA that moved, spreading gradually through a pre-existing population.

"showing Onge is much closer to East Asians than to Papuans"

Makes sense considering the Onge predominant Y-DNA D. But that in turn destroys the claim they are a remnant of the first coastal migration'. They came south through Burma before being able to reach the islands. And that was considerably after humans had reached both Australia and New Guinea, probably in that order.

"Papuans are only marginally (if at all) closer to Onge than to East Asians".

Yes, there has been a major move south of East Asians since the Early Paleolithic and they have largely replaced or interbred with the pre-existing SE Asian population. The resulting mixture missed most of New Guinea on its movement out into the Pacific in the form of the Austronesians. But the first Andamanese probably included some element of it.

"The very distant connection of Onge with Papuans is not evidence against a distant relationship of Onge with ASI, in any case".

But it is certainly not evidence for it. In fact it appears to indicate the opposite.

"This all matches history for the region (AA first, then T-B intrusion, then I-A), if the T-B was a lot of replacement and I-A was elite".

I don't think this order is right. AA precedes I-A, but probably by only a few centuries given the relative dates of the appearance of rice agriculture as a litmus test for AA and the appearance of cultural litmus tests for I-A respectively.

T-B is probably the more recent layer by thousands of years - if it wasn't, the geographic distribution of T-B would have much less sharp boundaries and would be far more dispersed. It could be that I-A never made it to some of the corners of the NE subcontinent were T-B is found, with it going directly from Munda to T-B in some areas, but any way you cut it, T-B should be the youngest by a wide margin.

Reich had already oversimplified European genetitcs in his CHG-EEF-ANE approach, as we have been learning last year. South Asia's population history is surely even more complicated than Europe's, given earlier arrival of AMH, less restriction on human mobility during the ice ages. and central position for sea travel around the Indian Ocean. As such, I never believed a simple ANI-ASI model could explain Souath Asia's complexity. It has, however, helped to point out some general structural pattern that now needs to be complemented by further details, of which there should be lots of.

"the word for Intimate "you" equivalent to Hindi "tu" is "ni" same as in Chinese "This leads us quite deep into prehistory. Pronouns are the most stable words. The half-life of "thou" (intimate you) across seven Eurasian language families has been calculated at 10.8 ky, after "I" (77 ky) and "we" (18.7 ky). Similarly stable are only interrogatives (who/where/what/how).http://www.pnas.org/content/110/21/8471.full.pdf

A screening suggests the following major pronominal families, restricted to "I"-"thou", as plurals are tricky for inclusive/exclusive distinction, duals etc. Forms have been taken from p.40ff of link 1 and p.254ff of link 2, amended, where necessary, from ASJP Proto-language wordlists (http://asjp.clld.org/), marked by "*". "?" marks families w/o reconstruction to proto-level. Vowels may change, the focus is on the consonants.http://www.academia.edu/5789756/Once_again_on_the_comparison_of_personal_pronouns_in_proto-languageshttp://www.merrittruhlen.com/files/Pronouns.pdf

Interestingly, many families and even languages show several of these patterns. AN, e.g., includes all four, Thai combines (1),(2) and (3). Apparently, we are dealing with deeply-rooted, possibly paleolithic substrate that was gathered up by newcomers.

"This all matches history for the region (AA first, then T-B intrusion, then I-A), if the T-B was a lot of replacement and I-A was elite" was meant to be specific to the far northeast, yeah for the subcontinent as a whole T-B is a newcomer. I'm no expert in the northeast's history, but Wiki has Brahmins arriving in Manipur in the 15th C.

Chad and Onur,

Of course you can't read too much into Admixture. But I was skeptical about Teal and that turned out meaningful. If the "AAA" component was purely an Admixture figment and most populations in the subcontinent can be modeled as (various West Eurasians) and true ASI, I don't think Admixture would've assigned so much "AAA" to the ANI pops across so much geography.

Razib thinks it's drift. I wonder if it couldn't also be structure in the subcontinental HG pops, and the West Eurasians mostly mixed with a HG pop that was related to but not exactly like the HG in Paniya nor Birhor.

Very true, though it's unclear how domesticated that pre-2500BC Ganges rice actually was and how vital it was to their subsistence. I suspect not all that much, because those pops were mobile, didn't deforest much, didn't leave much archaeology, and got mostly replaced, despite the fact that domesticated rice is very, very happy in the Ganges basin (unlike, say, Anatolian crops in Scandinavia). Also, wouldn't we then expect that "AAA" possibly-fake component to show up much stronger in the Bengalis vs the "ASI"?

I'll add that I always found the late deforestation there interesting. I've read several people claim the deforestation came with metals, but I have to think that's a correlate not a cause. Fire, girdling, etc. If the pre-metal Ganges peoples kept their forests I have to think it's because they wanted them.

Not very domesticated, seems to be the main view. Using natural wetlands rather than slash-and-burn, maybe. But still all other things being equal there ought to be increased population density and a greater genetic contribution relative to pure foragers.

There aren't any regular Bengalis in the study, only Bengali Brahmins who you'd expect to be much less admixed with the natives. But they do have more of the AAA component than Gujarati Brahmins.

It's possible, of course. What grabbed my attention wasn't the low absolute "AAA", it's the ~1:1 ratio of "ASI":"AAA" in the ANIs regardless of geography. Makes me think this admixture occurred early and northwest.

Austro-Asiatic populations of India are largely of native Indian stock rather than of Austro-Asiatic invader stock, that is why much smaller presence in non-Austro-Asiatic populations of India of an ADMIXTURE component that peaks in Austro-Asiatic populations of India does not indicate Austro-Asiatic admixture in non-Austro-Asiatic populations of India. In ADMIXTURE quantitative differences in the distribution of a component in many cases signal qualitative differences. For instance, only high percentages of Gedrosia-like components are a signal of some ASI admixture, in lower percentages such components do not signal any ASI admixture. And I agree with Chad that with formal stats we can make much better estimations of levels of admixture.

Here it's not about Admixture vs. formal stats. It's about how much you can do with modern DNA.

With the right ancient samples Admixture would do a fine job, but with modern samples we just cannot understand the past, regardless of the method used (admixture might be wrong in saying that Kshatriya are 98% ANI, just because they're the most ANI samples in the run, but qpAdm can model Kalash or Pathans as 83% Belarusian, 10% Georgian and 7% Dai with perfect score. So pick your poison).

"If the pre-metal Ganges peoples kept their forests I have to think it's because they wanted them".

I suspect the real reason is that the population was not particularly large. That is why much of the Indian megafauna has survived into modern times. Much the same is probably true of regions of South China and SE Asia where orang-utans, pandas and tapirs also survive.

Compare the D statisticsGorilla, Onge; Dai, Papuan -0.0341 -8.57Gorilla, Papuan; Dai, Onge 0.0062 1.761showing Onge is much closer to East Asians than to Papuans, and Papuans are only marginally (if at all) closer to Onge than to East Asians. The first statistic is exaggerated by the Denisovan in Papuans, however.

In a paper from some time ago Reich et al suggested that Australians/Papuans were about half something distantly related to Onge and half something quite divergent carrying Denisovan admixture. The very distant connection of Onge with Papuans is not evidence against a distant relationship of Onge with ASI, in any case.

These statistics are likely confounded by mongoloid admixtures in Onge and Papuans (newest study about Papuans show that most of them have neolihic East/Southeast Asian related admixtures but likely less than Onge), ASI/Onge related admixtures in Dai and archaic admixtures in both. Papuans have excess of Denisovan and I remember reading some study claiming that Onge have some unsamapled hominid admixture that's neither Neanderthal nor Denisovan.Both seem to have Ust'-Ishim and maybe Oase related admixtures.