The genetic formation of Central and South Asian populations has been unclear because of an absence of ancient DNA. To address this gap, we generated genome-wide data from 362 ancient individuals, including the first from eastern Iran, Turan (Uzbekistan, Turkmenistan, and Tajikistan), Bronze Age Kazakhstan, and South Asia. Our data reveal a complex set of genetic sources that ultimately combined to form the ancestry of South Asians today. We document a southward spread of genetic ancestry from the Eurasian Steppe, correlating with the archaeologically known expansion of pastoralist sites from the Steppe to Turan in the Middle Bronze Age (2300-1500 BCE). These Steppe communities mixed genetically with peoples of the Bactria Margiana Archaeological Complex (BMAC) whom they encountered in Turan (primarily descendants of earlier agriculturalists of Iran), but there is no evidence that the main BMAC population contributed genetically to later South Asians. Instead, Steppe communities integrated farther south throughout the 2nd millennium BCE, and we show that they mixed with a more southern population that we document at multiple sites as outlier individuals exhibiting a distinctive mixture of ancestry related to Iranian agriculturalists and South Asian hunter-gathers. We call this group Indus Periphery because they were found at sites in cultural contact with the Indus Valley Civilization (IVC) and along its northern fringe, and also because they were genetically similar to post-IVC groups in the Swat Valley of Pakistan. By co-analyzing ancient DNA and genomic data from diverse present-day South Asians, we show that Indus Periphery-related people are the single most important source of ancestry in South Asia — consistent with the idea that the Indus Periphery individuals are providing us with the first direct look at the ancestry of peoples of the IVC — and we develop a model for the formation of present-day South Asians in terms of the temporally and geographically proximate sources of Indus Periphery-related, Steppe, and local South Asian hunter-gatherer-related ancestry. Our results show how ancestry from the Steppe genetically linked Europe and South Asia in the Bronze Age, and identifies the populations that almost certainly were responsible for spreading Indo-European languages across much of Eurasia.

First Turk Empire

Though the abstract is focused on South Asia, the preprint actually has quite a bit about Inner Asia, because of the provenance of the samples. We often view the typical person in the past as a peasant in an agricultural society, and therefore relatively immobile over their lifetime. The story we like to tell ourselves is that non-elites in premodern societies, on the whole, had narrow horizons, delimited by their home village, or the neighboring network of villages.

But results from this work and others show that mobile populations where individuals spanned vast areas of Eurasia across their lifetimes, were not that uncommon for pastoralists. We know this historically, as empires such as that of the Turks and Mongols were defined by a ruling elite whose writ extended from eastern to western Eurasia. The Sintashta samples, which exhibit genetic heterogeneity, with some individuals very different from the norm in their settlement, is exactly what you’d expect from a social and political culture which was united in some fashion over huge distances.

As the sample sizes for ancient DNA have increased it seems rather clear that demographic dynamics that we see in later historical expansions of Inner Asian polities extends back to the Bronze Age. With expanding populations across the ecologically friendly landscape, the ancient proto-Indo-Europeans seem to have mixed with the local substrate wherever they went, just as Turks did later. As they moved west, they mixed with late Neolithic Europeans, as they went east, they mixed with Siberian populations, and as they conquered south they mixed with descendants of West Asian farmers.

One of the primary aspects that I think one needs to keep in mind is that one can’t just imagine that this was defined by simple diffusion dynamics. Historically the boundary between pastoralists and peasants could be fluid, but when political resistance collapses pastoralists have been able to use their military prowess to swarm across the lands of agriculturalists. In other words, centuries of gradual inter-demic gene flow might be interrupted by a rapid “pulse” admixture. There’s no reason that pre-literate polities couldn’t exist. The Inca were one such example, the homogeneity of the Uruk civilization in the 4th millennium BC is strongly suggestive of an imperial hegemony or paramountcy.

Another dynamic is that pastoralists are highly mobile, and so may leapfrog over territory which is unsuitable. Or, they may move so rapidly that there isn’t much mixing with populations in between point A and point B.

This is apparently the case with the Bactria–Margiana Archaeological Complex. These people were mostly descended from people related to the eastern farmers of West Asia, those in modern day Iran. Some of their ancestry had affinities with Anatolian farmers, and there is some evidence even of Siberian admixture in this region. But there are three important takehomes of this preprint in relation to this area 1) the BMAC did not contribute much genetically to South Asia at all, 2) steppe ancestry, related to that of the Yamna culture of the Pontic region, only shows up in BMAC ~2000 3) there is actually evidence of South Asian (Indus valley?) migration into the BMAC.

The fact that Yamna-like ancestry shows up in the BMAC region so late is a strong reason to suspect that Indo-Iranian peoples did not move to Iran and India until after 2000 BC. In earlier comments on this issue, I was rather vague about timing, because the Corded-Ware people show up in Europe before 2500 BC, and I was going along with the parsimonious idea that this was part of one single cultural and social revolution.

I was wrong. Going back to the Turkic analogy, there were multiple waves of migration and folk wandering by Turkic pastoralists. By different Turkic groups. One of the major ones occurred due to the rise of the Mongols, and the Mongols were not even Turks. The same seems to be true of Inner Eurasian Indo-European groups.

Moving on to South Asia, there are two primary constructs which come out of this preprint. “Indus Periphery” and “Ancient Ancestral South Indians.” I’ll call the former InPe and the latter is termed AASI. To some extent these complement and replace the earlier terms “Ancestral North Indian” and “Ancestral South Indian” (ANI and ASI). The AASI are the ancient hunter-gatherers of the Indian subcontinent. The authors suggested that divergence of this group from other eastern Eurasians occurred very early, that the division between the ancestors of the Papuans, Onge, and AASI was even polytomic (that basically separated very quickly without discernible structure).

The InPe samples are from eastern Iran and the BMAC. They’re unique in having AASI ancestry, at variable fractions (indicating contemporaneous admixture). They also resemble samples from Swat Valley which date to 1200 BC and later, with one major difference: the Swat Valley samples have steppe ancestry.

There are no samples from the Indus Valley proper, so the authors suggest that the InPe are reasonable proxies. Additionally, they assert that ASI can best be modeled as a mixture between InPe and AASI. In other words, there were two admixture events. Their Pulliyar samples are actually pretty good proxies for the resultant ASI, while the Kalash of Pakistan are good proxies for the ANI, who are presumably now modeled as a mixture of steppe populations with the InPe.

This resolves the enigmatic result that Priya Moorjani reported to me last year: less than 4,000 years ago “pure” ANI and ASI people existed. She was presumably going off admixture timing estimates. These results suggest that in some form ANI and ASI still exist, and the first admixture occurred with the creation of InPe.

Using a new method the authors contend that InPe emerged 4700-3000 BC. If this is true then the Indus Valley Civilization (IVC) was a compound of AASI and Iranian agriculturalists (sampled from the eastern end of the cline of admixture with Anatolians, that is, they had none of that ancestry). They also post the first arrival of agriculture to Mehrgarh by 2,000 years at the least. I suspect that it will turn out there were earlier admixtures, which are not being detected. For various ecological reasons the West Asian cultural complex was portable only to the northwest fringe of South Asia, and there it persisted for ~4,000 years. This served as a natural eastern limit for cultures which were migrating out of the West Asian zone, and a point where AASI hunter-gatherers constantly mixed into the local population.

As the IVC sites begin to get sampled in the future I predict that instead of a homogeneous transect of admixture over time and space we’ll see a lot of heterogeneity.

In the Swat samples, the authors see two correlated trends, an increase in steppe ancestry, and an increase in AASI ancestry. No doubt this dates to the “great admixture” which occurred between 2000 BC, and some time before 1000 AD (the Bengali admixture with East Asians dates to between 0 and 1000 AD, as does that of Brahmins who left the North Indian plain and mixed with local populations elsewhere).

Finally, the authors detect a skew toward steppe ancestry among some populations, in particular, Brahmins. The skew is in relation to Iranian farmer ancestry, the two being the primary constituents of ANI ancestry. In Who We Are and How We Got Here David Reich says some of the ANI admixture is much more recent than the rest, judging by tract length. And also going by the BMAC and Swat samples it seems that the time period for when Indo-Aryans arrived in South Asia has to be in the interval between 2000 BC and 1200 BC.

There’s another aspect of the preprint which allows for dating. The arrival of Austro-Asiatic people in South Asia probably has to postdate the expansion of the same group in Vietnam about 4,000 years ago (though not necessarily obviously). But the Munda Austro-Asiatic people of northeast India exhibit curious genetic patterns. They clearly have East Asian ancestry related to other Austro-Asiatic populations in Southeast Asia, but they have a lot less “West Eurasian” in their ANI/ASI mix. The authors resolve this by suggesting that the Munda arrived in South Asia when there was still heterogeneity among the ASI, and unadmixed AASI.

After 2000 BC the IVC went into decline. Various groups of Indo-Aryans were expanding and admixing. From the other end of the subcontinent arrived rice cultivators from Southeast Asia. At some point, they ran into an ASI population that had some Iranian admixture, but not as much as typical. All of this probably occurred in the period between 2000 BC and 1000 BC. I know that some researchers have argued that the Gangetic plain was inhabited by Munda speaking peoples before it was inhabited by Indo-Aryans. The main issue I’ve had with this is that modern Munda peoples are very genetically distinctive, and there’s no evidence of East Asian ancestry in most populations of the Gangetic plain (the main exceptions are those which have experienced Tibetan influence/contact).

So here is my interpretation of the genetic and historical evidence:

1) IVC emerges out of a matrix that was a synthesis of West Asian farmers and indigenous hunter-gatherers. I would not be surprised if later genetic work recapitulates the findings in Europe of an initial period of separation, and then a “resurgence” of indigenous ancestry as the barriers between the two groups break.

2) The period between 2000 BC and 1000 BC is the beginning of the transformation of the South Asian genetic and ethnolinguistic landscape, with the intrusion of two different groups from different directions, Indo-Aryans to the west and Austro-Asiatics from the east. Austro-Asiatic rice culture was superior to western wheat culture because rice is more delicious than wheat, but the Indo-Aryans ultimately established cultural supremacy across South Asia by the Iron Age.

3) The situation in South India is more complicated and confused. The admixture of groups like Pulliyar from InPe and AASI into the classic ASI configuration seems to be more recent than 2000 BC (their low bound dates go as late as 400 BC). The admixture may have occurred in various places, not just in South India. The evidence from this paper suggests that the Andronovo/Sintashta cultural zone was characterized by some genetic heterogeneity due to variation in admixture with neighboring peoples, and the same could be said for the IVC then. I would not be surprised if northern IVC locations had more AASI than southern IVC, as the latter were more insulated from the east due to the Thar desert (the results are consistent with earlier work that suggest modern populations in the lower Indus basis have less Indo-Aryan and more Iranian, with less AASI).

4) We need to be careful about assuming that everything here is a linear combination of distinct and separable atomic units of cultural integrity and wholeness. What I mean is that though Brahmins and some other North Indian groups are enriched for steppe ancestry, it is not only their purview. Rather, it may be that these upper caste groups simply mixed less with the other populations with Iranian and AASI ancestry. The statistics in this paper do not detect enrichment of steppe ancestry in South Indian Brahmins. I believe this is simply an artifact of the reality that South Indian Brahmins mixed with Iranian-enriched elites, like Reddys, when they emigrated to the south.

Though the model outlined in the preprint is much more complicated than a simple ANI/ASI mix, it still simplifies the demographic histories of many populations. For example, own survey of the data suggests that Brahmins who left the Indo-Gangetic plain mixed with local elites wherever they went (Bengali Brahmins have East Asian ancestry, just as South Indian Brahmins have more Iranian-like ancestry).

5) Language is important but is not determinative. R1a1a-Z93 arrived in South Asia relatively late with groups from the steppe. Its frequency is highest in the northwest, and among upper castes. That is, it is correlated in a coarse manner to steppe ancestry. But R1a1a-Z93 is pervasive throughout South Asia irrespective of caste and region. Even in Dravidian speaking southern populations, some groups have quite a bit of R1a1a-Z93.

The analogy that presents itself here is Southern Europe, where some groups with high frequencies of R1b, such as the Basques and Sardinians, are clearly descended in the main from pre-steppe populations. What this suggests is that a broad social-culture prestige network mediated by males extended itself into regions where its cultural hegemony was not assured. Additionally, the autosomal genetic impact was modest, even if privileges given to particular male lineages allowed them to sweep other groups out of the gene pool.

Tamil history precipitates out only a little later than that of North Indian Indo-Aryan civilization. I suspect that this is not a coincidence, that South Asia after the collapse of the IVC and the arrival of the Indo-Aryans and Mundas, could be thought of as a brought mixing cauldron genetically and culturally. In many regions, Dravidian languages persisted in the face of the expansive Indo-Aryan, but there was a cultural influence, likely reciprocal. This is why once Indian civilization reemerged its coherent unity set against peoples to the west and east was not strange despite the linguistic gap between the north and the south.

The only exception here might be the Munda. As I have said, R1a1a-Z93 is pervasive. But it is nearly unfound among the Munda, who tend to carry relatively exotic Southeast Asian Y lineages such as O. I believe that the Munda were in some way losers in a cultural conflict, but they maintained themselves in the hills above the Gangetic plain.

Finally, two reflections, one navel-gazing, one big picture. Genome bloggers in the years around 2010 actually anticipated many of these results. There’s some hindsight bias here because you remember the times you are right and not the times you were wrong. We were right that there was more than one ANI pulse. Additionally, we were looking at the ratio between “Eastern European” and “West Asian” ancestry years ago and noticing the skewed patterns, with North Indian Brahmins biased toward the former and South Indian elite non-Brahmins skewed toward the latter. Chaubey 2010 suggested to us that something was different about the Munda not only in their East Asian ancestry but in their ANI/ASI ancestry. They just didn’t seem to have any Indo-European ancestry (steppe), and a lot of ASI. Over the past few years I’ve been suggesting that Dravidian languages were not primal to South India, but the product of a recent expansion (though part of this is due to scientific publications).

The truth was out there. It just took ancient DNA and the analytic chops of the Reich group and their collaborators to prune the tree of possibilities so that we could zero in on a few precise and likely models.

In the general, I wonder about the role of clines, diffusions, and pulses. The models that the foremost practitioners of the science of ancient DNA utilize tend to assume pulse admixtures, rather than isolation-by-distance gene flow. This isn’t always a crazy assumption. But there was a discussion in the paper of a west-east admixture cline between Anatolian farmers and Iranian farmers. Is this cline due to admixture, or was it always there? A paper from a few years ago implied that early farmers were highly structured, structure that broke down later.

Also, the polytomy at the base of the eastern Eurasian human family tree, where all the major lineages diverge rapidly from each other, makes me wonder about gene flow vs. admixture. It seems possible that the polytomy may mask a phylogenetic tree topology which had gradually bifurcating nodes, if periodically a single daughter population replaced all its sister lineages in a local geographic zone. Much of history in human meta-populations may be characterized by isolation-by-distance and gene flow, erased by the extinction of most lineages and expansion of a favored lineage.

Post navigation

27 thoughts on “The maturation of the South Asian genetic landscape”

Lot of interesting stuff in it. Its a pity we don’t have samples from the ivc though. Dravidian and Indo-Aryan seem to have expanded roughly in the same time period. Its pretty interesting that some of the Dravidian upper caste groups are so have so much iran neolithic ancestry. The Karnataka Brahmin, Havik Brahmin and the Coorgi are roughly from the same geographic area and half about the same amount of AASI yet the Coorgi has only about half the steppe ancestry as the Brahmin groups. Makes sense that the Dravidian expansion was mediated by people with limited ASI. There is a case for an early dravidian presence in coastal sindh and Gujarat but the more mainstream view makes dravidian the language of the godavari neolithic. We’ll probably need more ancient DNA from India to resolve this.

Some people are asking about the near absence of r1a in swat. Is it possible that the dominance of cremation has affected the data?

The map shows that rather than Sintashta/Andronovo types heading straight south through Bactria to India, they headed way east, and then hooked around and headed west and south through the “West Asian Migration Corridor.” This puts them near western China (border of Xinjiang) in the course of their travels, which presumably raises the probability that they had some role in the foundation of the Shang dynasty in China.

1. “IVC emerges out of a matrix that was a synthesis of West Asian farmers and indigenous hunter-gatherers” The indigenous hunter gatherers were ASI? AASI? originated from east or west? what would be their MtDNA?
2. If InPE is itself an admixture of Iranian farmers and AASi, how to differentiate between AASI and INPE?
3. ” I would not be surprised if northern IVC locations had more AASI than southern IVC” This may be due to ungreening of Thar desert very slowly from 10,000 BP to 4,000 BP.
4. 2000 BC-1000 BC ” two different groups from different directions, Indo-Aryans to the west and Austro-Asiatics from the east”. I have heartyburn with the timing of this; the exact age of Austro-Asiatic intrusion has not been established; it may be before or after iNDO-aRYAN INTRUSION, AND i AM NOT SURE IF both intruded to the same population.
5. “I believe this is simply an artifact of the reality that South Indian Brahmins mixed with Iranian-enriched elites, like Reddys, when they emigrated to the south”. This is unclear. There is nothing special with South Indian Brahmins except a very very small subset of ANI-dominated populations slowly seeped into the south over a 1000 years and intermarried with any local women. There was not a lot of Brahmin-Reddy intermarriage as they were both conflicting elite wanna-be. This ofcourse is quite different in Kerala.

The only point missing in this synthesis post is a discussion of neolithic pastoralist ashmound population who have left archaelogical samples all over the south. They correspond well to AASI and if we find some DNA (beyond the Northern IVC samples which may still be a ASI-dominant INPE) we will be gold.

Now, the theory that the “original Indians” aka AASI stretched beyond India in the past is now vindicated.
Such thing had to be because of how CHG formed, and how you can find their presence in the ANE.
Actually, precisely because people like Kostenki signal so strong for them, I believe these AASI to be some kind of Basal lineage (along with the levantine Basal and the Crown).

Good to see the finding of West_Siberian_HG (I think it needs a new name), and disappointed by the paper quality overall, but it’s the preprint at least.

Also disappoint that no older samples from India and IVC were included, do they actually have them or not? Can we expect them in the publication?

Now, the theory that the “original Indians” aka AASI stretched beyond India in the past is now vindicated.

no it’s not. the ppl were IndPe. unfortunately we don’t have pure AASI today though some south india groups come close.

Such thing had to be because of how CHG formed, and how you can find their presence in the ANE.
Actually, precisely because people like Kostenki signal so strong for them, I believe these AASI to be some kind of Basal lineage (along with the levantine Basal and the Crown).

?

Good to see the finding of West_Siberian_HG (I think it needs a new name), and disappointed by the paper quality overall, but it’s the preprint at least.

1. “IVC emerges out of a matrix that was a synthesis of West Asian farmers and indigenous hunter-gatherers” The indigenous hunter gatherers were ASI? AASI? originated from east or west? what would be their MtDNA?

the HG were AASI. also, look at the excel table in the supporting info. the M haplogroup shows up with InPe. so they were M.

2. If InPE is itself an admixture of Iranian farmers and AASi, how to differentiate between AASI and INPE?

InPe shares drift with west eurasians. also some of the J haplogroups in the males. lots of ppl have long suspected that ASI was somewhat west shifted compared to andaman/east asian/papuan clade.

3. ” I would not be surprised if northern IVC locations had more AASI than southern IVC” This may be due to ungreening of Thar desert very slowly from 10,000 BP to 4,000 BP.

i don’t know the paleoecology, but my impression is that the thar has been a serious barrier. you see it in modern dna.

4. 2000 BC-1000 BC ” two different groups from different directions, Indo-Aryans to the west and Austro-Asiatics from the east”. I have heartyburn with the timing of this; the exact age of Austro-Asiatic intrusion has not been established; it may be before or after iNDO-aRYAN INTRUSION, AND i AM NOT SURE IF both intruded to the same population.

the indo-aryan migration had to happen after 2000 BC according to this paper. austro-asiatic people push into vietnam 2000 BC. it is possible they got to india first from southern china. but i doubt it.

i made that clear in my post. did you read that?

This is unclear. There is nothing special with South Indian Brahmins except a very very small subset of ANI-dominated populations slowly seeped into the south over a 1000 years and intermarried with any local women. There was not a lot of Brahmin-Reddy intermarriage as they were both conflicting elite wanna-be.

it seems pretty clear the introgression is not from groups like pulliyar, but more like reddy or nadar. the ancestry fractions of various non-brahmin groups in the south does differ.

Some people are asking about the near absence of r1a in swat. Is it possible that the dominance of cremation has affected the data?

some ppl are asking about quality. but i wouldn’t be surprised if indo-aryans came in waves, and different Y groups were represented. in europe haplogroup I seems to have piggy-backed on indo-european expansion.

those asking about ancient DNA: it’s out there. but the collaborations are very difficult and seem to collapse when an impasse shows up. hopefully, the indian groups just publish their own results and release the data.

The West Siberian HG is interesting because it’s an old population and gives no f3 mixture signals, they should have checked if its East Asian portion is similar to the East Asian they found last year in MA-1 and WHG (Paleolithic).

Thanks for the summary post Razib.
I am still trying to digest what is going on because there doesn’t seem to be a coherent story yet. From your previous posts it was clear that ANI = Iran_neo + Steppe, but ASI seems to be all sorts of things. If ASI = InPe + AASI, then what is InPe then?

Is InPe = Iran_neo + AASI? (If it is InPe that spread out of IVC then didn’t AASI piggy-back on it?), do you mean InPe does not have AASI and so InPe = Iran-neo + other? Perhaps if we start with new combination instead of ANI and ASI, are the basic ingredients going to be Iran-neo + AASI + steppe?

May be plotting each cline (Iran_neo, steppe, AASI) for one caste is better to get a sense of geographic distribution of these proportions.

Opening the possibility of Brahmins mixing with other high-caste groups (to increase Iran_neo) in South is a two-way street for Brahmins also mixing with other elite in North for enriched “Steppe” in them. Note that Reddy etc (all upper castes in South) are Sudras in varna system, and inter-marrying with kshayatriyas (or other “twice-born” like vaishyas) has less resistance than inter-marrying with sudras. Therefore, it seems there is a higher chance of inter-marrying in the North rather than in the South.

I am still trying to digest what is going on because there doesn’t seem to be a coherent story yet.

no surprise of the models are revised in the future.

If ASI = InPe + AASI, then what is InPe then?

iran agriculturalist + AASI

May be plotting each cline (Iran_neo, steppe, AASI) for one caste is better to get a sense of geographic distribution of these proportions.

it’s in the supplements brah. read it!

Opening the possibility of Brahmins mixing with other high-caste groups (to increase Iran_neo) in South is a two-way street for Brahmins also mixing with other elite in North for enriched “Steppe” in them. Note that Reddy etc (all upper castes in South) are Sudras in varna system, and inter-marrying with kshayatriyas (or other “twice-born” like vaishyas) has less resistance than inter-marrying with sudras. Therefore, it seems there is a higher chance of inter-marrying in the North rather than in the South.

the twice-born castes tend to be steppe enriched in the north. i can double check my models, but it looks like reddys are the best candidate.

also, in bengal it’s clear that the mixing happened with local non-brahmins who were presumably sudras (i know arguments about kayasthas, but they look like other bengalis). cuz the brahmins have some east asian ancestry they couldn’t have gotten otherwise.

@Razib
A good name for West_Siberian_HG could be LNE – Late North Eurasians – or LNE_HG.

Now, there must be the case for the widespread of the AASI, specially because of CHG and its signals in the ANE.
CHG shows signs for some ancient European Hunter Gatherer, the Basal and the AASI.
Maybe 40~20 thousand years ago they were much more widespread than their India home base. And of course, I’m not talking about the AASI per se, but their direct basal ancestral continuity.
With time, they started receeding to India, I presume.

But this is all speculation, I’m only intepreting the signals. Maybe there are ghosts out there, and they’re false signals.

@violet
One shouldn’t take the varna categories too literally. varna often doesn’t really corelate with social status (especially with non-brahmin groups) and castes are known to gain acceptance into different varnas at different points in time.

@Razib
It shows in ADMIXTURE, PCAs, qpGraph and qpAdm. There’s some lingering South Asian ancestry, strongly related if not identical to the AASI.
One of the best ways to model CHG on R (actually, there are no good fits) requires the use of some South Asian, AASI-rich population.

Great to see it out there. Thinking aloud a bit, and because you’re talking about the broader journey here and the social dynamics around it all, it’s all been a pretty weird ride to this point from the “Dawn of (Genetic) Knowing”. Like most, 5-6 years ago I would not have found it at all plausible for a population similar to present day Northern Europeans to have brought Indo-Aryan languages to India by a migration that looks approximately at least of 28% (based on the paper’s estimate of Steppe_MLBA ancestry in ANI). Would have felt ethnocentric and very doubtful.

Yet this adna model seems to show that seems likely what happened.

Even more than generally Northern European the Eurogenes blogs PCA tools imply the Steppe_MLBA cultures (Andronovo, Sintashta, etc.) do really seem to be closest among contemporary people to Swedish and Norwegians. (Though of course Steppe_MLBA are more distant to them than they are to Bronze Age North-Central European populations). This is also what we find when we look at Fst distance.

(We may have expected Steppe_MLBA as closest to Eastern Europe (Balto-Slavic) based on geography and language. But although also very similar, it is appearing to be the case that Eastern European populations have specific kinds of ancestry from HG rich late Bronze Age populations around the Baltic and from HG low cultures of the Balkans. This seems to be working out very slightly further from whatever reflux wave went back to the steppe. This is unlike the Yamnaya, who seem to be closest to present day Russian, Volga-Ural and North Caucasus populations by Fst.)

Plus this was all probably male biased to a degree (e.g. movement of males to found ANI could be greater than the 28% above based on the autosome).

Finally the steppe MLBA, to explain their European Middle Neolithic ancestry, probably lay their origins in what was probably a mass migration to the steppe from East-Central Europe (possibly even a 100% migration, depending on which European LNBA like population “refluxed” and how much MN ancestry they had). This has an obvious antecedent in theories that proposed “Indo-Germanic” was connected to migration of the Corded Ware people to the steppe, and thence to India.

Above also looks true for the founding of the early Indo-Iranian peoples all across Central Eurasia; Scythians, Sarmatians.

I’ve always treated understanding human prehistory through population genetics as a kind of an abstract curiosity, but reading between the lines and putting all the above together, I can kind of readily understand why David Reich is so careful in his book to draw a distinction between his ideas and his lab’s findings and the ideas of Kossinna and other nastier antecedents (too obvious to name), and why he wanted to publish his book at a similar time to this paper being published, to get well ahead of any comparisons.

(Feel free to delete / move if this seems too contentious / unproductive a topic to broach, or a bad way to phrase it!).

I would not have found it at all plausible for a population similar to present day Northern Europeans to have brought Indo-Aryan languages to India by a migration that looks approximately at least of 28%

well, one of the points in this paper is that that’s not really totally true. the reason that the steppe signal looks so ancient despite not being so ancient is that there was genetic variation across the indo-european range due to admixture. these people just don’t have that much EEF, and so either ‘lost’ the reflux or were never subject to it.

modern northern europeans are defined in part of the EEF admixture (i think there was more WHG resurgence on the NE periphery btw).

1. I find the existence of the ANE-heavy hunter-gatherer population in Western Siberia unsurprising, yet fascinating. It helps explain why some modern populations in Central Asia and northern South Asia show higher ANE than one would expect from steppe ancestry alone – because they got a “double dose” via the pre-IE admixture in Turan and later EHG-mediated admixture. I wonder if the Ket are a relatively close analogue for this now-defunct population if not mostly descended from it (though the modern Ket are more East Eurasian than the West Siberia hunter gatherers).

2. That the migration path of the presumed proto-Indo-Iranians went far to the east, through the Inner Asian Mountain Corridor, is interesting in the light that modern linguists tend to consider the Nuristani languages a third branch of Indo-Iranian. This is right around the location that the steppe migrants/invaders would presumably have went their separate ways, admixing into BMAC (forming proto-Iranian?) and the IVC (forming proto-Indo-Aryan.

3. There’s still some unanswered questions here for me. One set relates to the Indo-Aryan superstrate of the Mitanni, and how they got “around” the proto-Iranians. Another is the supposed Uralic influence on either proto-Indo-Iranian or perhaps proto-Indo-Aryan alone, which would seemingly suggest the West Siberian hunter-gatherers were Uralic speaking, not closely related to the Ket. Still, there’s plenty to chew on here.

the reason that the steppe signal looks so ancient despite not being so ancient is that there was genetic variation across the indo-european range due to admixture.

As I understand it, the way the final models work is West Steppe_MLBA+Indus_Periphery+AASI? West Steppe_MLBA being the reflux populations with about ≈25% European_MN (about an equidistant amount between present day North Europe and Yamnaya), and therefore much more actual Bronze Age European ancestry.

That is, no more West Siberian, etc. ancestry into Steppe_MLBA offsetting EEF prior to interface with Indus_Periphery.

Rather Steppe_MLBA+Indus_Periphery+AASI was approximated by prior models with Steppe_EMBA+Iran_N+AASI because Indus_Periphery differs from Iran_N by having West Siberian related ancestry. See Fig 2c – the samples representing Indus_Periphery (Gonur2_BA, Shahr-i-Sohta_BA2 and BA3) have varying levels of AASI and for remaining ancestry have Iran_N and West Siberian ancestry in about 1:9 ratio. The steppe signal looks more ancient because of admix in Indus_Periphery, rather than Steppe_MLBA.

Of course could have been West Siberian related ancestry in Steppe_MLBA related groups entering South Asia at this time, and would make equally or more sense, but would require IVC population (assuming ancestral to Iron Age and later S Asia) to have less West Siberian related ancestry than Indus_Periphery?

Karl, it looks like Okunevo marks the turning point of recent East Asian coming into western Siberia. Supplements have a plot with Han vs West Siberian HG f4 stats (page 146). Eneolithic Samara and various Sintashta and Srubnaya outliers are on a cline, while Okunevo and some later Kazakhstani groups stand out due to extra Han relatedness.

I could be wrong, it’s a big old paper with a lot going on, but that’s how it reads to me. I should probably note actually, Steppe_MLBA_East and pooled Steppe_MLBA_West+Steppe_MLBA_East do work as well, looks like both with lower p, and actually they may have used the pooled West+East in the hierarchical model. Maybe further samples could turn up that flip p values around. But fairly certain the model does not seem to be working through admixture in Steppe_MLBA_East relative to Steppe_MLBA_West to create a pseudo-Steppe_EMBA interacting with a population that’s just Onge-like+Iran_N-like. (Hence also changes in relative proportions from previous models where Steppe_EMBA+Iran_N+Onge).

E.g. supplement page 172, Steppe_MLBA_West higher p than Steppe_MLBA_East for models. Or supplement page 184 – “We finally observe that the SPGT samples and the Steppe_MLBA_East samples are off-cline.” (For modelling of South Asia cline) “The former have excess Iranian agriculturalist relatedness, while the latter have excess Steppe pastoralist relatedness (compared to the proportion of West Eurasian-related ancestry).”

(For how Steppe_MLBA East and West relate, Steppe_MLBA_East proximal model on supplement p 158 as 94:2:4 Steppe_MLBA_West:KhavlynskEN:Okunevo). Distal model in Fig2c shows East having about same Anatolian ancestry as West, but higher ratio of EHG to Iran_N and WHG, probably an abstraction for changes in West Siberian ancestry.).

Matt, indeed a lot to pore over but what you’re saying seems to be completely right from what I’ve gotten from cursorily reading through so far. South Asia got an expected drop in estimated steppe ancestry with these new proximal pre-IE sources. I seem to remember some early estimates being almost 50% for some groups though a few people already thought a Steppe_MLBA + proximal high ANE source would be a better model and reduce it.

As for your other comment, I think if someone wanted to make political or historical arguments based on broad modern-day similarities (I mean even if closest out of all due to their other ancestry being more similar, modern Europeans are still quite distant from populations like Sintashta and Andronovo), they would almost no matter what. If anything, this shows that actual IE ancestry in South Asia (at least in the long-term and for most of those early samples) is overall significant but still modest compared to the pre-IE stratum and that pretty much most of Eurasia, especially the steppe corridor, at the time was highly mobile, admixing intensely and always in the making. One can make an equal argument for multiculturalism if they’re so inclined. Really, there’s something for everyone in all of the aDNA data.

It’s a shame Elena Kuzmina missed the genetic confirmation of her favored theory by just a few years, though!

“Also, the polytomy at the base of the eastern Eurasian human family tree, where all the major lineages diverge rapidly from each other, makes me wonder about gene flow vs. admixture. It seems possible that the polytomy may mask a phylogenetic tree topology which had gradually bifurcating nodes, if periodically a single daughter population replaced all its sister lineages in a local geographic zone. Much of history in human meta-populations may be characterized by isolation-by-distance and gene flow, erased by the extinction of most lineages and expansion of a favored lineage.”

This is a really interesting observation, and both possibilities, naively, seem pretty plausible.

It is almost unthinkable that there was complete genetic continuity and stasis in South Asia from prior to 65,000 BP to 6000 BP. All of the examples we have point to very sustained stasis as very rare, even though turnover may have happened less frequently among hunter-gatherers than among Holocene people at the continental level. The late modern history of hunter-gatherer tribes of Native Americans, for example, suggests that moderately long distance folk wanderings and exterminations of whole tribes were relatively common even on time scales of 1000 years or so.

There were probably at least two significant waves of migration and expansion after the one that gave rise to the Papuans in mainland Asia that greatly interrupted HG genetics there.

Y-DNA D people were probably a mid Upper Paleolithic Northern route arrival in Asia (although pre-LGM given Y-DNA D in the Andamanese) and were possibly male dominated (given that the Onge autosomally are close to AASI), that migrated to South Asia from Tibet to the South ultimately reaching Burma and the Andamans (since phylogeny-wise the Y-DNA D of India and the Andamans is closer to Siberian and Tibetan Y-DNA D than to Japanese Y-DNA D which splits at a very basal point from other Y-DNA D). Other Y-DNA D people take the Northern route to become the founding Jomon people of Japan. Most Y-DNA D people of Northern Asia in between are wiped out in the LGM.

Y-DNA C is remarkably rare and quite low if phylogenetic diversity in South Asia. This could simply mean that the coastal route theory for Y-DNA C is wrong, and that instead it took a clockwise northern route to reach East Asia, mainland SE Asia and Island SE Asia, and the lack of phylogenetic diversity of Y-DNA C in South Asia tends to support that reading of the data. But, another possibility, given the proportionately high level of C-M130* in South Asia relative to other Y-DNA C haplotypes is that Y-DNA C differentiated from Y-DNA CF in India, with lots of Y-DNA C people migrating east, but a few remaining, and that Y-DNA F people (including sister Y-DNA clade Y-DNA H people) subsequently wiped out most of the original Y-DNA C right population of South Asia, and that a lot of Y-DNA C people in India today are associated with a Y-DNA C1b1 back migration later in the Upper Paleolithic. The fact that autosomal ASI ancestry in India is pretty much proportional to Y-DNA C proportions in India, even though the proportions are low, also points to the antiquity of Y-DNA C in India, followed by later events.

In particular, the Y-DNA C people in India were probably marginalized by the expansions of Y-DNA F*, F1, F3 and H in India in the pre-Neolithic period, with other clades of F and daughter clades derived from F expanding into both West Eurasia and East Eurasia where the expanding clades became dominant. It is hard to know what gave the Y-DNA F/H people a decisive advantage over the Y-DNA C people in India and elsewhere, although forced to supply my best guess, I might suspect dog domestication or perhaps mastering how to turn wild grains into flour (flour predates the Neolithic revolution by at least ten or twenty thousand years).

It is also worth noting that ancient DNA suggests that in parallel with these developments in India, that Y-DNA C was once much more common than it is today in Europe, which definitely reflects Neolithic and Steppe driven replacement of remaining European HGs with Y-DNA C, but which may also reflect Mesolithic era replacement.

Similarly, in East Asia and SE Asia, Y-DNA O which is also a remote descendant of Y-DNA F, also sweeps those regions even before the Neolithic revolution.

If Indian nationalists want to discuss their basal and formative influence on the rest of the world, they would be well advised to de-emphasize the Bronze Age and to instead focus on how, on one hand, Y-DNA F is the dominant ancestor of modern Eurasian Y-DNA clades and that it probably originated in India (or at least had its first major expansion there), and how, on the the other hand, in the Iron Age, Buddhism, which also has its origins in India, came to be a profound and arguably dominant religious influence in East Asia.

Of course, the problem is that Indian Nationalism today is Hindu rather than Buddhist, which is a religious movement that India didn’t heavily export and which outside Bali didn’t have much staying power where it was exported, and which isn’t entirely home grown, even though much of it has local roots.

Similarly, the expansion of Y-DNA F people to become the predominant people of Eurasia (especially West Eurasia) is so remote and thinly attested archaeologically that it is hard to identify with those ancient hunter-gathers.