search this blog

Friday, March 10, 2017

Bring it on

AdmixTools 5 is now available at GitHub (see here). I'm messing around with the latest version of qpAdm as I await the expected flood of new ancient samples. Based on first impressions, I'd say it's sharper than previous versions. Here's an attempt to hone in on Yamnaya's ancestral makeup; note that the best statistical fits are clearly those with the spatiotemporally closest genomes.

I also had a quick look at South Asia. The likely Eastern Iranian-speaking early Sarmatians from Pokrovka, Russia, recently published along with Unterländer et al., look like a decent enough reference for modern-day Eastern Iranians, but not for Indo-Aryans like the Kalasha and North Indian Brahmins. The latter prefer Ulan IV, the late Yamnaya/early Catacomb sample from Allentoft et al. 2015. It's an intriguing question why.

As far as I can tell right now, the eastern Scythians from Unterländer et al. aren't all that relevant for South Asians. I'll wind things up here with models for a few more populations from Pakistan and India.

55 comments:

Indo Aryans arrive in South Asia in the Bronze Age mixed with Oxus/BMAC peopleEastern Iranics arrive much later in Antiquity and different from the Yaz Iranians who are forebearers of Plateau Iranian languages. Those Steppe numbers seem too high and Iran_N numbers way too low, unless there was some massive population displacement and replacement which is unlikely.

Indeed, Rami, Iran_Neo seems quite low for all these groups based on previous estimates and likelihood of mixing with BMAC(as you said) prior to crossing the Khyber pass. ~50% are basically Udmurt levels.

+1 on Iran_neo that low being unexpected, and I'm someone that considers the steppe hypothesis pretty well-suited to a variety of evidence. Totally prepared to update my expectations.

Davidski, what if there were HGs (with ANE?) in SC Asia that Iran_neo ended up mixing with, an eastern parallel to EEF/WHG? What kind of effect would that have on your tests? Or are your tests showing small enough residuals that you think this is unlikely? Sorry, I still have a lot to learn about the subtleties of these methods.

SC Asia and the subcontinent are climatically different enough I would expect the HGs to be quite different.

Thanks again for this blog, and to other commenters, endlessly fascinating.

High steppe numbers by about +20% and Iran_N numbers of -20% just seems like a generally unusual feature / question of the formal models at the moment. The same balance of Steppe vs Iran_N in SA was present in Lazaridis's 2016 paper.

Using Basal_K7 plus Fst distances together generates best fit with the following as:

It replaces their East Asian/Siberian. But they may have got it from their Sarmatian-like Indo-Iranian ancestors. If so, then the model is correct.

@Gill

Davidski, do you suppose the pre-Neolithic inhabitants of India were 100% East Eurasian, perhaps very similar to the Onge?

I think they belonged entirely or almost entirely to a closely related sister clade to East Asians and Onge. Not sure if this will be classified as East Eurasian when we see their genomes, but maybe.

@Rami, Anthro Survey and Taymas

Just to reiterate what Matt said, the ratios of Iran Neolithic to Bronze Age Steppe (in this case Ulan IV) are very similar to those in recent scientific literature.

However, new ancient samples from Iran and Central Asia may well shift these estimates. Even using Iran_Hotu instead of Iran_Neolithic lowers the steppe input for the Kalash (though not for the Brahmins). But note that the standard errors also shoot up, probably because of the low quality of the Hotu genome.

Do west Iranics, Brahuis, Balochis, Afghan Pashtuns get any different results with the latest version of qpAdm? I wonder if the Extra East Eurasian in Afghan Pashtuns (compared to Pakistani ones) came from some Scythian groups.

@ Davidski, yeah, I was thinking about separate tests then comparing the chi-square, as in your main post, but messed up my post! I see, it would take longer than I thought. OK, if you (or Chad if he's interested) ever get the time and are intrigued, but understand equally if it's too time consuming.

I was hoping the outgroups should be good for that test, as there are plenty of ancients with information that should hopefully discriminate WHG subclades and offer better chi-square for, e.g. Iberia_EN with La Brana, Iberia_MN with La Brana / Loschbour, etc. Though mainly how El Miron should supposedly be to La Brana (based on f3 sharing), as many of the other ancients (Kotias) might be more of a clade to different WHG.

These results for the people of the Indian Subcontinent make sense since they are not descended from people of the Steppe and certainly not from Scythians. Instead people of the Steppe such as Ulan_IV and Sarmatian_Pokrovka owe much of their ancestry to the Subcontinent. Sarmatian_Pokrovka are worse in modeling Brahmins than Ulan_IV because of the extra East Asian ancestry that Scythians have and of course the eastern Scythians have even more.

The recent paper on Scythians and the Tweet from Lazaridis makes it clear that the Reich Lab people have given up on trying to model Scythians and people of the Indian Subcontinent as descended from Andronovo. My hypothesis on the origin of the Scythians is that they learned the art of making iron tools from India, spoke Indo-Iranian languages and expanded to eastern Central Asia, mixed with the people there. Then perhaps with horse herding techniques, learned in Central Asia, together with their knowledge of iron metallurgy, they replaced and extinguished the European-derived bronze-age cultures of the Eurasian steppe. Of course, they also mixed to some extent with these bronze-age people such as the Srubnaya.

Stop being a whiny bitch David, your the one bleeding out of your ass.

There is no way North Indians have Steppe ancestry at levels comparable to NE Europeans you must be smoking a crack pipe if you actually think that, the same modelling has Paniya at 40% Iran_N and 6% Steppe , when its known fact they are 80-85% ASI. Clearly there is more to this. As far Proto Indo Aryans go, they are descended from the same stock as those Androvono Iranians, and their split occurred on the Northern Steppes not in the Yamnaya region.

Thanks for posting David :P, see my point. How do those Eneolithic Harappa samples look like. Almost 40% steppe for Baloch and 35% Steppe Brahui, there is without a shadow of a doubt much more to this and only ancient genomes from SC Asia will resolve this.

Can you use any of these new tools to run a side by side comparison of R1a/b samples separated by 4/5 thousand years. Khvalynsk R1a\b [M459/L754]------Scythian R1a-2123 and Sarmatian R1b-2109[CTS-1078] To see how they compare to each other in different time frames. They are relatively close in time and geography to each other.

Basically, the eastern Iranic peoples of Tajikistan, Afghanistan, and northwestern Pakistan can be construed as being substantially derived (genetically) from the historical Eastern Iranians of the ancient steppe.

Now, with the Kalash, I get this:

33.8% Iran_Neolithic + 17.5% Iran_Hotu37.4% Yamnaya_Samara11.3% ASI

Distance=0.4269

Compared to Lithuanians:

42.95% Yamnaya_Samara31.20% Loschbour25.85% LBK_EN

Distance=0.378

So, again, there is a correspondence between qpAdm and nMonte. Both nMonte and qpAdm have Lithuanians and the Kalasha at around the same percentage amount of Yamnaya-related ancestry.

The exact estimates differ, because nMonte gives lower Yamnaya-related admixture to Europeans, in comparison to qpAdm.

Could you post results for the SA groups David posted .Kalash have the highest steppe ancestry among South Asians and if I recall the Lazardis paper mentioned them having levels , but I do not see that with North Indians or populations , east of the the Indus, where populations are far more ASI shifted.

Eastern Iranics arrive 1200-1500 years later in Antiquity. Getting results from Dardic groups in Afghanistan would be helpful

Earlier, you stated that the Paniya are a Negrito-like people, noted that they are 80%-85% ASI, and claimed that they are modeled as 6% steppe with qpAdm or nMonte.

For starters, the Paniya are not a "Negrito" population. They have never been described as such, in anything I've read.

In terms of physical appearance, they don't resemble Andaman Islanders, or the "Negrito" populations of Southeast Asia.

Instead, they look rather similar to "scheduled caste" South Indians.

I mean, this documentary is about them, you can see for yourself:

https://www.youtube.com/watch?v=cYGCwsjbCbY.

Again, they look far more West Eurasian-influenced in terms of phenotype, when compared to Andaman Islanders or Southeast Asian "Negritos".

Regardless, with nMonte, they are pretty consistently 60% ASI + 40% West Eurasian. And, the West Eurasian element is always Iran_Neolithic-related, but with more ANE. I don't know why you think this is problematic?

Also, I've never seen the Paniya modeled as 6% steppe, so not sure where you got that number from?

Anyway, as a demonstration, here are the Paniya, using four ASI references:

With Onge

61.4% Onge33.0% Iran_Neolithic + 5.6% MA1

Distance=3.5843

With Jarawa

59.50% Jarawa35.55% Iran_Neolithic + 4.95% MA1

Distance=3.5609

With Austroasiatic_Bonda

66.35% Bonda26.10% Iran_Neolithic + 7.55% MA1

Distance=2.6142

With my ASI simulation

60.15% ASI31.25% Iran_Neolithic + 8.15% MA1

Distance=0.3154

Beautifully consistent. Always 60% ASI + 40% West Eurasian (and the West Eurasian ancestry is always Iran_Neolithic-related, but with more ANE), no matter what you use.

And yes, the Kalash aren't even technically South Asian.

They actually live in Central Asia. The Pakistani provinces of KPK and Balochistan are, objectively speaking, situated on the "Eurasian plate", while Punjab and Sindh lie on the Indo-Australian plate. So, only Punjabi and Sindhi populations are geographically "South Asian".

Most anthropologists tend to describe the Kalash and the other Dardic peoples of Afghanistan (not to mention Nuristanis) as "Central Asian isolates".

Basically, isolated representatives of ancient Central Asia, with a minimal of later Iranic and Turkic cultural influence. The only times I hear these populations construed as "South Asian" is when geneticists talk about them.

But, the Kalasha steppe element is much more strongly linked with Indians, even ASI-rich Indians, rather than neighboring Pashtuns. It's that shared "Aryan" connection.

Which shouldn't be surprising, as the Dardic languages are "Indian", in the linguistic sense of that term.

So, the fact that they prefer Yamnaya (just like IE Indians), rather than Sintashta/Andronovo/Srubnaya/Sarmatians/Scythians, tells us something about the Indo-Aryans.

Well I don't want to get into semantics , of what should be South Asian and what should not be. South Asian wrt to this whole Yamnaya vs Sintashta thing. As the Indo Aryan sphere is mainly located in Northern India and Pakistan, this does tie Kalash to that and the ethnogenesis follows a similar pattern albeit with more ASI in the mix. Though classically speaking , West of the Indus the populations are definitely in that SC Asian sphere. Though in contemporary Pakistan, groups will mix and they are and this will only increase with time.

Well they are still negrito LIKE look if you look at many of Coon's plates. I guess till they find more archaic West Eurasian proxies from the region , they will have to use Iran_N . The 6% Steppe is from one of the model's Matt used.

Thank you DAVIDSKI, It may be impossible but I would like to find out just when my ancient H6a Grandmothers arrived in the Steppe. Was she part of the Ukrainian Mesolithic in a bone hut(along side R1a)? Did migrate with the Elshanka Culture from Central Asia into Samara in 7000-6500 BCE? Did she come up through the Caucus Cultures? Was she part of the Repin, Khvalynsk, Samara or Seroglazovo Cultures?.....may never know but I sure love the research & recreating Yamnaya/Corded Ware/Srubnaya jewelry.

I can take a stab at it. It can take quite some time finding the right outgroups. Barcin wouldn't be good to use as it isn't ancestral to the first farmers of Europe. A merge of the Koros and Starcevo samples we have should do fine. I'll report back anything interesting.

On my part, I just wanted to make a note of the fact that (objectively speaking) the Kalasha and other Dardic peoples in Afghanistan/northwestern Pakistan actually live in Central Asia.

They are geographically Central Asian, not geographically South Asian (but the Yamnaya-Sintashta difference does link them with Indians, rather than with neighboring Central Asians). The same goes for Pashtuns in both Afghanistan and northwestern Pakistan; they are, geographically speaking, Central Asians.

And, in the anthropological literature, Kalasha and other Dards are usually described as being isolated remnants of ancient Central Asian culture, barely influenced by the later Iranian influx, and with virtually no influence from the much later Turkic expansions.

Also, I should note that mixture in Pakistan isn't as pervasive as you seem to think.

For example, Pashtuns are viewed rather suspiciously by some Punjabis, who often tend to associate Pakistani Pashtuns with terrorism/violence/supposedly "primitive tribal custom", and also have stereotypes of Pashtuns as being "all brawn, but no brains".

For their part, some tribal Pakistani Pashtuns often associate Punjabis with traits like "effeminacy", "softness" (whatever the hell that means. Not even kidding, "soft" is the literal translation of a Pashto term I've heard used often, in regard to Punjabis), "arrogance", "decadence", etc.

It's all very stupid, and totally nonsensical. But, that's how things are IRL.

No doubt, there is mixture in cosmopolitan/urban settings. You'll find many people of mixed Pashtun-Punjabi heritage, just like how in urban Afghanistan you'll often find people of mixed Pashtun-Tajik heritage, or mixed Pashtun-Uzbek heritage.

But, there is also no doubt that rural Pashtuns, rural Balochis, rural Punjabis, rural Sindhi, etc, have a very strong preference for marrying people of their own ethnic background.

Basically, in the Pashtun tribal belt, and in the villages of Punjab and Sindh, inter-ethnic mixture is extremely rare.

And, with regard to the Paniya, all I'll say is that they physically just look like other South Indian populations.

In my modelling, South Indians tend to be 45% ASI, while the Paniya are around 60% ASI, so it isn't surprising that they resemble other South Indians when it comes to facial features (only a difference of 15% extra ASI).

Anyway, enough with this sort of discussion; here are the results you wanted.

Punjabi_Lahore

46.80% Iran_Neolithic + 12.85% AG333.35% ASI7.00% Yamnaya

UP_Brahmin

42.80% Iran_Neolithic + 11.05% AG328.40% ASI17.75% Yamnaya

With these, I do see some divergence between qpAdm and nMonte.

Although both qpAdm and nMonte agree in having the Kalasha at around the same levels of Steppe_EMBA as Northern/Eastern Europeans, qpAdm shows much higher steppe admixture for South Asians proper, when compared to nMonte.

Personally, I don't think we can truly know which output is more accurate. It's a matter of more aDNA.

The aDNA will be coming sooner, rather than later. So, I don't see why you guys get so heated over this topic, and immediately start throwing around the expletives.

@ Davidski, offer is definitely much appreciated. I've had a few problems in the past with dual booting Linux before. I might look into getting that fixed and get back to you on that in near future.

(For the tldr, essentially, had a good setup with dual booting Ubuntu just to run basic ADMIXTURE and D-stats for myself. This was a little before the adna autosomal revolutions of the last few years, just for curiousity. Then some system problems prompted system recovery and I've had some problems getting a dual boot of Linux booting since then, so being lazy gave up on it.)

@ Chad, sounds good if you have time. Lipson suggests Koros_EN and LBKT_EN might be good as they are described to have respectively 0.0 +/- 1.2% HG and 0.8 +/- 0.9% HG where Starcevo has 2.3 +/- 1.1%. Might help for LBK_EN which only has 4.2 +/- 0.6% in their estimates.

@sein Whats becoming more and more apparent is the local South Asian hunter gatherer population was not just a monolith of just ENA, your own tests show that. Based of what Lazaridis said Paniya are 80-85% ASI , but clearly even that ASI has a good amount of very archaic West Eurasian component. Contemporary South Indians do not look like Paniya, they may share a similar skin tone but features wise they differ considerably,as Paniya facially are mainly a mix of Veddid,Paleo Mongolids and Negrito and lack the Indid/ the local mediterranid element.

I agree on populations like Kalash having high steppe ancestry, ditto Pashtuns/Tajiks and to a lesser degree some groups originating in the Potohar.

Those models for PJL and the Brahmin make more sense. Fits for SC Asians also are much better as they have much less ENA ancestry.

Yes , classically Pashtuns view Punjabis or people in the plains as "Dal Khors" , ironically because of poverty and war, many are eating it themselves. Urbanization and a common religion have allowed for much more mixing in cities, in villages, endogamous and tribal views remain, though I don't think it is not as rigid as the Caste system. Though Pashtuns do not have much issue with men marrying non pashtun women, but Pashtun women marrying non pashtun is very rare, and prohibited, even in Afghanistan. Pashtun/Uzbek marriages are becoming more common in the North, in cities Tajik/Pashtun marriages especially among educated people are fairly common.

I seem to recall that the best fits for Yamnaya came when including WHG? With nMonte/PCA data it was always like that too: EHG + WHG/SHG + CHG. I wonder if including Latvia_HG in the last model (instead of Lengyel_LN) improves the fit as it does using Global 10.

I find that all Iranian peoples (but especially Eastern Iranians) have a preference for the Sarmatian samples, or show a mix of Sarmatian + Srubnaya/Andronovo + Yamnaya.

By way of contrast, Indo-Aryans only gravitate towards Yamnaya, even the ones that are very different from South Asians proper (for example, the Kalash). They never take any Sarmatian/Srubnaya/Andronovo, if Yamnaya is in the mix.

Although, if my memory serves me right, when one adds Poltavka, Indo-Aryans prefer those samples to Yamnaya. That could be significant, I guess.

Although, I also want to eventually examine the picture using the Srubnaya_outlier sample.

I mean, it's an interesting sample, and it seems that people like her had an important (genetic) role to play in the ethnogenesis of western Scythians (and the Sarmatians), at least in the tests I've tried.

The only thing that gives me pause (when it comes to adding her into the modelling), the fact that we don't really have a solid handle as to what sort of population she represents.

At the moment, I think she might have been a mix between some ANE-related population (from anywhere between the Urals and Central Asia proper) and Steppe_MLBA (so, ANE + Sintashta/Andronovo/Srubnaya), but with some additional West Asian/Caucasus admixture.

We need to find more samples like her, in order to put her in proper context.

SeinYes for my end the best match with PCA data is still Srubnaya outlier for both IA and Iranians; although much attenuated for the latter (with a peak of 15-18% in Zoroastrian). Sarmatians really don't seem to feature- even with Subnaya Outlier left out ("regular" Srubnaya slots in).

All Eastern Iranians, ancient and modern, seem to be very closely related. So even though Sarmatians aren't ancestral to Pathans and Tajiks, they're close in space and time to the ancestral group for all Eastern Iranians, and that's why they're a good reference pop for Pathans and Tajiks.

The Sarmatians don't produce very good models for Western Iranians, but that's to be expected. Western Iranians prefer Sintashta.

East Iranics like Pashtuns and Tajiks have much higher East Eurasian(Siberian-Mongol and possibly Saka related) ancestry than west Iranians, that's probably why west Iranics don't fit that well with east Eurasian admixed steppe groups.

That's true, although keep in mind that since the Sarmatians produce good fits for modern Eastern Iranians, especially the least admixed, Pamir Tajiks, then not only is their West Eurasian ancestry a good match, but their East Eurasian ancestry is too.

So a fair whack of the East Eurasian admixture in Eastern Iranians is probably from their Eastern Iranian ancestors from the steppe.

Moreover, Western Iranians do have minor East Eurasian ancestry, but it's more East Asian shifted than what we see in the Sarmatians, probably because its derived from post-Sarmatian Turkic population movements.

David pretty much hit the nail right on the head; I couldn't have said it better myself.

For the king,

Tajik and Pashtun East Eurasian admixture is usually much more genetically "northern", compared to the later Turkic admixture seen in West Asia (more Siberian/Native American-like, rather than like the Mongola/Altaians), so it probably represents steppe Iranian ancestry.

But I thought it would be interesting to compare apples with apples (i.e. PCA nMonte). I here used the weighted approach also, and have included the (good) Sarmatian individual as a source. The results are also the same without the Sarmatian. The Sarmatians doesn't form a part of Iranian ancestry, east or west:

So whilst it's clear that they share large chunk of common steppe ancestry, the question is when Sarmatians & other Iranians diverge from each other. Probably before the Sarmatian period, more like the MBA..

Just to keep things in proper perspective: I always find myself rather stunned, when it comes to your unbridled brilliance, and intensive analytic depth.

Sometimes, it just gets plain overwhelming. I mean, how do you properly manage that high a level of archeological/historical/anthropological/genetic knowledge, which you so obviously have right in-between your fingertips?

With that much intellectual heft, all in the hands of a single man, one is only (quite naturally) forced to grow envious. ;-)

Anyway, I can't replicate your results.

So, here is a quick analysis of different Iranian peoples, with the addition of an isolated Central Asian Indo-Aryan population (the Kalash), all for your viewing pleasure.

I took some time to pick the right references, and I've begun to implement Huijbregts' suggestions. Please, do refer to his comments, at Anthrogenica.

Also, this should go without saying, but I’ll say it anyway: everyone was tested under the same conditions (same reference populations, same dimensions, etc). And yes, I used the higher quality Sarmatian sample.

Distance=0.0767 (again, overfitting, but I was trying to replicate your results, via the use of multiple steppe references. So, I had no choice)

Ishkahimi

32% Srubnaya_outlier + 14.75% Sarmatian

38.95% Iran_Chalcolithic + 3.95% Iran_Neolithic8.7% ASI1.65% Altaian

Distance=0.0931

Pashtun, Pakistan (Karlani tribal confederacy, speaks an archaic dialect of Pashto, one which has substrate influences from an older Eastern Iranian language. The older Eastern Iranian language is still spoken in the vicinity of his tribal territory)

24.7% Sarmatian + 14.2% Srubnaya_outlier + 4.35% Srubnaya

25.5% Iran_Chalcolithic + 23.4% Iran_Neolithic7.85% ASI

Distance=0.152

Pashtun, Afghanistan (this individual is of the Durrani tribal confederacy)

22.5% Sarmatian + 13.65% Srubnaya_outlier

28.95% Iran_Neolithc + 26.4% Iran_Chalcolithic5.9% ASI2.6% Altaian

Distance=0.1786

West Asian Iranians (all Western Iranian speakers)

Persians (no clue about the geographic origins. Anyone who knows should chime in)

Lol Rob, jokes/sarcasm/ball busting aside, lets shift gears, and get serious for a moment (something which is exceedingly difficult for me. If you knew me IRL, you’d know that I got a tight lock on the world title for “sarcastic a-hole who doesn’t take anything too seriously”). This is the general picture I’m seeing.

Western Iranians have minor steppe admixture, and it is pretty much “Steppe_MLBA”-related, nothing else.

By contrast, Eastern Iranians have loads of steppe admixture. In most cases, they are largely steppe-derived, and they do have a preference for the Sarmatian/western Scythian samples, along with the Srubnaya-outlier.

This is quite interesting, because it seems that the Srubnaya_outlier already has some sort of relationship with the Sarmatians and western Scythians.

And, the closest modern populations we have to the ancient Indo-Aryans (the Kalasha of the Hindu Kush) are quite unique in this context. They have the same amount of “Steppe_EMBA” as Northern/Eastern Europeans, and it seems that their steppe ancestry is a combination of Yamnaya-like (Poltavka samples are carbon copies of Yamnaya_Samara, with exception to one Steppe_MLBA-like outlier) and Srubnaya_outlier-like ancestral streams.

Basically, Eastern Iranians, Western Iranians, and the Indo-Aryans have different kinds of steppe ancestry.

And, the western Scythians/Sarmatians do show a heightened relationship to all Eastern Iranians, whether they be Pamiri speakers, or Pashto speakers, or speakers of the Ormuri/Parachi langauges.

This should be of no surprise (at all), because Scythians and Sarmatians spoke languages very closely related to contemporary languages like Pashto, the Pamiri cluster, etc. This has been scholarly consensus for quite a while now.

Finally, with regard to this statement you made:

“The Sarmatians doesn't form a part of Iranian ancestry, east or west”

I do have to go back to what David said, because he really did hit the nail right on the head, and I can’t be as concise as he is:

“All Eastern Iranians, ancient and modern, seem to be very closely related. So even though Sarmatians aren't ancestral to Pathans and Tajiks, they're close in space and time to the ancestral group for all Eastern Iranians, and that's why they're a good reference pop for Pathans and Tajiks.

The Sarmatians don't produce very good models for Western Iranians, but that's to be expected. Western Iranians prefer Sintashta.”