search this blog

Friday, October 14, 2016

Global 10: A fresh look at global genetic diversity

Update 06/02/2018: I'm working on a new version of this test. It's called the Global 25. See here.
...
Below is my new Principal Component Analysis (PCA) or genetic map of global human population structure. I think it's a little bit special, and we can discuss why in the comments if anyone's interested. The datasheet is available here; it can be used to generate 2D and 3D PCA plots, and to model samples of your choice using the nMonte and 4mix R scripts. A similar sheet with average values for the ancient populations is here.

Here are a few examples of nMonte mixture models for highly drifted populations that often confuse the crap out of standard population genetics programs.

It's interesting to note that the Dai from southern China help to improve the fit for Karitiana from the Amazon basin, but not the Wichi from Argentina.
Also, Andronovo significantly improves the fit for the East Iranian Pathans or Pashtuns, but clearly not as much for the Indo-Aryan Kalash, and none at all for Brahmins from India, who are also Indo-Aryans. Why? Don't know, but it might well be an important question in regards to the origins and spread of Indo-Iranian languages.

What happens if Aeta or Agta are included in NMonte runs on Amerindians? There is evidence of coconuts having reached the American Pacific coast (Central America to Exuador) around 300 BC from the Southern Phillipines.

Also, I noted that your AdmixQ13 shows Kennewick Man as being 3.6% East Asian, 4.3% Steppe_EMBA, 4.5% Siberian, and 11.9% Beringian. Anzick1 (Clovis) does have some Beringian (6.7%), but lacks the other components. This difference seems worthwhile following up, also as concerns Kennewick's mtDNA X2, the route of which to the Americas is still mysterious.

Well, no Onge, so I can't test what the ASI proxy Paniya are best modelled as in the same terms as Lazaridis 2016, but using an array of SE Asian (Agta, Aeta, Ami, Bajo, Batak, Igorot, Dai), African (Esan_Nigeria, Biaka), Iranian Neolithic / Epipaleolithic (Iran_Hotu, Iran_Neolithic) and one steppe (Afanasievo), to model Paniya I got:

Paniya - Iran_Hotu 52.6, Agta 47.4distance% = 9.1447 %

(others populations were fit at 0).

(It's not a great fit!)

Still, if you imputed that into the fits with South Asians from the post, then something like:

which seems to track the correct sort of rough proportion of Iranian, steppe, and ENA groupings we might expect, albeit the Agta are an imprecise reference for the ENA proportion of South Asians and I think Iran_Hotu may be an imprecise reference for the Iranian like proportion of Paniya.

(Makes more intuitive sense to me than the proportions fit with qpAdm by Lazaridis).

Can you repeat with CHG added or in place of Armenia?If it works it would explain the Broushaki ancient fits which showed no CHG or WHG in India proper, just Iran_N and Ust-Ishim which can proxy for both ANE (Amerindians) and ASI. I thought it was just because of the lack of EHG in that model but this looks worth a test.

If that were the case, MA1 would be a better choice (theoretically, no Villabruna admixture, compared to 25%-30% in Karalia. And even if it does have some kind of West Eurasian admixture, when used as a baseline, EHG appears to be 25%-30% Villabruna-admixed, compared to it).

The extra ANE is probably because the Indo-Aryans were a fusion between Andronovo/Sintashta-like steppe populations and steppe populations similar to the Srubnaya outlier.

The Srubnaya outlier has more ANE admixture than EHG proper (!), although it also has some minor Near Eastern admixture, the kind we see in Yamnaya. Here is a fit with Andronovo and the Srubnaya outlier:

We have yet to find Indian L657 on the steppes, where it should be. Again, when we do find it, I bet it'll be found in a population that was some sort of genetic fusion between an extremely ANE-rich steppe population like the Srubnaya outlier, and a steppe population similar to Sintashta/Andronovo.

Also, Scythians, Kushans, Dahae, etc, these steppe-derived populations of the historical period most definitely played a role in the ethnogenesis of anarchic, tribal, semi-nomadic, confederacy-based East Iranian ethnic groups, as typified by the Pashtuns of Afghanistan/Pakistan. Not to mention the warrior agriculturalists of the greater Punjab (Jatts).

But they just can't have had any real genetic effect on Indian Brahmins. As per historical records, Brahmins viewed the steppe-derived populations of South Central Asia as "mleccha", unclean foreigners, deemed them unfit for marriage, and didn't even particularly enjoy their social company (to put it mildly).

Also, the Kalasha, the Nuristanis, and the Kohistanis are very isolated people, in terms of culture, geography, and based on what we see with genetic data. The Kalasha don't seem to have had any admixture with neighboring populations, since the Bronze Age.

So, although Pashtuns, Jatts, and the Pamiri people probably have substantial steppe ancestry from populations known to us from historical-era records, the isolated Dardic+Nuristani tribes of the Hindu Kush, and the Brahmins/Kshatriya of India, probably owe their steppe ancestry wholly to the original Indo-Aryan waves.

In Global perspective the Near Eastern, Caucasus and Europe cluster (with the exception of some populations drifting towards East and South_Central Asia) build a quite tight cluster in the upper right corner. Than we a greenish South Asian cline from The Near Eastern/Caucasus cluster reaching all the way down almost to the East Eurasian cluster (the more ASI descend populations). Central Asians are somewhere in between the West Eurasian and East Eurasian clusters. Drifting Westwards from the Near East, slightly towards the SSA cluster are the North Africans.

Obviously Karelians weren't living in northern India, but Villabruna -like populations could have existed further east than we might imagine (afterall, they did impact on the southern Caucasus significantly). Thus Karelia -like groups could have been dwelling in central Asia before the Boreal Holocene.

@ Sein

Please re-read what I wrote. I stated that the impact of Andronovo-groups was limited to exactly to those which you described- the pastoralist East Iranians, not Brahmins.

The problem I have with your 'Srubnayans' fit is that genetic-modelling feasability needs to allign with historicity. At least at initial glance this is problematic, as the Srubnaya culture dates from 18 - 1200 BC, and it thus potentially too late on in the piece to be of relevance for the ancestors who allegedly composed the Vedas, not to mention their culture too 'primitive' (I admit this risks a Marxist & evolutionary bias). Moreover, Sr-Andr does not extend south of the Amu Darya. In fact, Srubna- Andronovo barely made an impact on the BMAC, let alone later, more southern groups. Thus, at the moment, the only convincing link between EE & India is R1a-Z93

Lastly, I am not surprised that MA-1 is a worse fit. It's too old and peripheral to be relevant for any modern Pops

The Caucasus hunter-gatherers are in a broad/general neighborhood that always seems to have had populations with substantial Villabruna-related ancestry (Anatolia, the Levant, etc). Perhaps because the Villabruna cluster might be rooted in that general part of the Near East?

Basically, the Villabruna-related ancestry in the ancient southern Caucasus has the same affinities/roots that the Villabruna-related ancestry of ancient Anatolians/Levantines had, broadly speaking.

Even western Iranian farmers had significantly much less Villabruna affinity, compared to CHG. So how would Central Asian farmers/ancient foragers look? Definitely not more WHG-shifted, but even less.

Looking at the steppe/Central Asian angle, we can go back to that Srubnaya outlier. If I'm not mistaken, it is from further east than the EHG samples. As a result, it has much more ANE, and much less Villabruna, compared to EHG proper (although, again, it also has some very minor Near Eastern ancestry, which is lacking in EHG proper).

So, the more east one goes, both in the Near East and on the steppe/Central Asia, the further Villabruna affinity goes down.

So it's a pretty safe bet that the farmers/foragers of Central Asia/northern South Asia would have had none of this sort of ancestry.

Also, for what it's worth, I'm not saying that Brahmins have ancestry from that exact Srubnaya outlier sample, or the Srubnaya culture.

Rather, the very existence of a predominately ANE sample, that late, and found in the context of a steppe culture (the Srubnaya outlier is female, so she/her population could have been from somewhere else on the steppe) proves that such people still existed on the steppe (obviously, I'm being almost tautological), and that the Indo-Aryans could be derived from a steppe population with very heavy admixture from that sort of people.

Basically, all you'd need is a population intermediate between Sintashta-like steppe people and steppe people similar to the Srubnaya outlier (again, I was not claiming that Sintashta + Srubnaya outlier = Indo-Aryans. Rather, I'm referring to the genetic affinities).

I think the fact that we have yet to find Indian-specific R1a1a-L657 is of great importance. Basically, it means we still need more steppe samples, to properly match the Indo-Aryans from whom the Brahmins/Kshatriya are descended.

Proto Indo Iranians are closely related to Balto-Slavic peoples both are derived from CW culturally as well as genetically. An Afansievo/Yamnaya like Indo Aryan population is really far fetched, as Sanskrit contains no Yamnaya/Afanseivo linguistic influences, which are more closely related to Balkan languages, you do not find any proto Balkan influence in Sanskrit. The Uralic factor also places PII right around the Baltic Steppe. As well as all the points Rob mentioned. If there is a 1% chance of this being true, it would require serious remodelling of how PIE culture spread . Srubnaya culture exists after the Indo Aryans were more or less wiped out of the Steppe circa 2000 BC.

No such thing as the Baltic steppe. There's no steppe anywhere near the Baltic.

Indeed, Uralics lived and indeed live near the Urals, which are on the Kazakh border, not anywhere near the Baltic.

Also, keep in mind that we have Potapovka, Srubnaya and Andronovo samples that are more similar to Afanasievo than to Corded Ware. It's likely that the masses of the Middle and Late Bronze Age steppes were in large part like this, even though the elites might have been more like Corded Ware.

I see no reason why people mostly like Yamnaya could not have been the main type of Indo-Europeans to enter South Asia as late as 1600 BC and carried mostly Z93.

You do not find Yamnaya like populations in NE Europe. Kristina mentioned some great points which you should consider. As well the linguistic record does not contain ANY Yamnaya/Afansievo words , that is pretty telling considering Sanskrit does contain BMAC and Para Munda influence . If your suggesting a large group of Yamnaya like people were Aryanized by an Androvono group, that would be incredibly far fetched, possible but far fetched. You need to understand by 2000 BC, Proto Indo Aryans had vanished on the Steppe, it was dominated by Iranics, who very likely drove or wiped them out. They were fusing with much larger Oxus populations to the South. Rob basically hit it on the nail, there is some ghost population in Central Asia acting EHG like in some ways , given what we have seen with Iran_Hotu , I won't be surprised. In any case more Steppe samples and samples from Central Asia are needed to put this to rest.

They are quite equivalent, only with EHG instead of Yamnaya. It is in agreement with Lazaridis et al. 2016, where the populations that work best are Iran_N + Yamnaya or EHG (and actually when further constraints are put on the outgroups, Yamnaya stops working for some populations and only EHG and Samara_Eneolithic work for all).

In that paper Iran_Neolithic is modelled as 60% Basal Eurasian and 40% EHG. But that's in West Iran. I think that as you move to Afghanistan or Tajikistan, it's expected that Basal Eurasian will go down and we'd see populations with higher EHG and lower BE. At least without ancient DNA and based on modern one, that sounds perfectly parsimonious.

On the other hand, the Srubnaya_outlier sample is really a very big outlier. It has very little in common with the other 20+ samples we have from Sintashta, Potapovka, Srubnaya and Andronovo, who plot with modern NE Europe, not in Siberia. It's doesn't sound very likely that the steppe population that went south would be better represented by that outlier (or even a mix of it to produce a Yamnaya-like population).

That's not to say there's no Bronze Age steppe admixture in Iran and S-C Asia. But I think this is best represented by Sintashta (based on D-stats, which tend to show a shift towards Sintashta in certain populations), and it's probably not very high. But that shouldn't be a problem for the steppe hypothesis if R1a-Z93 did arrive with those guys. And I do agree with Rob that R1a-Z93 is still the strongest evidence we have for such migration. Though yes, we still have the question about L657. If that one is found south of the steppe then it would be a big problem (and I wonder if that's what they found that made them change the model? I just can't imagine that Krause and Haak would suddenly come up with that new model for no good reason, but we'll see).

Dave, may i ask which dataset and what sample size have you used for Turkish, Greeks? Could you add or paste here Turkish_Kayseri, turkish_balikesir, Turkish_adana, Turkish_aydin and Turkish_Trabzon PCA scores ? I want to make nMonte for turish population.

The preference for Yamnaya over Andronovo might be due to the greater ANE affinity in Yamnaya. There's higher ANE affinity in South Asia than is proportional for the expected ancestor (so Afanasievo also often fits, like it did for the Kalash).

If there does turn out to be higher ANE in Neolithic Indus Valley cultures like Mehrgarh, this strongly implies that Central Asian and South Central Asian foragers/HGs were basically ANE or were the southern equivalent of EHG and that Neolithic Iranians mixed with them during their movement into South Asia (if they did move into South Asia, they could just be cousins).

Not it makes too much impact in these things, but the Villabruna cluster is unlikely to be rooted in the near east. We have near eastern Epipalaeolithic and pre-Pottery samples from Anatolia to to the Caucasu, to Israel: it's not there .. , well it is, but mixed in with BE

On the other hand, it is also present in Europe but without basal , before the Neolithic.

The only solution is that it came from an intermediate zone- like the Black Sea region. If so, there's no reason why it won't be found in the steppe, and possibly as far as Central Asia after the Khvalynian transgression

@RamiThere's no reason for the Indo-Iranian family to be connected with corded ware. All we know is that the proto-Indo-Iranians showed prolonged linguistic contact with the proto-balto-slavic speakers and the genetic affinities of some of the later eastern bronze age individuals to those from CW indicates that there were contacts between the people living across the steppe. Andronovo for example shows a great deal of heterogeneity culturally and it would not be surprising if there was some degree of gentic heterogeneity within it too

But that Srubnaya sample represents a population (I mean, obviously, she can't have been magically sui generis). The existence of such people on the steppe is very important (I think it's a pretty big deal that she is predominately ANE), and serves as a clear indication that we need more steppe samples.

It's only a matter of time before we find L657 on the steppe (which is where it certainly will be found, considering that its brother clade is found in Sintashta, Andronovo, etc), and I'm willing to bet it'll be found in a steppe population with significant admixture from people like the Srubnaya outlier.

Regardless, even though we don't have the exact ancestral steppe samples for the Kalasha/Dards and Brahmins/Kshatriya, it is very obvious that these populations have substantial ancestry from the Bronze Age steppe.

As far as I'm concerned, this isn't even the kind of debate that it used to be, around a year ago. Mainly, because the existence of very substantial Bronze Age steppe ancestry throughout South Asia/Central Asia has been recognized in the scientific literature. The Lazaridis et al. paper on the ancient Near East was pretty clear that ANI is Bronze Age steppe + Neolithic Iran. And as per the estimates found in it, Pashtuns/Kalasha are 50% Yamnaya/Afanasevo-admixed. So, the estimates we see in the scientific literature are even higher than what we see with the nMonte method (in conjunction with PCA data).

With regard to South Asian agriculturalists go, we should remember that (as per some hints) IVC samples appear to be similar to tribal South Indians. Something to keep in mind.

It's only a matter of time before we find L657 on the steppe (which is where it certainly will be found, considering that its brother clade is found in Sintashta, Andronovo, etc), and I'm willing to bet it'll be found in a steppe population with significant admixture from people like the Srubnaya outlier.

First let me tell that archaeologically Srubnaya is the worst choice , IINW it was a culture which moved into Europe from Ukraine, not Asia. I guess you guys are running out of options .

Its also not a question of whether L-657 will be found in the steppes or not, it is a matter from how much ancient time, it will be there in S Asia , the place where it is specific of . Although we also have Z2124 in moderate frequency and with larger and correct sampling, many surprises are bound to come .

With regard to South Asian agriculturalists go, we should remember that (as per some hints) IVC samples appear to be similar to tribal South Indians. Something to keep in mind.

It bullshit now , they have reviewed their plan . But its not even based on Indian DNA yet AFAIK .

Jij,

the proto-Indo-Iranians showed prolonged linguistic contact with the proto-balto-slavic speakers

Nope . Vedic Sanskrit and Indic all together just don't show great affinity with B-S but also Germanic,Greek etc, this is known for last 250 years .

If you reread what I wrote, you'll find that I'm not making any claims regarding the Srubnaya culture.

Again, I'm just making a note of the genetic affinities we see with an outlier sample found in a Srubnaya context, and what that means for our discussion regarding Yamnaya versus Corded Ware affinity in the context of Indo-Aryan origins.

Sein,You will also need archaeological correspondence regarding population origins .Without having correct reference (aDNA) Its not wise to make even speculations , because they will have no real value no matter what you produce. So keeping that in mind the ''affinties'' that you speak of is a very very vague term :) .

There is no way Pashtuns are 50% steppe , you previously modeled Pashtuns as 60-70% LBk previously, which was a bit much. While using Yamnaya as a proxy works in theory because they have high ANE , it still does not work, because Yamnaya populations also have a high amount of WHG, but WHG scores are incredibly low or nil among Pashtuns, ditto with Kalash. This is where Rob's point is making sense. Previously ANI was assumed to be just one stream population, just a year ago , now its two. There is a lot of speculation going but one thing is for sure, Pashtuns cannot be 50% steppe, they don't have the WHG scores to back it up, simple as that.

I didn't model them as anything. You're making it seem like I magically made them behave that way, in the context of those qpAdm and TreeMix results. It was what it was, and what we see now with new data "is what it is".

Anyway, I don't know how much steppe ancestry Pashtuns have, by some sort of falsely exact percentage. Every method yields different output.

What I do know is that it is anywhere from 30% to 60%, probably 40%.

30% is the lowest they ever get modeled, with very conservative methods. 60% is what they get modeled as, in some output that's based on formal stats. To me, 40% seems reasonable, 30% is somewhat low and 60% seems too high.

Regardless, at the end of the day, it is going to be a very substantial percentage, no matter what (30% is some very serious admixture, and 60% would make it the main genetic element at play).

If you have any other questions, or wish to discuss further, email Iosif Lazardis. This debate is way too old/tiring, even though it's been settled. I don't have the patience for it anymore, so just talk to the researchers behind the actual papers.

I am not Pegasus btw. FYI. Considering you modelled them as 60-70% LBK farmer shows your OWD obsession . I am not debating there is steppe admixture in the region, but your making assumptions without any ioata of ancient genome data from that part of the world. What we do know is there is very low WHG scores among Pashtuns, that is a concrete fact, you can tree mix proportions all you want to eternity.

Lol "Rami", you are Pegasus from Anthrogenica, but that's besides the point, and doesn't add to the discussion. I mean, no one cares. It's tangential to our discusssion.

Also, you sound kinda stupid when you say that I modeled them. I have never used qpAdm or TreeMix (although, I am very well read on the underlying methods, and if my computer had more power and less damage, I'd do this stuff myself. Not hard, at all).

Anyway, qpAdm or TreeMix aren't "forced" to model in any direction.

TreeMix is totally unsupervised, doesn't really need any manual tweaking on the part of the user. The trees + migration edges are constructed/found by the program itself.

The same with qpAdm, you just set up left and right populations. Strictly speaking, the user isn't modeling anything, it does the modeling.

Regardless, we do have aDNA data from this part of the world. Those Neolithic Iranians are very South Asian-like. In fact, if we want to think in terms of old component names from ADMIXTURE runs (which I'm sure define how you think about population genetic data), those Iranian farmers are the Baloch/Gedrosia component, in the flesh!

So again, you are wrong, we do have some very relevant data, when it comes to the non-steppe side of ANI.

To be blunt, just learn a little something about the methods involved, and you'll realize why you harping on the "low WHG scores" makes you sound like an idiot.

Besides, like I said, this debate is so old/stale/tiring. Just read the recent Lazardis et al. paper, there just isn't much room for debate, anymore. We're not in the same place we were at, around a year ago.

At the end of the day, Pashtuns + Kalasha are anywhere from 30% to 60% LNBA European, although I think 40% is the true percentage, and the paper in question has them at 50%.

With the Pamiri peoples, at the very least, they are 40% LNBA European. In their case though, that would be the extreme low end of the range. In truth, they could easily be 55%-65% when it comes to this kind of ancestry.

These are facts about South Central Asian genetic history/structure. Stop being so overly sensitive and disingenuous, and just take science the way science was/is meant to be taken: objectively.

In closing, it is what it is, whining won't change anything.

If you want a substantive discussion, again, email Iosif Lazaridis, he's pretty open with discussing these questions.

I'm out (unless you respond, in which case I'll be forced to reciprocate. It's my nature, lol).

PS: Sorry for taking so long, just got home, been out on a trip. Also, no clue what the fuck OWD means.