search this blog

Wednesday, January 17, 2018

Another look at the genetic structure of Yamnaya

Yamnaya and other similar Eneolithic/Bronze Age herder groups from the Eurasian steppe were mostly a mixture of Eastern European Hunter-Gatherers (EHG) and Caucasus Hunter-Gatherers (CHG). But they also harbored minor ancestry from at least one, significantly more westerly, source that pulled them away from the EHG > CHG north/south genetic cline. This is easy to show with formal statistics (for instance, refer to the qpAdm output here) and illustrate with a decent Principal Component Analysis (PCA).

Over the past couple of years I've come to the conclusion that this minor westerly input probably came from the Carpathian Basin (modern-day Hungary) or somewhere nearby, like the Balkans (see here).
However, this inference was based on just a handful of Neolithic samples from the Carpathian Basin. Now, thanks to Lipson et al. 2017, I have genotype data from tens of individuals from several different Neolithic and Copper Age cultures from the region. So let's revisit the issue by plugging these new samples into qpAdm, and also using the very latest qpAdm methods as described in scientific literature (with Ethiopia_4500BP as the base pright sample to 15 other ancient pright groups and individuals).
Below are the results, best to worst, sorted by taildiff. For comparison, I ran extra models with ancient populations from other parts of Europe and also West Asia. It's interesting and, I'd say, important to note that the West Asian reference groups produce amongst the worst statistical fits (bolded). What this suggests is that Yamnaya did not harbor extra West Asian ancestry on top of its CHG input. And, by the way, please note that I'm only using Yamnaya_Samara in these runs because I prefer UDG-treated, and thus higher quality, ancient samples.

At the top of the list is Blatterhole_MN. Admittedly this is something of a surprise, considering the geographic distance between Blatterhole, Germany, and Samara, Russia. It's also an intriguing result because of the presence of Y-chromosome haplogroup R1b in both Blatterhole_MN and Yamnaya (see here).
However, this doesn't necessarily mean that Yamnaya harbors direct ancestry from Blatterhole_MN, or even any closely related group from North-Central Europe. Rather, Blatterhole_MN is simply the best proxy in this analysis for the non-CHG/EHG ancestry in Yamnaya, and the important question is why?
Considering also the presence at the top of the list of Koros_HG (which includes Hungary_HG I1507), Germany_MN and Vinca_MN, the likely answer is its high ratio of Western European Hunter-Gatherer (WHG) ancestry. Indeed, when I let qpAdm vary the WHG ratio, by dropping Blatterhole_MN and adding Koros_EN and Koros_HG in its place, I get an even better fit.

Considering the underlying more basic components in the Neolithic samples, all the working (Blatterhole, Koros_EN, Koros_EN+Koros_HG) models are pretty sharp on the total amount of CHG+Anatolia_N together, which are approximately 50% (varies from 49.7% to 51.0% drafting in the WHG+Anatolia_N proportions from Lipson's paper), and the remaining 50% (50.3% - 49%) WHG+EHG.

Within the models, CHG+WHG are positively related (models with slightly higher CHG fraction have slightly higher underlying WHG fraction). With as many outgroup relevant statistics on the right, it seems like this is balancing it to a very precise point on general north-south underlying dimension.

"So is the missing piece of the Yamnaya puzzle a population with roughly equal ratios of Early Neolithic (EN) and WHG ancestries from the Carpathian Basin or surrounds?"

I just want to mention that this is what a lot of us "R1b/Yamnaya skeptics" have been suggesting for a while (ie that R1b-M269 radiated out from around the Carpathians/Danube before some downstream clades then hitched a ride on Yamnaya). Obviously it remains to be seen if that skepticism will be validated.

The "third neolithic" talk on Bell Beaker Blogger's page fits this well IMHO.

Another thing: If I recall correctly it looks like Koros_HG has a tad extra ANE and a tad EEF. I can imagine that would somehow cause qpAdm to choose it over any other WHG sample with a target that has ANE and basal eurasian.

And also all Steppe ancestry in Europe came a long with significant EEF/WHG admixture. Those two mega "races" had a long history of mixing. By the time R1b P312 Steppe folk made it to Spain they may have been only 50% Steppe.

Well, it's in the Ruhr Valley, but I doubt if coal or iron ore were of any interest at that stage. There is copper in Sauerland, 40 to 100 km away, but ten minutes googling didn't turn up any references to pre-Roman mining.

regarding the Speculation of Copper and Blatterhohle - there's no basis. In fat the Blatterhohle R1b-V88 individual was found without context (in a cave without clear TRB or Baalberg or whichever local MNE culture artefacts). But an interesting clue is the links between Iron Gates, El Trocs and some of the Mariupol Ukraine V88 individuals, further evidence of intra-Europe movement patterns before the big steppe migration.

me wildly speculating - the V88 distribution in Africa is either very odd (if the source was distant) or not odd at all (if it was local) but on the assumption it was distant i googled around one time to see if there was anything in the areas of high concentration of V88 that might have drawn people from a long way away hoping for something fun like gold or copper but iirc the only clear potential candidate was salt. One of the high V88 regions was a major salt production center from ancient times iirc (that's if i got the right spot in the first place so may be nothing).

Who here remembers the Ukrainian hunter gatherers and that Eneolithic steppe-like sample from Ukraine from Mathieson 2017 et al?

As many of you probably recall, the Ukranian HGs were not exactly like EHGs. They had some WHG-like ancestry pulling them westward.

So, WHAT IF the WHG-like ancestry in Ukranians and Romanians was closely related to UHG in Neolithic Anatolians(and in turn EEFs)? It's a relevant question as Eastern Ukraine was, in all likelihood, the birth of the steppe ancestral package considering aDNA and archaeology alike.

In fact, I recall modeling Yamnaya Kalmykians a while back on nMonte using a battery of samples. They took no ANF ancestry at all, but preferred a mix of EHG+some WHG+CHG(don't recall the exact samples), instead. So, no surprise David's models prefer Koros_HG and HG-enriched Blatterhole_MN.

In this paper, the authors do not rule out, by any means, the arrival of agricultural, husbandry and pottery-making packages to the Donets/Lower Don regions from the NW Caucasus(or perhaps via the Eurasian route). In fact, they arguably show a tacit preference for this.

Furthermore, agriculture definitively began there at the tail end of 5th millennium BC. This is either contemporaneous with or post-dates some of the earliest "steppic" sites. Hence, if we assume an EEF route of transmission from Tripolye-like peoples, it doesn't preclude the earliest hybrid populations there being CHG+EHG+minor WHG. Either way, it doesn't require local EEF-like populations of E. Ukraine to have been significant contributors to the steppe package in the long run. They could have lived side-by-side and perhaps engaged in limited mixing but ultimately ended up as a dead-end population.

I remain open-minded, of course, and don't discount limited EEF in Yamnaya.

When do you think those genomes from Ukraine and Romania will become available?

Blatterhole_MN have R1b and clear links with the Western North of the Black sea region by mtDNA. Most likely, they are some kind of Mesolithic(? early? late?) aliens from the Western part of the Northern Black sea region. May be under pressure from Neolithic farmers is one part of their ancestral population get away in Germany, and the other on the Dnieper river in the Neolithic.

Protoboleraz_LCA clearly had contacts with the steppe culturally, and therefore received genetic flow. Baden_LCA comes from him and Tisza_LN & Tiszapolgar_ECA & Balaton_Lasinja_CA.

Hmm. Could this be do to a spread of R1b to the steppe via expansion of the Magdalenian culture. Archeology demonstrates that the magdalenian culture did expand out to at least poland

Based on Reich's paper of Ice age Europe The El miron cluster was fairly diverse with the El miron individual falling in between the rest of the El miron cluster and the Villabruna (WHG) cluster.

Or it could have been a slightly later, replacement expansion, of the Villabruna genetic cluster (WHG) which expanded out to at least the Czech republic. (https://reich.hms.harvard.edu/sites/reich.hms.harvard.edu/files/inline-files/FuQ_nature17993.pdf)

Blatterhold_MN is smack in the middle of that range and shows ~70% Villabruna or KO1 related WHG ancestry and ~30% EEF farmer ancestry. (https://reich.hms.harvard.edu/sites/reich.hms.harvard.edu/files/inline-files/nature24476_final.pdf)

Would seem parsimonious that the R1b came from the Villabruna lineage.

So R1b expanded out to eastern europe/steppe from western/central europe, where it was found only at low percentages, via the Villabruna-like lineage during the Late paleolithic/early mesolithic?

The area is not far from the Netherlands where the Vlaardingen Culture could be found in the peat- and marshlands. It was contemporary to the Funnel Beaker culture, even survived that well into the Corder Ware era. It showed clear Mesolithic treats, most likely a continuation of the local Dutch variety of Ertebolla called Swifterband culture.

https://en.wikipedia.org/wiki/Vlaardingen_culture

We also have a Funnelbeaker graveyard from near Schwerin - the Ostorf flat grave - which had fully mesolithic mtDNA and showed clear signs of being far more hunter-gatherer than normal funnelbeaker.

epoch2013 "Your scenario seems too far fetched with such examples in the neighbourhood."

It means nothing, as neither genetics Dutch cultures or their relations with the Blatterhol we do not know. It is important that in the Neolithic in Blatterhol and the Dnieper lived R1b-(V88), and, on the Dnieper river the population received more WHG in comparison with the Mesolithic.

That Koros individual I4971 is from Neolithic context c. 5300 BC, but seems to be yet another 'assimilated hunter-gatherer'. Reported lineages are I2a2 (Y) and K1 (mtDNA), although he can be further determined to be I2a2b. Just more confirmation that the area around the eastern Carpathian basin and adj. mountains were hunter-gather 'refugia' of I2a2 and probably certain types of R1b.

Isn't that just one way to make an Armenia EBA? You're relying on a pure CHG pop to exist in the Caucasus. That, in all likelihood, has no chance of being true. Why say Hungary_N + Kotias? Just have it be a pop between Armenia_EBA and Iran_ChL.

The Caucasus pops with good Anatolian admixture make the best proxy for an admixing source, going from Khavalynsk > Yamnaya

My models rely on a largely CHG, but partly EEF and EHG, farmer population existing in the North Caucasus and Don region of the steppe, both of which are yet to be sampled, during the Late Neolithic/Chalcolithic.

This is very plausible considering the archaeological and skeletal finds from that period in the North Caucasus and nearby parts of the steppe.

Mesopotamia isn't south of the Caspian. There's limited data from here and not enough to say it didn't happen. The trail from the Caucasus farmers is from Mesopotamia. There really isn't any way around that. After someone gets a couple hundred samples across this region and the Caucasus, it will be more clear.

@Davidski,"A bigger problem I think is the lack of South Caucasus and South Caspian mtDNA HGs in Yamnaya etc., like U7."

Maybe, the Caucasus lacked south Caspian mHGs but carried EEF mtDNA. That would make a Caucasus route for Yamnya's EEF stuff possible. But I think it makes a lot more sense Yamnaya's EEF is from Europe considering southeast Europe right around the corner was 90% Anatolian.

I think chad is right -there’s too few genomes outside Anatolia to really appreciate what’s what Anyhow , CHG must be either from south of the Caucasus or south / east of the Caspian. In fact as I’ve previously suggested, it’s probably several geographic and temporal layers

The Chalcolithic South Caspian populations that were expanding from Mesopotamia and Zagros made it to Egypt and Anatolia. We know this because it's easily seen in the ancient data without having to resort to any mental gymnastics.

They didn't make it onto the steppe, because their markers are missing from Bronze Age steppe populations.

Just follow the data without wishing too hard, and you'll see the reality.

No they didn’t . The Mesolithics who at one point lived in the NW Caucasus, and where of southern origin (Imeretian-Zarzian epipaleolithic) appears to have gone extinct. There is a 1000 year hiatus between the Mesolithic and the arrival of Mariupol- like groups in the north Caucasus (which would be on the WHG-EHG cline, and I2 /R1). Then new southerners arrived with Meshoko horizon from Georgia, mixed with Sfedny Stog people. Then again new southerners arrived with Majkop contacts (northern Ubaid and halaf, therefore not really “Mesopotamian”).

>The Chalcolithic South Caspian populations that were expanding from Mesopotamia and Zagros made it to Egypt and Anatolia. We know this because it's easily seen in the ancient data without having to resort to any mental gymnastics.

I'm not advocating anything about ancient Egypt here; just pointing out patterns in the data that have been discussed in scientific literature and on this blog already (ie. the spread of South Caspian/Mesopotamian/Iran_ChL ancestry to ancient Egypt).

The best model posted is the one with CHG + EHG + Koros_EN + Koros_HG, with proportions of 44% + 44% + 6% + 6%.

Ukraine Neolithic was SHG-like, and only received EHG during the Eneolithic with al already admixed Yamnaya-like population.

The North Caucasus (pre-Maykop) had clear links to the NW Black Sea region*, and I'd guess that those Balkans outlier can only realistically come from the North Caucasus (and they plot near modern Europeans).

The South Caucasus (home of CHG) had too much AN ancestry from early on (Armenia_Chl and Armenia_EBA). Everything around the Black Sea was too "western" by the LN/Chl to be the origin of a Yamnaya-like population.

So where does this leaves us? We need a population as "eastern" as Samara_Eneolithic (Khvalynsk), but more southern. This would make the lower Volga-Ural region as a good candidate. Either that, or Central Asia.

The first Yamnaya-like (but not exactly) sample we have is the Samara Eneolithic sample (probably not from Khvalynsk tribe) belonging to HG Q, and eastern marker.

The earliest Yamnaya proper (genetically) samples we have are from Samara and the Altai (Afanasievo), ca. 3000 BC.

Apparently (unplublished data), the best match so far for the mtDNA of Yamnaya comes from Ulug Depe**.

@Alberto "Ukraine Neolithic was SHG-like"Wrong. They were not SHG-like, such a term is not applicable to describe the mixture of EHG and WHG. They had an increase in the number of WHG compared to the Mesolithic.

"and only received EHG during the Eneolithic with al already admixed Yamnaya-like population."

@Rob, re: the discussion in this thread between you and Chad, as you know from past discussion, I was pretty interested in the idea of Iran_Chl admixture into the Armenia_EBA population (simply to tie in with general change in the Near East ME region as a single phenomenon).

But, arguing against it, would say I notice in the new West Eurasian Ancient 67 panel, in the higher dimensions than 1 and 2:

- Dimension 3 adds further distinction between CHG and Iran_N / Natufian / Levant_N Anatolian_N and shows some bending of particularly present day South Caucasian (Georgian / Abkhazian) towards Satsurblia and Kotias. North Caucasus also shows some of the same phenomenon (but seems reduced due to greater EHG affinity).

Armenian and Turkish populations who overlap in Dimension 1 and 2 with South Caucasus listed above do not overlap in this dimension and are more removed to the other, non-CHG end (which makes sense given their languages are from the north / east one way or another and relatively lower isolation means drawing in more ancestry from both Levant and Iran).

Steppe_EMBA samples also seem outbent towards CHG on this dimension (The West Eurasian type Karasuk and Mezhovskaya samples in the run don't seem to be).

- Dimension 4 where NW_Anatolia_N ancestry is distinguished from other Western farmer and Neolithic streams of ancestry (and where Western Europeans and Eastern Europeans are slightly more distinct from each other than PC2), there is also seems to be a slight outbending of North Caucasus and Steppe_EMBA samples towards the "other Western farmer" end (where Levantine and Bedouin samples are at the other pole).

...

Overall, I guess I'm in agreement that the sampling of eastern Anatolia and early South Caucasus / Armenian plateau is too weak to reject ancestry from there in Yamnaya (the Armenia Chalc and EBA may just be rejected becaue they are too complex mixes that don't have enough freedom to be fit).

But also seems like there has to be at least some continuity of CHG in South Caucasus, and don't totally see any compelling reason in modern dna as represented there that a "CHG stronghold" should have been in North rather than South Caucasus (working in complete ignorance of archaeology).

Armenia EBA and Iran ChL are the best admixing source to go from Khavalynsk > Yamnaya by f3. Nothing weak or rejected about it. Northern Mesopotamians are the ancestors of Caucasus farmers. Once we get the kind of coverage in Europe, across West Asia, it'll be pretty clear.

I think that could be different pright; he seems to be using greater number of European Upper Paleolithic+Mesolithic samples plus Natufian, ancient Ethiopia, Tianyuan. No recent Mbuti, Onge, Karitiana. (Always appreciate your comments, but we will get more out of this is comparing the methodology?).

Not enough power? I'm using the same stuff as you, so I don't follow here. Tianyuan is an early East Asian and not relevant. Karitiana will likely be closer to the ENA side of ANE anyway. Groups closer in time matter more here. Try a run with no farmers or mesolithic samples to see. Tianyuan changes nothing in the outcome, really. I've checked.

Also, there is only an H3a from Halaf and no y-DNA, so lets not get ahead of ourselves and say it's not in the later steppes. Just wait for the samples to come.

In qpAdm, in terms of modelling a European BA, say Rathlin, on earlier BA populations in the pleft like BB, would including Yamnaya in the pright (along with the obvious necessary ancients) be recommendable in terms of fleshing out things phylogeny wise? Or would that be going too deep/risking too much shared drift between left and right pops?

In qpAdm, in terms of modelling a European BA, say Rathlin, on earlier BA populations in the pleft like BB, would including Yamnaya in the pright (along with the obvious necessary ancients) be recommendable in terms of fleshing out things phylogeny wise? Or would that be going too deep/risking too much shared drift between left and right pops?

Somewhat belated, but here it is. I am going to preface this by highlighting three samples of interest: Eneolithic Samara 434, Yamnaya Samara I0357, and Y. Samara I0441(somewhat less interesting). If you recall from one of Dave's posts, 434 is the Hg Q suspected Kelteminar migrant(or admixed individual)

At a first go, virtually all the Indians just took I0357 and the Eneolithic Samara sample(434) for their steppe and no Sintasthta/Andronovo. Kshastriya took 434, primarily. Kalash took mainly the Samara sample, but also I0441. Pashtuns just took I0357, as did Burusho. Iranians from Iran just took Andronovo and Sintashta, though.

Midway through this, I changed my trajectory and decided to do a preliminary assessment of how Iranian Persians relate to Messopotamians and Arabs, choosing Iraqi_Jews, Arab Israel 1, Saudis, and Leb Muslims.

Guess what? I got back on track as two interesting things happened. The model SNATCHED Iraqi Jews AND opted for I0357 instead of Andronovo/Sintashta. The fit improved from 0.4% to 0.03%. Granted, it was an overfit as it opted to use >10 samples all of a sudden. After a bit of trimming, I was still left with 0.05% or so. One of my best fits(or overfits).Arab_Israel1 was used, but it's unlikely to reflect Arab-related ancestry since Zoroastrians also took it. Same story with them: fit improved, snatched Iraqi_Jews and opted for I0357. Mazandaranis and Lors "fell into line" in the same fashion.______________

I've played around with 434 in the past using a panel of various Hgs. Unlike the other 2 Eneolithic Samaran Samples, it actually preferred a good chunk of Iran_N(in addition to CHGs) and the fit was still worse than for the other 2. IIRC, it also took extra ANE.

Spurred on with my Indo-Iranian results, I chose to examine all the Yamnaya_Samara samples, using a panel of HGs AND Eneolithic 434. Guess what? They mainly opted for CHGs and Euro HGs, while I0357 took a considerable slice of 434 and Iran_N. I0441 grabbed these, too, but to a lesser extent. __________________________________________This leads me to suspect two things:

1.The Samara Bend area may have experienced some exotic steppe-like influence(if projected on 2D PCA, that is) from further East or SE. Possibly some Iran_N+EHG+extra ANE population. I0357 and 434 could be hints of this. Haplogroup-wise? Maybe Q+J2b.

2.If the first Indo-Iranians were, in fact, R1a-z93 Corded Ware folk carrying some EEF, they hybridized with some of these possible Central Asian(?) populations en route to BMAC and/or India. In other words, Monte's choice of I0357(and 434 in some) may reflect an imperfect composite tapping into both CWC and a mystery ancestral stream.

Can you look into these 3 samples(or at least I0357) using formal methods somehow?

No, I'm just packing the right pops with as many genetically diverse ancient populations and individuals as I can, while at the same time ensuring that my analyses are each based on at least 100K SNPs, so that I have as much discriminatory power as possible.

And then I look at the output, mainly the taildiff, to see how the models perform. If I'm seeing clear patterns that make sense in terms of biogeographical affinities, with, say, most groups being clearly discriminated against relative to a few that are obviously working well, then I'm happy.

By the way, you can e-mail Nick and Iosif about this sort of stuff, especially the more technical aspects. If your questions are legit they'll reply. And if you do find out anything new and useful, feel free to share it here.

@Lee, I believe in the latest papers and generally (Lazaridis 2017 supplement is good to look at) there is a lot of consideration to using qpWave to prune the minimal number of necessary populations in the pright (e.g. if you have all of La Brana, El Miron, GoyetQ-116-1, Villabruna in the pright, and the qpWave is only 2, then there is some way in which you can simplify down to needing only 2).

That said, even in that paper they abandon this for "All" sets including various outgroups, on the basis that ("Adding these later populations has one disadvantage: if populations A and B are both included in thelarger set and are composed of the same ancestral elements in similar proportions then A may be modeled as deriving most of its ancestry from B and vice versa. This does not clarify the ancestral origins of either population. However, this approach also has the advantage of identifying mixture when the admixing populations are themselves complex. For example, if a population A is a mix of B and C, and B and C are themselves 2- or 3-way mixtures, then this approach might identify a simpler mix in the origin of A than would be possible if B and C were not considered as source populations").

We should probably not think of qpAdm as actually less problematic than ADMIXTURE or PCA in regard to the problem of the pright being arbitrary. Using qpWave there is some degree of testing for redundancy, and whether the pright are even distinguished by multiple streams of ancestry, but it ultimately seems like an somewhat arbitrary choice of selecting populations which are believed to be able to distinguish the pleft in formal stats.

The other new approach in the Lazaridis paper (which is not yet part of the ADMIXTOOLS I think) is the simulation approach - directly simulating mixes of n populations, then running f4 of the form (real,simulated;X;Y) for various X;Y in a pright. The advantage of that is that the results are directly understandable in terms of comparison to (real,real;X;Y) and the f4 Z test for significance.

As well, with the simulation approach, you can run f4(simulated1,simulated2;real;outgroup), so that in the event that two simulations get all the outgroup relationships right, but actually either or both is not very close to the real population, then you could detect that (qpAdm can't really do this at all).

But still this does not get you away from arbitrary elements in the pright choice.