search this blog

Thursday, February 12, 2015

Eastern Europe as a bifurcation hotspot for Y-hg R1

The main angle of the recently released epic manuscript Haak et al. 2015 is that ancient DNA supports the steppe origin of at least some of Europe's Indo-European languages. That's certainly a move in the right direction, so that we can eventually do away with the Anatolian hypothesis, which was always a failed proposition.

But it's clear that the authors are holding back. They've obviously decided to be very cautious until they've looked at more ancient DNA, particularly from the Near East, Central Asia and India, before backing fully any one Proto-Indo-European (PIE) urheimat model.

That's understandable, considering how much opposition there is still to the steppe hypothesis, even though it does by and large have the support of historical linguists, which is what really counts. Nevertheless, my feeling is that Haak et al. are underselling their data, particularly the stuff from Eastern Europe.

I'm of the opinion that the steppe or Kurgan PIE model works just fine, and also not surprised by the ancient DNA evidence pointing to a massive expansion of people from the western steppe during the Late Neolithic/Early Bronze Age. So for me, the really big news in this paper is that the only two Eastern European forager samples belong to basal lineages of Y-chromosome haplogroups R1a and R1b. What this suggests, Id' say, is that ancient Eastern Europe was a key bifurcation region for R1.

Remarkably, it's possible to basically lay out the history and phylogeny of R1a in Europe using just three R1a samples from the paper. This can't be a coincidence.

What we can see there is the progression from a basal R1a in pre-Neolithic Northeastern Europe to a derived R1a in late prehistoric Central Europe. The derived R1a is actually R1a1a1b1a2, which is by far the most common subclade of R1a in Europe today, and closely related to the Asian and Indo-Iranian-specific R1a1a1b2.

Interestingly, all seven of the Yamnaya males sampled by Haak et al., mostly from the Samara Valley, belong to R1b-M269, the most common subclade of R1b today. However, five belong to the West Asian-specific R1b-Z1203, but none to the West European-specific R1b-M412. Also, all nine Yamnaya samples show Near Eastern admixture, described in the paper as Armenian-like.

Does this perhaps mean that the Proto-Indo-Europeans (and thus Yamnaya) originated in the Near East, as per the Armenian Plateau hypothesis?

I doubt it. The aforementioned Eastern European R1b forager is also from the Samara Valley, and he clearly lacks Near Eastern admixture. So what are the chances that a Near Eastern population with a frequency of R1b-M269 of around 100% moved into an area of Eastern Europe where a more basal R1b was already present, and in fact in a population with no Near Eastern ancestry? Very slim, I'd say.

So how did the Yamnaya herders acquire their Near Eastern admixture? The answer is obvious if we look at their mtDNA haplogroups. These include H, T and W, all of which might have come to Eastern Europe from the Near East.

Of course this doesn't mean that the Eastern European steppe was overrun by Near Eastern Amazons. It's generally accepted that during the Neolithic the steppe was settled by farmers from the Near East, just like much of the rest of Europe, and I'd say that it was mostly the women from these groups who were incorporated into the later pastoralist societies of the steppe. The men, who probably belonged to Near Eastern haplogroups like G or T, might have been killed or marginalized in some way, so that their reproductive success was seriously hampered.

This is not a far fetched scenario. Typical hunter-gatherer Y-haplogroups like I2 and C6 have already been recorded alongside Near Eastern-specific mtDNA lineages at several Neolithic sites in Western and Central Europe. The social mechanisms for this might have been different there than on the steppe, but in any case, it seems that European hunter-gatherer males shacking up with farm girls of largely Near Eastern ancestry was not an unusual occurrence back in the day.

Now, if Eastern Europe was indeed a bifurcation hotspot for R1, then a large proportion, or even the majority of R1a and R1b in Eurasia today, might well be of Eastern European origin. If so, there should be some support for this in genome-wide DNA of present-day Asians, and indeed I think there is.

Below are a couple of principal component analyses (PCA). The first is from Haak et al. and the second from my own West Eurasia K8 analysis (see here). Unfortunately, I don't yet have access to the Yamnaya genomes, but I think it's petty easy to guesstimate where they will land on my plot when I run them in the K8. I marked this spot with an X.

Note that most of the Near Eastern and Caucasian populations are clearly shifted east towards ANE, and also up towards Europe. Moreover, I'd say many of these groups are specifically pushing up towards the Volga-Ural samples and thus the Yamnaya herders.

There's really no other way to explain this outcome. Quite simply, the vast majority of West Asians have relatively recent (post-Neolithic?) ancestry from the Ural or Kazakh steppe, which manifests itself as a west to east cline on PCA, running from the southern Levant to the north Caucasus. This result is easily reproduced on any decent PCA with West Eurasian populations, and can be seen on the Haak et al. plot.

I'm yet to find solid evidence that Indo-European speakers from the Near East, like Armenians, Kurds and Iranians, don't harbor fairly significant ancestry from this northeastern source.

For instance, unlike many people, I don't find unsupervised ADMIXTURE analyses very convincing when they show these groups to be entirely of Near Eastern ancestry. That's because when ADMIXTURE creates a modern Near Eastern/West Asian cluster, it usually lumps within it all of the ancient ancestral components that are today ubiquitous in the Near East. In other words, the steppe admixture which shows up amongst most West Asians on the PCA above is classified as native to the Near East, even though this is unlikely to be true.

"The men, who probably belonged to typical Near Eastern haplogroups like G or T, might have been killed or marginalized in some way, so that their reproductive success was seriously hampered."

David, The Natchez people ( Native Americans from the Mississippi) provide a terrific example of a descent system which might provide some insight on the aforementioned problem.

" Natchez society was divided into only two classes, the nobility and the commoners. Within the nobility, the royal family was distinguished by the rank of Suns, but this rank was lost at three generations removed from the royal line. The Sun royal family was also internally ranked, and Sun males filled the top political offices of the nation roughly in order of their genealogical rank. Noble men filled many of the secondary offices in the state, including war chieftainships and presumably some of the state offices mentioned earlier, such as peace chiefs and administrators. Between the nobility and the commoners was a special rank for males, that of Honored men. Honored men by birth may have filled such positions as the head servant or “speaker” of the Great Sun or the Guardians of the Temple, both mentioned by Dumont (in Swanton 1911: 151,161). This rank kept the male descendants of the matrilineal nobility from falling directly into the class of commoners. Men of Honored birth were also allowed to reascend to the class of Nobles if they achieved fame through warlike exploits. The rank of Honored could also be attained by men of commoner birth, through exploits in wars...

Like the Nobles, Honored men were exempt from marriage with Suns, and from mortuary sacrifice through any other means. Extension of the exogamy of the nobility to exempt Honored men from marriages with Suns provided part of a powerful motivation for commoner men to achieve Honored rank. The possibility of achieving Honored status not only gave commoners a sense of continuity with the nobility, but induced their participation in the military and ritual affairs of the state organization.

The rules of hereditary class and rankmembership, assuming that the spouses of all nobility and of Honored men were commoners, are summarized asfollows:

These rules must be qualified by the provision that descent in the female line ofnobility degenerated after three generations, except in the royal line."

http://eclectic.ss.uci.edu/~drwhite/pub/NatchezPeople.pdf

If instead of Suns you have Near Eastern farmers and likewise EHG instead of commoners, you could see the EHG males - specialised in hunting par excellence - dominating the higher classes and relegating male farmers to the commoner class and through time eventually to extinction.

"So what are the chances that a Near Eastern population with a frequency of R1b-M269 of around 100% moved into an area of Eastern Europe where a more basal R1b was already present, and in fact in a population with no Near Eastern ancestry? Very slim, I'd say."

Its not slim. Its typical. Look at the mordern day distribution of R1a L664, old european and east european z283.

The basal R1b are early stragglers from a core south of the caucasus. It also explains why R1b is found in Spain. It's not a mammoth riding husband who stumbles onto a Cardian cruise ship.

Its because the Cardian launching areas were an easier trek from Armenia / western Iran.

Is the Spanish R1b sample over 90% EEF ? Am I seeing this correctly? If so, doesn't this simple fact itself eliminate the Samara as the origin point of R1b since both samples are only a few hunderd years apart but completely different in their components?

In my adventures with personal genomics I've come across clients with Sub-Saharan Y-DNA and East Asian mtDNA, but with 99% European genome-wide structure.

So a single sample with complex and perhaps unusual ancestry doesn't mean much. However, when you have pure northern hunter-gatherers with R1, and then later steppe groups mostly belonging to R1 (Andronovo, Yamnaya), then its very difficult to claim with a straight face that the Y-DNA of the steppe groups isn't native to the steppe.

Thank you. This article answers the questions I asked in this blog yesterday. I should say though that I still hold an opinion that R1b is Near Eastern, while R1a - local Eastern European, given the fact that now we have foragers who are R1a.

its not slim. Its typical. Look at the modern day distribution of R1a L664, old european and east european z283.

The basal R1b are early stragglers from a core south of the caucasus. It also explains why R1b is found in Spain.

r1b1 may or may not have been in the Near East prior to R1b1a2a (the common link of Bell Beaker and Yamnaya). It looks extremely silly that you assume it was there. However, the timing of R1b1a2a being in the yamnaya about 5kya is close in time to the genesis of R1b1a2a so we can expect the location of R1b1a2a's origin close by on the steppe.

Finally, one of the main preseasons david can say the chances of that yamnaya r1b originated in the near east is because they were 100% R1b1a2a. If near eastern men introduced R1b1a2a to the steppe you would expect some non R1b1a2a to be found all the way up in the samara valley were the overall composition of the yamnaya were largely native!

P, Q, R, R1, R1a and R1b all look like ancient steppe haplogroups to me at this point.

postneo,

The obviously significant genetic barriers between Eastern Europe and the Near East have only been breached in a big way on a few occasions (Neolithic transition, Indo-European migrations).

The two regions simply don't share the sort of population history that you're suggesting. So Mesolithic Eastern European genomes are very unlikely to tell anything about Mesolithic and Neolithic Iranians.

Is the Spanish R1b sample over 90% EEF ? Am I seeing this correctly? If so, doesn't this simple fact itself eliminate the Samara as the origin point of R1b since both samples are only a few hunderd years apart but completely different in their components?

Id say the context of MA and r1a matters more than anything else. Having said that, who do you think is more "autosomally pure", as in, who do you think experienced more recent migration from outside their realm?

I have a few questions about interpreting the new paper. What is the "dark green" in the yamnaya composition on that K chart? It must be the near eastern stuff but why is it different than the sardinian component? Why not just EHG plus sardinian?

Also, does the pca plot have a significant projection bias. A few thing look kinda weird. Some ancient samples from scandinavia, I think the neo HG, look like extreme northwest sardinians which makes no sense. Also, shouldn't they have EHG anyways?

Finally, I can understand how the tuscans are modeled as early farmer plus yamnaya but why would the spanish be modeled that way? Spanish WHG should not all come down to yamnaya input, right?

Colin, that PCA is unprojected. MA-1 clusters with Yamnaya when it's projected.

According to the fits, Spanish don't have extra WHG beyond that contained in farmers and Yamnaya so in that regard they are similar to Tuscans. This holds true even in the improved fits that try to account for more recent components. The fit with least residuals (Figure S9.27 C) for Spanish is 18.5% BedouinB+54.4%EN+2.4% Nganasan+0%WHG+24.7% Yamnaya.

Basques in that same fit have 0% BedouinB and 4.6% WHG so it appears they do have extra WHG survival.

The dark green cluster is essentially ancient Near Eastern farmer + EHG/ANE + heavy drift among the South Central Asians and North Caucasians, who probably both have extra ANE obtained via a different source than EHG.

But I think it'd be pretty easy to get Yamnaya to score EEF by removing some of the Asians, like the Kalash. I'll try this at some point when the data becomes available.

As for the Spanish, I think what we're seeing in the main ADMIXTURE run is the result of both a local hunter-gatherer bounce back and Iron Age migrations from the north into Spain. But in the Yamnaya test, their WHG mostly gets lumped into EEF, for whatever reason. I'm not really sure about that test; Oetzi comes out part Yamnaya too.

Also, yeah, running PCA with a mix of modern and low quality ancient samples is always tricky. You really never know how things will come out. Often the outcome just looks like garbage, but I think the Haak et al. PCA above actually looks OK.

"The diversity of R, and R1b especially, in west Asia can't be ignored."

Sure but did they always live where they live now or did they move south from the steppe.

If the Kartvel languages were originally in the north and migrated south to get away from the horse dudes - like people have been doing throughout history until rifles were invented - that would solve that mystery.

.

"Is the Spanish R1b sample over 90% EEF ? Am I seeing this correctly? If so, doesn't this simple fact itself eliminate the Samara as the origin point of R1b since both samples are only a few hunderd years apart but completely different in their components?"

Davidski: The dark green cluster is essentially ancient Near Eastern farmer + EHG/ANE + heavy drift among the South Central Asians and North Caucasians, who probably both have extra ANE obtained via a different source than EHG.

I have some skepticism towards this idea, because when these West Asia clusters / ANI that contribute heavily to populations like South Central Asians and Indians and apparently now heavily to Yamnaya break out in ADMIXTURE, they don't ever actually have particularly high levels of drift by FSTs. Could be right though.

....

Looking at those PCAs you posted up David, I was motivated to overlay them to see where they are the same and where they differ, the following are a few versions of overlays:

http://i.imgur.com/WwACA3e.png - not much attempt to align anything other than getting WHG close and ENF close

http://i.imgur.com/WBDPrGB.png - tried to align this one so the Basques, Spanish and Sardinians fit right

http://i.imgur.com/Vy0ROqv.png - another attempt to align on the Basques, Spanish and Sardinians with a bit more stretching.

(apologies, I tinted them to try and get contrast in a fairly random way, so they look a little different in terms of colour and shade)

Overall, my impression is it seems like the Fateful Triangle is *pretty* similar to the West Eurasia PCA plot from Haak - there are the same two clines, roughly, and similar positions -, only differences are :

a) that Europe is more generally more compressed and similar on the Haak PCA plot compared to the Fateful Triangle

b) hunter gatherers are close to modern Europeans on the Haak plot, and there is less difference between Europeans in how close they are to them, compared to the Fateful Triangle

c) the Middle East and West Asia is by comparison much more compressed on the Fateful Triangle, and the Corded Ware and Yamnaya have slightly more separation from West Asia and the Caucasus than they are on Haak's PCA.

d) In all instances, many Middle Neolithic Europeans (particularly from Spain) are outside the triangle

So taking the Haak PCA as more accurate, it seems like the Fateful Triangle overstates slightly differences between HGs and modern Europeans, overstates differences between modern Europeans (particularly West to East), and understates intra-West Asian differences.

Whether this is because the triangle is correct, but later drifts have skewed everything or because the ancient populations are actually slightly different to the triangles MO. (Difficult to see how that would make Yamnaya and the Caucasus more similar though.)

That two long diverged Hgs - the R1a and R1b in the two EE foragers - are otherwise autosomally close suggests the antiquity of those lineages there, ie at least from 10 kya.

However, this does not exclude the possibility that other, at least R1b haplogroups were also found further south, in say Armenia and Iran.

Furthermore, consider that the earliest of the AMH dispersals to north Eurasia failed. The LGM refugia could have been in the south Caspian region, as studies suggest. So then, between the first re-colonization of the north from south Caspian (~ 12 kya) and later movements north (our pastoralist, 6kya event) enable ample time for the populations to diverge autosomally, yet still keep similar Y DNA base profiles (ie R1b derived groups).

In this regard, I think post-Neo has a point: basal clades remain in the periphery. The parallels to linguistics here needs no elaboration.

Furthermore, there is the obvious and major autosomal component which shows southern intrusion. Whilst possible, I find it a little hard to accept this can be simply laid at the feet of mail order brides from the southern Caucasus region. To me, this is a little too cutesy, and appears like trying to explain away a big elephant in the room.

What I am suggesting is that the southern intrusions toward the steppe seen autosomally, mtDNA, archaeologically, could also be the R1b-Z213 seen in eastern Yamnaya, displacing to a large degree the more basal form (admittedly of n=1) from the Mesolithic sample.

The basal clades like R1a-M420* have indeed survived in the peripheries (West and Central Asia). That's why we needed ancient DNA from Eastern European foragers to learn that Eastern Europe was the core.

The basal clades have basically disappeared from the Eastern European steppe because of massive recent expansions there, of both R1a-M417 and R1b-M269. But they survived in the Near East and in the mountains of Tajikistan etc., because these regions were on the periphery of these expansions.

Davidski: The Fateful Triangle PCA packs well over 80% of the total variance into dimensions 1&2.My guess is that the Haak et al. PCA shows much less than 25%, and probably much less than 10%.

Well, 80% of the variation in the ADMIXTURE components put into the PCA (which seems pretty easy) or 80% of the genotype, which is totally different (and I don't know how close PCA ever get to this), what's your reasoning behind this belief?

Your take is that there is a lot of higher order differentiation between Europeans and between Europeans and Mesolithic Foragers hidden in their Haak PCA? And more similarity between Middle Eastern and West Asian pops that is hidden in their PCA? These being the two main areas where they diverge. It seems likely to me that their PCA with their range of ancient and modern samples is going to be more informative about relationships between the samples put into it, compared to a pipeline where an ADMIXTURE ascertained with synthetic individuals is then fed through a PCA.

Ok, then. So you still argue that the source for major apparent ANE shift in Europe must have been Yamnaya groups ? Coz to me, it seems like your focussing on only 1 dimension of the changes seen - the apparent eastward shift. But look globally, and you're actually seeing a "homogenization" of groups in west Eurasia. Yamnaya shift south just as EBA Europeans shift Northeast.

The 'extinct' Neolithic groups lie no further outside the modern European range than EHG groups.

I think the Haak et al. PCA is OK, but I wouldn't call it especially informative. The distances between the samples are skewed for a number of reasons, resulting in rather absurd outcomes like Ust'-Ishim being placed near the Caucasus and MA-1 not too far away.

Yes, one way to look at it is that it was a homogenizing process within West Eurasia, sparked by very specific conditions on the steppe. Other such processes included the Neolithic transition which happened earlier, and then various events during the metal ages and the medieval period, as well as isolation-by-distance.

Well, if we are to follow the genetic trail, it seems to me one-eyed to lay the door at the Yamnaya intrusions and stop there, and disregard what arguably catalyzed the "revolution" on the steppe : southerners.

Moreover, the new, "shifted" EBA European population was not solely the result of Yamnaya intrusions. Whilst providing some impetus, I still look to LN groups in Poland (GAC) and Moldavia (TRB), not to mention residual HGs on the greater periphery. The former might be more 'ANE' and cluster apart from the homogenous 'central European' farmers from Hungary and Germany.

Quite simply ; unless we find R1b-L51 in copper age Dnieper-Don steppe, and at the same time having excluded its earlier or contemporaneous presence in the Balkans and west Asia; then the kurgan hypothesis is doomed. We'd have to admit that a major reason for the apparent shift in BA Central Europe is input other areas of Eastern Europe- "Mesolithic residuals" and LN farmers from karelia, eastern Scandinavia and down to the East Carpathians- ie your real ancestors, and not those of modern Armenians and non-Slav Pre-Ural Russians :)

I still think the main problem is treating the population that entered the steppe as "Near Eastern".

Once we know that these people were ANE (who gathered some NE ancestry in their way to the steppe), everything will be much easier to explain.

ANE (from Central Asia and/or Iran) were not related to Basal Eurasian populations. They were closer to Europeans. And Y-DNA R was native to them as much as to Europeans.

A couple of answers from the previous post that might have got lost with this new one that I didn't see till now:

@Fanty

"Do modern day Armenians have 60% ANE to arrive at 50% ANE for Yamna?"

As I wrote, modern day Armenians might be direct descendants of these ANE people from Central Asia and Iran. But they moved to the Near East at least 5000 years ago. Hence, their very high Near Eastern admixture.

The ones that entered the steppe and founded the Yamnaya culture had much higher ANE component (actually this component will have to be slightly corrected, and renamed, once a genome from the original population is obtained).

EHG as such ceased to exist once this population (ANE) entered the steppe. A few might have survived a little while longer in Karelia or other remote areas, but they didn't go anywhere. They just went extinct, like WHG before them. (Though of course their genes also survived, but carried by other people, in this case the Yamnaya people).

@Chad

"The Neolithic in Iran starts at 8000BCE. There's no pure EHG coming out of there between 6000-8000BCE, and we have no evidence of EHG in Iran"

Of course. EHG were never in Iran. It was a population that's lately been called ANE (thought I can't wait till they rename that as Ancient Central Asian, or Ancient Iranian or whatever) that was there. That population was the one who migrated everywhere, including the steppe, where they replaced (and mixed with) the EHG.

That population was not Basal Eurasian, they were ANE, and had y-dna R since the Paleoplithic.

@David and others''Ancient DNA from the Near East will either verify or debunk my claims here. I know there's some on the way later this year, courtesy of the ancient DNA lab at Copenhagen University.''This where you and others fail its just not genetics but ALL other fields which have already debunked the Steppe Hypothesis! if any people have any objection to that claim just shoot.....

@KrefterThe main thing is West Asia-South Of Caspian is by very far the best candidate for PIE and its dispersal BUT all of them have O data from aDNA and all ready we are seeing Asian influx into the steppes corresponding to Archaeological and anthropological data.We do have very basal R1a,R1b,R2a even IJ* clades to be there in numbers which can't just be described as coincidence+Peripheral phenomena taking the above mentioned data's conclusions into consideration! just apply common sense...

If we try to understand the spread of IE languages,we should also consider the spread of IE towards East. Thus it might be better to look at WHG ratios of K8 in Eastern populations to detect correlation between IE and WHG companent. Which Tadjicks and Kurds have relatively higher. But Iranians Armenians and potic populations of Turkey have relatively less.In my opinion R1b (R-M269 L23*)in eastern Anatolia or Transcaucasia had not been carried by IE speakers. IE should be learned later by R1b populations. On the other hand there is no any evidence that IE is spoken in Anatolia before Hittities which is after Yamna. (2000 BC)Most of the antique langues of Transcaucassian was related with Caucasian languages like Sumerian or Hurri-Urartu and modern they Kartvelian languages. But according to autosomal results there was a deep contact with ANE and caucasian companent even before bronze age. Thus spread of ANE in to caucasia might happen in 2 wave. First one was related with Gedrosa-Caucasia contact during neoletic (which brought anatolian r1b), the other one is IE invasion which was completely related with IE invasion. Thus problem is how and Where R1b learned or developed IE. Yamna is still best candidate. but as a good alternative Corded ware was source of IE rather then Yamna...

"The Kurgan hypothesis looks far from doomed. In fact, it looks better than ever."

The Kurgan Hypothesis in itself is ok. But the only thing that's missing is that the Yamanya culture was not the culture of the steppe people, but the culture brought by a foreign population who replaced/mixed with the steppe people.

Other that that, the rest of the hypothesis is ok.

But that detail above implies that Yamnaya can't be the PIE homeland, because they Yamnaya people were not native to the steppe. They were recent migrants, and their homeland was in Asia.

"We have lots of Y-DNA from Kurgans now, and almost all of it is R1, which looks very much native to Eastern Europe. Not a single Kurgan burial has yet produced a Near Eastern Y-haplogroup."

Of course. Why would it? No Near Eastern Y-Haplogroup entered the steppe at this time.

Let's say that the intruding population came from Turkmenistan (rich in R1b). They moved to Armenia (rich in R1b). In the way they picked up NE ancestry, obviously from the women (hence the NE mtDNA). Then they entered the steppe.

I'll repeat again that the population that gave birth to Yamnaya were not native from the Near East. They came from elsewhere in Asia (Iran, Central Asia). And Y-DNA R was native to this ANE population.

"Of course this doesn't mean that the Eastern European steppe was overrun by Near Eastern Amazons. It's generally accepted that during the Neolithic the steppe was settled by farmers from the Near East, just like much of the rest of Europe, and I'd say that it was mostly the women from these groups who were incorporated into the later pastoralist societies of the steppe. The men, who probably belonged to Near Eastern haplogroups like G or T, might have been killed or marginalized in some way, so that their reproductive success was seriously hampered.

This part of your theory completely ignores the fact that older samples from Samara are pure EHG. And then suddenly the population changed. There was no EEF settlement for a couple millennia before the EHG took on them. No EEF in Samara found at all.

It also ignores that the "other" population (not the EHG) were extremely high in ANE (that was lacking in EEF).

"Yamna looks very non-ENF in what I've read so far, and I wonder how they came up with the statistic "50% Armenian"."

Yes, that's the key. That population was not like modern day Armenians. They were Ancient North Eurasians (who picked some ENF ancestry when they moved to West Asia before entering the steppe).

Till we don't get the genomes we won't know for sure, but my bet (if anyone wan't to take it), is that the Samara HG will be around 60% WHG, 40% ANE. And some of the Samara samples will be 30-35% WHG, 50% ANE.

Davidski: I think the Haak et al. PCA is OK, but I wouldn't call it especially informative. The distances between the samples are skewed for a number of reasons, resulting in rather absurd outcomes like Ust'-Ishim being placed near the Caucasus and MA-1 not too far away.

For the ancient samples who aren't that relevant to European variation like Ust-Ishim, K14 or maybe to a lesser extent, MA-1 (especially as this paper seems to half doubt ANE in places), then they might not place that informatively.

I'd expect it to capture much of the variation in the samples that are close to modern Europeans though (compare to using ADMIXTURE components designed to fit both world populations and synthetic persons effectively has its own projection bias compared to an actual PCA run on the real genotypes of the real people you're interested in).

I'll probably still take the Haak PCA as more informative, I'm not going to keep pushing you hard over this or anything. You've got your reasons (quality of genotype data).

Still, if the Haak PCA is uninformative, and doesn't explain much of the variance and there's all this hidden differentiation and similarity between populations, then...

it's probably a bit of a fool's errand to try and look at where the Yamnaya samples place on Haak's PCA, looking mainly at where they place relative to Northeast Europe, and then try and place them on the Fateful Triangle based on that. We can't be all like "Oh, this PCA is informative for when we want to place the Yamnaya relative to Northeast Europeans (and supports our preconceptions on that), then not informative otherwise".

David, a few things i disagree with: note that the f4 stats that they use is sensitive to admixture proportion, and gives the best numbers with the best proportions. The % from the stats is both 1) the most negative stat, aka the best choice for the other half of Yamnaya is Iraqi Jew and Armenian, and 2) the best stat, aka the proportion from Armenian which minimises the residuals. And the pca in Haak is extremely distorted, which means the position here will probably be wrong.

Matt, to fit the samples in the triangle, have you considered that the entire PCA has to be stretched, and different places by a different factor, to account for the distortion in Haak? The more eastwards we go, the further towards the east and south the stretching has to be. The whole plot has to be stretched SE in general for modern pops to fit David's triangle.

As the yamnaya proportions between EHG and Armenian/Iraqi jew gives 26-27% ANE, (and Krefter, it doesn't matter what language Haak speaks as the genome he and we are analyzing is the same, so it will be forced into the ANE K8 anyway) this places them just west of the easternmost Kets on that plot, and as they have 35-38% ENF, they will be somewhat south of the Moksha and somewhat north of ukrainians.

They will score around volga-ural pops today, but certainly not on the European side of them.

I place them here...http://imgur.com/4NeKb1M

I have always maintained that the Yamnaya will score in a triangle between Volga Ural, NCauc and Cent Asian, as three streams of pops stretch towards that empty area from S Asia, W Asia and East Europe.

@ All

I'm gonna run off some predictions which I had, and which I expect will be corroborated when the paper comes out in entirety.

1)"We also studied differences between the Corded Ware and present-day Europeans usingstatistics of the form f4(European, Corded_Ware_LN; Other, Chimp), with Other chosen fromthe list: LBK_EN, Loschbour, Karelia_HG, Yamnaya. These statistics are plotted in Fig. S7.11, and show that both the EHG and the Yamnaya share more alleles with the Corded Warethan with any present-day European population. This is expected in the case of southernEuropeans (as the Corded Ware horizon was a central/northern European phenomenon, andone might not expect present-day southern Europeans to form a clade with the Corded Warepopulation), but we find that it is also true for all present-day northern Europeans as well.This suggests that the ancestry introduced into Europe from the steppe during the LateNeolithic was later diluted, a process that had already begun during the Late Neolithic perioditself (Table S7.7). This dilution may have involved the pre-existing farming population ofEurope, but in parts of Europe may have included populations with substantial huntergathererancestry, as indicated by the fact that the statistic f4(European, Corded_Ware_LN;Loschbour, Chimp) is significantly positive for some European populations such asLithuanians, Estonians, and Icelanders (Fig S7.11c)."

Therefore, Estonian, Lithuanian and so on have the most significantly more WHG than corded Ware, and, looking at that image in the supp info, most europeans have more WHG than Corded Ware. WHG seems to rise again starting from a focus around the baltic.

Using that stat, it is quite easy to get a very precise window for Corded Ware. As many europeans--almost all in the north--share more alleles with Loschbour than Corded Ware does, and the English, Basque, Hungarian, Croatian, genomes are those that have ~0 on that statistic, aka share no more alleles with loschour than CW does, and the french and hungarian and spanish_North share less with loschbour than CW does, I predict CW is gonna be ~42% - 45% WHG.

Also, it is unlikely that the genomes that contributed directly to modern day european is that different from yamnaya. CW is modeled as ~80% Yamnaya, with the rest WHG, EHG and neol, and this was a different culture with a one-time transmission across a climate zone with ample opportunities for introgression. Genomes from other parts of the Yamnaya horizon would probably still be fitted as >90% Samara Yamnaya--only marginally different.

Their YHaps should be much different I believe, we shouldn't let distr of YHaps today mislead us as to how differently distr they were before.

2) An interesting phenomenon is that, despite the 'eastern' and 'western' label, the EHG:WHG ratio is highest for Northwest Europeans, and WHG:EHG ratio highest for Northeast Europeans. I expect this phenomenon to be tracked by the fact that Yamnaya will score high in North Sea and East Euro, but not Baltic, while Corded Ware will show this too but less. In the ADMIXTURE run in the paper, the West Asian portion of Yamnaya behaves in this way, peaking in NW Europe with troughs in NE Europe and SW Europe at K=20.

I will reiterate my previous prediction that Yamnaya will score off the charts for East Euro and West Asian, in a ratio from 1:1 to 2:1, and following that will be North Sea, and following that Atlantic and Baltic. There will be no Med scoring, unless if West Asian is low, in which case variation will be dumped in East Med--but I think this is unlikely. There might be 'exotic' scoring in NAm and Siberian components to account for ANE.

I expect Corded ware to be dominated strongly by North Sea, and Bell Beaker by Atlantic.

3)the f4 stats look absolutely incredible and rock-solid. Defining two pops in relation to 15 outgroups, then using that definition to find if a third population is mixed between the two, is prob gonna displace D-stats and other stats as the centerpiece of papers on late, post-bronze aDNA I think, because in those cases its not clear what goes in pop A and B, much less the C that we're modelling, so getting rid of the need to know whats in A and B simplifies everything.

I think EHG will turn out to be less and less like a mixture of ANE and WHG the more we analyse it.

If we look at the model in 8.6 carefully, and attempt to understand its implications, it becomes immediately obvious that the only way the model makes Mal'ta and Karelians equidistant to NAms, while modelling Karelian as mixed between Mal'ta and loschbour, is by making the ANE in NAms branch off from Karelian(!), not from Mal'ta. So Mal'ta and EHG are equal in their distance to NAms because 1) EHG shares more drift with NAms than expected, as the contributions from the Karelian into the NAms(!) form a group to the exclusion of Mal'ta, and 2) Loschbour contribution into EHG then pulls the EHG away from NAms just enough, such that EHG and Mal'ta become equally balanced on the edge of a knife in terms of their distance from NAms.

You don't have to be a scientist to recognize that 'this is unparsimonious'. This is nothing but a mathematical artifact, a way to 'balance the equation' that never transpired in real life. Let's say Spanish turn out to be as far from Mexicans as Russians are. There is of course a way of explaining this by saying that the West Eurasian ancestry in Mexicans in fact comes from an old Russian-like population, explaining the closeness of Russian and Mexican, then Asian ancestry in today's Russians pulls them just far enough away from Mexican such that Spanish and Russian are perfectly equal in their distance from mexican. Does this strike you as a plausible scenario? It violates both geography and parsimony in cladistics.

Of course we can model EHG as admixed betwween ANE and WHG, but that makes no more sense than modelling Europeans as admixed between Bedouin and Dai. Modelling like this tends to fail depending on the pop you include, which in this case is NAms again, because NAms are closer to Europeans but not closer to Bedouin, disproving the model. Of course, allele freqs in EHG are intermediate between ANE and WHG just like Europeans are intermediate between Bedouin and Dai, but in the that's because Europeans are their own thing that split from Dai later than Bedouin did--thus sharing more with Dai--but also mixed with Bedouin. Thus spurious admixture of Europeans as Dai+Bedouin.

In fact, I think that the author's scenarios of (ANE(WHG-EHG)) or (WHG(ANE-EHG)) is prob gonna be correct, because a tree-like split of ANE, WHG, and EHG simply fits the f4 stats so much better--aka, where EHG is similar to ANE, it is similar to WHG, and where it is different from ANE it is also different from WHG, instead of the negative correlation you would expect if EHG is ANE+WHG.

Their model sees (ANE(WHG-EHG)) with ANE admixed or (WHG(ANE-EHG)) with WHG admixed, and the admixture comes from a source in (ANE-WHG-EHG) but not from either of these three, aka it split off before these three did, or Basal West Eurasian. Which also makes sense because all it amounts to is 1) an old WEur branch entered Europe, then (ANE-EHG) expanded and an EHG-like pop replaced 48% of the genome of Loschbour in Europe, or 2) an old WEur branch entered Siberia, then (WHG-EHG) contributed 31% to Mal'ta. Which is far better than NAms deriving their ANE from Karelian to the exclusion of Siberia, then voila! Everything balances on the tip of a pencil.

I kinda favour that Loschbour was admixed, aka Basal WEur reached Europe, the EHG-ANE replaced 48% in Eur to create Loschbour, cos the authors discover Kostenki to be closer to WHG than Mal'ta, and closer to Loschbour than to EHG(!), so we already have a candidate here.

Also... because we created EXACTLY this tree several months ago!

https://drive.google.com/file/d/0B9o3EYTdM8lQYUFpRV9EQlNRNU0/view

Researchers, if any of you are reading this blog, please cite David! and give a nod to the cloud too.

This also creates less of a rigid divide between the steppe and Europe, which always struck me as something weird.

@ Ryu: Matt, to fit the samples in the triangle, have you considered that the entire PCA has to be stretched, and different places by a different factor, to account for the distortion in Haak? The more eastwards we go, the further towards the east and south the stretching has to be. The whole plot has to be stretched SE in general for modern pops to fit David's triangle.

Good suggestion.

Though you can't just distort Haak's plot down south towards the east, to match David's Fateful Triangle, as that would depress the Northeast Europeans, who on Haak's plot are already relatively "south" compared to David's plot.

You need to stretch Haak's up to the northeast and stretch down the southeast while stretching the southwest and northwest towards the centre (or reverse to try and get the other plot close), reflecting more southeast-northeast differentiation and less southwest-northwest differentiation on David's plot than the plot from Haak (which I'd interpret as the ANE-ENF-WHG triangle not quite capturing the differentiation properly, as plots based on no ancients and only moderns from Laz and Haak look more like the Haak PCA here than they do the Fateful Triangle).

On page 25 , there is a title "pop label for analysis" , these codes are used for figure 3 on page 23 in the ancient part.You will see that the G and T are coded LBK_EN , they are all over 5000 years old and the code LBK_EN in the figure 3 states 100% Early Noethlitic.Since the paper also states they all came from yamnya, then these 100% EN people where also farming in Yamnya

@ KrefterI think thats wishful thinking at this point. There is not a single piece of hard evidence which shows that Yamnaya are less than 30% ENF, while there are many hard pieces of math which shows otherwise.

That plot places basques as 60% WHG in linear distances, and forces all pops away from bedouin and closer to Mal'ta and WHG compared to David's PCAs.

Lithuanians are fit as ~50% Yamnaya, ~40% WHG, and only slightly more than 10% LBK. Tell me where all their ~30% ENF ancestry has gone? Note that LBK is not gonna turn out 100% ENF. And this in David's own ANE K8.

In fact, Estonian are 51% WHG and 27% ENF in ANE k8. They are fit as >40% WHG, ~50% Yamnaya, and <10% LBK. Go figure.

"Is the Spanish R1b sample over 90% EEF ? Am I seeing this correctly? If so, doesn't this simple fact itself eliminate the Samara as the origin point of R1b since both samples are only a few hunderd years apart but completely different in their components?"

I remind you that EEF's are about 30-40% forager in ancestry, and so it's not strange that he had a "forager" y-DNA haplogroup, in fact there's been many other EEFS with HG haplogroups, such as C or I2.

RK: Lol that was what tried to do that in GIMP. But gave up after 5 min.

I somehow kept the will to hammer away in Pixlr Editor for more than 5 mins. Yeah, note I still messed it up a little though as, the overlaid Haak plot should also be stretched horizontally.

RK: Lithuanians are fit as ~50% Yamnaya, ~40% WHG, and only slightly more than 10% LBK. Tell me where all their ~30% ENF ancestry has gone? Note that LBK is not gonna turn out 100% ENF. And this in David's own ANE K8.

If WHG is already accounted for by 40% direct contribution, Lithuanians have 12% more only, then the Yamnaya strand should be only at most 24% WHG.

Same with ANE, if Lithuanians have 18% ANE and that only comes through 50% Yamnaya, then Yamnaya have 36% ANE.

That leads to 50% ANE+WHG, 50% something else, which under ANE:WHG:ENF must be ENF (of course, then 50/2= 25, so adding in another 10% ENF via LBK wouldn't work but it would be closer than assuming Yamnaya is like 27% ENF).

So something is up with one set of estimates or the other - they don't really square - or perhaps the whole idea of ANE+WHG+ENF...

Btw, thinking, if we have WHG, LBK and Samara Yamnaya estimates in the paper, and we've got an WHG synthetic individuals and can generate synthetic LBK individuals - merge 72% synthetic ENF with 28% synthetic WHG, as per Stuttgart estimate, or 80% ENF, 20% WHG if you want to adjust for the paper's estimates for Stuttgart.

So perhaps those could be used to trigger the emergence of a Yamnaya cluster in Europe that fits with the estimates from the paper? Spanish and Tuscans would be ideal to test for triggers of the Yamnaya cluster, as according to the estimates, they are purely LBK plus Samara_Yamnaya.

Then run that synthetic Yamnaya back through ENF, ANE, WHG if you wanted to.

If you want to try forming an Yamnaya cluster with modern pops, Greeks would be preferable to Spanish and Tuscans. Yamnaya+LBK+BedouinB is a better fit for the latter than just LBK+Samara while it's not for Greeks.

"And the PCA in Haak is extremely distorted, which means the position here will probably be wrong."

"Matt, to fit the samples in the triangle, have you considered that the entire PCA has to be stretched, and different places by a different factor, to account for the distortion in Haak? The more eastward we go, the further towards the east and south the stretching has to be. The whole plot has to be stretched SE in general for modern pops to fit David's triangle."

Seriously, ryu, I appreciate you pointing this out.

The fact that the Haak et al plot is distorted makes me kind of ZONE OUT on the PCA results in this paper.

The y-dna from Haak et al is interesting, and happily adds to what we know about R y-dna dispersals, but the rest?

It's like listening to a barely audible radio signal.

For the time being, I'm giving up on trying to figure out who the Samara guys were.

In any case, regarding this statement in the Haak abstract:

"Western and Eastern Europe came into contact ~4,500 years ago, as the Late Neolithic Corded Ware people from Germany traced ~3/4 of their ancestry to the Yamnaya, documenting a massive migration into the heartland of Europe from its eastern periphery"

. . . I maintain that the paper reasonably demonstrates a Yamnaya connection with Corded Ware, but I am not sure this influence came directed from Samara or Yamnaya in the timeframe the paper is suggesting.

Furthermore, the paper does not demonstrate that Yamnaya/Steppe influence came directly to Southern Europe by way of the Steppe in the narrow timeframe they are suggesting.

So the mass migration thing "4,500" years ago is very overstated.

Even with all the problems with this paper, I see it got the royal treatment from Ewen Callaway in Nature News:

@MattThe elephant in the room is that is that ehg is not a mix of ane and whg. If it were, norwegians, who can be modelled in admixture as having much less ane than balts, would not outscore balts in ehg and Yamnaya ancestry. If you made the estimates with norwegians, approx 37 enf and modeled as 29 lbk, assuming lbk are 0.7 enf, that leaves approx 0.17 enf to fit in 0.53 yamnaya, which is still more than 30 percent enf. But 30 pc to 50 pc is a huge range, which tells us that trying to force yamnaya, esp the ehg side of it, into ane and whg is not working.

Obviously, norwegians have some dimension where they are much closer to ehg than their distance from mal'ta would suggest. This is obviously an instance of the model failing, in the same way as NAms cannot be successfully modelled if european are made to be dai plus bedouin, because there is some dimension where they are close to european but as far from bedouin as anything else, namely the dimension where ane differs from everyone else. Same for ehg here. Instances like this suggest that ehg is independent of ane and whg to a certain degree.

I'm gonna keep harping on this, because even the highly contrived model that they put up, aka the 'pencil balance' model, that tries to account for ehg as ane plus whg, still fails. So it fails both occams razor and statistically as a model. I think the authors' model that Ehg has a tree-like relationship with either whg or ane, and the one that it does not have a tree-like branch off of is the one that is admixed with some outgroup, is prob gonna get more support as time goes. This is the only model that makes sense of all the data, at least. Therefore, ehg did not 'introduce' ane into Europe in the conventional sense.

I favor that loschbour is admixed for the reasons described above, because kostenki tells us smth. If that model is true, then ehg just forms a clade with mal'ta, and in some ways 'is' just highly diverged ane that contributed to loschbour already once before. This would prob explain the r1b in Spain, but conversely no I in the ehg so far.

@krefterThanks for your suggestion. I have 200+ emails a day and a really busy schedule, I'm pulling time or for this lol. You know how college is.

That was David's work though. He should decide what kind of recognition he wants.

@Marineit always seemed strange to me why a pop that was 60 pc whg should have no I at all, when that was literally all the y hap that whg had. But if ehg had allele freqs between ane and whg only because it had contributed to whg, and thus can be modelled as ane plus whg despite this not really reflecting pop movts, then a natural way out is found.

The distrib of I outside of europe is extremely poor, even in siberia Afaik. I don't expect aDNA to change this view. Kristina prob has smth to say abt this.

Ryu: I favor that loschbour is admixed for the reasons described above, because kostenki tells us smth. If that model is true, then ehg just forms a clade with mal'ta, and in some ways 'is' just highly diverged ane that contributed to loschbour already once before.

I'm looking at model S8.12.a (p91).Archaeologically, is a model you think might map OK the idea of Epi-Gravettian and Solutrean LGM refugia, where C is your West European Solutrean guys and M is the pan-East European and Southeast European Epi-Gravettian group?

Descendants of M then mix with Basal Eurasian, O, in the Mid East to found the early Neolithic population then C (Upper Paleolithic West European) to produce E, WHG.

(If that is true though it seems possible that EHG itself might be an admixture of M and another in-clade descendent of A to produce position B (but unprovable on the basis of any dna we have).)

"Archaeologically, is a model you think might map OK the idea of Epi-Gravettian and Solutrean LGM refugia, where C is your West European Solutrean guys and M is the pan-East European and Southeast European Epi-Gravettian group?"

I don't think autosomal data has the ability to resolve something like the boundary between the Solutrean, Gravettian and Magdalenian.

That being said, the quite strong evidence for a Gravettian continuum extending into Western Europe, and fanning out across Eastern Europe and Siberia is very interesting.

Regarding specifics, its too complex a topic to discuss in a blog thread!

"Krefter" could probably teach us a lot, if he wasn't pretending to be such a goofball.

You can't do "simply maths" to effect a meaningful result, when the underlying assumptions used to create these components, like ANE, EHG, is fundamentally distorted and fails to account for layered population processes.

Why do you think Ust'-Ishim is closer to West Eurasians than MA-1 on the Haak et al. PCA? Do you think this actually makes sense?

Also, why do you think that many of the samples which apparently belong to the same clades are so far apart from each other, like the Spanish Neolithic farmers in dimension 2? Do they maybe harbor different levels of ANE-related ancestry? If not, then what's the issue?

On a related note, why is at least one of the SHG higher up in dimension one than Loschbour?

My answer is that the positions of the ancient samples are affected by factors other than their phylogeny and admixtures, such as read depth. This is why Ust'-Ishim is so close to the modern samples from the Near East, while, for instance, the Spanish farmers are further away, which obviously makes no sense whatsoever. That's just one example of the absurdity that I can see on that plot.

So when you try and convince me that the Haak et al. plot can be read more literally than my own plot, well I'm sorry but I just have to shake my head in total befuddlement.

rk,

I'm very skeptical of the suggestion the EHG are unadmixed. That's because I think there's something off about the MA-1 sequence in the Human Origins, which might be affecting the EHG analyses.

Maybe MA-1 is mixed, perhaps between EHG and something close to my ANE component from the K8. Then again, maybe it's slightly contaminated.

Also, the EHG probably come from a population with a very low effective population size, and on top of that their genomes are probably largely made up of pseudo diploid calls. This might well be the double whammy that skews their results and makes them look pure.

Davidski: Also, why do you think that many of the samples which apparently belong to the same clades are so far apart from each other, like the Spanish Neolithic farmers in dimension 2? Do they maybe harbor different levels of ANE-related ancestry? If not, then what's the issue?

On a related note, why are some of the SHG higher up in dimension one than Loschbour?

I expect its because they have genetic drifts from one another which are not captured by the fact they are in the same clade and the PCA is forced to contain this drift because it includes a number of them ? That's not an idea I find mysterious.

Single, highly divergent samples like Ust-Ishim can't push the PCA very much (like including a single Oceanian), so they end up relatively central, oriented only towards the part of the plot they share most drift with, even though it is very little of their total drift. MA1 being comparatively better represented in the dimensions on the graph is separated out more.

Listen, if you're still unconvinced, stay unconvinced. We can leave it at this.

"It doesn't need to be extreme patrilocality or killing of the "other" guys. Please stop making stuff so one dimensional."

Huh? I don't follow. Who talked about killing? Patrilocality just means the wife moves into the husband's lineage. As it happens, extreme forms in which the bride is considered a servant or effective mobile property of the groom's family and sometimes even kidnapped into it are still common to this day in Central Asia and the Caucasus, the very regions we are talking about.

"As it happens, extreme forms in which the bride is considered a servant or effective mobile property of the groom's family and sometimes even kidnapped into it are still common to this day in Central Asia and the Caucasus, the very regions we are talking about."

Too simple, if your talking about Armenians or Iranians.

And by the way, speaking from what I've heard from people from India or Armenia, some "bride knappings" are staged to save the honor of the girl's family when two people decide to elope (go against the dominant custom of arranged marriages by the parents.)

I'm not at all excusing the serious situation of child bride marriage, just pointing out that true "bride knapping" may not be as common as you think.

For what it's worth, Yamnaya are a complex mixture of ANE, WHG, something from the Near East, and perhaps some very slight ENA (I'm operating on the assumption that EHG is basically WHG+ANE, but perhaps with some minor ENA, which makes geographic sense, and is in line with the fact that SHG are modeled as a mixture of EHG and WHG). So, one can't really draw a true comparison between levels of purely ANE ancestry and levels of Yamnaya ancestry in Europe.

Also, they haven't really abandoned the ANE concept. On the contrary, it's an important part of the model. But they just wanted to focus on something far more specific to Europe, as ANE is very expansive (it peaks in the Amazon basin, the Hindu Kush mountains+Indus Valley, central Siberia, and the northeastern Caucasus. In Europe, the highest it gets is around 20%). Also, temporally speaking, Yamnaya is very relevant to European ancestry.

On a completely different note, an interesting pattern I've noticed is an African affinity among South Asians (it would probably hold for Central Asians as well, if they weren't East Asian admixed). For example, the Sindhi versus Yoruba fst is 0.133. This is much lower than other populations who fall under the "West Eurasian" rubric, even those with slight African admixture. The Mala, a "scheduled caste" South Indian population, have an fst of 0.139, which is low in comparison to other Eurasian populations. This might be related to the African percentages we see throughout South Asia in the "West Eurasia K8" test. Also, on any globally-oriented PCA plot, South Asians are further shifted towards Sub-Saharan Africa on dimension 1 in comparison to Europeans. Perhaps the Near Eastern agriculturalists who contributed substantial ancestry to South Asians had a strong African-like component to their ancestry? I don't know, it's an interesting pattern that needs further exploration.

"It's really annoying that key stats in Haak 2015 are shown as pictures instead of numbers."

I agree. I don't know why they didn't put the many numbers. Trying to hide something?

From the stats they give about Yamnaya admixture in modern pops, taking Lithuanians and Estonians I could figure out the numbers of Yamnaya. We'll see how close they are to the real ones when we get them:

Yamnaya:WHG: 20.4%ENF: 45%ANE: 34.6%

From this I could make 2 models from the Armenian-like population. The first one assumes that they had 0% WHG ancestry (which is more likely):

Armenian-like:WHG: 0%ENF: 68%ANE: 32%

This model would make Yamnya 33% EHG and 67% Armenian-like pop.

A second model assuming 10% WHG in the Armenian-like population (less likely) would be:

Armenian-like:WHG: 10%ENF: 63%ANE: 27%

With this second model (less likely), Yamnaya would be 20% EHG and 80% Armenian-like.

I figure if we can convert the numerical values I have acquired for modern Euro references from page 23 into your EEF, ANE and WHG model, and successfully plot the paper's modern groups, such as Norewgians, using the converted values from pg. 23 with the same modern groups from your fateful triangle, we may be able to plot some of the ancient samples somewhat accurately without having to have their actual genomes.

That is why I wanted Stuttgart and your Yamnaya stand in's values, so we can try.

I just realized that one of the Neolithic European individuals has Y-haplogroup H. Based on the close relationship between H and G, I think Y-haplogroup H in South Asia represents Neolithic Near Eastern heritage (H is concentrated mostly in South Asia, and rare outside of it).

Ust'-Ishim's position still makes little sense, I think essentially because it's just too old. But I'd say things would look quite different with both a small groups of MA-1's relatives and Ust'-Ishim's relatives.

In any case, the point I'm making here is that just because a PCA appears in a study like Haak 2015 doesn't mean it should be taken as gospel. For one, I'd say those problem Bedouin should be pruned from the Human Origins West Eurasia PCA. Secondly, I don't care who runs the PCA, if it includes a lot of ancient samples with different levels of coverage, then Jesus himself won't be able to make everything work so that it can be taken literally.

Y-HG T does actually have good representation in South Asia. Based on what I've quickly gleaned, it's more common among southern Indian populations, including tribal groups outside the caste system (in fact, it seems to be more common among tribal groups versus caste populations). Interestingly, Y-Hg T decreases the further north one goes, and doesn't show any increased frequencies among Indian high caste populations. I think that means it's a good candidate for something introduced from the Near East very long ago, and preserved best among peripheral groups in south/central India.

Not to mention that T's brother Y-HG L is very well represented in all of South and Central Asia.

Yamna is the only of the 3 components in Figure 2 with ANE. So, Norwegians scoring higher than Lithuanians doesn't make sense. This and other reasons make it impossible to get ANE K8 scores for the ancient groups, except maybe ones who lack Yamna.

Bronze age Germans appear much less ENF and much more ANE than I expected. I expect Corded ware will be around where we expected Yamna to cluster, and I expect Unetice to cluster in northeast Europe and Scandinavia, or east of them.

I think this is a very good idea, as the distribution of both haplogroups can easily be explained by the arrival of Near Eastern agriculture, and an association between them could easily explain the African signal we see in South Asia.

Certainly. The idea though is that the presence of H in Neolithic Europe, and the relationship between H and G, is good evidence that H entered South Asia with Near Eastern agriculturalists. In this case, the subclade isn't as important as the whole haplogroup.

Sein: Also, they haven't really abandoned the ANE concept. On the contrary, it's an important part of the model.

From the paper: Modeling of the ancient samples shows that while Karelia is genetically intermediate between Loschbour and MA1, the topology that considers Karelia as a mixture of these two elements is not the only one that can fit the data (SI8).

To avoid biasing our inferences by fitting an incorrect model, we developed new statistical methods that are substantial extensions of a previously reported approach, which allow us to obtain precise estimates of the proportion of mixture in later Europeans without requiring a formal model for the relationship among the ancestral populations.

It's not so much that they have abandoned ANE as they have not got total confidence it is the correct model, for one reason as EHG is no less related to Amerindians compared to MA1 (presumably when taking into account Onge related stats), which does not fit if EHG is admixed between WHG and ANE. They don't want to "reify" ANE, when totally different models could have been developed if it had just been chosen to sample the Karelian and Samara HG before MA-1, and there's weak evidence EHG even needs to be admixed between ANE and WHG.

That's why they've chosen to use other models to estimate Yamnaya related admixture directly that don't formally work through whether there is a tree with an Ancient North Eurasian population branching off (S9 and S10).

The paper's Figure S9.14, where other experiments have confirmed admixture and clade patterns through f4 correlations, fails to do so for admixture of MA1 and WHG leading to Samara and Karelia (stronger failure in Karelia).

They're developing formal tests and models of how many and what ancient populations are useful and this paper has only, if anything, thrown ANE into enough doubt that they've not relied on the idea or attempted to model estimates of ANE contribution to populations.

Remember before the paper came out when Patterson showed up and was evasive about ANE quantities in different populations "There's some question about how ANE is defined" (I think he said). And there seems to be. Still viable as a working model, with reasons why they chose not to rely on it too much, rather relying on models that use the samples they have directly and dispense with phylogeny. It'll be interesting to see how EHG behaves towards other populations which behaved as if they were ANE rich, i.e. in South Asia.

Amazing. I must be wrong about why Ust Ishim takes that position on Haak's PCA then, and it must be more to do with UI's extremely basal position in world variation, rather than that the PCA is not oriented to find a dimension of Ust Ishim related variation due to its low sample size. So on that one, you're definitely closer to right and I'm much closer to wrong.

I still think Haak's PCA is likely to tell us more about ancient population related variation than the Fateful Triangle, however. Those PCA you've posted up make it clear what the limitations of PCA are though, and how much they can show and under what conditions, so I'll not push it any more. Remember the Fateful Triangle itself is only derived from your (very impressive) experiments to produce synthetic populations which produce similar ADMIXTURE proportions to outcomes from the previous paper from Reich et al, which they themselves are now not 100% certain of with EHG, so I don't think its totally a done deal. You yourself have previously said that you were not sure that an ANE, WHG and ENF model would suffice for West Eurasia when you introduced the Fateful Triangle model on your bga101 site.

"Yamna is the only of the 3 components in Figure 2 with ANE. So, Norwegians scoring higher than Lithuanians doesn't make sense. This and other reasons make it impossible to get ANE K8 scores for the ancient groups, except maybe ones who lack Yamna."

Yes, tried with other populations instead of Estonians and Lithuanians and numbers get crazy. The admixture estimates in Figure 1 seem to be quite random (at least useless for estimating the Yamna proportions in K8).

@Davidski,I've seen the Sub-Saharan signal in South Asia. My mother even has South Asian admixture in her autosomes from her Sub-Saharan component. But L is too far down the tree IMO.@Sein,H is not associated with G except through F. F breaks into G and HIJK. HIJK breaks into H and IJK. IJK breaks into IJ and K. So all of HIJK is a sister clade to G under F. F-M282/H2 is the H clade most westerly in distribution except for the Roma, of course. I think that F-M282 gets convected with T and G2a to Europe with the Neolithic.

Essentially you were right. Ust'-Ishim is close to modern West Eurasians because it's outside the range of modern West Eurasian variation, and as a single sample of its type, it's not able to have much of an effect on the structure of the PCA.

The Oceanian analogy was a bit of a fail. But from memory, single Karitiana do cluster very close to West Eurasians, almost as close as Usti does.

However, just to get back to that Haak PCA for a moment. The methodology is just too arbitrary for my liking.

For instance, if more MA-1-like samples were added to that plot, MA-1 wouldn't cluster where it does. It'd be shifted much further to the right. And this, in my opinion, would be the more correct outcome.

Also, if all of the Spanish Neolithic farmers were very high coverage, they'd make a neat little cluster just north of the Sardinians, rather than a sprawled out one from west to east.

"In SI8 we show that MA1, EHG, and WHG cannot be related to each other by a simple tree and at least one of them must be admixed. While the direction of gene flow cannot be resolved uniquely, the above statistic clearly shows that there is some common genetic drift shared by all “European hunter-gatherers” (both WHG and EHG) at the exclusion of MA1."

Just thinking in spatio-temporal terms, the most logical candidate for being admixed is EHG (and EHG is still closer to WHG than it is to MA1, although MA1's closest population is EHG). Also, the symmetrical relationship that holds for EHG versus Native Americans and MA1 could be explicable via ENA admixture in EHG, which is plausible if we look at the mtDNA results for the eastern hunter gatherers (and if we recognize that Europeans without ENA admixture are always closer to MA1 than they are to Native Americans).

I agree. The most logical conclusion is that EHG is in fact WHG/ANE/ENA, in that order.

Considering geography and their uniparental markers, the fact these EHG individuals are coming out unadmixed probably means there's something off with the methodology, or with the genomes being analyzed, possibly with MA-1 rather than EHG though.

Sein: Just thinking in spatio-temporal terms, the most logical candidate for being admixed is EHG (and EHG is still closer to WHG than it is to MA1, although MA1's closest population is EHG).

On that basis only of distance purely I agree, as you say logically you'd expect the populations at the opposite ends of the continental landmass to admix in the middle. But there might be a climatic or other reason either MA1 or Loschbour might end up being a sink / mixed.

Sein: Also, the symmetrical relationship that holds for EHG versus Native Americans and MA1 could be explicable via ENA admixture in EHG

That is a good point, and I'm presuming that they tested D(Mbuti,(ENA);Loschbour,[EHG]) and D(Mbuti,(ENA);MA1,[EHG]) for all pairs of ENA and found nothing. As Ryu has mentioned upthread, their treatment in the phylogenic models where there is no admixture edges from the Onge/ENA clade to the EHG clade suggests this the found no evidence for EHG being closer to ENA. It would be nice if they'd made a specific reference though.

I haven't looked at all the models very closely, but in at least one of the figures Karelia_HG is modeled as 40% of something related to MA-1, but more Karitiana-like. I'm not familiar with ADMIXTUREGRAPH, because I've never used it, but could this actually be a signal of Amerindian-like Arctic admixture which is making the model fit without the need for a mixture edge from the Onge, which might not be the best surrogate for ENA admixture in Karelia_HG?

"NE (from Central Asia and/or Iran) were not related to Basal Eurasian populations. They were closer to Europeans. And Y-DNA R was native to them as much as to Europeans."

Yes. David is unwilling to consider this very real possibility because, for him, the fact that Mesolithic R1 in EE means that R1 had been there for 40 000 years, and can only have come from there. Forget that we are only basing this on n=1 from karelia, and n=o from the Caucasian highlands- Iran region. WHo needs sample when you have religious belief ?

Moreover, R1 and NE populations could only have come from north of the Caspian because EEF had no ANE. Like the proverbial ostrich, David and his disciples cannot grasp the complex notion that not all populations east of the Indus to the East Mediterranean- Aegean were a homogeneous group of ANE-devoid proto-Semites.

"The Kurgan Hypothesis in itself is ok. But the only thing that's missing is that the Yamanya culture was not the culture of the steppe people, but the culture brought by a foreign population who replaced/mixed with the steppe people.|| Other that that, the rest of the hypothesis is ok. || But that detail above implies that Yamnaya can't be the PIE homeland, because they Yamnaya people were not native to the steppe. They were recent migrants, and their homeland was in Asia."

Well, then the Kurgan hypothesis is not OK. The Kurgan hypothesis stipulates that the Yamnaya culture was THE homeland, and derived from preceding culture groups, like Repin, etc. If what you're saying - and Im also entertaining - is true, then the Kurgan hypothesis is up shit creek without a paddle, although David is valiantly doggy-paddling .

Apart from trying to brush away the obvious advent of W/A genes to the western steppe through an entirely unsubstantiated and rather laughable explanation which can be described as a "mass theft of women" from the south, he is in denial about the fact that ANE-rich and R1 -derived populations lived in other places and 'cultures' besides Yamnaya, like much further west within Europe propper.

Moreover, I do not share the confidence with which David claims that these PCA plots, F-st and D statistics actually mirror historical-demographic phenomena. These are approximations, simplifications, and ultimately models based, on what is still not an exhaustive sampling range.

@Marnie,I kind of agree with you that these blog comments are rife with geeky sexist banter. Aside from the sexism, I am afraid that many of the models of gender-specific demography suffer from sophomoric conceptions of male/female/offspring production that harken to the anthropology of the late 1800's. We don't even know whether the Neolithic revolution was driven by male or female shifts in economy or whether these cultures were matrilocal or patrilocal-- Important models for discernment of demography.

"as ANE is very expansive (it peaks in the Amazon basin, the Hindu Kush mountains+Indus Valley, central Siberia, and the northeastern Caucasus."

If ANE or a component of ANE was a boreal population i.e. far northern or southern but mountainous, then populations adjacent to those areas might have the most: so the steppe population might have a lot through being adjacent to the boreal forest zone but other more disparate populations near more southern mountainous regions like Norway, Balkans, Caucasus, Tien Shan etc might have some or more.

If so might it be the shared component that is being picked up rather than shared ancestry overall?

I think we can be pretty sure that the early Indo-Europeans were patrilocal. This is probably why to date all of the Kurgan Y-DNA has turned out R1.

What happened to the other haplogroups, like G2, E, T and maybe J? We don't know, but we can speculate, and my speculation is that the people who carried them didn't fit into the plans of the Kurgan nomads.

By the way, the Neolithic farmers of Europe were actually quite barbaric themselves. There are mass graves in Western and Central Europe from the period that contain butchered remains. These people might have been victims of battles, sacrifices and/or cannibalism.

Mike,

There are five stages to mourning; denial, anger, bargaining, depression and acceptance.

This is what happened when some plains hunters who hunted on foot got horses. They started hunting on horseback instead.

http://en.wikipedia.org/wiki/Plains_Indians#The_Horse

Later some of them turned into pastoralists because of all the sheep and cattle they stole.

.

I think yesterday's little theory that the raiding was a *result* of pastoralist polygyny is probably wrong - at least at first - because they wouldn't necessarily have been pastoralists at that point.

Polygyny would more likely be the result of the raiding (as a fait accompli) than the initial cause (although it might become part of the cause later).

Ha ha David I like your reply to Roy as if you know what you're talking about . You didn't even know about those mass neolithic graves until I informed you about it two months ago

Oh, if and when you produce actual proof for a kurgan invasion; then I'll happily revel as much as you do; for we'd have established reasonable proof rather than assumptions and post hoc number massaging.

The big question for me is where did the component currently labeled "near eastern" come from. It might fit current dna in the near east but did the near east have a pastoralist demographic turnover as well?

If so what would be the possible directions?

cool map of silk road era trade routes

assume for the sake of argument those routes became the primary trade routes because they were the paths of least resistance (maybe true, maybe not)

http://s1.hubimg.com/u/8474486_f1024.jpg

If the Armenian like component is ANE rich then it could be via the Caucasus but I wonder about the Tien Shan route: sheep herders from Tajikistan / Kyrgyzstan expanding west and then north west before being pushed back down south into the near east by the horse dudes.

Any Kartvel type language links with Hurrians or any of the pre-Turkic languages around the "istans" near the Tien Shan?

Oh ; and to reiterate: mass acts of violence and watfare were common throughout prehisotry ; no doubt But these were localised affairs : not a continent wide cleansing ; and there were no greater countries or ethnic groups to form a "Yamnayans vs late neolithic Central Europeans " entente

@DavidskiI certainly don't doubt that the Pit Grave/Yamnaya folks were patrilocal and IE languages demonstrate clear signs of patrilineality. It's the Cardial and LBK folks about whose cultures we know so little. Indigenous cultures of the Southeast US were frequently matrilocal and exhibited signs of warfare. Also, an impressed ware Neolithic site near Nice, France showed signs of cannibalism, so violence may be independent of marital locality. I am arguing that we must take a complex nuanced approach to archaeological cultures, an approach too often absent in comments to this blog.

Even if we assume that enough Mesolithic forgers survived in the forests of northeastern Europe to push into Central Europe during the late Neolithic, and have the demographic strength to actually make a real impact on the genetic structure over a very wide area there, how would they achieve this? What social and cultural advantages did they have?

The Corded Ware people also carried R1a, but they also had new social organization and substance strategy, and we know their spin off of the Kurgan culture dominated most of northern Europe for a long time. So why not them?

Your hunch that what unites Yamnaya with the Caucasus and the Near East is the ANE component is the correct one. I left a comment to the same effect on Dienekes. Basically, there are two "Near Eastern" components: one came to Europe with the farmers and the other one came to Europe with the nomads. But the other one is not the "Near Eastern" one by origin. It's the tail end of ANE that had spread across Eurasia before LGM from the East. As Raghavan et al. 2013 wrote, "Thus, if the gene flow direction was from Native Americans into western Eurasians it would have had to spread subsequently to European, Middle Eastern, south Asian and central Asian populations, including MA-1 before 24,000 years ago." So that's what we find in Yamnaya!

"Apart from trying to brush away the obvious advent of W/A genes to the western steppe through an entirely unsubstantiated and rather laughable explanation which can be described as a "mass theft of women" from the south"

Pretty much everyone expected Samara to show mixed y haplogroups: local HG (probably R1) and intrusive farmers (probably including G and J).

But it didn't.

There are multiple possible explanations but one is as soon as the steppe hunters got horses they started raiding their more settled neighbors (i.e. exactly the same as steppe populations have done throughout recorded history).

So the brushing away is coming from people who won't accept that possibility.

Something that might follow from that possibility is that the Kartvel languages now south of the Caucasus used to be in the north also but moved away from the horse dudes (exactly the same as so many other peoples have done throughout recorded history).

.

If they can't beat raiders then people move away and find some mountains.

" Corded Ware people also carried R1a, but they also had new social organization and substance strategy, and we know their spin off of the Kurgan culture dominated most of northern Europe for a long time. So why not them?"

Yes them.I heartily agree ! That is with everything but being a spinoff of yamnaya. Some similarities, borrowings and even outright migration. But this was a two-way affair. Simply - corded ware owes as much if not more to late Tripolye groups than an outright migration from yamnaya. This reich study proves me right .

Krefter - re Genetiker; I was parousing the web for other people's wrap -ups. I also have looked at dienekes; anthrogenica etc My agreement was that , in my analysis; the reich study disproves the kurgan hypothesis ; at leat in its pure form; that is clear to me. Yamnaya received clear impetus from the south - and it doesn't matter how some might wish to cotton ball this away. Moreover; this study produced no clear evidence for direct yamnaya movement to Eastern Europe - the very essence of the kurgan hypothesis . Now as for genetikers other views ; I have no idea; I've never actually looked at his blog before. If u say he's a nut then all well and good

All I see a fact samples of G2, T1a, H2, R1a , R1b etc who ALL have the same age give or take 100 years and all are in Europe at the same time.

If you want to find T or L haplogroups , then search for LT-P326 as this is the marker found in 2011 which represents the markers that both T and L had ( and still have ) before they split apart.The only places noted as origin of this marker are Gujarat and the russians found it in north Caucasus near caspian sea side.

There is nothing clear and concisive showing that Yamnaya regions (i) first developed mounted horse raiding, warfare, or "burglary" and (ii) they held this a monopoly on this. So your horse arguement rests on zero evidence. But rather than Grey's non-sensical gibbersih, I think Ill listen to Kuzmina (an expert) "t would have been possible for herders to ride horses using aleather bridle without a leather bit, but it would have been impossiblefor warrior riders to maintain control of their horseswithout a bit. The nomadic lifestyle of the early horse breeders,moreover, is refuted by the existence of settlement siteswith evidence of pig husbandry. In addition, no clear evidenceexists that reveals mass migrations of steppe peoples to theDanube."

Back to Corded Ware- its pastoralism was specially adapted to the new condtions of the LN/ EBA. However its roots are clearly within the preceding Neolithic- where in Late Ripolye relate sites in Moldavia and SE Poland we see animal burials, secondary product exploitation, etc. Even if a large number of the preceding central European Neolithic communitiy was killed/ raped/ castrated/ buried, the originators of the CWC were themselves part of the LN eastern Eruopean horizon; but genetically different to those 'central European farmers' to the west of the Carpathians- ie Hungary and Germany.

The roots of BA eastern Europe was thus in this small group of R1a & ANE bearers; and not the predominanlty R1b-asian branch (but similar autosomally due to long-existing genetic north Eurasian continuum) Yamnaya peoples.

So, then, if much of BA central & eastern Europe derived from CWC, themselves living west of the Dnieper, north of the open steppe, but east of the Tizsa, then that's NOT the Kurgan hypothesis. To me, that's simple.

Maybe, but speculative And given that horses dominated all the way in Botai ; and not ukraine - the doorstep to Central Europe - where caprids dominated the assemblages; the question if those horses could be ridden to escape; as u so colourfully hypothesize, is irrelevant.

For now; ill be happy to hang my hat on the words of the worlds leading authorities on horse archaeology ; Elena Kuzmina and Robert Drews; and not yours.

This is the crux of the issue at the moment:"the fact that MA1 shares more alleles with Karelia_HG than withLoschbour, but MA1 and Karelia_HG are symmetrically related to Native Americans, and, finally, thefact that the three group of Eurasian hunter-gatherers (EHG, WHG, and ANE) cannot be related toeach other by a simple tree, and at least one of them must be admixed."

I'm gonna push slightly against what has been bandied about heere, because, if you read the paper carefully, you realise that 1) It is not true that high drift in EHG will distort the f4 stats and remove signals of ADMIXTURE, as it is not true that the f4 modelling they use is sensitive to recent drift in those populations; 2) it is not true that a model that has EHG admixed between ANE and WHG is geographically more parsimonious; 3) it is not true that a model that has EHG admixed between ANE and WHG is statistically more parsimonious; 4) It is also highly unlikely that the caveat they applied to those f4 models failing to find admixture even when there is admixture, applies to EHG; 5) it is not true that ENA admixture into EHG will explain the data; it will probably cause the fit to be worse, not better, and 6) it is not true that MA-1 as contaminated will explain the data better than the model that fails.

If you can, pls read carefully, so you get a clear idea of what I mean:

1) Reading their methodology for f4s, it becomes clear that the only reason why their method is so powerful is because two populations A and B, e.g. Motala and Stuttgart, are ddifferently related to each of the 15 outgroups, placing them in a slightly different 'place' in relation to the other 15. And if a pop C is in between A and B, then the ratios of its distances to the 15 pops will be in between the ratios of those dists for A and B, to put things in plain language. Say an undrifted, highly heterozygous ANE sample, Alpha, is twicce as related to Chukchi as it is to Onge. Compared to this, a highly drifted/highly homogeneous/homozygous/inbred/small populationed ANE sample, Beta, will drift highly in its own dimension apart from everyone else, so its distances to everyone will increase, but if it is pure ANE it will also be twice as related to Chukchi as to Onge. And so on for the proportions to the rest of the 15 outgroups, which all increase to the same degree.

In fact, the authors say this:"Plotting the f4(Test, Ref1; B, C) and f4(Test, Ref2; B, C) statistics to detect admixture has an advantageover the use of the f3(Test; Ref1, Ref2) statistic in that these statistics are not affected by post-admixturedrift in the admixed population, but rather rely on allele frequency correlations deep in the phylogeny."

In fact, one would expect drift to affect this stat by close to *zero*, because every population drifts in its own dimension and none drifts closer and closer to another pop unless there is gene flow, so the ratios of distances to the 15 outgroups should be perfectly preserved for all pops no matter the drift, which is whhy this stat is so powerful. The proportional dists for EHG between the rest of the 15 will stay intermediate between those of WHG and ANE no matter how much it drifts--IF EHG is ANE+WHG, which appears not to be as it does not have proportional dists that are intermediate between them.

The fact that Motala, EHG, and WHG are all highly drifted means Motala cannot be fitted as EHG and WHG in conventional stats, but this method gets rid of all the recent drift and indeed fits motala as EHG and WHG, testifying to its power. So that it ddoes not fit EHG as ANE and WHG is very revealing.

4) The authors say this: "it is possible that allelefrequency differences between the reference populations (Loschbour and MA1) may not be sufficiently correlated with those of the outgroups, which is a necessary condition for the preservation of this signal of admixture." This is as good as a caveat against their own caveat. The caveat they apply to the case of EHG not being detected as WHG+ANE, due to WHG and ANE being not sufficiently different in relations to the 15 outgroups, is not likely to apply at all, because they proved that the same set of 15 outgroups succeed in teasing apart EHG and WHG just a few paras ago, so why would it fail in teasing apart ANE and WHG?

From fig 9.3, its pretty obvious that very slight differences in proportional dists between the A and B and the 15 references are enough for getting admixture to show up if tested in C. This is also corroborated by motala positive for EHG+WHG, aka the very slight differences btween EHG and WHG in their dists to the 15 outgroups produce a strong signal for admix for Motala. In the case of ANE and WHG, the presence of pops like Kharia and Karitiana and Chukchi in the outgroups means that the difference in proportional dists between ANE and WHG to these pops are gonna be really massive, which should expose admix in EHG, but it does not. So I make the prediction that, EHG will not show up as admixed between ANE+WHG for any combination of pops we place in the outgroups, should we get our hands on the software package or write our own.

2) It is not true that the model that fits EHG as ANE+WHG is more geographically parsimonious, because it requires ANE into Amerindians from the tip of the tree close to Karelian(!), aka it gets ANE all the way into Amerindians from the same pop as that which gave 40% into Karelian, to the exclusion of Mal'ta and without affecting Mal'ta at all(!), which is much, much more unparsimonious than the proposal that (WHG(EHG-ANE)) or (ANE(EHG-WHG)) is the tree with either WHG or ANE--the easternmost or westernmost pops--having admixture from an outgroup, which allows for the admix into NAms to branch off much further back than Karelians, while Karelians are unadmixed. E.g. the model in the supp info that succeeds in fitting the largest number of pops that is also the most geo parsimonious, shows West Eurasians as[ (Outgroup) (((EHG)Mal'ta)ANE in NAms) ]with WHG being Outgroup + 48% from EHG. In this case the first bifurcation is east-west, and the easterners branched west successively until EHG contrib into Outgroup to create Loschbour. This is also good because it demolishes the 10000s year old 'Mexican fence' between the steppe and europe--which never made very much sense, I went along with it because we didn't have the evidence we do now--and extends the EHG--->Europe process in PIE as just the most recent, explosive culmination of a process that also occurred with Loschbour.

3) + 6) It is not true that the model that fits EHG as ANE+WHG is more stat parsimonious, because by making karelian instead of Mal'ta contribute to NAms and by pulling Karelians away from NAms with WHG, the model presumes that 1) the process of Malta drifting away from the (Karelian plus ANE portion of Amerindian) clade and 2) the amount of WHG contributing to Karelian, are perfectly balanced, such that the result is that Karelian and Mal'ta are perfectly equal in the distance from Amerindian. This is obviously a product of a mathematically contrived, 'balanced equation' solution to the problem, not something that happens in real life.

As any scientist can tell you, the occurrence of zeros in any phenomenon is a highly significant fact and is not likely to arise by chance. Why would the contrib of WHG to EHG pull it far enough such that the f-stat between Amerindian, Mal'ta and EHG evaluates to almost precisely zero? Let me remind you that ANE and EHG share almost precisely equal amounts of alleles with NAms only if both have a tree-like split w.r.t the ANE that went into NAms, and this is true 99.99999% of the time except in highly contrived 'equation balancing' scenarios.

And the EHG as ANE+WHG model still fails in comparison to the models that see ANE or WHG as admixed and EHG as unadmixed.

It is also not true that the model that has Mal'ta as contaminated will fit, as this just contracts out the 'pencil balancing' problem to contamination, aka this presumes that the amount of contamination/distortion in Mal'ta is perfectly equal in genetic pulling power away from NAms as the amount of WHG/drift in EHG, which is an equally improbable scenario. If the contamination is in EHG, that would pull them away from NAms, not toward them.

Lastly, it is very unlikely that ENA exists in EHG, as if that occurs all the trees they built in ADMIXTOOLS with Onge and Karitiana will be rejected. ADMIXTOOLS uses a very large number of formal f-stat calcs to build the tree. That successful trees show that Onge and the ENA portion of Karitiana splits tree-like with all west Eurasian prob means we can preclude any extr closeness between EHG to East Eurasian more than that between Loschbour and Mal'ta and East Eurasian, indicating--once again--a tree-like split.

It also turns out that when LBK Early Neol is added to a tree with Mal'ta, Loschbour, Onge, Karitiana, Karelia, and Mbuti, aka in the largest trees they try, all successful models show Loschbour as admixed. All these models allow Mal'ta and EHG to form a clade, from which EHG contributes to Loschbour and something that branches deeply in the eastern clade goes into Amerindian.

The discrepancy in David's ADMIXTURE K8, Laz formal stat admix proportions, and Haak formal sttat admix proportions probably testifies to this, as it is possible for Norwegian to have both 1) higher allele sharing with EHG than most east euros when the rest is modeled as E neol and WHG, and 2) lower allele sharing with Mal'ta than most east euros when the rest is modeled as E neol and WHG, indicating that allele sharing with EHG is not mediated through sharing with Mal'ta to the great degree we've thought, and that EHG cannot be simply modeled as WHG+ANE to explain the proportion of EHG in Euros today. And this discrep ramifies through NW and SW euros too.

@ Sein, while ppl refer to 'ane ancestry' in the paper, it is significant that all instances of ane ancestry being mentioned are in conjunction with references to the old three-pop europeans paper, and that the conc and abstract all refer to EHG as having 'high affinities' to ANE, not high ane ancestry. Nowhere is ANE % defined for any pop other than the attempt to fit EHG as ANE+WHG during tree-building (which fails both mathematically and in terms of geo and occams razor-type parsimony); all euros are modeled as EHG+others. I second what Matt said about Patterson's doubts about ANE as it was previously 'defined', suggesting they are aware of this problem. So not all is well with the old model.

@ Krefter, the figures are in the supp info below the most mathy sections of the f-stats.

@ David, if you are interested, we should try modelling Kostenki, Loschbour, La Brana, Motala, Karelian, Mal'ta, and Sardinian together with Hadza, Mbuti and San in TREEMIX when we get the genomes. This would prob help us further distinguish wwhich of the successful trees is the most reflective of reality.

"2) It is not true that the model that fits EHG as ANE+WHG is more geographically parsimonious, because it requires ANE into Amerindians from the tip of the tree close to Karelian(!), aka it gets ANE all the way into Amerindians from the same pop as that which gave 40% into Karelian, to the exclusion of Mal'ta and without affecting Mal'ta at all(!)"

If you look at one of the Clusters Galore plots that Dienekes ran a few years back:

David,I agree that EHG probably isn't pure. It will probably take UP samples to hash it out. What's labeled WHG may end up being three things, not just EHG and an out group. La Brana and Loschbour seem different too. It might not just be a difference in EHG admixture.

Quoting from the paper, if Loschbour is 48% EHG and the rest Outgroup, and SHG is modeled as 65% Loschbour and 35% EHG on top of that, then we have a gradient where Outgroup ancestry declines from from ~50% in france in a smooth cline throughout Europe, to ~32% Outgroup in Scand, to 100% EHG East of finland into Russia. So there really isn't anything that improbable about this.

In fact, this model defines almost all euoprean HGs as admixed, so it fits even better with the idea that ancestry must have leached everywhere in such a low pop density area, instead of the 'standing borders' picture drawn when we use MA-1 and Loschbour only.

Chad, I agree that either Samara will turn out to be Karelia + MA-1, or Karelian will turn out as Samara + Loschbour, but the admix proportions are gonna be very low. Note that the negative correlation between Motala and Loschbour vs. Motala and Karelia is >-1, and for Motala between La Brana and Samara is ~-1, while that for Samara is -0.19 and for Karelia + 0.10. So I favour that Samara is admixed, but by very very little.

Some perspective on archaeo here will be good.

The only issue remaining is that, if EHG and WHG ancestry is defined as centering around Karelians and Loschbour, pops that are close to Karelians today have more higher WHG:EHG, and pops around Loschbour today have higher EHG:WHG. So there still has to be a source of WHG ancestry that increased in NE Europe post-CW, such that up to 40% of the WHG contribution into NE Europeans is modeled best as separate from EHG or yamnaya contribs by f-stats. I believe this ties in with my idea of pop continuity in the forest neolithic, but of course ppl are free to disagree.

Ryu,Was there anything about EHG in EEF? If some R1b was there, it makes me wonder. Remember, Loschbour didn't fit the whole 56%, but something like 42%, was it?!? The fact that Loschbour goes in at all, may point to EHG being in it, but maybe less than Loschbour has.

As far as archaeology, I think what we could have is that "WHG" was still moving NE as "EHG" was going west. The two mixed and crossed...The Baltic was a giant piece of ice for a while. Even today half of it freezes during the winter.

@ ChadWe need a clearer definition of terms. OK, for now I suggest we will refer to Loschbour as WHG, and ANE as conventionally defined in K8 as ANE. In the model that sees EHG branching off from Malta and contrib to Loschbour, I will refer to EHG as just EHG and the remaining portion of Loschbour as Outgroup, and ANE that is closer to Mal'ta than to EHG as just Mal'ta.

If loschbour is admixed, then there will be EHG in EEF, in addition to Outgroup, but no Mal'ta. Where Loschbour is defined as unadmixed WHG, there will be no ANE in EEF.

I agree that that would be a good expl for how R1b entered neols, despite neols scoring no ANE, if EHG in loschbour is 'hidden' by the fact that we don't have a sample of Outgroup, just like North_European in Sardinians is masked if we don't add Bedouins in ADMIXTURE. This might also explain why some neols are even more western than Loschbour, and quite far too.

But this is speculation. I think the stats are pretty unequivocal, esp the fact that the only way for us to fit EHG is in such a nonsensical way. But how much more a new model can explain is not known at this point.

@ MTI think using the published figures for fitting CW as 80% Yamnaya, 17% LBK and 4-5% WHG would fit, but bump up the ANE and WHG % by 2-3 each after calcs, as there is evidence that the 'Yamnaya' in CW was a shade more EHG-rich.

I'm gonna be tentative about the next sentence. I know this produces a somewhat southern CW, but note that Lithuanian and Estonian have be fitted as ~25% more WHG than CW, and even Norwegians are fitted as more WHG than CW... So this shouldn't be a problem... I think. But this of course conflicts with other estimates, which goes back to the point that fitting Europeans as Yamnaya/Cw + LBK + WHG, and attempting to derive WHG, ANE and ENF from that results in horrible distortions, e.g. Estonians drive ENF estimates way up, Norwegians way down, etc.. Its probably just better to deal with WHG, ENF, and EHG itself.

I have to agree with David, EHG is much younger in comparison to MA1, and is geographically intermediate between the ANE and WHG extremes. Just noting these two facts makes postulating them as an unadmixed pole of genetic variation seem misguided. Also, they really are genetically intermediate between MA1 and WHG (this is clearly noted in the paper), but some aspects of how they stack up in terms of genetic affinity add unexpected complexity to a simple model of WHG+MA1. But that is explicable via a number of different factors. We could be dealing with (as yet) unknown genetic structure among these populations, distinct patterns of drift, the pure technicality of genome quality with respect to all of these sampless, problems with methodology (I think David is on to something with his point concerning Artic ENA admixture being too distant from Onge-like ENA, and being collapsed into the Karitiana-shared ANE admixture in EHG), and so on. It could be any of these factors, or all of them.

Honestly, these possibilities make more sense then the notion of such "young" samples from a geographically intermediate location versus WHG and ANE, and with complex uniparental markers (by "complex", I mean the fact that 20% of their mtDNA haplogroups are East Asian, and their Y-HGs are ANE derived), being unadmixed versus older samples from opposite ends of northern Eurasia.

At the same time, I also have to agree with David that a genetic continuum makes much more sense. It really depends on whether you are saying EHG are unadmixed, while WHG and ANE are admixed with EHG and something else (I think this is highly unlikely), or whether you are saying that we are dealing with clines (I think this is very likely).

In this study we have an indisputable R1b farmer far from the steppe, with no detectable Yamnaya-like ancestry. We also have another (probably) R farmer from Central Europe, again without Yamnaya-like ancestry. Clearly European R is not necessarily of steppe forager origin (though it looks like Yamnaya or Yamnaya-like people were a vector for it).

The only decent argument so far made against the Yamnaya R1b coming in with the farmer component is that they have too much farmer mtDNA to also have matching patrilineal gene flow and end up only half farmer autosomally.

But then how much of that EHG might have been original with the farmers - they could potentially be a mix of something EHG-like with the original Neolithic migrants.

We won't know till we have the relevant aDNA from outside of Europe.

@ryukendo

Thanks for all that explanation, this is very interesting.

I was thinking that Mal'ta might have admixture from the west, given possible Gravettian influence on his culture (and perhaps reflected in his mtDNA U). But half the archaeologists seem to say one thing and half the other (as usual).

@ ChadAgree. Considering that ADMIXTOOLS can give different drifts to different branches, it is not clear how important that is though.

@ SeinThanks for your points.

However, I will like once again to push against the consideration of geography or genetic intermediacy in this way, because just because Han is geo in between Kensius and NAms means that they are intermediate between Kensiu and NAms, and can probably be modeled as admixed between kensiu and NAms, but that this model fails when other pops are added tells us something. Same for Europeans between Dai and Bedouin, when NAms are added.

In fact we know for the first situation this is because Han contrib to both Kensiu and NAms instead of the other way, which is equally likely on geo terms.

I'm not disputing that there are other possibilities. However, I consider the behaviour of EHG in f4 stats in this paper an extremely strong indication of what future samples will reveal, because those f4 stats are mathematically impervious to so much noise.