search this blog

Sunday, April 17, 2016

Estimating Basal Eurasian ancestry?

Basal Eurasians (BE) are a hypothetical ghost population that apparently split from other Eurasians no later than 45,000 years ago. If they actually existed, they had a significant impact on the ancestry of early Neolithic farmers, and thus all present-day West Eurasians.
Testing ancestry proportions from ghost populations isn't easy. However, Haak et al. 2015 made use of an f4 equation that seemingly gave an accurate estimate of BE admixture in LBK farmer Stuttgart: f4(Stuttgart,Loschbour;Onge,MA1)/f4(Mbuti,MA1;Onge,Loschbour) = 44%. The other LBK farmers scored an average of 40% BE, which also made sense.
Unfortunately, this equation doesn't appear to work too well for Caucasus Hunter-Gatherers (CHG) Kotias and Satsurblia. They both score around 25% BE, which, as far as I can see, seems way too low. Perhaps using MA1 in the equation is messing things up because CHG harbor significant MA1-related ancestry?
I tinkered around with Haak's equation and came up with this: f4(X,Iberia_Mesolithic;Dai,Karelia_HG)/f4(Mbuti,Karelia_HG;Dai,Iberia_Mesolithic). The results look solid, at least in relative terms (see image below). But is the equation actually valid?
My main worry is using both Iberia Mesolithic and Karelia HG. They share a lot of drift, much more than Loschbour and MA1. Also, even though both Dai and Onge belong to the so called Eastern non-African (ENA) clade, they're quite distinct, with Dai a lot less basal in the context of ENA diversity. Any thoughts? Suggestions?

Update 04/18/2016: Interestingly, my f4 equation essentially fails for most post-Neolithic Europeans, particularly those with relatively high ratios of Karelia HG-related ancestry. For instance, Yamnaya Kalmykia scores just 2.9% BE, which can't be right. Yamnaya Samara shows -2.2%, which is obviously wrong.
But I tried several combinations of reference samples and found that by replacing Karelia HG with Hungary HG and Dai with Ust-Ishim I was able to obtain coherent results for a wider range of groups, including Yamnaya.

To be honest, I still don't know what the hell I'm testing here exactly. The results appear to reflect the existence of two components within West Eurasia; one representing ancient hunter-gatherers from Europe and probably surrounding areas of the Near East, and another closely related to present-day Near Eastern populations. The latter might well be a signal of the so called Basal Eurasians, or perhaps a number of as yet unsampled meta populations from the ancient Near East?

35 comments:

I agree that there are a lot of questions here. Look at Ust_Ishim, according to these three ways of doing things. I've found that having a stronger drift with ENA, in pop A, can really inflate BE according to this. I'm not sure what we're really finding here, or if we can say there is one simple Basal Eurasian. Another thing about these below, is that West and East Eurasians share significant drift after Ust_Ishim. All but Kostenki.

I'm not sure if anyone with any ENA can have reliable results. It makes this "BE" signal go up. Put an East Asian in there, and you see them get 196% Basal Eurasian. West Asians having recent ENA will get inflated results too.

It looks difficult to find f4 ratios that would work well for all the populations, though those figures are probably not far off.

I do wonder if under the assumption of the estimates of BE for EEFs (which should be the most reliable), someone could calculate the theoretical coordinates for a ghost Basal Eurasian, for example from the latest PCA 9 (or any other of the datasheets). Maybe Matt knows a possible way to do that using PAST 3 , and while it would still be a very theoretical approach, it could be interesting for calculating BE admixture in "difficult" populations (with ENA, SSA,...) and making some interesting models.

Basal Eurasian may have never existed. The argument used by Laz 2013 that it did exist was that Loschbour and MA1 have the same relationship to East Asians, despite being so separated in space and time. However, the same could be said about CHG and EEF.

So, which is it? East Asian admixture in Kostinki/MA1/WHG/EHG or Basal Eurasian in CHG/EEF? Is there any way to know which one? Basal Eurasian makes the most sense, but I'm hesitant to say it's that simple.

The results are still infeasible. You have to have the West Eurasian and East Eurasian pops flipped. I've run all those. Kostenki seems to be okay, from between Ust_Ishim and West Eurasians. I'm not sure about using EHG, due to reasons you know from our talks. It would only complicate things. Another thing I've noticed from these formulas is that MA1 comes out about 40% BE, which is odd, considering he is basically equidistant from Ust_Ishim as WHG. We may need a whole new formula.

I still don't think we can completely throw out ENA into WHG, less into EEF, with some SSA, and toss out BE. Kostenki is equally distant from Ust_Ishim as WHG and ENA, yet further from ENA.

It could be that the Near East and farmers have some Kostenki-like ancestry, plus minor SSA.

I'm just looking for a test that can estimate Basal Eurasian proportions in ancient farmers, ancient Near Easterners and also modern Near Easterners who don't have too much Sub-Saharan and South/East Asian ancestry.

All of these groups have very similar ancestry, and I think that's why my equation works. Although it'd be nice to find an alternative to Karelia HG just in case potential shared ancestry between CHG and Karelia HG is underestimating the Basal score in the CHGs.

I wouldn't worry about anyone else. There's just so much a single f4 test can analyze. It can't cover all the bases, especially for more complicated mixtures.

Kostenki14 works OK in place of Karelia HG, but the estimates look a little low across the board IMO.

Here is a few ways to look at it. This is supposed to be capturing ancestry not descended from Ust_Ishim, however, it seems this and the inverse are actually only capturing what is descended from Ust_Ishim, rather than deeper. Although, we see other HG and MN get lower amounts, which seems to imply that Ust_Ishim is BE, but not the same as what farmers have.

Very interesting questions, and the output seems very clean/solid, assuming that "Basal Eurasian" is real.

Although, I'm not too sure about this anymore, considering that K14 doesn't have any ancestry basal to Ust-Ishim (I recall a set of stats which demonstrated this), yet is much more distant from ENA compared to ANE/EHG/WHG. Like Chad stated, just having substantial ENA ancestry in West Eurasian hunter gatherers, with much less in ENF/CHG, and perhaps some African ancestry in ENF/CHG, probably makes more sense considering the data we have post-Laziridis et al.

I guess we'll just have to wait and see how things play out with more aDNA.

This is pretty cool. If we assume the Basal Eurasian/Crown Eurasian scenario, ENF/CHG are the mixed ones, so it's great to see how they stack up with unadmixed populations (again, unadmixed as per the Basal Eurasian/Crown Eurasian idea). These were done with the same reference populations as the EEF/SHG ones:

I don't think we really know what the Paniya percentages represent in the Balkans, West Asia, the Caucasus, South Central Asia, and in CHG. Also, Srubnaya_outlier is basically an EHG/ANE sample, which is why I used it.

A side note, but this estimate of BEA for Kotias matches what David found using the Haak et al. equation.

I always had problems with the Basal/Crown Eurasian concept. I feel it is (a) oversimplifying things and (b) possibly labelling groups the wrong way, i.e. "Crown" being the more ancient and widespread, and "basal" the later and more geographically restricted group.

Let's run through our sketchy knowledge of paleolithic population movements and genetic admix:

1. Hominids: Neandertal as "basal" (all Eurasian), Denisova as a "crown", mostly SEA population.Quite early, ca. 100 kya, Neandertals have differentiated into a West and a East Eurasian line. The latter, represented by the Altai Neandertal, received early AMH genetic inflow, which European Neandertal samples are lacking. So far, we lack comprehensive information to which extent these two Neandertal clades may have differently acted on West/ East Eurasian populations, but I am confident that S. Pääbo's team at Leipzig is working on it.

2. Neandertal admixture shows regionally differentiated patterns, seehttp://www.sciencedirect.com/science/article/pii/S0002929715004863"Of the three putatively introgressed core haplotypes, III and IV are most similar to the Altai Neandertal genome (..). Core haplotype III is present in all non-African populations. (..) Core haplotype IV is restricted to specific Asian groups."So, on top of Denisova, we have another "crown" East Eurasian component reflected by specific Altai Neandertal gene flow.

3. yDNA: yDNA K, or K2, estimated to have split off around 45 kya, appears to well represent, in Dave's words, "a hypothetical ghost population that apparently split from other Eurasians no later than 45,000 years ago" . In fact, most West Eurasian yDNA hgs (G/H/I/J) come from upstream that split, East Eurasian ones from downstream. The exception, of course, is yDNA R.Irritating here is the tree phylogeny, since Australian K-M256 and Papuan M and S are found downstream of K, yet should in principle represent more ancient, "basal" populations than the West Eurasian G/H/I/J. To put it differently: Typical West Eurasian yDNA is indeed "basal" to the yDNA tree, East Eurasian and Ocean yDNA more downstream ("crown"), but we have little indication that those "basal Eurasians" in pre-historic times ever settled more than West Eurasia. Apparently, there is still much we have to learn about how the second "out of africa" wave some 45-50 kya ago worked. In any case, I suppose that "basal" vs. "crown" Eurasian is closely linked to yDNA pre- vs. post K(2).

4. mtDNA: Here, the case is less clear-cut. The M-N split appears to predate the emergence of "Basal Eurasian" at around 45kya. More fitting is mtDNA U, which dominates West Eurasian UP mtDNA. Intriguing here is the case of U2, which is predominantly South Asian, but includes West Eurasian U2e. South Asian U2 thus may represent an Upper Paleolithic West Eurasian migration into South Asia; note in this context also Kostenki's mtDNA U2.Possibly, U2 is related to the "Paniya" component showing up in MA1 and EHG.

In summary, we still have a lot to learn about Paleolithic population movement. If a sizeable "common Eurasian" layer has survived at all seeams doubtful. Rather, we may be dealing with a quite early east-west split. The eastern side would be represented by Denisova, Altai Neandertal, yDNA K(2), mtDNA M, and Ust-Ishim; the western side by European Neandertals, yDNA GHIJ, mtDNA U (plus possibly others, e.g. HV), and Kostenki. Already during the UP, there should have been several cross-overs.

f4(X,Iberia_Mesolithic;Ust_Ishim,Hungary_HG) is going to be more similar to f4(Mbuti,Hungary_HG;Ust_Ishim,Iberia_Mesolithic) for them*, because they don't have much / any WHG admixture?

(the opposite problem seems to have been attacking the ratio using Karelia). Doesn't seem like it can be valid for EHG+Basal Mixtures. Even for Anatolian farmers, they are in theory WHG+UHG+Basal.

* As f4(X,Iberia_Mesolithic;Ust_Ishim,Hungary_HG) approaches equal to f4(Mbuti,Hungary_HG;Ust_Ishim,Iberia_Mesolithic) the ratio of the two approaches 1.

@ Alberto: Yes, like Ryu is mentioning, if you have a set of stable BE estimates for populations that you know to be simple two way mixes of BE and WHG (at least more than 3 populations, including WHG with 0), then you can do a regression equation in PAST to predict a 100% BE zombie in the PCA space. Then use it the zombie data through nMonte for more complex populations. I think you've done something like that?

One thing I would say about whatever Basal Eurasian construct is produced by this method is, the definition of Basal Eurasian is a population that is equally related to WHG, EHG and ENA. So it should meet this definition, of being equally distant, or is not really Basal Eurasian. However I'm not sure how you would test this, as greater and lesser drift in WHG and ENA branches would interfere with measuring this.

@ Ryu, apologies for not replying to your post in the other thread. I'm not on Anthrogenica, unfortunately. Did you and Alberto resolve what it was you were looking for?

@ Krefter, well observed and stated re: CHG and EEF. Also note Kostenki and EEF and CHG also have the same relationship to East Asians (but not to Ust Ishim or Oase1!).

2. Neandertal admixture shows regionally differentiated patterns, seehttp://www.sciencedirect.com/science/article/pii/S0002929715004863"Of the three putatively introgressed core haplotypes, III and IV are most similar to the Altai Neandertal genome (..). Core haplotype III is present in all non-African populations. (..) Core haplotype IV is restricted to specific Asian groups." So, on top of Denisova, we have another "crown" East Eurasian component reflected by specific Altai Neandertal gene flow.

...

In summary, we still have a lot to learn about Paleolithic population movement. If a sizeable "common Eurasian" layer has survived at all seeams doubtful. Rather, we may be dealing with a quite early east-west split. The eastern side would be represented by Denisova, Altai Neandertal, yDNA K(2), mtDNA M, and Ust-Ishim; the western side by European Neandertals, yDNA GHIJ, mtDNA U (plus possibly others, e.g. HV), and Kostenki. Already during the UP, there should have been several cross-overs.

The Dannemann et al. 2015 paper you linked to only uses the Altai Neanderthal genome as a Neanderthal genome, so you cannot make conclusions specific to the Altai Neanderthal based on that paper. Indeed, the Kuhlwilm et al. 2016 paper (the very paper that found the 100,000 yo modern human introgression into the Altai Neanderthal) also found that the Neanderthal introgression into all non-African modern humans (and not just the West Eurasian ones) was from western Neanderthals, and not from eastern Neanderthals such as the Altai Neanderthal:

"When we refine our estimates of gene flow by adding the chromosome 21 sequences of the European Neanderthals to our genome-wide data, G-PhoCS infers significant rates of gene flow from Neanderthals into modern humans outside Africa only for El Sidrón and Vindija Neanderthals (0.3–2.6%) (Fig. 3a), suggesting that Neanderthals from Europe are more closely related than the Altai Neanderthal to the population that interbred with modern humans outside Africa 47,000–65,000 years ago."

Or one can examine the PCA David posted, in which the Srubnaya_Outlier clusters near EHG, at quite a distance from other Srubnaya samples.

And when it comes to the Paniya, assuming Basal Eurasian is real (which is a very complex/interesting discussion in itself), they are probably (at most) around 5% BEA. So the Paniya percentages (from the Balkans to western Pakistan) probably reflect (at least in part) some sort of Crown Eurasian ancestry (for which we don't have any ancient samples). But who really knows. As always, we need aDNA.

Odd detail, despite the presence of MA1, Samara hunter, Karelia hunter, and Afontova Gora (in the PCA nMonte fits I tried), modern populations from Lithunia to India prefer Srubnaya_outlier, while ancient samples do often tend towards the Karelia/Samara hunters. Not sure what that could mean, if anything.

But as David noted, perhaps the PCA-based approach is wholly inadequate in this case, although it does work great with EEF and SHG.

Yes, I basically did that to create the Basal Eurasian ghost, but manually and with only one sample. It seems to work ok for Europe and the Levant (Basal Eurasian peaking at around 52% in BedouinB), but it's more complicated when it comes to CHG-rich populations (Kotias itself being some 30% BE, Satsurblia a bit less). Maybe moving the position of this ghost to the "east" a bit would work better.

Re: distances of BE to WHG, EHG and East Asian, the theory is that they should be equidistant. But I'm not sure how feasible is that on a PCA. Using BE as the only source population in nMonte (as you did before with 4mix) we can calculate the total distance for any given target population (using the Ncycles=1 parameter saves a lot of time):

So there is something strange there in Loschbour being too far from Dai relatively to the others. I checked with the D-stats based datasheet to see if the distances there were more correct (I used the last one provided by Davidski with Scythian as a row):

1) Personally I've always thought the idea of the Neanderthals being wiped out in one big sweep was unlikely. If they were particularly cold adapted then at the very least some in the high altitudes of high latitudes should have survived longer imo and in the most extreme case (north Himalayas?) potentially a lot longer. In which case the time since last 50/50 mixture may have been substantially different between the mid-east and northern mountain regions like the Himalayas.

2) If highly divergent DNA like Neanderthal is generally selected against for some reason (except the particularly good bits) from the time of mixture and at the same rate then if (1) is true then the population that had Neanderthal mixture latest would still have more over time than the earliest (even if declining all the time) until a minimum percentage was reached (which may not have happened yet).

3) I'm not a mathematician (probably obviously) but I've messed around with various equations as part of modding AI for games so I know equations can throw up odd effects with unusual situations you'd hadn't thought of.

so

4) how do the various software packages deal with small quantities of highly divergent DNA?

I'm wondering if BE is maybe an artifact revolving around how the software deals with populations whose Neanderthal ancestry was least recent relative to those in whom it's more recent.

"To be honest, I still don't know what the hell I'm testing here exactly. The results appear to reflect the existence of two components within West Eurasia; one representing ancient hunter-gatherers from Europe and probably surrounding areas of the Near East, and another closely related to present-day Near Eastern populations. The latter might well be a signal of the so called Basal Eurasians, or perhaps a number of as yet unsampled meta populations from the ancient Near East?" (Davidski)"Another thing about these below, is that West and East Eurasians share significant drift after Ust_Ishim" (Chad Rohlfsen)."I'm wondering if BE is maybe an artifact revolving around how the software deals with populations whose Neanderthal ancestry was least recent relative to those in whom it's more recent" (Grey)............................................After Copernicus arrived.

It appears that the Antillean Culture came to an end due to an abrupt change in stone technology; courtesy of the succeeding Kebaran Culture. Courtesy of Eupedia's hypothesis; N1,N2,X & W=Basal Eurasian and the fact that those mtdna haplogroups were dominate in Mesolithic farmers, how likely could it be that the Basal Eurasian gene came from the Antillian Culture?

The fact that basal Eurasian is highest in Neolithic Middleeasterners could be a sign that Basal Eurasians were assimilated there.

Hey Davidski, just thought of something. I was looking at your Basal Eurasian K7 D-Stats and saw that La-Branda hasn't been compared yet to Basal Eurasians. La-Branda in Gedmatch=F999915. It's just a wild guess but I'm wondering if Mesolithic Europeans/Gravettian Culture somehow absorbed some Basal Eurasian before heading to Europe. :)

Wish you the best of luck :). Ultimately it's gonna come to the fact that as far as prehistoric archeology is concerned, you are way ahead of your time. Burials of Modern Homo Sapiens from Upper Paleolithic; or Paleolithic for that matter are simply hard to come by. Therefore you simply do not have enough Paleolithic archeological remains to DNA test for the soul purpose of finding Mr and Miss Basal. But despite our lack of Paleolithic skeletons, we do have our very own Ydna/Mtdna; these two DNA clusters can trace our direct Paternal and direct Maternal ancestors to Paleolithic times.

https://en.m.wikipedia.org/wiki/List_of_human_evolution_fossils

Is there any way to Download Raw Mtdna/Ydna and test them on the Basal-Eurasian K7? If none of the Ydna;A-T or Mtdna;A-Z have any sign of Basal Eurasian Influence then it might be safe to say that Basal-Eurasian is fluke-dna.

I thought that Lazardias et al disputed the idea that Kostenki 14 had Basal Eurasian. At least the same BE that the EEF had. It does not make sense that Kostenki 14 was heavy on Neanderthal genes and yet one of the hallmarks of BE is that it is inversely correlated to such admixture.

Do the tests make more sense if you just use the more recent samples (<15K)? If the tests using only the more recent samples makes sense but the tests with the older ones don't maybe its because the younger and older are not both the same BE.