search this blog

Friday, July 22, 2016

The Basal-rich K7

Update 25/01/2018: The Basal-rich K7 is now available to personal genomics customers for $6 USD a pop (see here).
...
I've got a new test. Currently I'm only using it to explore ancient genomes, but at some point I'll make another version available to personal genomics customers, one way or another. However, that might take a little bit of work and time to mitigate the effects of the calculator effect and so on.
Below is a spreadsheet featuring a wide range of ancient and present-day samples from recent papers. A table with the Fst genetic distances between the seven ancestral populations is available here.

Please note that the Basal-rich component is unlikely to be a perfect representation of the hypothetical Basal Eurasian population. At the same time, it's likely that the two hunter-gatherer components, Ancient North Eurasian (or AG3-related) and Villabruna-related, contain some Basal Eurasian admixture.
Here's a Principal Component Analysis (PCA) of the West Eurasian populations based on their K7 ancestry proportions. It captures all of the main features of West Eurasian genetic diversity, including the two parallel clines made up of Europeans and Near Easterners, and the intermediate position of South Central Asians between the ancient samples from Neolithic Iran and Bronze Age Europe.

An extra large version of the same PCA, with the samples labeled individually, can be downloaded here.
Also, using the K7 ancestry proportions, I modeled the ancient ancestry of a few present-day populations from the Near east, Northern Europe and South Central Asia with the nMonte R script. Bronze Age steppe admixture in groups from the latter two regions is usually inferred at 40-50% with tools based on formal stats, such as qpAdm and TreeMix, so I wanted to check if I could reproduce such results.

Admittedly, these estimates look very conservative, but certainly not out of the ballpark. I suspect that I'll be able to improve the models and statistical fits as new Bronze Age steppe samples become available. Indeed, I'll be updating the spreadsheet above regularly.

If we assume basal is supposed to be equally related to East and West Eurasians, then these basal figures are too high.

A stat in the formMbuti Test Kostenki14 Ust_IshimShould be 0 in any Africans share ancestry with Eurasians beyond Mbuti (Yoruba and such), Basal Eurasian, and any miscellaneous Crown Eurasian branches that may exist.

Now if we take the genome that is most related to K14 (V16) we have a stat of -.914

Bedouins have -.538...so theoretically at most 41.1% of their ancestry should consist of basal, ENA, african, misc crown Eurasian. I say at most because any west Eurasian that split pre V16-K14 will dampen the statistic, and ENA will slightly reverse it (as ENA are closer to Ust than K14).

Equally, if you modeled Yamnaya_Kalmykia as 20% Iran_Neolithic, the remaining EHG balance of its ancestry would be 60% AG3-MA1, 40% Villabruna (seems close to what it would be supposed to be?). If you modeled Yamnaya_Kalmykia as 23% Anatolia_Neolithic and the remaining HG would be 73% AG3-MA1, 25% Villabruna.

The fst stats should be important to see if the Basal-rich meets the criteria of Basal Eurasian.

How do SE Asian populations model in this with the separate Andaman and SE Asian components?

All that said, ultimately not totally sure how well this matches with the estimates from Laz et al 2016 -

I think Chad's argued before, with Shaikorth, for those estimates for Iran_Neolithic particularly being wrong, due to ANE ancestry, and I think he may have had an arguable point (not totally sure I understood it), but if correct it does make that nice correlation with Neanderthal ancestry basically disappear if he's right and BE is overweighted here for all Iranian and Iranian related points! (And that's quite a serious point for Laz 2016, and a strong justification they used to continue with the BE concept).

ANE, to a point. They covered that in a section and had Iran and Natufians equal in BE. Onge or ENA ancestry is 3x a driver in false BE as ANE. I'm pretty sure David's component here is 80% BE and 20% WHG-like.

Not to make things to complex, is it possible to show the relationship -Neaderthal/Denisovan averages.http://science.sciencemag.org/content/sci/suppl/2016/07/13/science.aaf7943.DC1/Broushaki.SM.pdfTable S21 give some averages; but not all.Maybe you could correspond with Broushaki et al and get Neaderthal/Denisovan averages for your components.AG3-MA1Andamanese Basal-rich Oceanian Southeast_Asian Sub-Saharan Villabruna

@ Chad, do you have a page ref? The section "Supplementary Information 4" is quite extensive and includes a number of different estimates, so I'm finding it hard to identify the specific bit at a glance.

(I'll give a few examples of the different estimators I can see that they use for populations, examples Anatolia_N, Euro_MNCHL, Iran_N, Steppe_EMBA, Natufian, CHG:

I don't think it's possible to find a pure Basal Eurasian population with methods like ADMIXTURE, STRUCTURE and the like. I've tried both, as well as other programs, and the most basal-rich ghost pop I can find still seems to have at least 10-20% Villabruna and even Onge-like stuff.

Anyway, like I said, I'll update this post later today with more results and the Fst table.

Check that too. I think that would make David's component about 50% BE and 50% WHG. That might be why the BE is getting thrown in ANE, with all the Iran linked pops in the test. It's a long process of adding and subtracting samples to get the proportions in Admixture to match things like qpAdm.

I suspect that a hybrid Basal/Villabruna-related population very similar to the Natufians was a major feature of the Near East for a very long time, maybe longer than anything purely Basal Eurasian. Btw, the Natufians score around 70% of the Basal-rich component in this test, but essentially no Sub-Saharan admixture.

At the same time, it's also possible that both Villabruna and Samara HG have some Basal Eurasian admixture, so Basal Eurasian ancestry, albeit at low levels, may already have been a feature of many pre-farming populations across northern Eurasia.

I ran many tests, usually at low K, looking for specific signals, like ANE and WHG. Whenever a nice cluster formed that reflected what I've seen elsewhere, I made synthetic samples out of it, and used them to trigger the same cluster in the following runs.

After doing so many analyses, I don't see a way to extract a pure Basal Eurasian population, although I'm now also sure that such a population, or populations, did really exist.

They seem to have made hybrid composites in the Near East with Villabruna-like groups in the Levant and MA1-like groups in Iran very early, probably before the Ice Age.

I'll be using this test from now on to explore a lot of things because it does reflect really well the genetic patterns that are found across Eurasia. We can all argue until the cows home about the ancestry proportions. They'll never be perfect in the context of all analyses, most of which are probably skewed to some extent anyway.

I'm running more samples now. I'll make an update soon with more info.

All of the results above were done in the same way, so the ancestry proportions reflect the same components.

This Afanasievo individual is easily the most northerly shifted in my PCA, so the low Basal-rich result more or less makes sense. Note also that the noise due to deamination here is taken care of by the very low Sub-Saharan proportion. This probably otherwise shows up as increased southern ancestry in other analyses that don't have such an outlet for erroneous basal genotype calls.

Also, I think the Villabruna cluster is indeed part Basal Eurasian. This seems to show up in the Fst and resulting PCA. So this is where some Basal Eurasian will be hiding for many samples.

But all of the Bronze Age steppe samples, especially the early ones, show very low levels of the Basal-rich component, which is something I've noticed before, and that's why I was always stumped by models that showed them to be ~50% Armenian. That never made much sense to me.

Btw, the PCA above is very similar to one that I did with an AG3-MA1 composite, Villabruna, a Levant_Neolithic sample, an Onge, a Papuan, and a few African individuals. The Levant_Neolithic sample was shifted in the same way towards the Africans as the Basal-rich component.

So you do get a convergent estimate for Natufian and Iran_N under the MA1 dropped model here *only*....

But at the same time, you at the same time in all of these models you get a convergent Basal level between Steppe_EMBA (Yamnaya / Afanasievo) and Euro_MNCHL, and also Anatolia_N is still around half the BE level of Iran_N.

Assume the Basal Rich here could be around 20% WHG, so if so we'd get:

Doesn't seem like it would have that value that Laz find in all their models, of similar level of BE between EuroMNCHL and Steppe_EMBA, while Iran_N and Anatolia_N are much closer in levels than in the formal models (where Anatolia_N is pretty close to Steppe_EMBA).

Ryu: Matt, I suspect the Basal Component may represent different things in different populations, i.e. there is 'slipping and sliding' going on, so the very low percentage in Afanasievo may not be that representative.

Yes, probably is; at the same time, I'm not sure that the Afanasievo is the only one inconsistent compared to the formal Laz models, if you're saying anything like that, unless I'm reading you wrong.

I think you'd probably need something like 0% Basal in the Villabruna cluster, 40% WHG in Basal Rich, and something like 40% Basal in AG3-MA1 before you got any close estimates, even to the qpAdm model that drops MA1 as an outgroup.

I'm thinking BE was shoved into the ANE component. Admixture can be funky that way. I've got a calculator in progress with separate Natufian, Iranian, and ANE components. As soon as it fits close to qpAdm, I'll see about getting it out.

It wouldn't be 20% less Basal Eurasian unless Anatolia_N was 100% Basal Eurasian. It'd be 10% less Basal Eurasian if Anatolia_N was 50% Basal Eurasian (that's not close to what these models find though).

(Similarly, if their models find CHG is around 0.417 Basal, and Steppe_EMBA is theoretically around 0.5 CHG, then Steppe_EMBA would be predicted to be around 0.208 Basal).

Admixture doesn't work in that way. In some samples the Basal-rich might be a lot more basal than in others.

Btw, I don't think the AG3-MA1 cluster is influenced by Basal. The Iran_N ancestry proportions seem about right, with 5-10% of something non-Basal in the Basal-rich cluster. The rest is AG3-MA1 related, both near and far.

Also, there's an issue with the Basal estimates for the Iranian samples in Laz et al., caused by an eastern component specific to these samples. It's discussed in the paper, can't remember which page though.

I suspect that the steppe samples might also be affected by the same or similar phenomenon.

a) to get anything like their models with the same relative proportions of BE in Steppe_EMBA as Anatolia_N, around 2/3 as much BE in Steppe_EMBA as Anatolia_N, you'd need Basal Rich to be 100% Basal in Afanasievo (7.54%), and then only 25% Basal in Anatolia_Neolithic (to produce around 11% Basal and keep the two populations in ratio).

Or taken more literally - their estimates all show Steppe_EMBA should have around 20% BE - all of the estimates they've included are impossible without BE in another component that is present in Steppe_EMBA to a large degree.

b) if Basal Rich does vary very widely in Basal Eurasian depending on which population it is placed on, it's not a very useful index for how much BE a population actually has?

Graphically comparing the estimates that crossover between theirs and the BEuK7 - http://imgur.com/a/uP3rb

Also, there's an issue with the Basal estimates for the Iranian samples in Laz et al., caused by an eastern component specific to these samples. It's discussed in the paper, can't remember which page though.

Sorry, I can't find this, only the note on page 39

Three populations (Kostenki14, MA1, and Han), when dropped as outgroups, result in the quadruple (Test, WHG, EHG, Mota) being consistent with 3 streams of ancestry for all (or nearly all in the case of Han) Test populations. Removing Kostenki14 results in a blowup of the standard errors suggesting that it carries important phylogenetic information that is not present in the other outgroups. Removal of MA1 and Han suggests interactions between West Eurasia and Upper Paleolithic Siberia and East Asia which we explore in Supplementary Information, section 11. For our purpose of estimating Basal Eurasian ancestry, however (and unlike with Kostenki14), removing MA1 and Han from the set of outgroups does not result in a blowup of standard errors which remain modest (less than 10% for most Test populations). In Fig. 2 we plot graphically the Basal Eurasian estimate results when removing MA1, which results in successful modeling of all Test populations, and is thus the main estimate we use in the study. (note these are the same I used in the above graph).

If there are reasons the Iranian estimates are not valid, then OK, though that does mean that their association of Basal Eurasian with 0 Neanderthal ancestry sort of ceases to exist, because it is driven by them. That's a pretty big deal for their paper.

Not home right now, so I can't point you to the specific page. But it's in the supp info, and basically what it says is that the Basal estimates for the Iranian samples might be inflated and not higher than in the Natufians.

Also, the ranges for Basal in the paper, in a table in the supp info, are pretty big, and from memory go down to just 2% per cent for Steppe_EMBA.

So based on what I've seen, plus the uncertainty in modeling Basal ancestry proportions, I don't think it's a done deal that the Early Bronze Age steppe samples have around 20% of Basal. In fact, 20% seems a lot, and I'd say unlikely.

But I haven't tested many samples yet, just a couple of individuals from the steppe, and like I say, variable levels of Basal are probably hiding in the Villabruna cluster for many samples.

Re: error bars and ranges, yeah you're right that those are there, but I did sort of discounted them a bit since they exist for all the populations - Table S4.6 - feasible ranges Steppe_EMBA gets min 2, max 52, but then Anatolia_N gets min 3, max 49.

I did just notice there is one set model of models in their paper that does have a slightly better fit with these BEuK7 values : Table S7.26.

What this is the result of is, rather than modeling BE directly into all their populations, they model other populations as admixed between Natufian, Iran_N, EHG and WHG with qpAdm and qpWave, then feed in model BE values for those four from Section 4.

http://imgur.com/a/JYbxD - correlation of the S7.26 mixture based values with BEuK7.

So there is that at least. But they've kind of gone with the direct estimate in the Neanderthal correlation, so I don't know which they have more confidence in, and I would assume the direct estimate. To accept the Table S7.26 estimate would also mean accepting that the Steppe and Anatolia Neolithic models as really modeled by admixtures with Iran_N, which are a bit dubious from other perspectives (e.g. PCA, etc.)...

Most of all, Basal Eurasian levels should show an inverse correlation with the levels of north Eurasian forager ancestry, although in the usual West Eurasian PCA samples with inflated levels of ANE, ENA and probably other eastern components are pulled south. So in PCA at least, there's a pseudo-Basal effect for the steppe samples.

Actually, if you look at the graph in the main Laz et al. PDF that compares Basal Eurasian levels with Neanderthal admixture, Anatolia_ChL has about half the Basal of Steppe_EMBA, even though it looks more Near Eastern in all other comparisons. And Iran_Hotu has more Basal than Iran_N, even though it has more EHG affinity.

So WTF is going on there?

I'll run Anatolia_ChL as soon as I get the chance. But you can already see in the above test results that Iran_Hotu looks less Basal than Iran_N, which agrees with everything else, but not the Basal/Neanderthal graph in Laz et al.

Tomorrow I'll see what happens when I tweak the dataset by removing some of the steppe reference samples. If this increases the Basal in Afanasievo and the whole test still works, then that will be interesting.

If nothing happens, or the whole analysis collapses, that'll be interesting too.

Those two Iran_Hotu and Anatolia_CHL are the weirdest outliers, though can some of that can be lain at the feet of them both being sample size = 1 for their populations, and maybe not so many SNPs (also Steppe IA), while the larger populations and higher coverage samples can be a bit more robust in the direct model? I'd expect Iran_Hotu to be a lot more like Iran_N, ultimately. Maybe even having 2 or 3 samples makes it much robust.

Still, Iran_Hotu having more BEu could be consistent with, f4(Iran_N, Iran_HotuIIIb; EHG, Mbuti) = -0.00199 (Z=-2.4), if Iran_Hotu had less of some other kind of non-Basal ancestry that was not closely related to EHG. What they're doing in the Table S4.9 those estimates from the Neanderthal graph comes from is qpAdm with the outgroups (Ust_Ishim, Kostenki14, MA1, Han, Onge, Papuan), when the pright are Mota*, EHG, WHG, and dropping any individual outgroup of them (except Kostenki14, which when dropped gives no meaningful results) doesn't change Hotu's relative position in the BEu ranking.

It might be interesting to run the whole set of D(Iran_N, Iran_Hotu; X, Mbuti) where X is Villabruna, Kostenki14, Ust_Ishim, MA1, AG3, EHG, Han, Onge, Papuan, Kharia to see what comes out of that, and whether those stats themselves agree that Hotu is less Basal Eurasian (particularly should be closer to Ust_Ishim if less BEu, or at least further away) or if it's just more EHG, and looks like a strange, hard to explain "High EHG, high Basal" thing.

Re; Anatolia_CHL in the comparisons from Table S4.9 which don't drop MA-1 as an outgroup, Anatolia_CHL does get the more stable 20% Basal Eurasian. Still oddly lower than EuroMNCHL and Steppe_EMBA though.

*Doing a qpAdm which includes Mota and a lack of African outgroups, I guess, allows Mota to emulate the behaviour of Basal Eurasian in their theory.

@Ryu," now that we Know Yamnaya has EEF ancestry, is now an uninterrupted increase in WHG and EEF ancestry from the Khvalynsk-->Yamnaya-->Andronovo-->Iron Age transitions. Which makes me think that your original idea about interactions between the Steppe and CT/other settled societies in Europe creating cultural dynamism, may have something to it."

Andronovo looks more like a brand new population from Central Europe. The change from Yamnaya to Poltvaka_Outlier isn't gradual.

@ robA.You are partly right. Sorry for bad wording but writing on the smartphone is actually a problem.B. I will delete the comment. However you seem to make two mistakes. First is to think anyone "listens" beyond their pet theory. They dont. Second that I try to convince anyone of anything. Because of one,i don't.C. All i am trying to do is create a digital record of it. Because if i am right shortly after everybody will argument as if they were never wrong. When in fact all people do is talk amongst themselves to themselves.

@RobSo. My point to ryu was actually meant to a lot more people. See how instead of looking to the source of EEF in yamnaya just next door where the EEF actually seem to have originated, to fit the pet theory they "pretend notnto see" which is in fact a very interesting neurocognitive pathway fired by the anterior cingulate cortex... But that is a different story.

You make some solid points. However if you're looking to validate your opinions thru some meaningful discussion, then I think you giving this place too much credit. Despite it's great potential and Davidski's ability, this bolg just keeps on propagating the same western European wannabe bullshit.

"...you seem to make two mistakes. First is to think anyone "listens" beyond their pet theory. They dont."

You're projecting.I myself have abandoned numerous pet theories of my own over the years, some of which I felt very strongly about, and I'm far from alone. I'm sure that many people here have. In fact, the scientific method relies on a willingness to abandon one's pet theories, as new data comes in. That's how it's supposed to work, and hopefully most of us understand that, but you don't seem to. So yeah, speak for yourself.

"Second that I try to convince anyone of anything. Because of one,i don't."

Oh. Is that why you spam every other thread on this blog with demands that we all read your thesis?

The gradual admixture scenario is possible of course. Except ancient DNA from Samara shows a sudden appearance of R1a Corded Ware-like people. They lived in the same time period as R1b Yamnaya like people in the Poltvaka culture, as if they were two differnt populations.

It's similar to Bell Beaker and Corded Ware in Germany. Yes they weren't contemporary but they were clearly two differnt populations, mostly because of differnt Y DNA.

"Well that's true. The EEF in Andronovo is significantly more similar to Iberia _Neolithic than the EEF in Corded Ware or Yamnaya is."

I'm not confident there's a way to differentiate Ibeira_Chl from German_MN.

@JingusThat is called setting a digital footprint. Its not really meant to convince anyone. You do know what that is right and how its done?

Secondly...thanks. I which i was that successful in *spam* my thesis. Not really. Don't have the time to. Although i which i do a better job at it.

Lastly. If you abandoned several thesis its because those were not particularly good or well crafted, were they? Mine on the other hand has names, dates, phone numbers steet names (sarc)... Not broadly set mames as steppe or oit or generally meaningless brush pictures. In the end i will be right or wrong and believe me either way wont loose a second of sleep over it. Now one thing you can be sure...i wouldnt lose a second trying to convince the likes of you. Waste of time.Have fun and Over and out for you.

For the Basal Eurasian estimates from Lazaridis 2016's SI, section 4 and section 9, I thought I'd run through them through a PCA along with some estimates of my own on how much of the non-Basal for each was WHG or ANE (unfortunately not provided!):

I was having a few more thoughts about the qpAdm method Lazaridis et al use to estimate BEu (on p37-p40 of the supplement), and whether there are some improvements available from the Fu et al samples that weren't available for Laz to use.

The final estimate they use is outgroups Ust_Ishim, Kostenki14, Han, Onge, Papuan - and then model populations Mota, WHG, EHG where Mota is the substitute for the Basal Ancestry.

Useful samples it seems to me that Fu et al make available would be GoyetQ116-1 and Vestonice16 (as other UP European with low relatedness to particular recent groups, like Kostenki14, and lacking ENA relatedness and Basal Eurasian), AG3 (as a better representative of ANE than MA-1), and Villabruna (as IRC the least ANE / ENA shifted member of the WHG cluster).

So maybe it would be interesting to run that qpAdm again, with outgroups still as Ust_Ishim, Kostenki14, Han, Onge, Papuan, and then the model populations as Mota, Villabruna, AG3, GoyetQ116-1/Vestonice 16.

And also the same with outgroups as Ust_Ishim, Kostenki14, MA-1, Han, Onge, Papuan.

But you could also include as test any high coverage UP Europeans (e.g. whichever of GoyetQ116-1 / Vestonice16 you aren't using), and test the method that way, as they shouldn't score and BEu.

I think that would help split apart true Basal Eurasian in the Near East (if it exists) from ancestry which diverged early from WHG and EHG but is still West Eurasian, just without that increased ENA affinity that shows up in WHG and EHG. Laz 2016 doesn't seem to think this is a concern, but it seems like it should be a concern if ANE / Villabruna cluster members are contributing to and from some ENA groups.

(Appreciate you're busy, so this is if this is quick to run, or food for thought for the future).

Yeah, I'll be home in a bit. I think Bichon is the least ANE of the group, without at least 10% UP ancestry. In Admixture, Villabruna scores the most ANE of the 15 I use. I think it may be excess Neandertal keeping him away from ANE a sliver more than Loschbour in Dstats. I've got the following components; SSA, San, WHG, ANE, Natufian, Iran, Onge, and Ami. I may add a UP component based on the Aurignacians too.

If there are reasons the Iranian estimates are not valid, then OK, though that does mean that their association of Basal Eurasian with 0 Neanderthal ancestry sort of ceases to exist, because it is driven by them. That's a pretty big deal for their paper.

I think this is another problem in the paper, because if Basal Eurasian lacked Neanderthal admixture, wouldn't that make them closer to Africans (and in this case to all Africans, you wouldn't even need to find a specific branch)? (It would probably be better worded as Neanderthal admixture would make non-Basal Eurasians further away from Africans than Basal Eurasians are). I think we can see this effect with Ust-Ishim, Kostenki14 or Vestonice16 (and presumably much more with Oase1, though I haven't seen it explicitly), even if they only have a tad more Neanderthal than WHG/EHG. So why don't we see this effect much more clearly when comparing Iran_N and WHG to Africans?

In general, I think it's difficult to test accurately both of these things (Basal Eurasian having no Neanderthal admixture and Basal Eurasian not being closer to Africans). You probably need better materials and moethods. But at least theoretically, it seems to me that you can't argue for both at the same time (unless you do show that Basal Eurasians are indeed closer to Africans, but then argue that this is just the effect of them not having Neanderthal admixture and not them being otherwise more related to Africans).

Interesting comment and questions. I think that such an effect does show to some degree for the Papuans (Denisovan+Neanderthal), and Oase1, so you could infer that if BEu lacked Neanderthal, you would expect it to share more with Africans than more Neanderthal admixed populations in D stat measures.

Another question that comes to my mind is, given Neanderthal is very divergent, if one Eurasian population did lack any Neanderthal ancestry, while a few other Eurasian populations all had it at around 3%, could this create an effect like the Neanderthalised l groups forming a clade together? Even if that was not phylogenically true. Might be worth the study authors giving some thought to.

Like, say you had WestEurasian A, WestEurasianB, who form a clade, and then a separate Ust_Ishim clade and ENA clade, then WEB, Ust_Ishim and ENA all pick up some Neanderthal ancestry, which by the processes of selection smooths out to the same level, while WEA is unaffected. Would WEB, show some relatedness to Ust_Ishim and ENA that WEA lacks?

How much of the sharing between Eurasians is mediated by Neanderthal ancestry. How much could a few unrelated clades show as related just by sharing the same % of Neanderthal ancestry? How much differentiation between the Eurasians remains when Neanderthal alleles are masked out? Do the D-stat relations found still exist when Neanderthal derived variants (and variants derived from those variants) are masked?

That seems like it would need consideration in light of the idea of Near Eastern populations and particularly ancient ones systematically being derived from some pop that lacked Neanderthal ancestry.

Are you suggesting that BEu is a sister branch to a population related to an UP European group which simply did not mix with Neanderthals, whilst other west Eurasian, as well as all other divergent Eurasians (ENA, U-I) did (somehow) ?

This would have to essentially exclude a southern coastal AMH route wholly (beyond the Persoan gulf) , and mean that all Eurasians apart from BEu colonised the globe via a path going through Neanderthal territory

"This would have to essentially exclude a southern coastal AMH route wholly (beyond the Persoan gulf) , and mean that all Eurasians apart from BEu colonised the globe via a path going through Neanderthal territory"

would a southern migration counter clockwise around the himalayas followed by two back migrations, east and west, work?

Firstly, the non-Basal (ie crown eurasian split) was more than a two way split (west & east), but one which has to take into account north Eurasian (ANE, WHG); ENA, Ust-Ishm, etc)

Secondly, Papuans have Neandethal admixture. If so, and considering the territorial range of Neanderthals existed, then Asia would need to have been colonised via somewhere near the Southeast Caspian region. It can get tricky about the routes around the Himalayas, south of them, north of them, both, layering, etcIt's been debated since the early days of mtDNA, and recently with TreeMixes etcIndeed, a few threads back, most TMs had crown Eurasian slightly closer to the basal in Iran than Natufians, which suggested to me that Cr Eu split off the basaloid branches closer toward Iran than immediately in the southern Levant

Rob: Are you suggesting that BEu is a sister branch to a population related to an UP European group which simply did not mix with Neanderthals, whilst other west Eurasian, as well as all other divergent Eurasians (ENA, U-I) did (somehow) ?

Really, it's that since there is an apparent correlation with Near Eastern early Neolithic ancestry and Neanderthal statistics, I'm interested in whether the D(EEF,WHG/ANE,Han;Ust_Ishim,Outgroup) signals would remain when Neanderthal derived variants are removed.

If those signals go (e.g. D(EEF,WHG;U_I,Outgroup: 0, when Neanderthal derived variants are removed), then something like that topology like: http://imgur.com/a/JvGDS could be possible.

Seems kind of unlikely to me (three separate edges from Neanderthal is less simple and no sign of it in any treemix with Neanderthal) and how it would make in terms of movements I don't know, but seems worthwhile for any listening academics from the Reich group to test, if they're listening in here.

"I'm interested in whether the signals would remain when Neanderthal derived variants are removed."

It would be fantastic if we had an accurate list of all alleles that l entered the modern genepool from Neanderthals or Denisovans.

Unfortunately, this is much more difficult than it seems. Since there was often heterozygosity at the exact same alleles in modern Africans and in archaics, you can't simply seperate them on a SNP basis, you would have to use short haplotypes. This requires high coverage sequencing to be very accurate.

The main problem is that we don't have the exact genomic sequence of the Neanderthals that actually admixed at the split toward the Crown Eurasian group.

We are only inferring that info from a very small number of distantly related Neanderthals.

Also, we do not know if the admixing Neanderthals at the base of Crown Eurasians already had any earlier AMH admixture (as the Altai Neanderthal did).

" Since there was often heterozygosity at the exact same alleles in modern Africans and in archaics, you can't simply seperate them on a SNP basis, you would have to use short haplotypes. This requires high coverage sequencing to be very accurate."

There should be plenty of sites where Archaics are hetrozygous to the exclusion of Africans and Eurasians, or even hetrozygous at AMH hetro sites, but with a very different allele frequency. The problem is that we don't have any good arrays that are very well ascertained in Archaics. Even if we put together one from a high coverage genome, the problem is AMH may not be genotyped at those sites (except for some sequences at the Simons Human Diversity Project).

I have worked with the 110K Lazaridis Denisovan ascertained panel, but found that they had too much overlapping allele frequencies at African sites. I did not spend too much time on identifying Denisovan unique sites. That may be a possibility.

With regards to not having the sequence for the Neanderthal groups that actually admixed with AMH, that is not that big a deal. An analogy is even with Africans being very diverse, we are still able to identify Africans in general from Eurasians

"With regards to not having the sequence for the Neanderthal groups that actually admixed with AMH, that is not that big a deal. An analogy is even with Africans being very diverse, we are still able to identify Africans in general from Eurasians"

No doubt. But you are not trying to identify 0-5% admixture from groups that you have only a small amount of data from.

The available Neanderthal genomes are mostly low quality, with the exception of the Altai Neanderthal that clearly has AMH (African-like) admixture.

There must be a few thousand SNPs that originated in the lines leading to Neanderthals and/or Denisovans that will be 100% indicators of that ancestry at thiae positions.

However, we can only identify those that were present in the small number of sequenced genomes.

The other factor is (as you mentioned) that the enriched genotyping data is great for calling alleles on known heterozygous sites. But we can't know if those sites were also polymorphic in Neanderthals without more data. And we certainly can't know if those sites were polymorphic in the particular Neanderthals that admixed into the base of Crown Eurasians.

Only many more higher quality Ust-Ishim like genomes will be able to shed light upon that.

"No doubt. But you are not trying to identify 0-5% admixture from groups that you have only a small amount of data from."

The amount of admixture is not really the issue. We are able to identify such small amounts of African admixture in Eurasians, even just using a couple of African samples. It would be helpful if we had a few good coverage samples to get an idea of diversity at various Archaic loci, but keep in mind that even if we don't have genomes from the ones that admixed with AMH, Archaics by virtue of their shared drift with each other, should have loci with a shared common "Archaic" allele frequency to the exclusion of AMH. Naturally at some loci, archaics will show varying allele frequencies amongst each other, but at other loci, archaics from various groups should have a common "archaic" allele frequency different from AMH, or perhaps at those sites AMH may have a MAF of 0 (fixed in AMH), or visa versa.

"The available Neanderthal genomes are mostly low quality, with the exception of the Altai Neanderthal that clearly has AMH (African-like) admixture."

I believe that I have seen a couple of decent coverage genomes (Denisovan and Neanderthal". The African like admixture is not real, as it is a byproduct of ascertainment bias. The reason for the agreement at those loci between archaics and Africans to the exclusion of Eurasians, is not geneflow from Africans to Altai or visa versa, but because as you mentioned African and Archaics agree at those sites because they are very likely sites ancestral to both AMH and Altai, but have mutated in Eurasians only. Whole genome comparisons don't support the African like admixture in Archaics.

Forgot to mention, you have a point about not having genomes from those archaics that admixed with AMH, not that we would not be able to identify general archaic geneflow to AMH, but rather that geneflow would likely be underestimated if we used distantly related archaics.

An analogy is if I had a bunch of Eurasian genomes and 1 African genome, I could identify % African in those Eurasians, but that would likely be an underestimate of African admixture for somewhat obvious reasons

"Going forward, said Pinhasi, "We're eager to study remains from the world's first civilizations, who succeeded the samples analyzed in the study. The people everyone reads about in history books are now within the reach of our genetic technology."

Harappa, Sumer, Kemet, Elam, Hittites, Minoan Crete, I'd guess.

In other news - http://www.nature.com/ng/journal/vaop/ncurrent/full/ng.3621.html - "Genomic analysis of Andamanese provides insights into ancient human migration into Asia and adaptation"

"We show that all Asian and Pacific populations share a single origin and expansion out of Africa, contradicting an earlier proposal of two independent waves of migration. We also show that populations from South and Southeast Asia harbor a small proportion of ancestry from an unknown extinct hominin, and this ancestry is absent from Europeans and East Asians."

I don't know about single wave (given ANE findings in Lazaridis 2016). Is this unknown archaic ancestry the Denisovan related we know of, or something else?

Yes, those are good questions for someone to look into if they have the technical ability to do so.

This paper about Andamanese is quite relevant to all this. It clearly shows how archaic admixture pulls populations away from Africans. French and Sardinian (the 2 basal admixed pops in the study) indeed show the lowest levels of Neanderthal (just lightly). And also appear closer to Africans (just slightly too). Also it seems they have a lightly lower Denisovan admixture than the other populations (Supp Fig. 13), though here Papuan is clearly the outlier with high Denisovan (as expected). And then that mysterious 3rd archaic population that admixed into South Asians (but not East Asians - Han and Dai), and is tested indirectly by their lower relatedness to Africans. Here it's Australians that seem to have clearly highest than South Asians.

The final effect of all the accumulated admixtures produces the highest stat like (Supp. Table 6):

I if I understand it, I *think* from their treemix models (and I think I vaguely remember something similar in another paper) "Ancestral" is a kind of simulated population that has ancestral states from which Neanderthal, Denisovan, Human are all derived.

That is they used the chimp genome and genomes for Neanderthal, Denisovan and Human to simulate the last common ancestor of the whole Homo. clade. Rather than any specific archaic population that shows derived states of their own, or the chimp which would shows lots of derived and ancestral states that are totally irrelevant to Homo.

So far Davidsky has published the ADMIXTURE data of only 9 pops.And already I find it hard to combine all the data.What is helpful for me is a datasheet, which has the rows and columns ordered by dendrogram.This can be implemented with the R function 'heatmap'.See heatmap.png in my Dropbox: https://www.dropbox.com/sh/2wtnsgvq05zszop/AAD5QDuXZog3gANg8e6fdVrEa?dl=0

@ Davidski, I see thanks for trying. If you have a chance, can you see if the model in the paper work, with outgroups as Ust_Ishim, Kostenki14, MA-1, Han, Onge, Papuan, and model populations as Mota, WHG, EHG? If those don't replicate then I must have misunderstood the section somehow. I can't see how the Villabruna / UP Europeans / AG3 in as ancestors in place of WHG and EHG would break it though.

Which part do you take that from? Dstats in the form of West Asian/CHG Paniya Onge E Asian favor the Onge. I also don't see the West Eurasian part of South Indians being included in TreeMix, which would explain them looking like a separate branch. qpAdm also favors Onge, and pretty much rejects E Asian.

@ Davidski. Thanks. I suspect if there are reasons why the models with the same right and then AG3, Villabruna and UP European as left don't work, rather than just give UP European at 0, it may be because there's not enough information for qpAdm to distinguish between some of the left.

I guess other options to explore the impact of allowing the low ENA affinity, high Ust Ishim affinity UP Europeans in are to make more iterative changes to their model with the UP Europeans (e.g. keep left pops as is, but substitute WHG for Vestonice16 or GoyetQ116-1, or trying swapping EHG for AG3) and then see what happens, but basically to explore or not explore in detail as up to you if and when you have time.

Yes, that's correct. This Ancestral is a simulated common ancestor of Homo and Chimp. It was used in the Mota paper too as an outgroup for estimating archaic admixture.

BTW, from the Mota paper there are a couple of East African populations that don't show any sign of Eurasian admixture: Sudanese and Anuak. In the absence of more African ancient DNA, they seem more relevant to test if Basal Eurasian is closer to Africans than using Yoruba or Mbuti. Though I'm not sure if those populations are publicly available.

From the Andamanese paper, another thing that caught my eye: While the stats in the form D(Loschbour, X; Andamanese, Yoruba) are significantly negative for South and East Asians as expected (Supp. Fig. 10), the ones in the form D(Mal'ta, X; Andamanese, Yoruba) are insignificantly positive (Supp. Fig. 28). At the same time D(Andamanese, Dai/Han; Mal'ta, Yoruba) is insignificantly positive too (Supp. Table 3). Not sure what to make out of it.

"We found that the relatively homogeneous population seen across western Eurasia today, including Europe and the Near East, used to be a highly substructured collection of people who were as different from one another as present-day Europeans are from East Asians," said David Reich, comparing apples to oranges".

It's rather obvious that the area between the Black Sea, The Med and the Persian Gulf have been a sink, rather than a source of the patrilinear cultures thar grew into the ancient civilisations.

That they all used domesticated plants and animals are well known.It's also established that the respective populations were growing out of patrilinear dynasties - forming "extended families" ("etnicities") - as they grew in numbers and built the first, large civilisations.

What this study establish is that the various branches of the paleolithic genome - forming these etnicities - have developed various speciations (sub-groups) of the plants and animals used domestically.

Today we also know that domestication of both plants and animals had already started when settlements like Aurignac, Kostenki, Madeleien, Maglemose, Malta, Mladec, Solutre and Sunghir existed.

Moreover we know that (all) these populations were very homogenic, sharing 'close genetic ties' across Europe, from Spain to Siberia.

Moreover we know that ALL of the known paleolithic sites went extinct, during the Last Glacial Terminus. This started with the LGM at 23.000 - 18.00 BP and peaked during The Younger Dryas, 12.900-12.100 yrs BP.

It was during the very last period that the last populations of larger species of land-animals disappeared, some 45 of them into extinction.

The same 'evolution' can be spotted among the remains of the artical humans that populated the arctic Eurasia - where a number survived the LGM, only to disappear during the 'extinction event' known as the Younger Dryas.

This implies that the specific haplotypes from paleolithic Eurasia are basically extinct, too. Except from a small group or two who happened to live in a climatic refugia, where they could survive the mass-extinction of the Younger Dryas.

We still don't know where this refugia(s) were located, but Pinhasi et al have already repported (2014) that only "small groups" of people had survived the devastating end of the Eurasian ice-age.

Since the results from Ust-Istim, Kostenki and Malta-Buret we have also know that ALL later Eurasians - including the arctic Hunters as well as the arctic Gathererers, Trappers, Fishers, Foragers, Gardeners, Herders, Dog-breeders, Goat-breeders, Horse-breeders and Cattle-breeders alike - have a common ancestry from a (small) group of SURVIVORS from the European Paleolithic.

Consequently we may explain the occurance of both the 'Anatomically Modern Caucasian' as a result of an ice-time refugia - where the specific traits and phenotypes neccesary to survive in the arctic hemisphere COULD develop.

Moreover - we may find that the spread of the later, caucasian y-lines - defined by an ancestral CT/CF-makrogroup - are the sole origin to the y-lines that made the respective dynasties that ruled the first, known civilisations - such as hg G,H,I,J, K.

Obviously there's a mutation of K2, forming R1a/b, that can be linked to the spread of Cattle-breeders, which seems to be the somewhat limited understanding of "farming", as the term is used by modern academians.

Thus it's good to see that Pinhasi et al have looked a bit deeper into these factors, finally substantiating what most of us already knew - that VARIOUS etnicities developed a v-a-r-i-e-t-y of produce, each adapted to regional biotopies and climates. Which explains why there are agricultural societies forming already some 10.000 yrs ago - not only in sentral America and China, but even in the arctic and semi-arctic regions of northern India, Anatolia, northern Africa and Europe.

An effective sperad of the arctic Caucasians was possible only after the Younger Dryas and the end of the Eurasian Ice-time. That may explain how the initial, "frog-leap" spread of both agriculture and the I-E languages can be linked to the older y-dna-lines of G, H, I and J. As their cousine-lines of K2/R1, were able to develop an infantilisated ability to digest milk and diaries we got the later spread of "livestock-agriculture", where the massively effective milking-cows start to roam all plains and lowlands.

Thanks to a common, post-glacial ancestor-group ("Noah") - the homogenity between the various caucasian populations of mesolithic Eurasia were also very close. Thus the various branches from "Noah" (C/CF) - forming todays 'brother-lines' of G, H, I, J, K+ (etc.) - became distinctively separate family-lines already at the beginnnig of Holocene, as soon as the repopulation of arctic Eurasia COULD start.

Finally - since the various y-dna lines from Paleolithic Eurasia went extinct as late as the Younger Dryas, we may have to look for a Last Common Ancestor to the e-x-t-a-n-t y-lines (from CF) as a result of the last 12.000 years - only.

Interesting facets: - Invariably some small level of the Levant / Natufian cluster in Mandenka, even where IRC the old SW Asian clusters never occured there.- Zoroastrian cluster seems to contain Euro_HG, and not just as a by product of containing Yamnaya cluster. not sure why.- Specific Levant/Natufian cluster absent from ancient Europeans always shows up in modern Southern Europe (except Basque).- Iran_N always takes CHG cluster plus South_Asian cluster. Poss not enough power to make its own component.

It's Admixture. It's not very reliable for differentiating ancient mix. It tends to load on groups with lots of samples, such as the Zoroastrians. I can check tomorrow, but I doubt they're very significantly Iranian over Levantine.

Genetiker is a schmuck. He probably hasn't cottoned onto the fact yet that some of these Zoroastrians need to be removed from ADMIXTURE runs because they show inflated pairwise IBD sharing, so they form their own cluster at high K, and pull a lot of other South/Southwest Asians into it.

Btw, the Kusunda from the Human Origins are really mixed. They pretty much have everything that exists in South Asia. Hard to make sense of them.

"Genetiker is a schmuck". As Genetiker is a German very likely, and "Schmuck" in German does mean "gioiello" in Italian, and if we put together the two metaphors (Gioiello too is a "schmuck")...I don't know if you know all the tens of languages I know, but...1) I think that the results about SNPs of Genetiker are reliable, above all about Villabruna, and that against Sergey Malyshev2) I don't give a dime to the autosome, I leave it to the Tolemaics

Davidski, we all are waiting for this paper, because from a scientific point of view the last word has to be given to "proofs", but the only serious theory against mine is that of batman's (vespertilius, who as Hegel's noctule doesn't like Mediterranean light and naked bodies), i.e. that there was one only refuge. We'll see whether in the cold North or in the sunny South.

I left ever opened the possibility that everything might have come from North, just when I said that R1a-M420 did come from Northern-Western Europe, and the close link of IE with Uralic languages should make us think to North, and also that R-L11* seems expanded from Baltic with German peoples migration to South and Western Europe, even though it seems very unlikely to me that some R1 (b or a) were there 14000 years ago.As usual I am basing me upon single Y: I am testing c/o www.yseq.net the J1* of an adopted man who is wandering through Europe in serach of his true father, who seems an Italian, I thinbk from Abruzzi or nearby. His Y separated from the subclades about 15000 years ago... I expect that als J will be found in Italy, older than Satsurblia and all the rest.

Davidski: He probably hasn't cottoned onto the fact yet that some of these Zoroastrians need to be removed from ADMIXTURE runs because they show inflated pairwise IBD sharing, so they form their own cluster at high K, and pull a lot of other South/Southwest Asians into it.

Probably not and he doesn't realise, at very high K, though in these case I think I misspoke and to be clear the Zoroastrians don't form a component in any of his K11-K13 runs. I should have said "Zoroastrian population samples seems to contain Euro_HG, and not just as a by product of containing Yamnaya cluster. not sure why." not Zoroastrian cluster (which there isn't).

@ Ryu, thanks for the comments. Re: Sardinians, I think depends on which mainland South Europe population being compared to, as its different for South Italian / Spanish.

Lol The BatmanNot only does he think IE came from ice age west Baltic, but it seems like he's suggesting that all humanity came from a surviving group in the Baltic after the YD, so 11kya, although he freely admits that he doesn't know the first thing about genetics, such as directions of gene flow

It'll also be good when his Gio shuts up everything coming from his granddads backyard in Tuscany because he "proved" in, you know, all those thousands of letters he wrote based on STRs, but he was blocked because there is a conspiracy against his genius, like Copernicus when he pointed out the world was round

Yes, it will be interesting to see what yDNA the early Iberian Bell Beakers had. I think some people here are suggesting they had R1b. And that this had migrated somwehow along the Mediterranean from West Asia (or Italy). But this is rather unlikely, because we already know samples from northern Iberia dated to 2900 - 2600 BC, and they were predominantly I2a, complemented with some G2a, I and H. One sample from El Mirador, I1277, was carrying I2a even as late as 2500-2400 BC. And one of the Remedello samples, RISE486, also postdates the Bell Beaker age, he lived after 2134 BC, yet he had haplogroup I. While already at 2500 BC Bell Beakers from Germany were purely R1b. I also point out that from early Bronze Age Armenia 2600-2500 BC we have only one R1b, and that wasn't M269, and from chalcolithic Armenia we have several L1a, but no R1b. So it's clear on which thesis I'm going to put my money...

@ Davidski"I think the new paper will confirm the steppe hypothesis, with both R1a-M417 and R1b-M269 coming from Eastern European Hunter-Gatherers via Khvalynsk, Repin and/or Sredny Stog".

Davidski, the first serious things you are saying from when I know you. So reasons a Copernican: he makes athoery and proofs for proving or disproving them, even though Koeppernick was very likely closer to genetiker than you.

@ CroMagnon

You who believe descending from an extinct species, and perhaps coming from Holy Spirit, Villabruna was a slap of mine, confess, but it was only the first... este paratus!

It's also noteworthy that modern Iberians have non-negligible, quite substantial LNBA European admixture that is even visible to the naked eye. And this can't be all from Visigoths and Suebi. If Celtic had spread from Iberia, with direct roots in West Asia, then the modern picture would probably look very different.

MDLP World-22 averages and a TreeMix on the components. First of all, it's obvious that Kusunda are endogamous and get a "Tibetan" component minimized in all other populations because of that first and foremost. TreeMix helps in checking the affinities of said component, once you check where the components peak from the sheet. Note that "Austronesian" is clearly misnamed and should be Australian.

Don't forget the theory which argues that early Iberian BB was steppic from the outset. Ie steppe colonists somehow reached Iberia already by 3200 BCIt'll be interesting to see what pans out, but I agree with you that where looking at a culmination of gene flows from south (=Iberian maritime beakers) and east (single graves, etc). I think Ryu's tests were a teaser but indicative of significant complexity we should expect, esp once we get samples from several Beaker regions

Yeah, that's true. But seems kind of hard to believe that Yamnaya herders went to coastal Portugal around 2900 BC.

@all

Of course you could argue that El Mirador was the wrong culture, i.e. not Bell Beaker at all, and moreover in an area where non-IE Basque may have survived for a long time. We know e.g. from the steppe that the culture matters, and even neighbouring cultures may have had very different haplogroup profiles. And you could also specially plead that Northern Italy was only culturally affected by Bell Beakers, but not much by immigrants, and hence explain the late Remedello sample. And the steppe admixture in modern Iberians could be explained as a Celtic back-migration. We've seen a back-migration on the steppe too, so why not in Iberia? These arguments are all possible, but not very parsimonious at the moment. I think they'll only be justified if new data forces us to adopt them.

The European LNBA input in Iberia is about 50-60%, and the R1b was most probably introduced by them. The Bell Beakers of Iberia were probably carrying typical neolithic haplogorups, such as I2, G2a, etc.

Addendum to prevent all misunderstandings: What I've just been suggesting was that Celtic and R1b might have back-migrated to Iberia at a later date in the Bronze or Iron age thereby freshly introducing the LNBA ancestry there. That's what I consider the less parsimonious theory atm.

R1b-P312 going to Iberia then back east is an unnecessarily complication. It went one direction East>West.

I'm most interested in DNA from Bronze age SouthEast Europe. IMO, we'll see gene flow from Northern West Asia and Northern Europe. By the time Greeks began writing they were very mixed by Bronze age standards.

Anyway, without a "super paper" on bronze age samples this is what "I know" as I suppose anyone who reads anthropology also should “see”. So, this “super Bronze age” paper is going to disprove me on R1b. Ok

a. R1b M269 was in southern Caucasus north of kura river so between the lesser and the greater Caucasus and also at the basin of Araxes river in Aratashen (from 6000BC to 4.900 BC – See how precise the dates are!). The culture is today called Aratashen-Shulaveri-Shomu, because it existed in both pockets (oasis), separated by lesser Caucasus Mountains.b. Aratashen got kicked out first. Aratashen is near the Sevan lake where that Bronze age R1b (p25?) was found. So by 5300Bc they were kicked out by the Ophidians (snake people) coming from beyond the Zagros mountains ( a couple centuries earlier) so the source people of Ubaid, Uruk and whatever was left after the period 5500BC to 4500 BC. Anyway, Aratashen Fell first at the same time as the Halaf was transitioning to Ubaid. Maybe, just maybe this were the M269 (xl23).c. By 4900 bc, Mentesh tepe fell (that is north of lesser caucasus mountains where the shulaveri lived. That is where R1b-M269 lived. That is the point L23 arised. Some had it, same didn’t. Pushed by Ophidians people (Ubaid/Uruk) coming from east/south they were pushed north.d. By 4800 bc, some were near the shores of Black Sea. Some where in Anaseuli (near black sea ) others, with R1B-L23, near Kvachara – where you cross from southern Caucasus to Northen Caucasus – and later part of Maykop territory. I believe The M269 (xl23) were in Anaseuli and moved south back to Anatolia (where highest variance of R1b still exists) to coalescence and then immediately south to … that is the rest of my theory.e. The ones moving north near the black sea (also with L23) were in Nalchik by, say… 4800/4700 bc. And kept moving north to samarra, Khvalynsk, Sredny Stog.etc.

So,Just hope your super bronze age is not talking about R1b-M269-L23 in …. “"I think the new paper will confirm the steppe hypothesis, with both R1a-M417 and R1b-M269 coming from Eastern European Hunter-Gatherers via Khvalynsk, Repin and/or Sredny Stog."” Millennia later, because the right answer to that it would be…. No shit, Sherlock!

Just hope that Super bronze age paper wlll give us, bronze age L51, L11 in Sredny stog and eastern Europe at least by Bronze age. Otherwise is a double “no shit Sherlock”.

What everyone should be trying to figure out is who were and where they came from, the Ophidian people that during 6th millennia BC moved to northwestern Iran, then to north Mesopotamia, to Iraq, Syria, and even up to parts of Anatolia. They all fell fast. Shulaveri (both at Shomutepe as in Aratashen) the Halaf, etc. That is the admixture you see and we have been talking about these last weeks. They admixed everyone and made the first civilizations… and scared the living shit out of everyone with their deformed heads and face paintings to look like snakes, so much that a whole lot of people run and kept running until they couldn’t see them no more.

The VCF 4.2 files are preferable for running an analysis, since they have calls for all the SNPs on their respective chips, and the VCF 4.2 "DRP" field contains all the high-quality read counts for each allele too. (There should be "no reads" for those SNPs that have no reads, so you can distinguish a "homozygous identical with Build37" result from a true "no call".

WC1 is a whole genome sequence (not from a capture array) at 10.42x coverage with 622,993,765 raw read pairs, extremely high coverage for an ancient sample. Between these two datasets we should be able to get excellent IBD between WC1 and other ancient samples, and all sets of modern individuals too.

Most of you may not realize it, but WC1 is a complete mystery. It's from a femur found in Wezmeh Cave in the Central Zagros, 7455-7082 calBCE (9465-9092 BP). The femur was dragged into this cave in a cliff by an animal, apparently from an exposed grave somewhere else.

WC1 was a full-on grain farmer, whose diet consisted mostly of grain. He possibly also ate domesticated goats and cattle (and dairy?). There's no evidence that he hunted wild animals like gazelle at all. This is based on an isotope analysis of the collagen of the bone. There are very few radiocarbon dated Pre-Pottery Neolithic sites in Iran at this time. There are even fewer excavated sites, because most of these were from surface finds.

The contemporary site of Tepe Guran in particular is only 33 miles / 53 km southeast of Wezmeh Cave.

This period in the Central-Northern Zagros of Iran is called the "ACN" or Aceramic Neolithic. It's contemporary with the Middle PPNB and Late PPNB in Northern Syria, Southeast Anatolia, and Northern Iraq, and also with the ECA I (Early Central Anatolian Neolithic I) of Çatalhöyük. It predates the Pottery Neolithic Jarmo Culture of the northern and central Zagros by several hundred years.

I think it's very clear that the ancestors of WC1 recently arrived in the Central Zagros from somewhere else with their wheat and cattle. Where did they come from? The PPNB region of Southeast Anatolia?

If Iran_IA is baseline for Iranic ancestry in modern western Iranics, then Iranic ancestry should be very low, less than 15%. Non-Iran_IA part of ancestry of modern West Iranians is most similar to modern Pamiri Tajiks. I don't think anyone surprised this though.

Part of the steppe ancestry in modern West Iranians predates the proto West Iranics(probably total steppe ancestry is over 20%). And probably from groups like Hurrians et al, even though Hurrians et al didn't speak an Indo-European language, their religion and culture have IE influence all over.

Iran_IA looks like Armenia_EBA(Kura-Araxes) on 2D PCA but its admixture profile -altough not much- is still different. Shows less steppe, more Iran_N and Levant type admixture, though this should be expected judging by its southern location and recent in time. On 3D PCA, modern West Iranics are in a cline runnig through modern south-central asians to Iran_IA.

I also checked Iran_recent, she's most similar to Feyli Kurds, has slightly more Pamiri Tajik like ancestry than modern West Iranians.

It's true that the state of research into the earliest Neolithic in Iran is very poor. It does however look like the early Ganj Dareh people were only nomadic goat herders, not settled farmers. The Ganj Dareh people were "almost" nomadic hunter-gatherers who "herd managed" wild goats.

Of course, the Zagros grain farmers and cattle herders were not directly derived from the Anatolian Neolithic people. We do know that the tMRCA of G2-P287 will be just after the LGM, around 18,200 ybp. not before it. There must be some sort of connection between the two groups that post-dates the LGM. It seems likely that both groups originate in the Middle Euphrates region, and migrate in opposite directions admixing with the local hunter-gatherers. The Anatolian Neolithic people are all in various branches of Y haplogroup G2a2-L1259 with a tMRCA of 16,800 ybp, while the Neolithic Iranians are in G2b2a-Z8022 (WC1) and G2a1a-Z6553 (SG2/I1671). (There's also a Chalcolithic sample from Seh Gabi, I1674/SG21, in G1a1b-GG362/Z3189.) Even leaving out the G1a, the tMRCA between all the G2-P287 samples is 20,800 ybp, during the LGM.

This certainly is in the Upper Paleolithic, but they are not separated by 45,000 years as the authors of the various studies claim. The Anatolian Neolithic Y results indicate that the G2a2-L1259 lineages had been co-migrating together since 16,800 ybp, and they weren't too far off before that from the other G2-P287s.

The paradox is, where is the shared ancestry that must exist between these two groups that is at least 25,000 years later than the claimed earliest shared ancestry? Does this have something to do with the high percentage of so-called "Basal Eurasian" found in both groups, a kind of IBD that may have been lacking in the Iranian Hunter-Gatherers such as the sample from Hotu Cave?

@rob,Hummm so prior to being able to extract dna from inhumations we knew nothing about history? So even theories like steppe pie had no business being put forward prior to dna extraction?Or,

Can you points us to genetic data that make us believe that pharaonic egypt stem from the movement of people from upper nile onto lower nile and the mixing of both? As far as i know there is no genetic data to support this assumption. So every ancient Egypt expert is an idiot because there is no dna to support their claims?

You're deflecting the question by false analogy The expansion of Pharaonic Egypt is documented archaeologically, and makes no bold or specific calls about genetic lineages coming from a specifically defined geography, as you do, hence my question about specific evidence, esp. in light of how "obvious" you keep claiming it to be and the "evidence" you mentioned but did not elaborate upon or cite

Moreover, your theory reads more like a cartoon than one developed from a propper evaluation of evidence & rooted in deep methodological understanding. Some apparent parallels between Zambujal and Shuvaleri enclosures, albeit they're separated by 2 Millenia with no bridging? Essentially, the snake people chased everyone away- that's the crux of your theory ?

@open genomes,my problem is. apart from the growing evidence stemming from goats, cattle, cereals and so forth to support your previous comment, what we know is that a leveling and admixing force came to Caucasus, Mesopotamia, even part of Anatolia, from 5500 bc to 4500 bc and changed the region. Halaf to Ubaid, Ubaid, Uruk, end of shulaveri, etc. It was the snake people and it first show up in northwestern Iran, near Zagros. So, it could have been a local event then. Is there any indication of "foreign dna" from east or southeast of mesopetamia/caucasus/anatolia that was added to the mix and shows up in chalcolithic populations there? What is it with L1a guys found in Caucasus?

G2 in Anatolians can be from Iranian-like admixture. Remember, Lazaridis had Anatolians as about equal parts Iranian, Levantine, and WHG. The Levantine is 2/3 Natufian and 1/3 Anatokian-like, meaning Anatolians have been modeled as a decent amount more Iranian than Natufian. So, Anatolians may be something like 25% Natufian, 40% Iranian, 35% WHG. The G2s may have only recently gone separate ways, even after the Older Dryas.

@ rob and friends.I do not seek your validation, right? So, couldn't really care about what you think...But, let me just put a challenge to you guys. Just yesterday, this, a copper awl was found in a old pit of perdigoes and you know how that is important to my thesis. So probably near 3000 BC. see the pic. http://perdigoes2011.blogspot.pt/2016/07/0168-dia-9-2-fase.html

Now,This a a picture of the oldest copper awl found in in middle east, around 5000 bc (when the shulavari were on the run). This copper awl as been traced its origins to Arukhlo, the heartland of Shulaveri itself and was not local! see the awl!http://www.techtimes.com/articles/13864/20140824/scientists-unearth-7-000-year-old-copper-awl-oldest-metal-object-in-middle-east.htm

now, I do not have a pic, but a similar copper awl was found in Maadi in lower nile (so at the Merimde region) so... 3800 bc?.

Easy. Go find similar copper awls in a different route in copper age and shows us here... Because there are similar in north caucasus... coming from south Caucasus ore. So show the spread by steppe and so forth.

oh, bell beakers in Iberia had Ivory. But oddly enough the Ivory was from "Savannah elephant" not even "jungle elephant" that existed by copper age in north Africa. See, even in east Iberia, all elephant ivory, as in east Mediterranean sites for that matter, was from "asian elephant" .... do you know were there was similar ivory found? in ancient Egypt heliopolis near the Nile Delta... yeah, right.

I could go on forever...

Well if R1b didn't spread with BB from Iberia, that then is a different story,,,

I do agree that they are likely more Natufian than Iran_EN, but check this out. I don't think the Iranian is so minor, but not far from Natufian.

Depending on what is closer to the WHG in Anatolians, we see that the numbers aren't very significant. Hungary_HG, as we know, does look closer to Anatolians, but that might be because of possible minor BE and some ANE, which is likely in Anatolians with CHG/Iranian influence. If the hunter that admixed into Anatolians is more like Loschbour or Villabruna, then they aren't that much more Natufian than Iranian.

The two should share a good amount of drift if they're both a mix of Natufian, WHG, and Iranian. The stat above, Mbuti Iran Levant Anatolian shows that the Levant + Iran + WHG is valid. Even with more BE, Levant is further away, so Anatolia must have a good amount of Iranian. Considering Anatolia has significant WHG, which is much closer to Natufian and Levantine than Iranian. I would think this confirms Iranian into Anatolia must be comparable to WHG, just as Natufian Anatolia WHG Iran confirms.

Nah, Iran_N has a lot of ANE. That qpGraph in the last paper showing WC1 as 60/40 Basal/ANE isn't far from the truth.

Anatolia_N has no ANE that it didn't get through the Villabruna stuff. So it can't have any Iran_N.

Like I say, Anatolia_N and Iran_N do share ancestry, and some of the Anatolia_N individuals have clear CHG (Iran_N-related) admixture. But the idea that Anatolia_N is significantly Iran_N is just plain stupid, and I can tell you it will eventually be corrected.

Some Anatolia_N individuals have some CHG or another type of Iran_N related admixture.

But Anatolia_N by and large can't have a lot of Iran_N ancestry, because Iran_N is rich in ANE, while Anatolia_N lacks it totally.

I can't see any way around this by positing the existence of a ghost hunter-gatherer group. There's simply a problem with the model, possibly related to having Levant_N as an unadmixed pop with no Villabruna-related input, when clearly it has such input.

I'm not saying Anatolia_N doesn't have MA1-related ancestry. What I'm saying is that this didn't come from Iran_N admixture, and most of it is contained within its Villabruna-related ancestry, apart for a few Anatolia_N samples that clearly have CHG or some other Iran_N related admixture.

This is what non-Iran_IA part of the modern West Iranics ancestry looks like36.36 Potapovka_I041913.03 Andronovo_SG_RISE5052.73 Sintashta_MBA_RISE39529.70 Iran_ChL_I16615.45 Pima11.21 Paniya1.21 Koryak0.30 Nganasan

About Sardinia I actually saw some of the Nuraghes head statues last year, but don't know too much about them, indeed they're a mystery to historians. I suspect that they're just Neolithic descendants.

But apparently the Nuraghics are sometimes counted amongst the "sea peoples", so perhaps trans-Mediterranean contact is to be expected. More pertinently, Sardinia was colonized by Phoenicians, becoming a Carthaginian colony until the Romans came. After Rome, it never fell to Arabs / Berbers.

* About Balkans It was immediately obvious after the Satsurblia genomes came out and Dave did his new CHG K8 that extra CHG was required into Europe, probably via an Anatolia - Balkan route. It makes sense archaeologically. I remember only up until a few weeks ago that Sam attempted to vehemently deny such a movement, but others (eg Roy) thought is should be placed in the 2200 BC period ("bronze age collapse"). I always maintained it could/ should have begun as early c. 4000 BC, after the copper age collapse.

So Maybe we should expect Iranian - Anatolian like stuff in *some* (eg lower Thrace), but perhaps not all Balkan Bronze Age samples (eg further north toward Hungary), as it probably penetrated Europe more slowly compared to the Eastern European steppe-like mixture, or was initially limited to Southern Europe. I suspect there were even 2 differential types and streams- an island Cypro-Minoan type and inland Anatolian- Thracian type. Of course, we might be looking at an additional Anatolia -> SEE in 2200 BC, also.

The Croatian paper your after is by Pinhasi Abstract out only http://bib.irb.hr/prikazi-rad?rad=8246

Quite the opposite ! , but it depends on the region In Anatolia, it seems there occurred an "exhibit readjustment" with relative continuity But in Greece, there appears to have been fresh migrations from Anatolia, maintained by archaeologists from the 60s and to this day. A second, smaller movement appears to have come from the NW Balkans, from the Cetina culture area.

I agree with You that Hurrians should also have spread some Steppe like ancestry. After all they were neighbours of IE and they definitevily have IE influence. Both linguistic and religious (Teshub). But on the other side from whom the Southern shift of modern Armenians comes from? It is impossible that this shift was mediated solely by 'Semitic' influence. After all we didn't seen any J2a, J1*, G2a in North Near East BA / IA context. So I still expect substructures in North Near East mountains. A lot off people should be there without any Steppe just a mixture of Iran Chl and Anatolia Chl. Plus some Levant with loads of J and G.

We should have the means to do at least some comparisons like they did with South Asians to evaluate the algorithm's performance in other locations - comparing it to Broushaki's modeling of moderns as ancient West Eurasians + mota/yoruba/han. With the obvious limitations, won't work in South/Southeast Asia or deeper in Africa because of a lack of ancients and can't measure something like direct steppe ancestry because there's no high coverage sample.

But let's consider Karitiana. In Lazaridis merge it's 5.22% Ust-Ishim, 9.46% Han and 85.31% self copy. In the Busby merge it's 36.45% Ust-Ishim 60.81% Han and 2.74% self copy. In this case it's obvious that U-I is standing in for the bulk of its ANE, though Han can contain a bit, and the self-copy was just drift - same thing resulting in the Native American/Amazonian component in ADMIXTURE. The problem is, Ust-Ishim isn't always standing in for ANE - Papuans have 50% in both sets and it means something very different there.

Extra WHG wouldn't create the affinity to Iran, vs Natufians or Levantines. It is significantly closer to Levantines than Iran. Iran isn't even significantly closer to ANE than WHG. If there was no Iran or very minor Iranian in Anatolians we wouldn't see this. Even Levant EN is around Z>3 closer to Iran than Natufians. Then, Anatolia is closer yet, with substantially less BE. This can only be explained by more Iranian than Levantines have. Again, if it were only WHG, Anatolians would be significantly further from Iran than Levantines, and not significantly closer. It's got to be a good amount to completely offset the extra WHG. It may not be 40%, but it's definitely not just a little Iran/CHG in a few Anatolians.

David, I'm converting the files to 23andMe format, including the SNP that are the same as Build37. From there I can create plink . The problem is that the tools either create VCF 4.2 files that aren't supported by plink (even plink 1.90) or that VCF 4.1 files only show differences from Build37. The files I'll create will also list the alleles for each set (Human Origins Array and OmniExpress) that were read but agree with the Reference Sequence too.

@ Ryu, yes, while the Chromopainter analysis may *somehow* contain more information, it's to me at least quite obscure about what they actually mean, and how to trace back to the actual model.

I haven't read through it very much "A tutorial on how (not) to over-interpret STRUCTURE/ADMIXTURE bar plots", though I have the following comments:

Looking at their K=11 ADMIXTURE simulations, I'm a bit circumspect about their comment that

"Note that these simulations were performed with 12 populations but only results for the four most relevant populations are shown."

Presumably these are other simulated pure populations, but I do think it makes the figure a little more obscure to not include them.

The objections about indistinguishable ADMIXTURE plots are reasonable, but at that stage you would incorporate more information, e.g. in the B Ghost scenario P2 would be poorly fitted while not in A, and in the C recent bottleneck scenario, P1 and P2 would both form an exact clade to outgroups (via fst, D stats, etc.), which you'd not expect if P2 was truly admixed.

In fairness I do think they get into this (though not having read the full paper), at least for the goodness of fit (they could do with discussing cladistics as well). Hence "use [of algorithms like STRUCTURE] should represent the beginning of a detailed demographic analysis, not the end".

I'm also a bit critical about their comment that "This exercise is relevant in particular because human history is in fact full of episodes in which groups such as the Bantu and the Han have used technological, cultural or military advantage or virgin territory to multiply until they make up a substantial fraction of the world’s population. The history of the world told by STRUCTURE or ADMIXTURE is thus a tale that is skewed towards populations that have grown from small numbers of founders, with the bottlenecks that that implies."

While theoretically possible, I would qualify this that I don't think they actually *know* that groups like the Bantu / Han have gone through sharp bottlenecks / founder effects that have an effect in ADMIXTURE following the invention of new technologies.

IRC, the group we have an example of as early agriculturalists with a decent enough genome quality to recreate population size - the Anatolia Neolithic - specifically seemingly did *not* go through a bottleneck at that time and has a more relaxed population history than many others. That may be generally true of most of the populations who have been "winners" in our history.

So this to me feels very much an assumption and they could stand to be more explicit that it is. It's not enough to support this assumption with vague notions about injustice due to favouring the winners; they should work to explicitly demonstrate that it is plausible with the data we have.

"C recent bottleneck scenario, P1 and P2 would both form an exact clade to outgroups (via fst, D stats, etc.)"

Not necessarily via fst though? Case in point the drifted Italian_South in the clustering diagrams with Lazaridis 2016 ancients you did. Outgroup to everyone or very loose clustering with Sardinians as outgroup to everyone.

Re: other stuff, I think that they were getting to Han & co being oversized populations compared to their effective population size and that's what a bottlenecked population technically is. Problem here is that populations commonly understood as bottlenecked also have other characteristics relative to those that aren't, like more long RoH and so on. Lets consider Japan and Bangladesh. These have populations of considerable size, known to be admixed, but one shows more RoH and gets its own ADMIXTURE component easily (then appearing unadmixed) while the other doesn't...

In a discussion over at Jabal al-Lughat translated a passage from Arab geographer al-Idrisi saying that Sardinians were originally "barbarized Roman Africans". The article "Sardinia in Arabic sources" has some interesting stuff about North African connections back and forth with Sardinia. There's certainly evidence in the Y DNA.

@ Shaikorth, I think you make sense of the Karitiana result in that analysis, with the Ust Ishim fraction being concordant with the expected ANE fraction, though, even within that data, if you generalize to the other Amerinds, it looks like you have odd scenarios like (using their analysis of Lazaridis data):

No great pattern. IRC Bolivian_Pando, not admixed with Europeans. Self copy also seems to have a loose relationship with the degree of drift in a branch (Karitiana don't have notably that much more than various of the other groups, IRC?). Seems like Ust-Ishim can or can't proxy for non-ENA ancestry in Amerinds in this analysis, depending on the population.

Looking at the results of Indian and Native American populations, it's pretty clear that U-I can proxy for some kind of ENA too, and ANE haplotypes seem to prefer U-I over Loschbour. Lack of high coverage ANE sample hurts here. I made the Karitiana example because they're in both sets and the Busby merge result allowed inferring self-copy as drift for them. It's also true for Surui, they are clear U-I and Han split in the Busby merge and proportions approximate ANE/ENA based on formal testing. Bolivians unfortunately aren't in both sets, though they may have a bit of euro mixture (about the same as Maya).

Given the ancients available it should work best for those with no Ust-Ishim or high self-copy so we can avoid guessing. However in the case of something like Papuans I'm pretty sure the self-copy is more than drift, if the upcoming study about first OOA remnant in Sahul is to be trusted they should have a lot of pre-Ust Ishim ancestry when that's combined with Denisovan.

@Chad,"Even Levant EN is around Z>3 closer to Iran than Natufians. Then, Anatolia is closer yet, with substantially less BE. This can only be explained by more Iranian than Levantines have."

Natufian is less close to all Eurasians(inlu. Iran_N) than Levant_N is and Anatolia_N is closer to all Eurasians(inlu. Iran_N) than Levant_N is.

IMO, future ancient genomes will surprise you and show Anatolia_N has little if any Iran_N ancestry. Looking at D-stats from modern Middle Easterners I couldn't see how anyone could have significant CHG ancestry outside of the Caucasus. Then I saw D-stats from Iran_N and Natufians, which were out of this world crazy and were an answer on how some could have significant CHG-related ancestry. We need to think outside of the box of the current ancient genomes we have.

Again, regarding Bell Beakers and R1b, what strikes me is the rather negative association between west European R1b and excess West Asian admixture (excess relative to the CHG-related part in Yamnaya). In western Europe R1b peaks among the Basques and some extreme northwestern groups like the Irish. The Basques are known to be rather un-West Asian, compared to others. The modern Irish do have additional West Asian admixture, like all IEs, but R1b was already predominant in their early Bronze Age ancestors who didn't have it. Hence it's extremely unlikely that R1b-M269 reached Iberia from West Asia.

But one comment Alberto made also made sense to me: In Iberia there is no correlation between increased steppe admixture and formerly Celtic language. I would add the same holds true for R1b. To the contrary, in Iberia R1b seems to be most common where in pre-Roman times non-IE languages were spoken. In contrast, the IEs differ from the Basques in their additional West Asian admixture. It's only a minor difference, but it seems to be real. For instance, in the new PuntDNAL K12 calculator, some English people have more than 9% additonal Iranian Neolithic-like ancestry, in addition to the steppe stuff. That's probably just a random deviation, because the English average is lower, but the pattern is persistent, and for example my south German + 1/4 Swiss grandmother scored strong southeast European scores in several tests. I'm tempted to think that Celtic started to spread after R1b did, in the Bronze Age, from southeastern Europe, where steppe influence had mixed with Natufian- and Iran_N-related influence that had reached the area after 2600/2500 BC.

I am also running out of patience trying to form separate Iran N and CHG clusters. I have tried all kinds of K supervised and unsupervised, with various combinations of references.

This is a shortcoming of ADMIXTURE, which is very sensitive to recent drift. It works fine with moderns, assuming adequate sample sizes and a good genotype rate for the run.

With ancients thrown in, things change, and one finds oneself wrestling to have reasonable clusters form around ancients, and have reasonable mixture proportions for the test samples.

I think there is a 50% chance that with the release of additional ancient near eastern genomes from the recent papers, over the next couple of weeks,it will become easier to form those types of clusters. Then again ADMIXTURE does not perform well with ancient/modern mixes.

I don't think it'll be possible to get both Neolithic Iranian and CHG clusters in ADMIXTURE, unless we have at least several high quality samples from each grouping that share ethnic-specific drift.

But that wont help in properly capturing the relevant ancient components.

The early Neolithic Iranians and CHG look like populations on almost the same cline. The main differences are that CHG has less Basal and more Villabruna affinity/admixture. Also, considering the really high level of something that looks very close to AG3/MA1 in the Iranian farmers, I don't see the need for an Iranian Neolithic cluster at the same time as an AG3-MA1 cluster.

As I said above, Iran_Hotu does look distinct from the early Neolithic farmers, but like CHG, it's probably on basically the same cline as CHG and the farmers, apart probably from more Central/South Asian forager affinity/admixture.