This may have been pointed out in the paper, but what I find intriguing is that the Scythians from the Zevakino-Chilikta group look somewhat different from the rest, because instead of falling on the Europe-Siberia cline, they fall on the Europe-Central Asia cline. Not sure what that's about yet; might be worth investigating.
See also...
Global 10: A fresh look at global genetic diversity

It might mean that Z93 moved out into Asia before the ancestors of Andronovo, Potapovka, Sintashta and Srubnya acquired EEF.

But those eastern Scythians are Z93(Z2124+), same as Sintashta, which suggests that either there were Sintashta groups identical to Yamnaya, like the early Baltic Corded Ware, or Z2124 was acquired by the eastern Scythians, who may have been mostly of Afanasievo origin, via a founder effect and little autosomal admixture.

@ Nirjhar007"Will be fascinating to see how Indians do, among others of course. I sense Z2124 is the Northern Branch of Z-94 . Further aDNA should clear it up".

Read again how I explained as the IE word for *swesor entered the Proto-Uralic, and everything will be clear: IE from Samara went Eastward till Central Asia, mixed with Uralics and Altaians, after they came back Southward to India, and Scythians are later people of the Russian steppes.

For what it's worth, I've been looking into how Central and Southern Asians stack-up, and it seems that I've verified a notion I've held for quite some time; mainly, Indo-Aryans derive their steppe ancestry from Yamnaya-related populations, while South Central Asian peoples (who, with the exception of small Dardic and Nuristani populations, are all Iranian speakers) have an affinity towards Iranian steppe populations from the historical era (Scythians).

A few quick examples should suffice.

This is just my standard setup, but with all of these new samples thrown into the mix.

In terms of recent ancestry, Pashtuns like myself should be mixtures between people like the Waziristani Pashtuns/Afghan Pashtun highlanders and South Central Asian Dardic highlanders like the Kalasha. So, the greater importance of the older "Aryan" Yamnaya-related element (in my case) makes sense.

So, I think Indian L657 will be found in Yamnaya-like populations, while Pashtun/Tajik Z2124 (which also happens to be my subclade) simply descends from Iranic steppe peoples.

I'm a little unsure about using a wide set of recent Siberian references to fit these samples (Ket / Mongola), as they're probably a bit admixed.

I'd also say that these are extremely dispersed and I think difficult to treat as population averages rather than sample-by-sample. Except maybe the two Samartians.

Here's a couple of visualisations of the PC2 (main East-West Eurasian PC) vs other PCs for these samples plus averages of other groups:

http://i.imgur.com/q4EbZ5P.png and http://i.imgur.com/1AhPDiX.png

Re: Ze6b from ZevakinoChilikta, it looks kinda Central Asian (not Steppe-Siberian?) in PC2 vs PC1 and to an extent PC2 vs PC4, but then no more South Asian at all than the others in PC2 vs PC3 (Oceanian vs North Asian cline), PC6 (Iran_Neolithic vs EuroHG), PC7, PC9 and PC10 (South Asia specific against Kotias / EEF).

Tu, Daur and Mongola are probably the least steppe/EHG/Beringian admixed among the relevant groups in the region, as suggested by Lazaridis 2016 fits, this SpaceMix plot has similar implications: http://oi64.tinypic.com/2emdbfl.jpg

Oh, also, I'll have to echo Matt here; it seems that these people were extremely heterogeneous.

South Central Asians prefer the least genetically East Eurasian samples (with exception to myself, I always gravitate towards Siberian-rich samples), so just keep that in mind, when looking at the labels in my modelling.

Modern dna of Indo-Iranians will not tell us much about the exact genetic profile of early Indo-Aryans and Iranics. South Asians look closer to Yamnaya because they have much less EEF than Iranics and maybe they have some extra EHG from pre-Indo-Europeans. There is probably significant direct yamnaya/afanasievo ancestry among Central Asian Turkics and some Tajiks. Here Yamnaya/Afanasievo predated Indo-Iranics and R1b is still frequent. Pashtuns, Kalash and especially South Asians have probably not so much of it but we dont know with whom Indo-Aryans mixed before they arrived in South Asia. Indo-Aryans in Central Asia were likely steppe/BMAC/North Central Asian(EHG/Yamnaya/siberian) hybrids before they arrived in South Asia.

The earliest Indo-Iranian samples from Sintashta, Potapovka or Poltavka_outliner were the least yamnaya-like and L657/M780 expanded earlier than Z2124 so i actually think that if we find very early L657/M780 in the steppe it will be even less yamnaya-like than Sintashta because this archaic L657 had less time to mix with local Afanasievo/Yamnaya and EHG people.

That model is pretty cool, but since Afanasievo is basically the same as Yamnaya_Samara, might it not be wiser to try that same sort of modelling, but without Yamnaya_Samara in the right pops? Everything else unaltered, if you think its worthwhile.

Thanks in advance.

Coldmountains,

Even once you account for extra ANE (over what Iran_Neolithic had), or just use Iran_Hotu, Indo-Aryan speakers just can't be modeled with Sintashta/Andronovo/Srubnaya, they only take Yamnaya/Afanasievo.

This can actually be seen in my models above.

UP Brahmins are 11.05% ANE + Iran_Neolithic, while Kalash are 4.6% ANE + Iran_Neolithic, yet they still only receive substantial Yamnaya-related admixture, nothing from the more West Eurasian of the Scythians/Sarmatians.

By contrast, all Pashtuns show a mix of Yamnaya and Scythian/Sarmatian, as do all Pamiri people, not to mention western Iranians.

It's something to chew over, but we'll only know for sure, once we have "Aryan" aDNA samples, perhaps from those Swat valley skeletons.

I'll bet they'll be rather similar to the Kalash, but with even more Yamnaya-related admixture.

@DavidskiThe earliest Z93 found yet was from the Poltavka_outliner who already was EEF shifted and was around 66% Yamnaya-like and 33 EEF-like he lived probably not long after Z93 and Z94 was born so i expect it also among earliest L657/M780 but in the steppe they could quickly change their autosomal dna so yeah maybe they became at some point in history very yamnaya-like.

@NirjharMaybe L657 is too young to be found in the steppe but M780 which is ancestral to L657 will be found at the steppe and there is actually one Ukrainian cossack who belongs to it so it is still present in East Europe

Do you know how close is Pashtun z2124 to Scythian z2124 found yet? Most Pashtuns Z2124 is quite homogeneous and could be a founder effect from historical saka, hephthalites or kushan. Scythian or Scythian-like ancestry makes sense among Pashtuns and is expected. I guess the Saka groups of South Central Asia which were not tested yet had a significant genetic impact on South Central Asia among Tajiks and Pashtuns. I am more sceptical about Scythian/Saka ancestry among western Iranics like Kurds who are rather descendants of early West Iranics which resembled Scythians but arrived much earlier.

I didn't say that R1a-Z93 was born in Italy, but perhaps that he is old there and not come recently. I think that R1a-M420* migh be in the Italian Refugium, and we'll see if it will be found or not. Certainly R1a into India arrived recently from the Russian steppes, with IE languages.

I am saying that from the beginning of my analysis. I don't agree with Davidski that R1b came from Samara to Western Europe and also the centum languages, but that happened the other way around, being the satem languages more recent than the centum ones, a mutation that didn't affect the "aree laterali", older.

Just keeping it historically accurate. Yamnaya is needed in the outgroups to bring down the standard errors. I can make South Asians with Sintashta, but each group can be quite different and a pain in the ass. There's definitely extra ANE and ENA giving a Yamnaya impression.

L657 is extremely rare in Kurds. There's so far only 1 L657 case found in Kurds (IR5_20, M576 from Underhill et al.) who's from what's now Iran. It should be around a few percent in Iran I think. I'm not well versed about the clade.

In my bad English: I don't think that Villabruna is our ancestor, but very likely he belonged to a tribe of hunter-gatherers of the Italian Refugium where there were many linked R1b only one of them survived as R-P297* and is our ancestor. Only stupid people may think that they test only one Y aDNA in Italy and find the only one R1b1 living there. Ask why these PhD-s test hundreds of aDNA elsewhere and only 5 in Italy...

@ Nirjhar007 "I see . Okay I have saved your opinion :) ". ùI thank you, but add what I replied to Ted Kandell, who is a clever person, but he has some problem in understanding too:

TED KANDELL oder THE EVIDENCE Evidence? Everyone has understood that Samara was composed of hg. R1a and a little of R1b-L23. That they migrated to Baltic carrying the Balto-Slav languages (no R1b has been found amongst them) and migrated to Andronovo and Sintashta as Indo-Iranian and gave birth to Scythians of Iranian languages. The tiny R1b subclade, only belonging to the R-L23-Z2105 subclade with perhaps some extinct line, was in those migrations (above all carrying hg. R1a) till Mongol/Chinese/Turk people and after also to the Indian subcontinent where a few of those haplotypes may have survived. But from these samples survived in Eastern Europe, Caucasus, Middle East only a few subclades different from the Western European ones which anyway didn't derive from them. I have explained in my previous letters which subclades may have been derived from these haplotypes there and which not. Evidence? You lack: R-M335 R-V88 and all subclades (not older in Africa and Middle East than 5000 years) R-L389+ (except the haplotype with YCAII=23-23 found in Armenia, whereas Italy has all the 4 hts known so far) R-Z2109-Y4512 only in Western Europe R-Z2110 and subclades found in Western Europe and back migrated Eastward as CTS9219 R-M269 R-L51 R-L11 R-U106 R-P312 and all the Western European subclades... Evidence?

For example, I have the UP_Brahmins at 11% ANE, in addition to their Iranian_Neolithic-related percentages.

On top of that, I have the UP_Brahmins at almost 30% ASI, so obviously there is a lot of ENA in their case.

But despite the extra ANE + ENA combo, no South Asian population prefers Steppe_MLBA, they always choose Steppe_EMBA, despite accounting for extra ANE and ENA ancestry, and despite giving them a choice between the Steppe_MLBA and Steppe_EMBA samples.

The Indo-Aryans of South Central Asia (the Kalasha) have the same preference as Indians (despite having way, way less ASI compared to Indians, and despite having less of a preference for pure ANE), while Iranian South Central Asians prefer Scythians (the ancient Iranians of the steppe).

So, the differences show a very striking/clean correlation between Iranian vs Indo-Aryan language and Scythian vs Yamnaya affinity, and don't correlate with West Eurasian/ENA levels.

We'll figure it all out, once we see the requisite aDNA.

And, I actually like the model you've tried (it's very interesting), but my impression has been that having virtually identical populations in both your left and right pops is somewhat problematic?

I mean, Yamnaya_Samara and Afanasievo are basically identical when it comes to autosomal ancestry, and we now know that they even have the same Y-DNA.

“Excavations between the rivers Orel' and Samara have uncovered burials of a syncretic nature that attest contacts between the spheres of the Corded Ware and Yamna cultures. It is suggested that these may indicate early contacts between proto-Indo-Iranians and the prehistoric ancestors of the Balts and Slavs.”

So maybe Yamna culture was pre-Indo-Iranian and proto-Indo-Iranians originated there after mixing with Corded Ware Balto-Slavs who introduced some R1a-Z645 there.

There might well be something in this claim about the supposed Yamnaya-like effect in formal stats for the Scythians and also South Asians.

It's at least certain that the Scythians aren't a straight two-way mix between Yamnaya and East Asians, because as Samuel points out, they have West/Central Asian mtDNA hgs like U7, U1 and HV2 that Yamnaya and other European steppe groups lack.

But keep in mind that we can argue about this for months on end, and then a couple of key ancient genomes might totally contradict all our conclusions.

I've done some tests, and the results are very interesting, but I'm going to mail off the output to Iosif for now, instead of making any bold statements on the topic here.

"So maybe Yamna culture was pre-Indo-Iranian and proto-Indo-Iranians originated there after mixing with Corded Ware Balto-Slavs who introduced some R1a-Z645 there".

I thought that happened the other way around: that people from Yamanya who migrated to the Baltic Sea brought the satem Indo-European which became the Balto-Slav, instead those who migrated Eastward became the Indo-Iranians, but I should study genetics and languages together,

How trustworthy is the analysis which gave West Asians(Iran, Iraq, Caucasus) the most Western Scythian ancestry among modern people? If it is trustworthy couldn't proto-Iranian speakers be responsible of such ancestry not Scythians?

It does seem like if excess ANE is not accounted for in the eastern Scythians, then they prefer Yamnaya instead of Sintashta. But when it is, by adding AG3 to the models, then Sintashta does pretty well. In fact, minor Esperstedt_MN also works, in tandem with Yamnaya.

Rather, what interests me is the relationship these samples have to Central and South Asians.

What I've found is that Iranian South Central Asians prefer these Scythians, while Indo-Aryan South Central Asians and Indians prefer Steppe_EMBA. And, this is after extra ANE and ENA is accounted for.

Also, the same occurs when using Steppe_MLBA, instead of the Scythians.

I'm just struck by the linguistic correlation, although there could be confounding factors.

I'm seeing the same: a good amount of Andronovo, a smaller amount of Yamnaya and a smaller one of Iran_ChL. And then Siberian admixture (some 8-10% in western Scythians, closer to 50% in Eastern ones). Their rather diverse, so to make a short version I had to make choices, but the fits remained very close, so I think it makes sense like this:

@Alberto Any idea why do they prefer Iran ChL over Iran neolithic or CHG? IS2 is quite admixed with Iran ChL(20%). He/she also scores around 25% East Eurasian. I wonder if they had higher Iran ChL before mixing with east Eurasians.

The preference was not unanimous. They are diverse (not sure about their coverage/quality yet), so the preference was a bit divided between Iran_ChL, Armenia_EBA and Iran_Neolithic (hardly any Kotias). In the end I opted to just use Iran_ChL since the differences were quite small.

Initially they also didn't pick Andronovo. Instead it was more Karelia_HG + Anatolia_Neolithic. So these models are more "supervised" than "unsupervised", just based on what seemed to make enough sense for all the samples after trying many different combinations.

After doing some research i found out that most Pashtuns seem to be Z2124<Z2125<YP413<M12280. I guess you also belong to it. It is also found among some Indians, Arabs and Armenians. There is also one bulgarian belonging to it. It was not found yet in ancient dna but 2125 was found in Sintashta, Andronovo and Karasuk. My feeling is that it is related to Hepthalites in some way.

@ Shaikorth: Tu, Daur and Mongola are probably the least steppe/EHG/Beringian admixed among the relevant groups in the region,

Hmm, well, the HGDP group designated Mongola tended to get around 5% in West Eurasian components in old school calculators like Dienekes Globe13; I haven't looked to see what it gets in other newer stuff. Tu (Monguor) are about 10% West Eurasian in classic Eurogenes K13 and again around 5% in Dienekes Globe13.

If I PCA a limited subset of the samples in Globe13, like so - http://i.imgur.com/4uYuS1u.png.

The Tu and Mongola samples seem similar in position on the North vs South East Asian line to to other locals like Korean, Japanese, but offset slightly towards West Eurasia.

Not as to any huge degree like Ket, true (based on above PCA about similar to comparing Thai vs Dai, not Dai vs Cambodian or Burmese vs Han or Malay vs Atayal). so I may have misspoke by putting them in the same breath as Ket. But I don't see any superiority to using them compared to just using Oroqen, Ulchi, Nganasen, Han, which are less displaced towards West Eurasia. Daur seems OK and a bit more useful in this context.

(It's also true the samples designated Mongola are usually much less admixed than the ones designated Mongolian used in some analyses. But this does not mean Mongola are at 0.)

Btw, based on the above, couple of nMonte minimal sets I may test to see if they can fit the Scythian / Samartians: 1 - Andronovo, Daur, Okunevo, Ulchi or 2 - Afanasievo, Andronovo, Daur, Sakha. See - http://i.imgur.com/wLWAMST.png or http://i.imgur.com/QkXB1JT.png

Remember that PCA positions are affected by sample homogenousness (endogamy). Ulchi is useful as a source because it shares the most drift with neolithic Northeast Asians, but if we don't use Tu or Mongola we shouldn't use Nganasans which seem to be basically a more drifted mix of their neighbours and not something that could have been ancestral to the Scythians, which is why in haplotype-based models they do not contribute to any South Siberian populations.

Mongolic-speaking populations may all have minor western steppe ancestry but that isn't much more significant than the ANE-shift detected in Ulchi compared to Devil's Gate, and they could plausibly represent Iron Age eastern steppe. In Lazaridis 2016 Onge-ANE models their fits are closer to Ulchi than Nganasans, Yakuts, Dolgans and comparable populations.

As Ryan says Kets are an apparent ANE relic ( https://verenich.files.wordpress.com/2016/05/maltavsgoyetq116-1.png?w=1600 , the difference between Goyetq116 IBS and MA-1 IBS in Eurasia) but Okunevo is available to be used instead as an actual ANE-rich Bronze Age South Siberian population.

"The results I'm getting for the Sycthians are; Siberia+Andronovo/Sintashta+CHG/Iran Neo. Typical West and SC Asian mtDNAs; U7 and U1 and HV2 are documented in Sycthians."

It's not only West and Central Asian mtDNA there is also more typical "West Asian" yDNA among Steppe Iranics, such as J1, J2, G2a and if we coun't Huns to them (since Huns were basically Scythians) L too.

@Shaikorth, I don't totally follow regarding sample homogenousness argument, though, in any case for the Scythians and Samaritans, it looks to me like Tu and Mongola is too far "south" within NE Asia to be a good proxy, and we may as well use Daur who are fairly minimally admixed along east-west, are much more Siberian / northern than the Tu or Mongola and are Mongolic speakers to boot.

(Unless they're in there as a counterbalance the extreme "northness" of Nganasan or Ket and EHG, as it appears to me in some of Ryu's models).

@Ryan, hmm... Kets look a bit between Okunevo and Nganasan from what I can see. Looks like they may be probably a bit more ANE rich, than a simple mix of the two, but it seems like not enough to me to justify using a recent admixed population to model ancestry for an ancient one.

Anyway, so, fits with the groups I mentioned in my above post (and the normal Globe10 sheet with no weighting or alterations):

@Rob, it looks like Okunevo is displaced towards the "ANE" direction in PCA relative to what a combination of Ulchi+Andronovo or Ulchi+Afanasievo can produce. That would seem to make it useful for fitting the Aldybel+Pazyryk samples.

A combination of Yamnaya+Sakha could serve a similar purpose (as it does in Set2). A least this is how it looks to me from looking at the PCA dimensions - anyone can test can if using a set (Andronovo, Daur, Yamnaya, Ulchi) works better than Andronovo, Daur, Okunevo, Ulchi.

Giving up a bit on the nMonte modelling, to complement the post by Alberto, here is the Top 30 population euclidean distances for each sample, including each other and other ancients as well:

http://i.imgur.com/HjJkoen.png

Scythians are always closest to another Scythian, and Sarmartians are always closest to a Scythian.

Hungary Iron Age and Altai also tend to be high on the list in closeness to Scythians.

Karasuk, Karasuk outlier is the closest pre-IA population (or in one case Okunevo). Per wiki, Karasuk Culture were "a group of Bronze Age societies who ranged from the Aral Sea to the upper Yenisei in the east and south to the Altai Mountains and the Tian Shan in ca. 1500–800 BC (with a distribution which) covers the eastern parts of the Andronovo culture, which it appears to replace."

Daur should work, though they don't seem to be that different compared to Tu and Mongola.

The homogenousness argument basically is that Nganasans are a recently formed but bottlenecked population and not a plausible source for admixture in Scythians. They show all the patterns: elevated Ld compared to other northern Siberians (Pugach 2016), haplotype-based fits don't pick them as a donor but as a recipient (happens with one-way gene-flow, Hazara get Pathan donors but not vice versa)... Falush et al 2016 states population-specific drift can be incorporated into PCA dimensions like in ADMIXTURE components.

Ah, I see what you mean now in theory, though no idea to tell how much it actually affects Globe10.

Re: comparing Mongola and Tu vs Daur, and whether they don't seem that different, in these dimensions, just calculating the simple euclidean distances in these dimensions gives the following:

- overall euclidean distance between Daur and Mongola is about the same as between HanNChina and Miao, or Han from Southern China and Korean. - Daur and Tu is about the same distance as Han from Southern China vs South Vietnamese. - Mongola and Tu are somewhat more distant to Daur than they are HanNChina.- Distance of Han to Daur is about double in these dimensions the distance of Han from South China to Tu.

Not 100% sure that is accurate to the real genetic distances, but that's how it is in these. Judging by these, I guess you could say they don't seem that different if the other population pairs don't seem that different.

Of course, what makes them distant may not be as relevant in using them as a reference for Scythians.

I'm pretty sure that won't hold with haplotype analysis though. If you run them as the sole source in qpAdm which doesn't necessarily care as much about genealogical descent Nganasans might work well because they have the right ANE/ENA balance, but have you tried adding multiple sources?

@Matt

OTOH Daur and Mongola in Lazaridis 2016 fits can differ by as little as 0,1%. But I'm fine with using whichever works best in nMonte really.

Neolithic East Germany, Hungary, and Spain have been beaten to death with DNA testing. That's an amazing study but I wish they'd get Neolithic genomes from other locations like Ukraine and Romania and Serbia and Italy.

The Neolithic Hunter Gatherers from Bl¨atterh¨ohl have Y DNA R1b1. One has yhG R1b1 and mHG U5b2a2 like me. Coincidence.

- Looks like the Blatterhohle Cave groups who earlier were thought by researchers (Brandt?) to be an example of hunter-fisher HG living alongside farmers peacefully without sharing genes look genetically like a mixed Neolithic-WHG group, with 40-50% HG. (May have been sex biased if they were looking at mtdna?)

- They reproduce a HG structure of ElMiron+LaBrana one end and EHG at the other, which correlates expectedly with latitude (Bichon+Losch closer to LaBrana, KO1 closer to EHG and Villabruna intermediate). Comparing this to early Neolithic farmers they find a mild correlation with this and regional farmer ancestry... and a result:

"We find that almost all ancient groups from Hungary have ancestry significantly closest to one of the more eastern WHG individuals (either KO1 or Villabruna); the samples from present-day Germany have greatest affinity to Loschbour; and all three Iberian groups contain LB1-related ancestry (Figure 2C; Extended Data Table 2). This pattern implies that admixture into European farmers occurred multiple times from local hunter-gatherer populations. Moreover, combining the proportions and sources of hunter-gatherer ancestry, populations from the three regions are distinguishable at all stages of the Neolithic. Thus, any further migrations that may have occurred after the initial spread of farming were not substantial enough within the studied regions to disrupt the observed heterogeneity."

(How much does / doesn't this contribute to present day structure?).

Looks really worth reading if we're interested in trying to quantify how much survival of HG in Europe tended to be *really* local in the Neolithic and Chalcolithic, as opposed to being spread all around Europe by continuing diffusion of farmers.

Also from "Parallel ancient genomic transects reveal complex population history of early European farmers" We observed discrete signals of admixture in LB1 and KO1 via f3- and f4-statistics [29], and both fit best as admixed in the scaffold model, LB1 with ancestry from a deeper European hunter-gatherer lineage and KO1 with a small proportion of FEF admixture (Supplementary Information section 6).

Nice to see some corroboration in the literature of an effect which various of us (at least Chad I think?) have modeled from time to time.

Makes me wonder whether this is just KO1, or the Balkans HGs may have more..

From my testing, restricting to 4 pops it would probably be Andronovo, Yamnaya, Iran_ChL and Ulchi. Adding Itelmen as a 5th improves fits further, but it's not going to be much more informative.

BTW, for the nMonte-like testing you might find useful a script I use but never released publicly (because newer versions of 4mix were said to be coming -but didn't, so far- and because I don't know R scripting any good). But if you're interested, drop me a line: alberto6674 at gmail.com. (Ryu, idem. Or anyone else who does many runs).

Only one where fits were comparable with my original sets were I0576 (Set 1) and I0577 (Set 2). But for the majority of the samples, it looks like the distinction between Andronovo and more Iran_Neolithic / CHG related populations is more important for explaining them than the Daur, Sakha vs Ulchi distinction (at least when Okunevo is in there).

@ Alberto, that set sounds plausible given above, but I may test whether adding Karasuk / Okunevo as a sub for one of those steppe related (Andronovo, Yamnaya, Iran_Chl) works better. Thanks for the offer of the nMonte script also.

"the notion of Pashtun linkages with Scythians/Hepthalites/Kushans/etc is very old. In fact, it's been the standard opinion held by those handful of western scholars who've spilled ink on our historical roots."

I'm more familiar with papers on modern Afghan DNA investigating the common claims of Greek ancestry and Jewish ancestry in Pathans. The Lacau et al 2012 paper could not find support for Greek admixture, but found indications of Khazarian admixture in Pathans:

Furthermore, the high frequencies of R1a1a-M198 and the presence of G2c-M377 chromosomes in Pathans might represent phylogenetic signals from Khazars

Although Greeks and Jews have been proposed as ancestors to Pathans,3, 4 their genetic origin remains ambiguous. The Lasithi Plateau isolate, in the highlands of eastern Crete, partitions relatively close to the Afghanistan populations in the CA graph (Figure 3a), which could be attributed to the elevated proportion of R1a1a chromosomes20 shared among them. However, the absence of the predominantly Greek E1b1b1a2 -V13 lineage39 in Pathans does not argue for genetic contributions from Greece.

We envision a plausible scenario in which the converted Khazars could have been absorbed by the early Pathans and that R1a1a-M198 drifted to high frequency in Afghanistan

Underhill and his co-authors above don't clarify how much of the R1a1a in modern Pathans or Afghans they think is from Khazars, although they do imply that Khazarian R1a1a input into the "early Pathans" may be connected to R1a1a-M198 drifting "to high frequency in Afghanistan".

Not few Pashtuns show signals of siberian/east asian ancestry. Some of it is probably from recent intermixing with Hazara or Uzbeks but there is surely no connection with Khazars who never lived close to Afghanistan. Underhills conclusions were simply idiotic to be honest. Applying the same logic someone could say that Pashtuns got R1a from Slavs because both belong to R1a-M198. Neverthless there is a connection between Khalaj turks of South Central Asia and Ghilzai Pashtuns. So it would be interesting to see if Ghilzai Pashtuns show more signals of east skythian/early turkic ancestry. But as far as i know they not really differ from other Afghan Pashtuns and tend to be very similar to Durrani Pashtuns

The authors may have conflated the western Turkic Khazar Khaganate (Khazaria) with the eastern Turkic Khaganate. Or they might be suggesting backflow from west to east, to explain claims among some Pathans about having Jewish ancestry.

The genetic signatures of both the eastern and western Khagans may have started off similar. Wikipedia map of the Turkic Khaganate, which looks like it includes northern Afghanistan.

A page about Y Hgs in Turkic populations contains Another map demarcating the eastern Turkic Khaganate (which looks to overlap a part of Afghanistan) and the slightly younger and more western Khazar Khaganate.

So a study on how much of the (Eastern) Scythian signal in modern Pathans and other Afghans has been mediated via any Turkic Khagans would be useful to lay to rest or find support for the conclusions in the Lacau et al 2012 paper.

So, we have to rely on brief, and often very unsavory, references made by non-Pashtun writers.

And even those are pretty rare.

Which isn't surprising, as Pashtun tribes have traditionally inhabited isolated/inaccessible eyries and tracts, and have been adept at avoiding co-option into state systems based in both greater Iran and greater India.

Al-Biruni mentions Pashtuns very briefly, and just notes that they're a bunch of rebellious/violent tribes that inhabit mountain ranges, near the western border of India (Indus river).

That was the 11th century. And, as is typical for these early sources, he refers to Pashtuns as "Afghan".

Apparently, Ibn-Batutta wasn't a fan, and just says this:

"Their mountains are difficult of access, having narrow passes. These are a powerful and violent people; and the greater part of them are highway robbers..." — Ibn Battuta, 14th Century

That's almost everything we have on Pashtuns, before the first Pashto book, and before Babur's military campaigns.

In fact, Babur was the first person to describe Pashtun tribal geography in full detail, because he spent much of his time fighting tribesmen in what is now eastern Afghanistan/northwestern Pakistan.

Regardless, of those few western scholars who have made attempts to theorize on our "origins", most have tended towards the idea of Scythian, or Hepthalite, descent.

For some reason, Russian anthropologists have mainly pushed the Hepthalite connection.

And, when it comes to local Pashtun folklore, the story remains one of Isrealite origins, which obviously can't be true.

Coldmountains,

I have data for some Ghilzai Pashtuns from Afghanistan, and I've found that they are somewhat different from the Durrani.

The Durrani have more Iran_Chalcolithic, less steppe ancestry, and less ASI (the Ghilzai I have show the same amount of ASI as myself, and I'm a northeastern Pashtun with no Ghilzai ancestry), when compared to the Ghilzai.

And, the Ghilzai I have don't show any excess of Turkic affinity.

On top of that, the Ghilzai Y-DNA profile is identical to other Pashtun populations (mostly R1a, followed by G, Q, L, etc).

I still find this to be quite surprising, because I was basically sold on the Khalaj-Ghilzai connection.

Then again, it could still be real, but the Turkic ancestry is now diluted?

Interestingly though, on the topic of the Durrani, I've found that they almost look genetically intermediate between Pamiri Tajiks and Balochistanis.

You could draw a line from Tajik_Shughan and Balochistanis, and the Durrani Pashtun are almost perfectly on it.

I still believe that Ulchi-related type of ancestry in Zelvakino and Pazyrk samples is overestimated. If I were you I would try to add DevilsGate genomes to the list of outgroup (right) populations in qpAdm.

For anyone interested, one thing I tried recently since I discovered Principal Coordinates Analysis was running the component Fst from Basal K7 through PCoA, which produces a nice set of 7 dimensional distances between them - http://i.imgur.com/EvBqtUk.png.

Then you can project all the rows from the spreadsheet on to those dimensions, like so:

(Useful if you want to look at what the euclidean distances between populations should be based on the Basal K7).

Here's the output of the original PCoA on component Fsts, with the projection of the rows: http://pastebin.com/GWXiYfGx, in case anyone wants to try using it for nMonte, etc. (Including the Scythians).