search this blog

Thursday, November 26, 2015

The Khvalynsk men

This is where the three Samara Eneolithic or Khvalynsk samples from the recent Mathieson et al. paper plot on my Principal Component Analysis (PCA) of ancient West Eurasia. They're labeled as Steppe_CA (steppe Copper Age). I've also marked them with their Y-chromosome haplogroups.

Individual 10433, belonging to Y-chromosome haplogroup R1a, is almost a pure Eastern European Hunter-Gatherer, which is perhaps surprising, considering he was buried with copper artifacts. On the other hand, sample 10434, the one belonging to haplogroup Q1a, and positioned further east than the other two, appears to have been whacked over the head a few times and simply thrown into a ditch.

The PCA also has most of the other samples featured in Mathieson et al., including Neolithic Anatolians (labeled Anatolia_N), as well as extra samples from Allentoft et al. and Jones et al.

I am willing to bet that when their yDNA is in a majority of Dnipro-Donetsk men will be R1a, and so will much if not most of Serednj Stih (Sredny Stog), as well as much of the Dnipro-Donetsk admixed North Trypilian Chapajevka people (early inhumation phase). Interesting times ahead...

@Davidsky"Individual 10433, belonging to Y-chromosome haplogroup R1a, is almost a pure Eastern hunter-gatherer, which is somewhat surprising, considering he was buried with copper artifacts."The individual buried with copper artifacts is 10122 of R1b Y haplogroup.See pages 9 and 10 of Supplementary Information:"10122 / SVP35 (grave 12)Male (confirmed genetically), age 20-30, positioned on his back with raised knees, with 293 copper artifacts, mostly beads, amounting to 80% of the copper objects in the combined cemeteries of Khvalynsk I and II. Probably a high-status individual, his Y-chromosomehaplotype, R1b1, also characterized the high-status individuals buried under kurgans in later Yamnaya graves in this region, so he could be regarded as a founder of an elite group of patrilineally related families. His MtDNA haplotype H2a1 is unique in the Samara series."

10433 / SVP46 (grave 1)Male (confirmed genetically), age 30-35, positioned on his back with raised knees, with a copper ring and a copper bead. His R1a1 haplotype shows that this haplotype was present in the region, although it is not represented later in high-status Yamnaya graves. His U5a1i MtDNA haplotype is part of a U5a1 group well documented in the Samara series.

David, you've done Human Origins World 1&2 PCAs for various ancient samples. Can you add NE1 and KO2 (as a proxy for the Neolithic Anatolians) in the same plot as Mota and Kotias? If you can accommodate K14 and MA-1, and the WHGs, SHGs, ane EHGs, add those too.

As you can see from the World 1&2 PCA, Stuttgart is *not* where the Bedouin B are at all. Stuttgart is "above the peak of the apex of the triangle", higher up than the Sardinians. La Brana-1 is not far from Kotias, near the North_Ossetians and the other WHGs are in the same vicinity. This is the true picture of Eurasian variation. We have a continuum that goes from the "EF" (or rather, Kebaran Hunter-Gatherers at the end of the LGM) to the Ulchi at the other end. Kotias is already "on the way" into Eurasia, which is why the CHGs cluster with Kostenki and MA-1 on the TreeMix graphs apparently even from 41,500 years onward, when the climate turned colder and drier, the populations became isolated, and the drift began.

The basic point here is that you cannot show the real picture of Eurasian variation without including Africans, particularly Mota and the Hadza. Are PC 1&2 83% of the variation? Other dimensions, PC 1&3, 2&3, etc. would be valuable too.

@Open Genomes, the "triangle"-shaped PCA with Africans actually doesn't show much about the real variation of Eurasians as Africans will squeeze some of the Eurasian variation out and the Sardinia vs. East Asia polarity is largely due to drift or sample sizes. This obscures certain things easily verifiable with formal testing, like Papuans actually being the most divergent Eurasians.

If you want Africans and Eurasians on the same plot, I think SpaceMix from Coop et al. provides a decent one, like:http://oi64.tinypic.com/jt13k4.jpg

I can do a few runs again, with some South Asians. I haven't converted my new set with the Paniyas yet, but I can do several groups. It would be preferable to use CHG, but I'm not set up for that yet. The thing that I'm looking at is the potential for ENA and CHG in EHG. I've got several intriguing Dstats which I will post here in a couple minutes. I have to move to my laptop with Plink and Admixture.

Davidski, given that Yamnaya = EHG + CHG and given the archaeological context of this mix, would it be possible to date this admixture event using ROLLOFF, which would also tests this model's accuracy.

We can see that it's really critical to show *all* samples, including Africans. The presence of Africans, Oceanians, East Asians, and Native Americans changes the picture completely for Eurasia.

Rather than "compressing" Eurasians, the presence of Africans shows us some fascinating things: The WHG-SHG-EHG group trends downward toward MA-1, who in turn leads to the Inuit and Na-Dene, and distantly to the Native Americans. However, Ust'-Ishim leads off on a separate upper South-Asian / Austroasiatic edge towards a vertex consisting of Japanese and Taiwanese Aboriginals. These in turn on another edge leads downward through Paleo-Siberians (Chukchi, etc.) to Native Americans.

A closer examination of the upper left reveals that the Early Farmers (EF) are nowhere near the Bedouin B, who appear to be admixed with Sub-Saharans, but rather, represent their own separate Eurasian vertex, today only populated by WHG-admixed Sardinians. The Kebaran Levantine Hunter-Gatherers would be even more isolated, beyond KO2 (Starcevo_EN) and the Anatolian Neolithic.

The lesson here is that all ancient and modern genomes need to be plotted together, and only then can we "zoom in" on a particular region of interest, knowing which way the drift is headed. The real "projection bias" (or rather "biased projection" ;) is when certain regions are left out, and and arbitrary 2-dimensional projection leaves out key variation that makes samples appear to be "related" when in fact they are not.

South Asia is still difficult to crack, because of all the layers of admixture, and the geographic and social clines in these layers that exist there.

But Yamnaya always shows a clear preference for Kotias. This can be seen in ADMIXTURE especially, and I might post some results later today or during the week.

The interesting thing is that Yamnaya Samara often shows inflated affinity to WHG, and in ADMIXTURE also some admixture from WHG. Whatever this is, it might be pulling Yamnaya Samara closer to LBK_EN too.

Btw, I don't think it's possible to run any X chromosome tests with the steppe samples. They usually have much less than 4000 SNPs on the X, which isn't enough.

Vox,

Kotias-related admixture entered the steppe during the Khvalynsk period, at the latest. This can be seen on the PCA above, with the Khvalynsk samples forming a cline from EHG to the Bronze Age steppe.

I don't think Roloff can provide more accurate dates than ancient DNA, especially in this case, because most of the admixture appears to have happened gradually over a long period of time.

So the interesting question is why the admixture happened during the Khvalynsk period. As per the sad tale above of the Q1a man ending up dead in a ditch, it might have happened amidst both hostile and friendly relations with people coming from the south and east.

In other words, I suspect that in some cases men were killed and their women taken, and in others women were married off and moved hundreds of kilometers from their homes to be with their husbands.

Looking at some various stats with Ust_Ishim and the hunters, I'm curious about something. I think it may be possible that Motala is closer to CHG, but not because Motala is closer to Crown West Eurasian, but that they have some CHG, maybe about the same as Karelia. I've seen that despite being closer to ENA, compared to other hunters, they're also further from Ust_Ishim more significantly than WHG. I'm wondering if this means that they have a few percent of actual ENA and CHG. Maybe, someone else can come up with some more stats to test.

Your PCA is much better scaled than all what I see in this recent studies. Just few questions to figure out who is who there.Who are the most 'Southern' Near Easterns (Crosses) ? Bedouin_Bs?And what sign are the modern Armenians? Caucasian Circles or Near Eastern Crosses?Thanks in advance.

@ Chad “I'm wondering if this means that [Motala] have a few percent of actual ENA and CHG”

Also haplogroups support that view. On the one hand we see C1f in Mesolithic Karelia (a sister clade of C1c found in Apache and Arsario) and C1e is found in modern Icelanders (sister clade of C1b is found in Apache and Cayapa). People (yDNA Q1a?) who brought these haplorgoups to Scandinavia probably carried Northeast Asian ENA.

On the other hand, we see H which is probably H2a2b (which I previously erroneously named H2b) and yDNA J in Mesolithic Karelia. CHG was probably carried along to Fennoscandia with these haplogroups. H2a2b is still sporadically found in all Fennoscandia. All mtDNA H2a (http://so-many-ancestors.blogspot.fi/2015/03/matrilineal-monday-haplogroup-h2a.html) looks like having spread from north of Caucasus to Fennoscandia.

As for yDNA J, this FamilyTree map is interesting https://www.familytreedna.com/public/Y-DNA_J/default.aspx?section=ymap . Distribution of “J2b M12 confirmed near predicted and suspected, subclade un recognizable” in Scandinavia could be the result of a Caucasian wave. J2b M12 is still today frequent in Vologda and Rybinsk area and Volga Ural.

Any idea when we will see the Mathieson et al 2015, and Jones et al 2015 data converted/incorporated into Eurogenes K6-K10 & K15, for public viewing ? For example the above R1a-R1b-Q Kvhalynsk samples?

I doubt it, but if it's West Eurasian admixture in Mbuti these ones should tell better and identify the possible offender(s) and the degree of it. Yoruba has at least double the West Eurasian admixture than Mbuti, and Mota possibly none. So the 3 should be significantly different. Chimp and Gorilla should be equal, and closer to Mota:

Actually one of the strangest things about that Mota study is it estimated a purported ~7 layer of Western Eurasian "admixture" across the board with the Mbuti at 6% and Yoruba at 7%; the Dinka, Ju'hoansi, and Bantu speakers had identical results.

Maybe the first one was using only transversion sites? Anyway I think all the others make more sense that that first one.

Re: Indian Mesolithic sample, I also remember not long ago something about it. It was some HG from the Gangetic plain that showed some degree of affinity with modern inhabitants of the region, but it was not DNA, only craniometric data, I seem to remember. Not sure, though.

A 3-D PCA plot is much more informative than a 2-D projection, because any projection can appear to falsely superimpose and samples and foreshorten distances. With this interactive 3-D plot it's easy to see the true relationships between populations and ancient samples, and even the directions of admixture.

For example, it's possible to see that Mota clusters with the Hadza and Sandawe rather than with the Aari Cultivators of Ethiopia.

There does seem to be a correlation between Y haplogroups and the plot. Notice that the Early Farmer (EF) (Y-DNA G, T, and H2) is completely basal branch of Eurasians right at "Out of Africa", and that there is a progression of Y-DNA J => H1/H3 => NO => O toward the Austronesians, while another migration edge is roughly I2 => R1a/R1b => C2 => Q1a toward the Americas.

I am confused is the ANE a West or East Eurasian component, mix of both or a unique component?

I ask this because I notice that for example, Ulchi sample score approximately 13% ANE according to estimations by Lazaridis this thread: http://www.anthrogenica.com/showthread.php?4990-Studies-Find-Mysterious-Link-Between-Native-Americans-And-Indigenous-Australasians/page2

but at the same time score approximately 100% East Eurasian in this admixture result of a run at K3: http://www.anthrogenica.com/showthread.php?5711-E-Eurasian-vs-W-Eurasian

Does this mean that Ulchi samples actually have some West Eurasian and how much is it?

Thanks for clarifying that from the D statistics, the Yamnaya (and Afanasieve) are the only populations that we know to favor Kotias over LBK_EN. Even Corded Ware which is supposed to be 80% Yamnaya favors LBK_EN over Kotias.

Mbuti Corded_Ware_LN Kotias LBK_EN 0.0178 4.702 302875

All modern European populations must strongly favor LBK_EN. South Asian and East Asian populations do not choose between LBK_EN and Kotias. For the Near East, I found the following statistics that Davidski calculated.

It will be good to find out how more Near Eastern populations choose between the two. I suspect that they will favor LBK_EN, even people of the Caucasus. Davidski could you calculate the following D statistics when you get the time?

I don't understand why the people are always talking about Kotias while asking for includment of CHG in a new calculator. Isn't it clear from the paper that Kotias is A. the younger B. the EF(25%) admixed and therefore less pure sample of the both CHG samples.

Satsurbila is the one which shows no signs of outside admixture whatsoever, so if any than Satsurbila should be used for future calculators not Kotias.

Satsurblia looks less mixed because it's a low coverage haploid genome. Kotias looks more mixed because it's a high coverage diploid individual representing a whole population against various heavily drifted modern populations.

By the way, the Scythian from Mathieson shares highest drift with Latvians and Lithuanians. :p

Thanks. So that's interesting for the methodology of using qpAdm. Probably as RK said a while back, better to add one by one to test each combination separately and see if they improve the model or not.

Also knowing that Paniya is ASI+CHG and takes no European LNBA could serve as a more realistic base than using Dai to get accurate results about Andronovo/Sintashta admixture. For example to model an Indo-Aryan population as Paniya + Kotias + X, where X can be Sintashta/Androvo, EHG, MA1,...

"By the way, the Scythian from Mathieson shares highest drift with Latvians and Lithuanians. :p"

According to which study lol. He fits perfectly as a "mixed" individual based on his admixture results, similar to Yamna and Andronovo (if not even slightly more South and eastern shifted) on PCA plots.

Maybe he shares "highest drifts" with them but that doesn't mean he is automatically very close to them either. As we know things such as "highest" are relative ;)

And about the Kotias and Satsurbila issue, well the former might be high coverage, but the study itself states there is EEF like mixture in Kotias probably slowy reaching Anatolian farmers in the Caucasus. Satsurbila on the other hand is low coverage yes but he doesn't seem to show signs of EF admixture in combination with his age this is a strong indication that he is obviously less mixed.

Thats what I tried to say. No suprise with them sharing "highest" compared to other populations. But highest is relative. Turkic groups in Iran have "highest" East Asian admixture but that doesn't make them remotely similar to East Asians.Just to give a more extreme and drastic example to make my point clear.

Obviously any ancient Satem Indo European and Uralic group from the Steppe region will share significant drifts with Lithuanians/Latvians. But from what I have seen the Iron Age Scythian sample looks like belonging to a population which can be modeled inbetween Lithuanians and a different West and South_Central Asian group. This leads to my statement years ago that the North and East Iranic groups are the once who are the missing gap between North Caucasus, South_Central Asians and East Europans.

The use of Dai is a worse fit in qpAdm. In fact, Dai are closer to West Eurasian Caucasus pops than the Onge. I've said before, and I still believe that the Dai do have West Eurasian admixture, and cause more problems. Here are the Kharia, with the Dai

Onge 46.0%Dai 27.6%Hadza 1.0%Georgian 25.4%

chisq 2.330 tail prob .675224

The Dai clearly have West Eurasian ancestry, and show clear affinity to EHG, MA1, and Nganasan. This is why they are closer to South Asians, it is their West Eurasian ancestry and ENA. The Onge, clearly model with better fits for ASI, which is probably a mix of Onge and Atayal-like stuff.

Obviously, we are dealing with radically different methods, and radically different kinds of output. Any direct comparison is somewhat problematic. And anything based on formal stats takes precedence over ADMIXTURE output. The qpAdm output is determinative. But I'm simply struck by the similarity. For example, "South Asian" is a West Eurasian component that peaks in South Asia, West Asia, the Caucasus, and has a bias towards appearing more strongly in Northern Europe rather than Southern Europe. Basically, it acts like CHG. "SW Asian" is a composite of the EEF-like, Bedouin-like, and CHG-like components that often appear in ADMIXTURE. Here, it takes the place of AEN/EEF for Pashtuns, but takes some CHG with it. "South Asian" + some of "SW Asian" is identical to the amount of CHG shown by qpAdm. The "European" score is almost identical to the EHG percentage. And the percentage of "Onge" + "East Asian" is identical to the ENA score in qpAdm. I just find it interesting that the two sets of results are so similar. Probably a good indication that this model closely approximates reality.

A weird detail that I just noticed, Pashtuns have about the same amount of EHG as populations from the British Isles, while the Kalash have about the same amount as Scandinavian and Eastern European populations, looking at the Haak et al. supplements.

David,

If possible, could you try to model Pashtuns as Andronovo + Paniya + Armenian, and Kalash as Andronovo + Paniya + Georgian? In the absence of South Asian aDNA, the Paniya are great for this sort of thing. Thanks in advance.

@Chad: "The Dai clearly have West Eurasian ancestry, and show clear affinity to EHG, MA1, and Nganasan. This is why they are closer to South Asians."There may be another, much more simple explanation for Dai being close to South Asians: IVC is known to have grown rice, at least during its late stages. While the "homeland" of rice domestication hasn't yet been unambiguously determined, Yunnan ranks at the top of the candidate regions. Since migration of crops tends to be associated with migration of people, I deem a migration of Dai-like people into the Indus Valley by around, say, the first half of the 3rd mill. BC, anything but unlikely.

Conversely, Yunnan is the world's single largest tin producer today - a commodity that is indispensable for bronze production, but only found in mineable concentrations in a few places around the globe. Bronze appears rather early in Yunnan, in high technical and artistic sophistication, geographically disconnected from the main entrance route of bronzeworking into East Asia along the northern branch of the silk road.

https://en.wikipedia.org/wiki/Yunnan#History:"By this time (2nd ct. BC), agricultural technology in Yunnan had improved markedly. The local people used bronze tools, plows and kept a variety of livestock, including cattle, horses, sheep, goats, pigs and dogs. Anthropologists have determined that these people were related to the people now known as the Tai."

https://en.wikipedia.org/wiki/Dian_Kingdom

Moreover, the standard theory of bronzemaking being disseminated southward from Northern China into SEA is more and more getting into conflict with C14 dating of SEA sites. Discussion is still on-going.

Hi, Tobus, thanks for the stats. It looks like Dai might be slightly closer to Onge, although not at significant levels. According to the analysis of Khrunin et al, the Han, Sherpa, Dai and Malaysians harbour about 19% Australian-like admixture. Anybody else has any ideas?

Whereas Armenian has much more LBK_EN than CHG related ancestry, Georgian shows no preference for either LBK_EN or CHG. Clearly Georgian has much more CHG ancestry than Armenian. The Caucasus mountains have been quite effective in impeding gene flow. I had meant to request the statistics for Iranian but had mistyped.

Chimp Iranian LBK_EN Kotias

I expect Iranian to favor LBK_EN over Kotias but I may be wrong and perhaps Iranian too will have no preference.

Alberto, I think the "secret" of this 3-D PCA Plot is that in fact it does include Africans. The way to examine West Eurasians closely is just to rotate the plot appropriately, and then zoom in real close on that smaller section of Eurasia, and examine the 3-D relationships. There's no way we could have seen the "pull" toward Sub-Saharan Africans in the Palestinians, Bedouin and North Africans which is *not* from the EF Early Farmers unless we have Mota and the other Africans on the plot. The most striking finding is that the EF (Early Farmers) are indeed *the* "Basal Eurasian" branch, and the CHGs (Caucasus Hunter-Gatherers) are in fact *not* any sort of "Basal Eurasian" but something headed out to Ust'-Ishim, the Austroasiatics, and the Austronesians of Taiwan. Call it "ASI/ANI" if you will. ("ASI" includes the Andamanese.) Likewise, the WHG/SHG/EHG group is related to Mal'ta boy and on to the Native Americans, it's a kind of "WHG-ANE" continuum. Of course, this is precisely what we've seen in TreeMix, except there's a bit of confusion between the Early *European* Farmers and the EF "Basal Eurasians".

Given that there are "poles of drift" - or rather, the "points, tail and string of the kite" ;) then perhaps TreeMix would work best with Aytal, Karitiana, Ust'-Ishim, MA-1, Kostenki K14, and the Starcevo Early Farmer KO2 (who seems to be "ultra-Anatolian Farmer"), with Mota and the Ju-Hoan as outgroups. The Papuans / Australians may prove useful too. That way, adding the CHGs (preferably Satsurblia over Kotias) and the various "European" Hunter-Gatherers, will reveal the real combination of admixture found in any "test" individual or population.

Also, is Saqqaq Man in the data? What about Clovis Anzick-1 and Kennewick? Given that Native Americans are one "pole" of admixture, these ancient genomes are going to be very important, particularly to fill in the "ANE" migration path between Mal'ta boy and the Native Americans. It seems that since the R1a1* Karelian EHG comes out at "16% Native American", it's very important to have these ancient Native Americans to distinguish any so-called "ANE" from the Caucasus and the area of Tajikistan from something related to Scandinavian Q-L805 which is in a "Native American" Y clade, Q-M930 that also includes Q-M3 (Kennewick) and below Q-M1107 which includes the Q-Z780 sister clade of Q-M930 to which Clovis Anzick-1 belongs. It would seem that this one single Q-L805 represents the unique instance of actual Beringian admixture in north Eurasia. BTW, the sample I0434 from Khvalynsk is Q-L474 xL56, in the same clade as Saqqaq Man.

I think with these additional ancient American samples the North Eurasian drift towards the Native Americans will become clearer.

Really, all Eurasians are some combination of drift toward or away from KO2, the Ami/Aytal, and the Karitiana, and back towards Mota, except for the Oceanians whose Denisovan admixture pulls them in another direction.

Let's see what these other ancient American genomes do to the PCA and TreeMix.

Q-L805 is not the only Beringian suspect, there are also the Eurasian mitochondrial C1 clades (and it is possible that more upstream clades like L330 are Beringian too). We don't know if I0434 is more related to Saqqaq than any given Q1a; Saqqaq is in Q1a1a-NWT01(xM120) specifically, the Khvalynsk man is just Q1a(xQ1a2).

Amerindians are a "pole" of admixture because they have a lot of specific drift, not necessarily because they have any importance as an ancestral population outside the New World - though I do think there is likely to be significant Beringian ancestry in Eurasia, I strongly doubt it is the main source of ANE.

I'm still working on the Admixure stuff, but I can tell you that this Scythian has around 10% of Siberian ancestry, and I'm not talking about ANE here. Much more than Finns, but not as much as Chuvashs.

Thank you very much. The statistics that you have provided show that CHG could not have moved from the Caucasus to the Indian Subcontinent. CHG had a hard enough time going from Georgia to Armenia. From the Caucasus to Iran to the Subcontinent there are more formidable geographical barriers.

Iranian has more CHG-related ancestry than Armenian. I think this is because Iranian received CHG-related ancestry both from the Caucasus and from India. Still Iranian has more LBK_EN related ancestry than CHG-related ancestry.

Chimp Iranian LBK_EN Kotias -0.0099 -2.857 507266

All this goes to show that agriculture in South Asia which is at least 10,000 years old did not come with migrants from the Near East. They would have had more of LBK_EN ancestry. It was an indigenous development. This also means that ANI has been in the Subcontinent since the Late Pleistocene. ADMIXTURE analysis further suggests that CHG-related ancestry in South Asia is of the Gedrosia kind different from the Caucasus kind.

"agriculture in South Asia which is at least 10,000 years old did not come with migrants from the Near East. They would have had more of LBK_EN ancestry."

It is an interesting situation for sure. The archaeology shows multiple centers of early farming with local plants. Yet clearly, agriculturalists across the entire fertile crescent very early on started using the exact same domesticated crops. Any useful traits were bred into their own local landraces.

So it seems that in that region 10,000 years ago, at least some seeds and animals were traded and passed around much faster than people were admixing.

And some farming knowledge must have also been passed around. How else could such diverse people all coincidentally domesticate the exact same eight plant species at exactly the same time?

I think Dai having West Eurasian ancestry is causing them to look closer than the Onge. It's a weak fit to make the Dai without Siberian and Onge Admixture. Using Onge and Atayal makes a much better fit for ASI.

@capra internetensis, the idea that Native Americans are a "pole of admixture" does not mean that in fact Eurasians have (any substantial) "Beringian" ancestry, aside from the somewhat small clades you mentioned. Rather, the correct term should be a "pole of drift", where Central Siberian migration to the Americas was the end result of a process of isolation and drift we already see in Eurasia, i.e. with Mal'ta boy. Regardless, this "pole of drift" is in fact important for our understanding of Eurasian drift and admixture. As we know, "ANE" was modeled on the derived alleles shared between the Karitiana and Mal'ta boy, so this represents what has been called "ANE", even if the concept may not be entirely accurate regarding more southerly Eurasians such as in the Northeast Caucasus and the other "ANE hotspot" around Tajikistan.

The value in using the ancient Native American genomes is of course that they are much closer in time (and space) to the Eurasian source of the drift. It may be that Saqqaq man is purely Paleo-Siberian (probably a Koryak) and therefore a completely different source population and migration than Clovis and Kennewick. Between this apparent "Dorset" population in Eastern Canada and Greenland, the Na-Dene related to the Kets and other Yeniseians, the Amerinds / "First Americans" (and an "East Asian" as well as "ANE" component to their ancestry", and the apparent minor "Papuan" element among a few South American tribes like the Karitiana, we can see there were quite a few populations that contributed to this "pole of drift". This is really why Native Americans are at the extreme of a "triangle" rather than a "line". (Notice too that South American tribes "make a turn" in the general drift in the Americas, due to some additional ancestral element, perhaps this "Papuan" ancestry.)

Regardless, since the Native Americans are at several extremes of drift that was already taking place before the settlement of the Americas, all of these ancestral components accentuate and emphasize this Eurasian drift in a way that would not be possible if they were not on the 3-D PCA.

I can think of other apparent population isolates that are not in this Human Origins Array dataset, namely the Onge and the Tibetans, the Semang of Malaysia and the Aeta of the Philippines. I suspect that the Tibetans in particular will show up at some unusual place within the triangle because of their long isolation due to their physical adaptations to the extreme altitude of the Tibetan Plateau. We can see this in their unusually high percentage of Y haplogroup D, just like the Andamanese and Japanese, other physically isolated East Asian populations.

Perhaps something can be done to "round out" the dataset by including these other isolates along with the ancient Americans?

I suspect that this may create some "pull outward" even for such Siberian-admixed populations like the Karelian Hunter-Gatherers and further clarify the PCA plot.

The main point here is that even a close examination of the PCA of a small region on the plot such as Europe cannot be done properly without including *all* extremes of drift on the same analysis. We would never have seen that the CHGs were very different from the Early Farmers ("Basal Eurasians"), headed in the direction of the Austronesians, or that in fact the European Hunter-Gatherers (all three groups) were headed in the direction of the Native Americans, and the true nature of "LBK" (in fact, EF) admixture in Africa, and the fact that the EFs are the only true "Basal Eurasians" and not at all "Bedouin_B-like", without the Aytal, Karitiana, the Mbuti and the San. on the very same plot as the LBK, Corded Ware and Bell Beaker samples.

@David, can we have a Global9 with Clovis, Kennewick, Saqqaq, Tibetans, Onge, Semang, Aeta (or related people), to "round out" the PCA? It's seems reasonable the Europeans and the CHGs will be "less compressed" if these were on the plot, because they should accentuate the sources of drift in Eurasia. Thanks.

However Atayal gives similar West Eurasian shifts compared to Onge as Dai do, so if there is that kind of ancestry in Dai it should be in Atayal too. This would leave Onge as the one pure ENA reference since Papuans etc. are complicated by archaic admixture.

After reading FrankN's interesting comment and looking at the stats, it's looking more like most of what we call ASI could be a late migration from SE Asia to India. This migration is also supported by a recent study from National Geographic regarding Y haplogroup O-M95.

This would also be a more parsimonious explanation for the late estimates of admixture between ANI and ASI. It looks quite clear that ANI was in the Indus Valley long before 2200 BC (oldest estimate date of admixture), so it could have been ASI which arrived during the late Harappan period there.

I don't think that all the ANI-ASI will be a single event/migration. It's probably going to be quite more complicated, with different waves at different times, both ways. But it's looking like the biggest event might have been this hypothetical late Bronze Age migration from SE Asia to India.

This would increase the chances of the Harappan DNA (if/when it comes) being pure ANI, which would be quite interesting too.

"it's looking more like most of what we call ASI could be a late migration from SE Asia to India"

Then who was in India before that? ANE like people, CHG like people? This should have been a territory with a large sustained population for a very long time. How could they have disappeared with so little a trace in such recent history?

The Kharia have more admixture from a Dai/Atayal group. Paniyas should be more Onge like. Onge like people are probably native to South Asia, and I would be surprised if something ANI like dates to the Paleolithic/Mesolithic. That may be why the South Asian cluster is a pain in the ass to break down. It's a mixed and heavily drifted group. I think the Austronesian came later on, more like the Mesolithic to Neolithic timeframe.

Yes, before ASI arrived to North India, the people would be something like CHG + ANE. That's still the base of the populations of North India and Pakistan, so they didn't disappear, they just got influx from ASI populations.

Chad is probably right. There was probably an Onge-like component in South India earlier, and during the Bronze Age a SE Asian migration might have taken place, bringing Austro-Asiatic and expanding southern populations to the north.

Probably a complicated history, but in any case the point is mostly about the study about ANI-ASI admixture from a few years back with age estimates between 2200 BC and 900 BC (?). This was taken by many as a proof of Aryan invasions, but it's looking more that what it was detecting was a Dai-like migration to India and the subsequent ASI (a mix of Dai and Onge) expansion to the north.

Hypothetical, of course. But now more parsimonious than the old theory of Aryan Ivasion, I think.

Zero result, no sign of any age effect (at least using genomes with decent coverage).

@Alberto

There was certainly a late Neolithic migration (or multiple waves of migration) from Southern China/Southeast Asia into India (c. 2000 BC?), bringing Austroasiatic languages, polished shouldered axes, and corded ware, as well as Y haplogroup O2a1-M95. Some of the Neolithic Gangetic sites have very early dates, before 6000 BC, but I'm not sure whether these are securely associated with Southeast Asian elements.

But this wave is associated with Austroasiatic tribals, and to a lesser degree with East Indians generally, O2a1 is Holocene age (there are not enough samples but I suspect in East India it is largely or almost entirely the young O2a1a2-F789 clade). South Indians with high ASI have negligible Y DNA O and do not show the Southeast Asian component in HarappaWorld admixture that Munda do (nor any significant East Eurasian outside of what is contained in the South Indian component).

There are earlier connections with Southeast Asia, e.g. Hoabinhian-type lithics, but this all poorly dated. During the LGM there was mostly horrible desert lying to the northwest of India (though along the edge of the Himalayas and Pamirs was probably OK) while India was covered mostly with savanna grassland and open forest, separated from not too different habitats in Southeast Asia by the Naga Hills.

Altogether I see no reason to think ASI is predominantly due to late gene flow from the East. I also think the Harappans were mainly ANI, but the earliest admixture dates come from Dravidian speakers of South India, and may represent the arrival of Neolithic/Chalcolithic farmers/pastoralists from the north. I guess the situation in the subcontinent was quite complicated, with plenty of gene flow in and out, and with major autochthonous components. It will be very hard to disentangle without aDNA.

The North Indian admixture dates from Moorjani et al are very late indeed, Iron Age and historical era. Considerably later than the appearance of Southeast Asian Neolithic elements in the Ganges valley and the eastward migration of the Harappans. There must have been early admixture events but they are being obscured by late ones.

Alberto Hi. interesting hypothesis. Like Capra, however, I was going to point out that Y hg O, and austro-asiatic are not common enough in India to account for ASI ? But I'm sure you've thought about this :)

Is the Scythian a ghost population that contributed to both Lithuanians and the Kalash?

No, the early Indo-Europeans from the Bronze Age steppe carrying R1a-M417/Z645 is the ghost population that contributed to Scythians, Lithuanians and South Asians.

There might be some minor Scythian ancestry in Lithuanians. But it can't be much considering the very low level of R1a-Z93 in the East Baltic and Siberian admixture at only a couple per cent, if even that. That Scythian is R1a-Z93 and has around 10% of Siberian admixture.

Thanks, I don't know much about the details of Indian prehistory so it's good to hear a good summary and that it's not in disagreement with what I'm more or less seeing.

Indeed, the Y hg O is restricted to Austro-asiatic and not too relevant in itself, but the tests so far don't seem to show 2 clearly different types of ASI. The ENA in Paniya and in Kharia don't look too different, and both look quite Dai-like, and Dai itself being a mix of Atayal-like and Onge or Papuan-like. So yes, probably a complicated history, but with a result that these components are mixed more or less equally in South Indian and in SE Asian populations.

Maybe further test will be able to find the difference (like Admixture does, though admixture could be doing so for other reasons), but for now it looks to me that whichever migrations from SE Asia to India seem to have homogenized the ENA component, regardless of hg O or AA language. Or maybe we just don't have any good proxy for the "real" ASI so it shows up as Atayal+Onge/Papuan because Onge is just too drifted and not too related to continental ASI.

@ryukendo: is it possible for you to pass the data for CHG to Chad or Tobus in any form?

David has said the CHG data is freely available from the author, so I should be able to get it easily enough. The issue is that it can take a while to process and merge with my existing data set, depending on the format etc., and I don't have the time to dedicate to that at present... maybe this weekend I'll give it a go.

Karelian gives only 0,5% into Atayal, that's definitely too little to explain the significant shift of Karelia HG towards Atayal over Onge. Maybe the formula of Flegontov paper, though it worked for Siberians and Native Americans, just isn't good enough here.

Kostenki was 1%, but that isn't a very good reference since Onge shares additional drift over Loschbour over it.

That should bring EHG closer to Onge too, assuming it is in the same clade as Dai and Atayal. North Siberians and Native Americans also prefer Atayal over Onge, which should not happen if both Atayal and Onge are fully ENA.

I'm not sure where you guys are getting this 1% and 6% numbers at. That is not what qpAdm shows. Best fit for Dai is a mix of Atayal, Nganasan, and Onge. If we have more Karelia into Dai and more Onge into Dai, then both can appear more closely related than they are. Onge are further from West Eurasians than the Dai and Atayal. Just Onge into Dai would make the Dai a bit further from West Eurasians, but the additional West Eurasian into Dai, almost evens it out.

You are probably already well aware of this, but it hasn't been mentioned here. The early Mehrgarh folk were largely Sundadonts, which pretty much necessitates ancestry from SE Asia (or Dai-like)

One thing to keep in mind is that Sundaland has been very heavily Sinicized, so a good proxy might be something more akin to Ainu at its northernmost edge (assuming they and Okinawans are slightly more Jomonese, less Yahyoized in their make-up)]In that case, this Dai-Mehrgarh component of ASI or whatever, might have had paternal haplogroups more something like C & D..?

The Austroasiatic (Munda) is doubtless, but should AFAIK have come from somewhere more south than Yunnan. The issue of the Tai-Kadai homeland appears to be nearly as intensively debated as the IE homeland - as such I refrain from any opinion whether during the Late Neolithic the Dai already lived where there are recorded today, or much further to the South Chinese coast. In any case, they don't appear to be a particular good proxy for a "pure" population.

"Recent archaeological discoveries in Harappa and Chanhu-daro suggest that sericulture, employing wild silk threads from native silkworm species, existed in South Asia during the time of the Indus Valley Civilization dating between 2450 BC and 2000 BC, while evidence for silk production in China back to around 2570 BC and earlier.[4][5] The Indus silks were obtained from more than one species Antheraea and Philosamia (Eri silk). Antheraea assamensis and A. mylitta were widely used. It is widely believed that silk process techniques of degumming and reeling were purely Chinese technology."

Hence, I tend to stick to my "first half of the 3rd millenium" dating. 2000 BC seems to be slightly too young for the move into India, though possibly correct for a "tin explorer and bronze producer" India-to-SW China/ SEA scenario.

Interesting, and confirming something I have already been supposing for some time. The possible links may be the "Sea Nomads", today scattered in three groups (Andaman Sea, Southern Sumatra, Borneo/ Sulawesi/ Southern Phillipines), but possibly a far more widespread phenomenon in ancient times (I wonder if there ever has been done DNA analysis on them..)

https://en.wikipedia.org/wiki/Sama-Bajau_peoples

Why an "anciently more widespread phenomenon"? For a start, Sulawesi/E. Borneo, i.e. today's epicentre of the Sama Bajau, corresponds to the genetic and linguistic "homeland" of the Malagassy people on Madagascar. Nearby Helmahera, the largest island of the Moluccas, has been demonstrated as genetic origin of the Polynesian rat, and is as as such believed to be the origin of the Lapita expansion (from 3.000 BC) into Melanesia, Samoa and Tonga. There is evidence for Obsidian trade from New Britain to NE Borneo at the end of the 4th mill. BC, a distance of 3,500 km!

In addition, domesticated coconut from the South Phillipines was around 300 BC shipped to Southern Ecuador:http://link.springer.com/article/10.1007%2Fs10722-008-9362-6

Last but not least, there is the story of the banana: Originally domesticated on New Guinea, with additional hybridisation in the Southern Phillipines and a second one somewhere around the South Chinese Sea, all before 3.000 BC. From 2.000 BC, there is archeological evidence of bananas in Pakistan. Most banana terms on the Indian subcontinent can be traced back to *qaRutay, a root that developed in the Northern Phillipines. Reflexes of this root are both present on Northern Sumatra and the Nicobares, and along a 'land route' through North Vietnam, Yunnan, Burma and Northern Bangladesh, which makes it difficult to define the migration path. By about the same time at latest, Papuan/East Indonesian domesticates reached East Africa (East African 'banana'-terms are pre-Bantu substrate, which provides a terminus ante quem). A separate transfer brought bananas from around the Celebes Sea directly, i.e. without the genotypes in question being found in India, Arabia or East Africa, to West Africa, with the first archeological evidence (Cameroon) dating to around 500 BC.http://www.pnas.org/content/108/28/11311.full

English 'banana' is believed to have been borrowed from Wolof 'banaana', which may be a reflex of the root 'punti that is widespread around Eastern Indonesia, and also spread eastward into Melanesia. The origin of span. 'platano' is somewhat obscure: It is assumed to have been borrowed from a Carib language, which, however, would imply pre-columbian presence of bananas in the Caribbean.

In short: A maritime network centered around the Celebes Sea, today's home of the Sama Bajau "sea-nomads", appears to have existed at least from 3.000 BC onwards. Around 300 BC, this network stretched from Cameroon to Ecuador - probably sporadically, but intensive enough to transplant bananas and coconuts, and allow for colonisation of Madagascar and Polynesia. The Andamans (plus Sri Lanka, Maledives etc.) would have been apt stopovers.If anybody gets bored over long winter nights and feels like running some admix statistics along the a/m routes - say Buginese (Sulawesi) vs. Onge, Mbum (Cameroon), Amerindians from Ecuador - I'd be curious about the results.

We are doing f4 ratio estimates for the connection between Dai and EHG, so far with no success.

@bellbeakerblogger

I am somewhat skeptical of the value of dental morphology in tracing long-term genetic relationships (as opposed to the appearance of a novel populations, etc). Sundadontry in particular seems to be a relatively generic pattern, possibly close to the ancestral form; e.g. Africans and some mixed populations (like South Siberians) cluster near to Sundadonts.

@FrankN

I expect it was complicated, as usual. The questions of Daic, Tibeto-Burman, and Austroasiatic homelands have already seen some genetic study at relatively low resolution, but the proliferation of full sequences and the likelihood of more ancient DNA from China are very promising.

As i Wait for the Indian DNA to washout some of the bullshit here, here something related to the anthropology,There appears to be two types of the hunters in the Holocene. The first type clustered strongly with upper paleolithic Europeans and was concentrated in the Ganges plane/further west, some even find that these north Indians were taller than other Mesolithic populations of Eastern and Western Europe!. The second type of hunter contrasted with the Ganges type and was concentrated in the South. A good hypothesis is that the Ganges type was perhaps related to ANE and and the southern type was related to South Eurasian (ASI).The Harappans were largely Caucasoid same to the Modern North Indian Populations around Hariyana etc. I think we just have to wait for the DNA to solve the riddle.http://scholarspace.manoa.hawaii.edu/bitstream/handle/10125/29101/AP_V49No1_singh.pdf?sequence=1

"...all populations carrying East Asian post-neolithic ancestry, incl Southeast Asians, Polynesians and Indians such as the Austroasiatics, have high levels of EDAR, while all ENA populations without East Asian post-Neolithic ancestry, such as Papuans and Onge, do not."

Chaubey et al. (2011) have published the following figures for the frequency of the 1540C allele of the EDAR gene in their Indian samples grouped by language family:

61% in Tibeto-Burmans (but with only 57 samples), 40% in Khasis (but with only 20 samples), 5% in Mundas, 1% in Aryans, 0% in Dravidians.

The frequency of EDAR 1540C does appear to be moderately high in the Khasis (though not nearly fixed as it is in e.g. Native Americans, northern Han Chinese, or Koreans), but it is actually quite low in Kolarian populations of India. Rather than saying that "Indians such as the Austroasiatics...have high levels of EDAR," I think it would be prudent to say that Munda-speaking populations exhibit non-zero frequencies of the EDAR 1540C allele.

I would think the -ve sign is irrelevant there, no? I mean, it's just because Chad changed the order of the Onge and Gorilla, so both results are negative. And -/- = + (the results would be -ve only if one was -ve and the other +ve).

@Shaikorth

But it doesn't seem very clear that Dai and Onge form any kind of tight clade. The 6% Dai in Karelia_HG is probably more from a Han-like source from Siberia, so not related to Onge.

@RK

Yes, I'm not saying that ASI actually didn't exist and South India long before as a specific component. But with the samples we have now and the test run so far, it doesn't seem to show specifically different pattern/signs from SE Asian. Maybe it's just that we don't ave the right samples to see it, but it looks strange that Paniya is no more Onge-like vs. Atayal-like than Dai is. Some kind of homogenization seems to have taken place, even if it didn't bring AA language, Y hg O, EDAR or straight hair to Paniya (or the opposite to Dai). But let's see further tests if they can actually find differences in the components or not.

Razib shared data which included Paniya samples a few years ago, but I recall that some of the samples looked like they were mislabeled. Or there are two unrelated populations of Paniya? Anyway, when you get the Paniya samples, please check that you have a single population before analyzing them as a meaningless mixture.

In the samples I have, GSM536916 is not the same population as GSM536806, GSM536807 and GSM536808 but they are all labeled Paniya.

And it is possible that I made some copy and past mistake when I was first learning how to work with the data following Razib's tutorials. I am not an expert in this.

Interesting work your 3D PCA, thanks for sharing. But I've made some observations that seem kind of odd: - The CHG are in no way outliers but cluster closely with some modern people from SE Europe, West Asian and the Caucasus. - The BA Armenians plot very far from each other. One is like a true outlying pole of genetic variation, much more than the CHG, while another one plots close to central European Bell Beakers, Sintashta, Swedish Battle Axe and modern Latvians! I didn't see anything that extremely northern or divergent among the BA Armenians in previous analyses. So in this PCA the IA Armenian can be modeled as a mix of different BA Armenians.- A modern Makrani plots close to Estonians, Loschbour and Bichon. That's too odd to be true.- The Andronovo people are extremely diverse. Some close to Corded Ware, another one far off in the Aleut area. I think that may be the admixed one, so this observation is less odd than the others.