If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Dear Guests! Registrations are temporarily disabled on Anthrogenica. Please either wait until they have been enabled or communicate with the administration via the "Contact Us" link at the footer of all pages. Thanks for your cooperation.

Siberian aDNA and Turkic, Iranic, and Uralic populations

Hi everyone,

As promised, I ran a set of nMontes for Northern Eurasia using Davidski's new Global25 datasheet, from the Scandinavian Peninsula to East Asia, focusing on the fits for Iranic, Turkic, and Uralic populations. I thought that the higher number of dimensions should make the PCA capture more recent drift between tracking gene-flows. The results are pretty good.

I limited all contributors to the late Bronze Age to Iron Age and later, or populations that could plausibly have had gene flow with the target pop at the Iron Age and later.

The reason this was done was to avoid long-distance overfitting, i.e., where the algorithm, instead of using proximal sources only, it uses a few proximal sources for the pop to account for recent drift plus a large number of small % contributions from very distant sources. E.g. for Greek it uses Mycenean, Slav_Medieval and Anatolia_BA (which makes sense), plus small % Natufian, Dai and Ju_Hoan (which does not). Or fitting spanish with Iberia_BA, England_IA, Levant_BA, (which makes sense) and 1% Mbuti instead of ~5-10% Mozabite (not). Or EHG and CHG instead of Yamnaya in Europeans (ditto). The reason why the algorithm may do this is because the ratios of ancestries in the proximal contributors may not match the actual contributors exactly, so by using small % old or distant ancestries separately, there can be "fine-tuning" so all the distances approach 0, which is overfitting in this case.

I kept trying using only proximal populations from the late BA to IA and later until the algorithm produced relatively good fits, which it was able to do in most cases. Many of the fits look very reasonable and make archaeological and historical sense, pretty impressed.

Interestingly, the closest populations list displayed patterns at least as interesting as the fits themselves.

I split the populations to the following sets:

SET 1: Altai-centered Turkics

SET 2: Central Asian Turkics

SET 3: Tajiks.

SET 4: Uralic-like Turkics, such as Chuvash and Tatars, + Lipka Tatars.

SET 5: Samoyed-like Uralics.

SET 6: Volga Uralics.

SET 7: Finnics and Saami

East set showed similar lists of closest populations and similar lists of contributors, while differing systematically between themselves.

Last edited by Ryukendo; 03-22-2018 at 12:19 AM.

Quoted from this Forum:

"Which superman haplogroup is the toughest - R1a or R1b? And which SNP mutation spoke Indo-European first? There's only one way for us to find out ... fight!"

These populations could not be fit without Ket and Nganassan. The nearest aDNA sample, Karasuk_Outlier, were at distance 0.06 from them, i.e. the nearest population was at 6% difference in distance. So Ket and Nganasan were added back to improve the fit, and later we break down Ket and Nganasan through fits at a second step.

For the next two sets of populations, I did not think that they received direct gene flow from populations like Dai in S China, Ulchi in Manchuria etc after the IA (i.e. it was mediated by some ENA-admixed population from the IA and later, instead of being airlifted across Siberia from far Easterm Asia) so all East Asian populations were purged:

The Following 5 Users Say Thank You to Ryukendo For This Useful Post:

SET 1: Altai Turkics. Note the close distances between aDNA samples and these populations. Note also the consistent appearance of Scythians (especially Scythian_Pazyryk), Altai_IA, Mongola, and Karasuk_Outlier among the closest populations.

BURYAT (they are mongolic, but autosomally virtually identical to Turkics. Historically, they are known to descend from Kurykans, who are Turkics. Their language may be due to linguistic shift after the 13th century, when they were conquered by Mongols.)

The patterns seem very consistent: Turkics around the Altai are Scythian_Pazyryk+Mongola (inner Mongolian Mongols, almost pure ENA with very little West Eurasian ancestry) at approximately 4:1 ratio. As we move into Siberia, Itelmen (beringian-like) and Karasuk_Outlier ancestry starts to appear.

Since all fits were satisfactory, I did not pursue any alternate models with inclusions or exclusions of other populations.

Quoted from this Forum:

"Which superman haplogroup is the toughest - R1a or R1b? And which SNP mutation spoke Indo-European first? There's only one way for us to find out ... fight!"

SET 2: Central Asian Turks. Scythian_AldyBel starts to appear, together with more West Eurasian ancestry from the Caucasus and West Asia, but the Pazyryk+Mongola pattern still dominates. Closest populations still tend to be Scythians, Mongola, Karasuk_outlier followed by other East Asians (at a much further distance away), resembling the pattern for SET 1.

There is one population that doesn't fit anywhere; these are the Yakuts. They are quite far from any other population (the closest I could find were Evenks, who were at 0.08 distance away, still very far) and I could not find good fits for them. The Siberian part is probably badly represented by the samples we have now.