September 28, 2010

Some ADMIXTURE estimates in Eurasia

(Last Update: Sep 29)

Continuing my exploration of ADMIXTURE, I turned to the HGDP data, which has 660,918 SNPs for a wide assortment of worldwide populations. After pruning 12,086 SNPs with more than 1% missing genotypes, I was still left with ~650k SNPs.

Here are some experiments on this dataset. First, a clustering with K=2 of Han Chinese, Russians, and Orcadians (left to right)

The emergence of 2 clusters (red=Mongoloid, blue=Caucasoid) is as expected, with Russians showing a small participation in the red cluster (7.2%). These northern Russians are believed to have a substantial Finno-Ugric genetic origin, so this is inline with a recent estimate for the eastern component in the westernmost Finno-Ugric speakers being less than 10% (but see below).

Notice a couple of Chinese individuals with a small Caucasoid component: as I've mentioned before Mongolians, and presumably northern Han have a small Caucasoid component from early movements of Iranian speakers from the west. That's an advantage of doing your own admixture analysis, that you can look at the data at a fine detail, and not rely on the published figures.

Druze appear complete Caucasoid (red), Bantu completely Negroid (save for a couple of individuals), while Bedouins show a quite variable minor Negroid component. This variable African contribution (0-17.6%) makes an elongated cluster out of Bedouins in a recent analysis, pulling them away from other Middle Eastern populations in a Sub-Saharan direction.

Finally, I clustered European populations together with Mandenka and Han Chinese:

The populations are in the following order: Han, Mandenka, Orcadian, French Basque, French, North Italian, Tuscan, Sardinian, Russian.

Here are the admixture proportions:

Notice how the eastern component in Russians is now estimated as 10.9%. This probably reflects the inclusion of French Basque and Sardinians, i.e., populations which have historically no opportunity for eastern Eurasian admixture, rather than only Orcadians. This underscores the importance of having appropriate poles in inter-continental admixture estimates (see Appendix I).

Note also that the 100% value for the Han Chinese is not incompatible with the presence of the two aforementioned Caucasoid-admixed individuals, who are present here with an estimated 1.9% and 0.5% such admixture. However, this contributes little to the sample average of 40+ individuals.

The minor (0.1%) Sub-Saharan admixture in Tuscans and Sardinians is also interesting. As you can guess from the figure, this stems from a handful of individuals (green specks) with less than 1% admixture, which is, however more than the numerical low of 0.001% inferred for most Europeans by the software.

UPDATE I: Eurasian Cline

Below is a run for the following populations (left-to-right: French Basque, Russians, Uygur, Mongolians, Daur, Han Chinese). Notice that the Mongolic-speakers (Mongolian and Daur from HGDP have a small Caucasoid admixture, as I have mentioned before.

APPENDIX I: The importance of choosing poles

The choice of appropriate poles in the estimation of inter-continental admixture is extremely important.

If there is a racial admixture continuum between two major races, such as we observe in Eurasia, then we can express each intermediate population as a weighted sum of populations that live to the east and west of it.

For example, I will use a variable in interval [0, 1] to represent the position in the continuum, with 0: pure western, and 1: pure eastern.

A population at 0.4 can be expressed as the following weighted sum:

0.4 = 0.6*0 + 0.4*1

i.e., as an admixture of 60% western, and 40% eastern.

But, it can also be expressed as e.g.,

0.4 = 0.612*0.02 + 0.388*1

Notice that the choice of a slightly eastward-tilted "western pole" (at position 0.02 in the continuum) has resulted in a reduction of the inferred eastern component (from 40% to 38.8%).

This is exactly what happened in our example: Russian eastern admixture reduced when we used Orcadians, rather than French Basque as the western pole.

Note also, that this is all done automatically: no one told ADMIXTURE to identify these two poles: it was the presence of unlabeled individuals from different ends of the spectrum that influenced the admixture estimates for the rest.

APPENDIX II: Latent populations

Another important point that needs to be remembered has to do with the possible existence of latent ancestral populations.

For example, it is true that Eurasia (minus South Asia) is economically described as a continuum from the Caucasoids of the Atlantic coast to the Mongoloids of the Pacific, with a transition zone in Central Asia and Siberia, and spillovers on either side. But, we cannot exclude the prehistoric existence of other races in the Eurasian landmass that do not exist today in a relatively unadmixed form.

In Eurasia, the Proto-Uralic race was postulated as such a "third race" with features of its own and not reducible to simple Caucasoid-Mongoloid admixture. It is difficult to see whether these features are ancestral peculiarites (prior to admixture with Caucasoids and Mongoloids), or if they have arisen in a mixed Caucasoid-Mongoloid population.

It is also important to understand how such latent populations affect genetic continua:

First, if the latent population is equidistant from the two major races, then its admixture has no effect on an individual's position in the continuum between the two races. However, it is possible that the latent population was more related to one of the two major races. In that case, admixture with it will move a population towards that race.

So while the jury is still out about the existence of a Proto-Uralic race in Eurasia, its effects on admixed populations indicates that if it had existed it was genetically closer to Mongoloids than to Caucasoids.

24 comments:

I wouldn't use Orcadians for any population admixture analysis, they are NOT a good sample!

On 23AndMe they are skewed far away from the other European clusters, just like the Basque and Sardinians. Americans who are of British, Scandinavian and Native American heritage, frequently cluster with the Orcadians - providing evidence that there is some Central Asian/Native American heritage in the Orkenyar - or else that there is some very ancient North Eurasian heritage there. Either way, they don't represent Northern or Western Europeans too well.

On DeCodeMe, although I am Irish, I am closer to the French and Icelandic samples than the Orkney one - so a poor proxy for British/Irish too.

I wouldn't use Orcadians for any population admixture analysis, they are NOT a good sample!

I wouldn't use not just Orcadians, but also any other isolated population like Sardinians and Basques. Caucasoidness should be measured using typical West European populations like the English, French, etc. as poles, just as Mongoloidness is measured using typical central East Asian populations like the Han Chinese, Japanese, etc. as poles.

I don't know much about Orcadian history. But if they are genetically non-isolated (unlike Basques and Sardinians) and are genetically typical West Europeans, they can be used as a pole for Caucasoidness.

I should put it this way: Basques are from a genetically relatively inbred/isolated segment of SW Europeans, so they shouldn't be used as a pole for Caucasoidness. Spaniards as a whole or the Portuguese represent the genetics of SW Europe much better than Basques.

I should put it this way: Basques are from a genetically relatively inbred/isolated segment of SW Europeans, so they shouldn't be used as a pole for Caucasoidness. Spaniards as a whole or the Portuguese represent the genetics of SW Europe much better than Basques.

Basques deviate from other Iberians to some extent (not necessarily very much) in all the freely accessible studies (the study you mention isn't freely accessible) I've seen that included them both. But other Iberians too are distinct from other Europeans to some extent.

My proposal is this: If we are to calculate the Caucasoidness and Mongoloidness of a population, we'd better compare it to a bundle of West European and a bundle of East Asian populations. This will prevent the effects of population-specific deviations.

If we were to use Iberians and general, we would be using a population with North African (and in some cases) Negroid admixture.

My proposal is this: If we are to calculate the Caucasoidness and Mongoloidness of a population, we'd better compare it to a bundle of West European and a bundle of East Asian populations. This will prevent the effects of population-specific deviations.

Your proposal is misguided as there are Caucasoid/East Asian populations with East Asian/Caucasoid admixture. Hence, your study dedign would not produce an accurate estimate.

If one is interested in determining how much gold was added to an alloy, they do it by measuring against pure (or as unadmixed as possible) gold, they do not measure against what most "gold" artifacts are.

You guys are way over me but isn't race genetics about structure and not viewing populations as being elemental (irreducible).

Supposedly the French Basques are the most "European" because they've had less admixture. But that implies their is a parent population shared by Europeans being considered pure. Haven't the Basques been drifting from this parent population due to evolution? Also that parent population was never perfectly homogeneous. I just don't understand how you decide on a elemental population. The above combined with the fact that the Basques are more isolated from other europeans than say the English possibly makes them less European. Wouldn't the English better capture recent European specific mutations?

What is the negroid admixture in Spaniards. I think its being exaggerated too. I thinking the admixture in all of Spain would be compatible to most Europeans. The southern Portuguese would be the only exceptional Iberian population in this regards but only due to recent migration.

there are Caucasoid/East Asian populations with East Asian/Caucasoid admixture

If ingredients of the West European and East Asian averages are chosen from the racially purest West European and East Asian populations, then their averages can be used as gold standards for Caucasoidness and Mongoloidness respectively. A similar gold standard for Dravidoidness (South Asian race) can be established from an average of the racially most Dravidoid South Asian populations. In fact, similar gold standards can be established for virtually all races.

Fanty, your third group of maps are implausible as they show the "southern Mongoloid" component in Asia Minor in equal shade with Kazakhstan and show no visible "southern Mongoloid" component in southern Central Asia, and also as they show the "northern Mongoloid" component more in East Europe and Asia Minor than in most of Central Asia. These are all in direct contradiction with the results of Behar et al. and all other genetic studies of these regions so far and also with anthropology and craniometry in addition to genetics.

The dog bites itself into the tail.

"Racial purity" could be only measured by autosomal genetic testing.

But to interpret the results you need to know wich are the "racial purest" populations.

So this is doomed to fail.

My formula above is plausible. All I need is experiments to show that it works.

In Russia the largest East Asian component probably is in the regions affected by the Turco-Mongol invasion, like those close to Kazakhstan, Southern Volga and Tatarstan.Turkicworls.org/genetics describes both mtDNA and Y in various regions and the share of eastern mtDNA is high in these areas but absent in old Finno-Ugric populatiosn like Mari and Komi-Zyryans.Inn the Far North there might be another source of Eastern mtDNA tha originally came from the Paleosiberian Q populations to the Khanty over the Ob-Yenisey river systems. Some of it was transmitted with fur trade westwards to the Komi and Vepsians and from them to Novgorod.The Saami are very heterogenous and might have an old Iberian component by th atlantic caost but also a Q component by the Atlantic coast from the East.A trace of Q can still be found in Trondheim region in Norway, and might have reached the Orkney islands also.DNA Tribes seems to find some " Amerind" influence over Siberia. Turkicworld.org/genetics now has a rather new article by Anatoli Klyosov that might deserve its own discussion.

The North han Chinese with 0.3% Caucasian admixture are most likely from Hui Chinese. Since millions of them have assimilated during the Mao Zedong era and even abandoned their Muslim faith even today there are millions of Hui Chinese living on Northern Han provinces.

Old Blog Archive

Dienekes' Anthropology blog is dedicated to human population genetics, physical anthropology, archaeology, and history.

You are free to reuse any of the materials of this blog for non-commercial purposes, as long as you attribute them to Dienekes Pontikos and provide a link to either the individual blog entry or to Dienekes Anthropology Blog.

Feel free to send e-mail to Dienekes Pontikos, or follow @dienekesp on Twitter.