October 18, 2012

Relatives/duplicates in ADMIXTURE

The presence of relatives in a dataset tends to throw ADMIXTURE out, but this does not always happen. In particular, I've noticed that at low K, relatives do not appear to form their own hyper-specific clusters. A good example of this is the Yunusbayev et al. Armenians_Y sample (N=16) that happens to include what appears to be a common individual (or a twin?) with my own own Armenian_D sample from the Dodecad Project. This was discovered the last time I ran ADMIXTURE, so I henceforth began using a subset of 15 Armenians (Armenians_15_Y) from that dataset whenever I also included my Dodecad sample.

In my current ongoing analysis of the world dataset, I included two versions of the Sakilli, Paniya, and Malayan samples, from Behar et al. and Chaubey et al. I believe that HarrappaDNA Project has previously identified that some of these are not exactly the same individuals, so I wanted to see what the ancestry of all these individuals was, to help me decide which ones to keep.

As I move forward in my "world" analysis, I've decided to drop GSM536916 and the Chaubey et al. versions of Sakilli and Malayan. Thus, PANIYA will refer to the Southeast Asian-like individuals of the Behar et al. set, and Paniya_Ch to the South Asian-like individuals of the Chaubey et al. set, with one copy of the duplicated individual removed.

5 comments:

Have you managed to track down any Australian Aboriginal or Papuan populations? They would make any 'world' analysis much more meaningful than any analysis without them. I would presume that the people who managed to cross Wallace's Line had advanced enough boating technology to expand back west and north as well.

Unfortunately, I can not perform my clinality test on this particular data-set, since you have opted not to report your K2 results, I have found that the results of ADMIXTURE generated components on a global level, even at higher K values, vary significantly as a function of the 'clinality' (or lack thereof ) of the particular dataset....

In general, your idea of determining the clinality of a dataset by taking pairwise differences across the sorted order is interesting, and could be useful for real clines that occur within the human species.

However, at K=2 there is no such cline in the human species; in particular Caucasoids (who appear intermediate in a K=2 analysis) are not really the product of admixture between Africans and East Eurasians (who appear terminal). Populations of Caucasoid+African or Caucasoid+Asian ancestry (or even African+Asian, although there are not many of those) may come to occupy neighboring positions in terms of their K=2 proportions, despite having very different origins.

But there indeed is a Cline that is pegged by Africans on one side and East Asians / Amerindians on the other side, furthermore, the component that further appears at K3, peaking in Basques and Sardinians, certainly emerges from the synthesis of those polarized components @ K2, as evidenced by the fact that the West Asian component that emerges @ K3 has an intermediate Fst distance with respect to the East Asian and African components that were already present @ K2 , whether this 'intermediateness' is a result of a signal of common ancestry with Africans and East Asians that was later preserved in West Asians, or whether it is a signal of later Admixture events is off-course a different story altogether, but the facts do stipulate that it is an intermediate cluster that is composed of ~1/3 African and ~2/3 East Asian in-terms of K2 ADMIXTURE proportions.

Old Blog Archive

Dienekes' Anthropology blog is dedicated to human population genetics, physical anthropology, archaeology, and history.

You are free to reuse any of the materials of this blog for non-commercial purposes, as long as you attribute them to Dienekes Pontikos and provide a link to either the individual blog entry or to Dienekes Anthropology Blog.

Feel free to send e-mail to Dienekes Pontikos, or follow @dienekesp on Twitter.