March 19, 2012

The effects of ascertainment on admixture estimates

In a previous experiment, I discovered a clear signal of West Eurasian-like admixture in Sub-Saharan African populations, using a set of markers ascertained in a San individual. The marker panels of the Harvard HGDP dataset have been ascertained on different individuals from around the world, and so they are very useful in showing the effects of ascertainment on admixture estimates.

I have repeated the same K=5 experiment, using the ascertainment panels on a French, Han, Papuan1, San, and Yoruba individuals. Irrespective of the panel used, the same five components emerged: Asian, West-Eurasian, African, Australasian, and Amerindian. However, there are substantial differences in the inferred admixture proportions. The average admixture proportions can be found in this spreadsheet.

The levels of the "African" component in the HGDP African populations are summarized below:

These are almost the same in the French/Han/Papuan1 ascertainments, despite the different number of markers used. When SNPs are ascertained on Eurasian individuals, many SNPs present in African populations are not discovered, and hence, African populations appear "purer" by having a higher proportion of the "African" ancestral component.

However, when SNPs are ascertained on African populations, a different picture emerges, with clear evidence of West Eurasian admixture:

San ascertainment:

Yoruba ascertanment:

It is now evident that there is Eurasian admixture in African populations that was "hidden" in panels of SNPs ascertained on Eurasian individuals. Moreover, this Eurasian admixture seems to be more related to West Eurasians.

Someone might argue that the observed West Eurasian admixture in African populations is the result of a second migration Out-of-Africa that only affected West Eurasians. This, however, is a weak argument, because -with the exception of a few populations with known recent African admixture, such as the HGDP Arabs- there is no variation in African ancestry in Eurasia: it is everywhere virtually zero. On the contrary, in Africa, there are populations with a lot of West Eurasian admixture (Mozabites), intermediate (Yoruba, Bantu) and minimal (San, Pygmies). If the HGDP included more East African and Saharan populations, we would see an even clearer view of variable West Eurasian admixture throughout Africa. In short: the variable levels of West Eurasian admixture in Africa, coupled with the constant lack of any substantial African admixture in West Eurasia is a tell-tale sign that it was a West Eurasia-to-Africa migration, and not the reverse.

This migration created a cline of West Eurasian admixture in Africa, with minima in isolated African hunter-gatherer (San/Pygmy populations), maxima in North and East Africa, and sharp transitions across geographical barriers (such as the Sahara), or ethnic differences (e.g., African agriculturalists vs. foragers). It is no longer tenable to view West Eurasian back-migrations as limited events that affected only North and East Africa: their effects are clearly evident throughout Africa, having affected different populations to a different extent.

The existence of Eurasian admixture throughout Africa is an interesting and novel finding. How much such admixture is there? As I explain here, in the case of admixed populations, the proportion of foreign admixture of a population increases if we include "purer" indigenous populations: Mexican Mestizos are less "European" if remote Amazonians are included in the analysis; North Indians appear more "South Asian" if South Indian tribals are excluded.

The African hunter-gatherers (San and Pygmies) are the least admixed Africans currently in existence, but we cannot tell what their proportion of indigenous African vs. Eurasian ancestry actually is. We simply don't have the genomes of pre-back-migration Africans to compare against, although there are strong hints from palaeonathropology that these included forms that do not fit within the present-day Homo sapiens continuum, such as Iwo Eleru.

It is at present unknown what percentage of African genomes is derived from Eurasian back-migrants, anatomically modern humans in Africa, as well as more divergent indigenous African hominins.

Hopefully the techniques of "virtual genomes", inference techniques allowing migration, such as TreeMix, together with full genome sequencing, and (hopefully) ancient African DNA will help elucidate the emerging picture of the multiple origins of the Sub-Saharan Africans. Africa may have been the cradle of H. sapiens but many of her Eurasian sons came back.

37 comments:

It would be interesting to know what subgroups of Yoruba and Mandenka were used. If Eurasian admixture truly is present here, perhaps known circumstances can explain it. The Mandenka live in the Sahel-Savannah, and some samples might be affected by Maghrebian and/or Fulani-like admixture. Many Kenyan Bantu tribes are known to be admixed with Cushitic peoples etc., and parts of Northern Yoruba land were repeatedly invaded by the Fulani in the 19th century (One also might consider dealings in parts of thei region with, not too far away Chadic speakers. Perhaps also, larger sample of African groups should be used, including some with less known history of such contact.

Regarding Zulu: Nguni and many S. African Bantu subgroups are a branch of the Eastern Bantu family, which may have coalesced near the Great lakes. Cushitic peoples inhabited the Eastern parts of this region, and some southern Bantu, like Bantu groups in much of the East may have a small Cushitic admixture.

"The Eurasian admixture is also present in Bantu (both Kenya and South Africa). It is not limited to Yoruba and Mandenka."

I mentioned Kenyan Bantu. They like many Eastern Bantu Speakers (including the Southern Bantu which are a branch of the former) often have some amount of Cushitic admixture, which of course itself has a Eurasian component.

By commenting on your past couple of posts, I've revealed several times now that I don't think a scenario with DE back-migrating is unthinkable; quite far from it.

However, you still fail to address the significance of the structure in the mtDNA phylogeny. mtDNA is arguably a more reliable sign of continuity than Y-DNA. That you continue to see any increased Eurasian affinity relative to the San and Pygmies, who carry L0 and L1, respectively, as admixture is not the right approach. I cannot take estimates based on such an assumption seriously.

>> That you continue to see any increased Eurasian affinity relative to the San and Pygmies, who carry L0 and L1, respectively, as admixture is not the right approach.

I am not entirely sure what you are arguing regarding San and Pygmies. They do have Y-haplogroup E, so they may have Eurasian admixture under my model. They may also have other H. sapiens ancestry from pre-OoA Africans (who must have belonged to A/B and other deep subclades).

And it's certainly admixture: no ifs and buts. It is pan-African and clinal, it is a sure sign that Eurasian populations back-migrated to Africa and affected different African populations to a different extent.

If it was due to E migrations Out-of-Africa, we'd see E-bearing populations having substantial (or at least measurable) African admixture. We see nothing of the sort. On the contrary, we see that E bearing populations in Africa have the maximum Eurasian admixture, while the A/B bearing ones such as the San/Pygmies have minimum such admixture.

>> San and Pygmies are mtDNA L0 and L1, mainly. The majority of West Africans belong to the L2'6 clade, along with all Eurasians. It's not very complicated.

Stating facts without making an argument does not allow me to understand what your objection to my model actually is.

If I may guess, I believe that you are suggesting that Eurasian affinity in West Africans relative to the San is due to West Africans and San belonging to the L2'6 clade. This may very well be -in part- but it does not at all explain why this Eurasian affinity is specifically West Eurasian, since L2'6 is symmetrical with respect to all Eurasians.

What is the most parsimonious explanation for this experiment is that the closer the ascertainment is made to an African population, and even more specifically closer to 'older' and genetically more diverse African populations, the more likely that the ancestral variation of non-Africans, and more specifically West Eurasians, that is most certainly present in modern day Africans is to be observed by ADMIXTURE and other model based analysis. This would be consistent with declining genetic variation found further from Africa due to serial population bottlenecks that arose in the peopling of our world.

The glitch however is that the Yoruba are 1.8 non-African with the French, 3.2 non-African with the Han, 3.3 non-African with the Papuan, 10.1 non-African with themselves and 17.1 non-African with the San ascertained. Plot this out with respect to geographic distance, starting around somewhere say in Namibia, and in order to have a reasonable best fit-line the Yoruba should have somewhere in between 10.1 and 3.2 non-African with the French Ascertainment, and not 1.8, this tells me something is wrong with your experiment somewhere or that a Full Genome Scan may very well change the outcome.

The above two paragraphs say absolutely nothing. "Serial population bottlenecks" do NOT explain why Sub-Saharan Africans have West Eurasian admixture, because in the "serial bottleneck" model, Eurasians are supposed to be symmetrically related to Africans.

The experiment shows one thing only: that Sub-Saharan Africans have Eurasian (and in particular West Eurasian) admixture.

Just for clarification, how do you know that the the west Eurasian genes were not part of a later contact with explorers and traders.We do know, for example that the Phoenicians reached Britain. They could well have traveled south as well Further back there is now the theory that thre was a rapid expansion down the western coast of North American ~15K ago.I think that contacts happened repeatedly and also OoA happened repeatedly. There is the idea that somehow in Christian culture is a strong urge to see events as singular. One OoA, one movement down N. America (Clovis) even one Big Bang.

"If I may guess, I believe that you are suggesting that Eurasian affinity in West Africans relative to the San is due to West Africans and San belonging to the L2'6 clade. This may very well be -in part- but it does not at all explain why this Eurasian affinity is specifically West Eurasian, since L2'6 is symmetrical with respect to all Eurasians."

Part of this could be due to admixture events like those I suggested above. This might partly explain why "East Asian" and "Amerindian" components look to correlate much less to overall Eurasian portion(The two Pygmy groups, of course are an exception to this in their "East Asian". ) Australasian also seems to correlate less in general to overall Eurasian, and be higher in San than all Africans but the Kenyan BantuThe sharing by Africans of E (whatever its origin) with West and not East Eurasians might also influence the nature of the Eurasian affinity.

"If it was due to E migrations Out-of-Africa, we'd see E-bearing populations having substantial (or at least measurable) African admixture. We see nothing of the sort. On the contrary, we see that E bearing populations in Africa have the maximum Eurasian admixture, while the A/B bearing ones such as the San/Pygmies have minimum such admixture".

I find that very convincing.

"San and Pygmies are mtDNA L0 and L1, mainly. The majority of West Africans belong to the L2'6 clade, along with all Eurasians. It's not very complicated".

But those groups have minimal Eurasian admixture. That amount could be explained by male enry only. The argumant that they would not have moved without taking women with them does not carry any weight with me.

While the argument is plausible, I have to ascertain that various African populations seem to have fairly recent North and East African admixture, something which was known for quite some time. People like the Zulu show East African influenced individuals even by the standards of physical anthropology.

The African populations used are in the sphere of influence of known male back migration of R1b and E1b1b in particular.

So while I think Dienekes is right, there is still a lot to do and populations to test and compare with, to get the full picture.

I don't get why the Mozabites (Berbers) get the most African ancestry in the French ascertained panel. Shouldn't it be the lowest in the Eurocentric panel? Since autosomally Berbers are mainly derived from Caucasoids?

No, it's the opposite. The French ascertainment misses a lot of African alleles, and hence makes African groups appear more homogeneous and more "African".

In a French ascertainment African alleles are missed, and hence the most divergent African populations (San/Pygmies) appear to be closer to Europeans, and hence Mozabites beome more "African-like" because the definition of the "African" component in a French ascertainment is more Eurasian-like than in an African ascertainment.

A good analogy is to think of a paint that is made by mixing 20% white and 80% black paint. If you had to work with white and 50-50 grey paint instead, then you'd need to mix more of the grey paint, and less of the white paint to achieve the same hue.

Eurasian admixture is certainly evident in some groups where it perhaps wasn't immediately obvious, such as the Maasai, but these results don't seem to provide sufficient evidence that it is present across Africa. ADMIXTURE cluster memberships can not be taken at face value as corresponding to some real historical population.

As someone mentioned, the exact same result could be predicted by a near tree-like history where non-Africans are phylogenetically nested within African populations. In such a model, non-Africans could be more differentiated to Pygmies than they are to Mandenka/Yoruba, and even more differentiated to San.

The reason why the shared component is with West Eurasians could be that West Eurasians have experienced less genetic drift than East Eurasians, Australasians and Native Americans (e.g. Keinan et al. 2007, several others). West Eurasian are thus less differentiated (e.g. have lower Fst) to Africans even though in a phylogenetic sense all non-Africans could share as recent history with Africans.

One prediction to make is that if you increase the number of Ks, there will no longer be shared cluster memberships between Africans and non-Africans.

Eurasian admixture is certainly evident in some groups where it perhaps wasn't immediately obvious, such as the Maasai, but these results don't seem to provide sufficient evidence that it is present across Africa. ADMIXTURE cluster memberships can not be taken at face value as corresponding to some real historical population.

There are no Maasai in this analysis. Better read the post before commenting on it.

The reason why the shared component is with West Eurasians could be that West Eurasians have experienced less genetic drift than East Eurasians, Australasians and Native Americans (e.g. Keinan et al. 2007, several others). West Eurasian are thus less differentiated (e.g. have lower Fst) to Africans even though in a phylogenetic sense all non-Africans could share as recent history with Africans.

TreeMix explicitly flips migration edges when it sees that there is a great residual between a pair of populations.

So while I think Dienekes is right, there is still a lot to do and populations to test and compare with, to get the full picture.

I am getting a new ascertainment from a San individual. Working with FGS is computation-heavy so I'm not sure whether I'll succeed, but hopefully it will have enough common SNPs with Illumina platforms; the Harvard HGDP is done in conjunction with Affymetrix. Assuming a large enough number of SNPs, I can process the entirety of my African population collection on a set of SNPs ascertained on a San. Keeping my fingers crossed that I have the firepower to achieve this.

Regarding the TreeMix results, you might try leaving in both the San and Yoruba (and perhaps other African populations as well). I've definitely seen West Eurasian contributions to the Mandenka in analyses like this, but nothing to e.g., the Yoruba or San, and not on the scale you're seeing.

In our paper, we show that a common error made by TreeMix is to get the direction of gene flow wrong, esp. when it involves the outgroup(s). I think others have reported African admixture into the Sardinians. You might consider including a few African population and perhaps Chimp to test the robustness.

Are you mistaking the Mandinka with the Wodaabe (a sub-group of the larger Fulani group whom seem to possess a minor NW African component)? The Mandinka have consistently been shown to be overwhelmingly West African with no significant foreign contributions.

It is based on the _assumption_ that Sardinians are shifted on the CHB-San axis relative to Northern Europeans because of African admixture.

In my initial criticism of the above-mentioned paper, I pointed out that there is an alternative explanation, namely that North European populations are shifted _away_ from Sardinians on the same axis. My analysis of Oetzi's genome which reveals an eastward origin of extant Europeans relative to Oetzi/Sardinians is supportive of that conclusion.

The use of CEU as representative of Northern Europeans is doubly wrong, because CEU includes recent Amerindian admixture and is hence shifted on the San-CHB axis for that reason.

The current analysis points to a yet another potential source of the shift, namely West Eurasian admixture in Africa from a southern Caucasoid source.

The latter is the case irrespective of the TreeMix results, since it has been demonstrated by ADMIXTURE analysis with African ascertainment of markers.

The issue with the TreeMix results is whether admixture was with a small and highly divergent African component, or with a large and little divergent African component. I leave that question open, because it is difficult to evaluate the significance of the solution arrived by TreeMix in the North Eurasian experiment.

I forgot to add a fourth reason for the Sardinian-North European shift along the San/CHB axis, namely Northern Europeoid admixture in China. Such admixture has been repeatedly discovered in Siberian Mongoloid populations, and may in fact exist in China itself; there is a strong hint of it in the Northern Han group in the Han ascertainment of the present analysis.

Regarding the TreeMix results, you might try leaving in both the San and Yoruba (and perhaps other African populations as well). I've definitely seen West Eurasian contributions to the Mandenka in analyses like this, but nothing to e.g., the Yoruba or San, and not on the scale you're seeing.

In our paper, we show that a common error made by TreeMix is to get the direction of gene flow wrong, esp. when it involves the outgroup(s). I think others have reported African admixture into the Sardinians. You might consider including a few African population and perhaps Chimp to test the robustness.

Thanks for clarifying. There is a significant distinction between African hunter-gatherers and other Africans, and yet the estimated "West Eurasian" gene flow for the San (63%) and Yoruba (64%) is almost identical.

The gene flow inferred by TreeMix is clearly unrelated to mtDNA L2'6 (or Y-DNA E, for that matter). It's a result of the shortcomings of TreeMix. I'd like to see how it looks with some more samples.

Well, regardless of your interpretation of the Moorjani paper (and I know that the authors disagree with your interpretation!), testing the robustness of TreeMix by including other populations is probably a good idea.

You might also try playing around with 3- and 4-population tests to see what they look like. As you're aware, Structure models (like that implemented in ADMIXTURE) do not provide formal statistical tests of mixture.

If you think it'll be worthwhile for people, I have basic code for 3- and 4-population tests that run off of TreeMix input files, but which I haven't put online.

>> Well, regardless of your interpretation of the Moorjani paper (and I know that the authors disagree with your interpretation!)

The authors never explained in their paper why they took the Southern European-Northern European shift along the African-East Asian axis as evidence for African admixture, and failed to explore any of the additional explanations for the observed pattern, such as:

- Southern Caucasoid admixture in Africans- Mongoloid admixture in North Europeans- North European admixture in East Asians

Any of the above could explain the observed shift, yet the authors preferred one explanation (African admixture in southern Caucasoids) without arguing for it.

They can disagree with me all they want, but that does not alter in the least the facts of the case.

I see; just read your post again. Regarding African gene flow into southern Europe, my understanding is that there is admixture LD in the southern European populations that is absent from the northern European populations (e.g., Figure 3 in Moorjani et al.).

The precise estimates of gene flow might be influenced by the things you mention (which are of course not mutually exclusive with gene flow from Africa), but as far as I'm aware, any mis-specification along these lines should not create admixture LD where there is none.

It would be interesting to see if you could generate a spurious signal of admixture LD by using two admixed populations as references in rolloff. Not sure if this has been done.

The admixture estimates of Moorjani et al. are based on f4 admixture estimation, and hence are spurious, as they treat shift along the African-Asian axis in Europeans (which is measured by the f4 test) as evidence of African admixture in South Europeans, without considering the alternatives mentioned above.

Their admixture estimates are also higher than every other method, including both methods using LD, and without using LD:

Their Sardinian estimates are particularly wrong, because Northern Europeans are shifted relative to Sardinians towards Mongoloids, and this is misinterpreted by the f4 test as "African admixture".

The ROLLOFF procedure also produces shorter age estimates of admixture compared to both HAPMIX and StepPCO, a discrepancy never addressed in the paper.

The paper does not consider the possibility that North Europeans are shifted towards Asians, and sports this immortal line:

"Application of the test to each West Eurasian population (using A = YRI and B = CEU) finds little or no evidence of mixture in North Europeans but highly significant evidence in many Southern European, Levantine and Jewish groups (Table 1)."

In other words, application of the test with a North European reference population (CEU) finds no evidence of African admixture in... Northern Europeans.

Repeat the procedure with Sardinians and CHB, and you'll find every single North European population shifted towards CHB.

All the following are true:

- Sub-Saharan Africans have West Eurasian-like admixture- Northern Mongoloids have North European-like admixture- Northern Europeans are shifted towards Mongoloids relative to Southern Europeans, and there is plenty of evidence for the presence of Mongoloid mtDNA in Northern Europe, as well as of a major lineage (Y-haplogroup N) of recent Siberian origin.

I will also add the fact that CEU has recent Amerindian admixture, which renders it inappropriate as a reference population.

Moorjani et al. is a wonderful example of confirmation bias, as it sought to understand the extent of African admixture in Europe and West Asia, and ignored every other possible explanation for the observed data, never testing any of them.

>> It would be interesting to see if you could generate a spurious signal of admixture LD by using two admixed populations as references in rolloff. Not sure if this has been done.

As of April of 2011, ROLLOFF was "not really mature" to be shared with me, despite the fact that it was supposed to be "available on request" in the published paper. As of this writing it has not been made publicly available.

I'll be happy to evaluate its properties when/if it becomes publicly available.

"I will also add the fact that CEU has recent Amerindian admixture, which renders it inappropriate as a reference population."

This is almost certainly true. A massive percentage of white Americans claim NA heritage. However, there's little in the CEU samples you've used, either in averages (in world 9, CEU Amerindian levels are on par with NW European noise levels, for example), or in portrait. In fact, according to the latter, SSA heritage appears more common than NA, though granted, Amerindian wasn't a component on the portraits that I saw, and I had to assume East and SE Asian admixture acted as proxy. But the fact that you said 'has' rather than 'may have' suggests you've seen data I haven't. Did you remove NA-admixed individuals from the CEU dataset?

Now for a more on-topic query: Does usual (i.e. European-primed) ascertainment bias have the effect of inflating or of underestimating SSA admixture in West Eurasians, or could it go either way? I understand that, given SSA contains a WE element, previous efforts of European-centered estimation have neglected that element. But my question is in regard to recent SSA admixture, for which the deep WE affinities can be set aside in the same way that the WE and SSA components in North Africans can be set aside when dealing with the component 'NW African' in and of itself. So does ascertainment bias over or underestimate recent SSA admixture?

A final question: Is the Amerindian component in the above runs that is found in African and European groups at varying levels simply noise? It, in addition to Asian levels, exceeds other measures of East Eurasian admixture in the groups in which it occurs. Do Amerindians have a WE component which the software inaccurately interprets as NA and projects back onto Eurasian (and SSA, for whatever reason) source populations? Thanks.

If you'll permit, I'll expand a little more on my last comment. I apologise in advance for focusing on mainly tangential issues. According to this (https://docs.google.com/spreadsheet/ccc?key=0AuW3R0Ys-P4HdHdGXzdLR0prSk1UUWZjNGl5cUEtWlE#gid=0) spreadsheet from the Harappa Project, the Amerindian groups here treated as pure are, at k=3, split approximately 25/75 between East Asian and European. This admixture isn't recent, since at k=5 these two components are almost entirely absorbed into the new 'Amerindian' component. I wouldn't call this counterintuitive, because we already know that modern North Asians contain a Northern European element, and this element may have been present before the migration of Siberians to the Americas. The amerindian noise may therefore occur because an analysis can't treat an 'impure' population as 'pure' without doing some statistical violence to the populations from which in various quantities it descends. Hence the presence of the component in most Eurasian populations.

New-World-to-Old-World backflow in the period between the discovery of 'Vinland' and the present is certainly absurd as an explanation of the component in the populations in which it occurs, and interpretation of the component in Europeans as being a slight misinterpretation, on the part of the analysis, of Siberian elements again seems spurious, since the element isn't proportionally constant in relation to Asian components, nor is it notably higher in Siberians/other East Asians than in some European groups. Rather, it seems that the analysis is misinterpreting the direction of certain affinities. The absence of the Amerindian component in Sardinians is interesting and seems to suggest that the affinity between Amerindians and Europeans is pre-Neolithic (considering Sardinians are the most Neolithic-descended European group, as far as I remember). This, then, would explain the larger 'Amerindian' portions in Northern Europeans that may superficially seem to correlate with Siberian input (e.g. in Russians), though a closer inspection (e.g. of the French) shows the correlation to be weak, and argues rather for the fact that the N. European element in Siberians is responsible for this.

I also think treatment of Amerindians as a primordial group (in terms of the other principal clusters identified) possibly belies certain Australasian affinities in Southern American natives. Strangely, again returning to the Harappa spreadsheet, at k=4, Pima, for example, still express no affinities with S. Asians, which one would assume would act as proxy for Australasian (and this is seen in the charts of this blog entry, which show Australasian in those Asian groups that are likely to be mixed with Dravidian elements). The more than slight physical differences between North, Central and South American natives remains a mystery.

As for recent Amerindian in the CEU, I was originally surprised by the seeming lack of Amerindian elements in them. I understand that only a portion of the CEU dataset is used by Dodecad (and Harappa, presumably), but even so, it seems strikingly rare given the usual boastings of such heritage by a massive proportion of pred. European Americans. I suppose I'm just looking for a rough quantification. It's possible, though not likely, that Amerind. mixed individuals were pruned from Dodecad's CEU samples, since Dodecad prefers to deal with Old World admixture. But if the percentage of Amerind. admixed people in the CEU dataset is so minor that random portions of the CEU dataset sampled by Dodecad and Harappa contain no/few such people, then that is a quite striking statement on the psychology of 'ancestry bragging'.

Old Blog Archive

Dienekes' Anthropology blog is dedicated to human population genetics, physical anthropology, archaeology, and history.

You are free to reuse any of the materials of this blog for non-commercial purposes, as long as you attribute them to Dienekes Pontikos and provide a link to either the individual blog entry or to Dienekes Anthropology Blog.

Feel free to send e-mail to Dienekes Pontikos, or follow @dienekesp on Twitter.