April 26, 2011

Sub-Saharan admixture in West Eurasian groups (Moorjani et al. 2011)

Let me preface this by saying that I don't doubt that there exists some Sub-Saharan admixture in some West Eurasian (Caucasoid) groups, and I've quantified the different types of African admixture that can be found in many such groups, most recently here.

However, there are serious methodological flaws in a new paper by Moorjani et al. which render its estimates unreliable. This is unfortunate, as the authors assembled an important dataset, but they only consider a very simplistic model of 2-population admixture which is completely inappropriate for the problem they are studying.

Caucasoids on the Chinese-San axis of variation

Moorjani et al. motivate their study by projecting various West Eurasian groups from Europe and the Near East onto the first principal component of variation defined by CHB (Chinese) and San (Bushmen). The reasoning is the following:

To study the signal of African gene flow into West Eurasian populations, we began by computing principal components (PCs) using San Bushmen (HGDP-CEPH- San) and East Eurasians (HapMap3 Han Chinese- CHB), and plotted the mean values of the samples from each West Eurasian population onto the first PC, a procedure called ‘‘PCA projection’’ [17,18]. The choice of San and CHB, which are both diverged from the West Eurasian ancestral populations [19,20], ensures that the patterns in PCA are not affected by genetic drift in West Eurasians that has occurred since their common divergence from East Eurasians and South Africans.

This is indeed a good idea: if some Caucasoid group A has a common ancestral element with Sub-Saharans that is lacking in another Caucasoid group B, then A is expected to be shifted towards the San side of the first PC relative to B. Indeed, this is what the authors observe:

We observe that many Levantine, Southern European and Jewish populations are shifted towards San compared to Northern Europeans, consistent with African mixture, and motivating formal testing for the presence of African ancestry (Figure 1, Figure S2).

However, this is clearly a case of seeing the glass half full. The authors prefer the hypothesis that some Caucasoid groups have African ancestry, although the hypothesis that other Caucasoid groups have East Asian ancestry can equally well explain the observed pattern. Indeed, both hypotheses may explain the phenomenon they observe.

For example, African ancestry in Palestinians has been well-documented, so Palestinians are expected to be San-shifted relative to northern Europeans. On the other hand, East Eurasian ancestry has also been well-documented in HGDP Russians, so we expect them to be CHB-shifted relative to southern Europeans.

Things are not that clear for other Caucasoid populations, e.g., southern Europeans or northwestern Europeans. The authors assume that the different position of these two groups on the San-Chinese axis is due only to Sub-Saharan admixture in southern Europeans. This implicit assumption is the Achilles' heel of the paper.

Tests of population admixture

Because of genetic drift, two populations that diverged from a common ancestor will have different allele frequencies. However, imagine if we looked at these allele differences and saw that a population A not only had different frequencies than B, but also the difference in frequencies tended to be in the direction of a Sub-Saharan population. For example, at some locus f(A)=0.4, f(B)=0.3, and f(Sub-Saharan)=0.1. You can see that B's frequency deviates from A's in the direction of Sub-Saharans. This may occur due to random drift for one particular marker, but if it occurs systematically across the genome, then admixture is a likely explanation. This is the basis of the 3-population test used by the authors.

Another idea is to see whether frequency differences between A and B are correlated with frequency differences between Sub-Saharans and another Eurasian population unrelated to either A or B. Differences between Caucasoids and Sub-Saharans are (in part) due to divergence between Sub-Saharans and ancestral Eurasians. Suppose, for example, that we've identified a group (e.g., Papuans) unlikely to have admixed with Caucasoids. If B differs from A (over many markers) in the same direction that Sub-Saharans differ from Papuans, this is consistent with the notion that B has some Sub-Saharan admixture that A lacks. This is the basis of the 4-population test.

Note that because of symmetry, a highly negative value in their 4-population test (x, CEU, Papuan, YRI) indicates Sub-Saharan admixture, while a highly positive one would indicate "Papuan" admixture! The authors do observe positive values, suggesting that some northern European populations are Papuan-shifted even with respect to CEU, most notably Russia with a Z-score of 11.4. Thankfully, we are spared a paper on Papuan admixture in Russia.

Comparison to the Indian Cline work

These tests are an important statistical tool, and many of this paper's authors have used them before to study the IndianCline of populations. However, the current paper has two important shortcomings in comparison to Reich et al. (2009).

In their study of the Indian Cline, Reich et al. (2009) excluded groups that were shifted towards CHB, thus ensuring that they were left with groups that could be modeled as a simple mix of two ancestral population elements.

Moreover, they used the Onge a relatively isolated population from the Indian Ocean as a control group that could be said to form a clade with Ancestral South Indians at the exclusion of West Eurasians. In the current paper it is simply assumed that northern Europeans have no African admixture.

Application of the test to each West Eurasian population (using A = YRI and B= CEU) finds little or no evidence of mixture in North Europeans but highly significant evidence in many Southern European, Levantine and Jewish groups (Table 1).

In other words: taking CEU (a northern European population) as the standard, northern Europeans have no evidence of African admixture.

Sardinians: an important test case

Sardinians are an important test case for the authors' model. Their 3-population test shows no evidence of admixture, while the 4-population test does. Moreover, their STRUCTURE analysis shows a trivial 0.2%, whereas the authors estimate their Sub-Saharan admixture as 2.9%.

Let's begin by performing a PCA analysis of Sardinians, CHB, and CEU, which is shown below.

(All PCA analyses are done in smartpca as implemented in EIGENSOFT 4.0 beta, withnumoutlieriter set to 0. All analyses are performed over datasets merged in PLINK with the --geno 0.001 flag, which effectively keeps only common markers and ensures a high quality dataset)

CEU is shifted towards CHB relative to Sardinian. This is made more visually obvious if we blow up the CEU/CHB portion of the above plot:

CEU is shifted towards CHB by 2.4% relative to Sardinians. This is quite close to the 2.5% East/South Asian K=3 admixture for Britons in my most recent analysis, done with a different East Asian reference and a different method (ADMIXTURE); the CEU sample of White Utahns has been repeatedly shown to be most similar to people from the British Isles or Northwestern Europe.

Now, let's look at Sardinians, CHB, and YRI:

and a blowup:

Sardinians are shifted 1.1% relative to CEU towards YRI. Again, this is close to the 0.9% K=3 Sub-Saharan ADMIXTURE result I recently obtained.

So, where does the 2.9% Sub-Saharan admixture in Sardinians come from? Moorjani et al. estimate this percentage under the assumption that Northern Europeans are not shifted towards Chinese, i.e., that East Eurasians are irrelevant. Clearly, as we have seen, this is wrong. As we shall see, this erroneous assumption leads to the erroneous admixture estimate.

2.9% Sub-Saharan admixture in Sardinians (?)

Now, I will demonstrate how the spurious 2.9% result can be obtained. By doing so, it will become obvious why Moorjani et al. obtained this result as a result of ignoring the eastern Asian shift of their northern European sample in their analysis.

Here is a PCA plot of Sardinians, CEU, CHB, YRI:

and the blowup:

When we run all four populations together, Sardinians are shifted towards YRI along Dimension 1, and CEU are shifted towards CHB along Dimension 2. Given that the eigenvalue for PC1 is approximately twice (50.15) that for PC2 (25.31), and doing a little high school geometry on the triangle (Sardinian, CEU, YRI), we project Sardinian onto the CEU-YRI line, intersecting at point X. We thus obtain the estimated "CEU" admixture as:

The example of the Sardinians showed how lack of controling for East Eurasian shift tended to overestimate the degree of Sub-Saharan admixture. Another test case is that of Ashkenazi Jews. The authors find no evidence of admixture with their 3-population test, but do find such evidence with their 4-population test, as well as with STRUCTURE.

On a PCA plot of CHB, Ashkenazi (Behar et al. 2011), and CEU, the Ashkenazi are shifted 3.3% towards CHB along eigenvector 1.

On a PCA plot of YRI, CEU, and Ashkenazi, the Ashkenazi are shifted by 5.3% towards YRI.

In the case of the Sardinians, their African-shift together with CEU's Asian-shift caused Sardinians/CEU to diverge on the African-Asian axis, and Moorjani et al. took the entirety of this divergence to represent African admixture in Sardinians.

In this case Ashkenazi are both Asian- and African-shifted relative to CEU. The two shifts partially cancel each other out: Ashkenazi are pulled towards Africans on the YRI-CHB axis because of their YRI-shift, and away from them because of their CHB-shift. Failing to account for these processes, the authors assume that only Sub-Saharan admixture in Ashkenazi can accont for the different position of CEU and Ashkenazi on the Asian-African axis, coming up with a 2.8-3.2% "Sub-Saharan admixture" in two different samples.

And, here is a second way of seeing how this spurious admixture estimate follows from the phenomenon I am describing. CEU are (in terms of Fst) 0.76 times distant from CHB as they are from YRI (Fst=0.17 and 0.129). In other words, Sub-Saharan admixture is more "potent" at shifting a population than East Eurasian ancestry is. Ashkenazi are YRI-shifted by 5.3%, and they are CHB-shifted by 3.3%. Multiplying the latter by 0.76 we obtain: 5.3-0.76*3.3 = 2.8%!

In other words, the 2.8% Sub-Saharan admixture in Ashkenazi Jews is a compromise between two different phenomena in a tug-of-war. It is not an accurate estimate of admixture.

Papuans

I have also carried an experiment with Sardinians, Ashkenazi Jews, CEU, and Papuans, instead of CHB, as Papuans are also used in the paper as an outgroup population.

and the blowup:

It is clear that the populations show differential shift towards Papuans that is concordant with their above-described shift towards the Chinese.

Luhya and Bilala

Failure to correct for differential shift towards Chinese/Papuans is problem enough, but the paper also fails to properly take into account non-West African populations. North African groups are conspicuous in their absence, while the HapMap3 Luhya (LWK) and a Bilala sample are used to represent East Africa.

Henn et al. (2011) contains Tuscan, Yoruba, Maasai, Bulala samples, so I ran the Tuscans as test data in a supervised ADMIXTURE 1.1 analysis together with these African groups, HGDP-CEPH North_Italian, and HapMap3 CEU. That is, I'm playing along -for the sake of argument- with the idea that East Eurasians are irrelevant, and Tuscans can be seen as a mixture of CEU "Europeans" and African groups.

The results are unambiguous: Tuscans/North Italians are found to be 2.1%/1.2% "Maasai" and 0% of all the other African groups. In other words whatever element there is in common between Tuscans and Africans is not particularly West African.

The inclusion in the paper of HapMap3 Luhya Bantu but not of HapMap3 Luhya Maasai is puzzling, and the choice of one group over the other is passed in silence.

In my own experiments, I distinguish between North, Sub-Saharan, and East African ancestral components.

Beyond a binary worldview

Much more can be said, but let's summarize: the model of Moorjani et al. (2011) fails because:

It does not account for the West-East Eurasian axis, folding everything onto the North European-Sub-Saharan African one

It undersamples African diversity by excluding both North African and East African populations

Perhaps I'll add more in the future, but I believe I've already said enough to cast serious doubt on this paper's conclusions.

PLoS Genet 7(4): e1001373. doi:10.1371/journal.pgen.1001373

The History of African Gene Flow into Southern Europeans, Levantines, and Jews

Previous genetic studies have suggested a history of sub-Saharan African gene flow into some West Eurasian populations after the initial dispersal out of Africa that occurred at least 45,000 years ago. However, there has been no accurate characterization of the proportion of mixture, or of its date. We analyze genome-wide polymorphism data from about 40 West Eurasian groups to show that almost all Southern Europeans have inherited 1%–3% African ancestry with an average mixture date of around 55 generations ago, consistent with North African gene flow at the end of the Roman Empire and subsequent Arab migrations. Levantine groups harbor 4%–15% African ancestry with an average mixture date of about 32 generations ago, consistent with close political, economic, and cultural links with Egypt in the late middle ages. We also detect 3%–5% sub-Saharan African ancestry in all eight of the diverse Jewish populations that we analyzed. For the Jewish admixture, we obtain an average estimated date of about 72 generations. This may reflect descent of these groups from a common ancestral population that already had some African ancestry prior to the Jewish Diasporas.

21 comments:

I have an additional reason for believing that the amount of African admixture among Jews has been exaggerated. The estimate that Jews acquired there African admixture 72 generations ago while other Levantine populations remained unmixed until 32 generations ago does not look reasonable. It seems to be the result of over-estimating African admixture among Jews. This would make the observed level of linkage disequilibrium, which is not high, seem falsely to be the result of very early admixture.

Great points, Dienekes. It's amazing how a lack of multidimensional spatial awareness leads even academics astray.

This brings me to my own (DOD232) case, though. Your K=10 run has me with .002 E.Asian and .006 NE Asian, but 0 N. or E. African. On your recent K=11, wherein you conflate the two Asian components, I then get .0055 E. African (and a lower .0045 combined E. Asian score). Could this be a spurious pull to compensate for my Asian really being more northeasterly?

PS While your breakdown of Mediterannean into Basque and Sardinian match Davidski's values to the percentage, your NE and NW scores for me essentially reverse what he obtained for me (.15/.08 vs .07/.16). I wonder if you have any thoughts on that.

The estimate that Jews acquired there African admixture 72 generations ago while other Levantine populations remained unmixed until 32 generations ago does not look reasonable. It seems to be the result of over-estimating African admixture among Jews.

I did not go into ROLLOFF in the post, but if you notice, it takes into account the inferred admixture proportions. How that affects the estimate is another matter.

I agree with the thrust of your comment. If the "Roman Empire" is to blame, then most of the Levant was in the Roman Empire, while the part that wasn't (Arabia) had even more opportunity, and for much longer, to undergo African admixture. Hence, the idea that Jews received African admixture before Arabs did is difficult to believe.

(Mind you, I don't believe there is substantial Sub-Saharan admixture in most of the populations listed in the paper. What they have probably discovered is events linking West Eurasia and Africa such as mtDNA haplogroup M1 or Y-haplogroup E. The expansion of the latter from East Africa has brought a common population element to both West Eurasia and Sub-Saharan Africa)

George, the earlier run only used Yakuts and Chinese, while this one uses pretty much a representative from every Mongoloid group (from HGDP and Rasmussen) instead of two populations.

With respect to "East African", it's a good idea to always look at Fst divergences between components that happen to be named the same. You will see that this "East African" is not the same as the "East African" of the K=10, and in fact it is equidistant (tilted to the West asian side) from Caucasoids and Sub-Saharans.

Unfortunately there are only so many names one could use to label components, and, between the alternatives of keeping them un-named or re-using the same names, I chose the latter, as it makes it easier to remember where particular components peak.

Just to elaborate on the ROLLOF calculations: If the Arican segments in population are, on average, short, it could be because the admixture occurred a long time ago or it could be because the admixture was small. Once Moorjani et al assumed that that admixture was not small they had to conclude that the admixture was early.

Very interesting, you basically demolished many of their results/interpretations. African-like admixture in Europe and ME is real IMO, even though I'm increasingly convinced it wholly or largely corresponds to an old North African/Saharan population whose genes are today completely admixed into a Neolithic gene pool with a predominant more Northerly origin (Mesopotamia/Anatolia/Levant).As some people may know, I believe African admixture in the Levant and in Southern Europe is mostly due to en Out-of-Egypt expansion some 7000-4000 years ago, associated with more efficient Neolithic capabilities and Afro-Asiatic languages, as well as common Y haplogroups (E and J?) and the rare L mit haplogroup. A typical uniparental distribution for a Neolithic wave, the opposite of the expected for absorption of slave populations. No wonder it appears in Jews, who may have preserved some cloudy memories of these events better than most (even though references to Egyptian ancestry also appear in Greek Mythology and even among the legends of Gael of Ireland).

Why do you censor me and my work -like a "Nazi" irreverent- and also shush impeding to publish my articles that authors, scientifics, colleagues and institutions welcomes?

The blog has a rule: avoid double-posting. You consistently flaunt that rule, by not only double- but also triple-, quadruple-, quintuple-, ... posting

http://tinypic.com/r/k3vbt/7

I have brought this issue to your attention before, and you continue to flaunt the blog's rules, adding personal insults to your replies.

Thankfully, blogger's spam folder is handy as a dumpster for repetitive posts, and that's where your comments invariably end.

Readers can read the n-th iteration of your world-famous 2001 Jewish DNA research in the supplied pic which is a snapshot of my spam folder, and you are free to repeat them in any of the venues that welcome you, but not here.

Much thanks for this article. I was getting worried when I originally read this. There's no WAY that the %'s of SSA could be that high in Italy. How come Greeks are never mentioned as having SSA. It's only the Italians that are ever mentioned. How about other Southern Europeans? It's not like they have a magic shield over them that protects them from Negroes. Furthermore, the blog you link us to "Genetic Structures of West Eurasians", paints a rather false picture of SSA ancestry in Southern Italians. It shows them as having lots of it, while Greeks have none? How can this be possible. What are the real %'s of admixture in Italians?

What is your idea that the picture of SSA ancestry is "false" based on? There is fairly good evidence for the introgression of African Y chromosomes and mtDNA in Italian populations, and this is consistent with a little Sub-Saharan autosomal ancestry detected there (order of 1%).

Are there any other European populations besides Italians that have SSA ancestry? The numbers seem exaggerated in the Moorjani report. I looked over the link you provided with me. Go figures, Sicilians have 1%. All other Italian populations have 0.6% at most. That might even be noise. What are your thought's on that? It seems completley devoid in mainland Italians and absent in Northern Italians. Was there an agenda with the Moorjani study?

Because the link that you just provided me with proves that it's bull-crap. There's no way that ALL Southern Italians have 2.7% SSA admixture. The most I've seen is 1% in Sicilians only. Mainland Italians don't even reach a full percentage mark as provided by your study. Doesn't the fact that you wrote this prove that they used some sort of agenda when doing this study?

Well the arabs did conquer iberia and sicily, so it does make sense that there would be significant ssa ancestry.No offense, but you don't seem to have any academic credentials in genetic science, unlike the people who wrote the paper.

Old Blog Archive

Dienekes' Anthropology blog is dedicated to human population genetics, physical anthropology, archaeology, and history.

You are free to reuse any of the materials of this blog for non-commercial purposes, as long as you attribute them to Dienekes Pontikos and provide a link to either the individual blog entry or to Dienekes Anthropology Blog.

Feel free to send e-mail to Dienekes Pontikos, or follow @dienekesp on Twitter.