September 29, 2012

More on the surprising link between Africans and Denisovans

In a previous post, I showed that there is an unexpected link between Africans and Denisovans. Papuans appeared more "Denisovan" than other populations irrespective of SNP subset used, but Africans appeared more "Denisovan" than Eurasians for both a subset of SNPs polymorphic in Eurasians and monomorphic in Africans, as well as a subset of SNPs polymorphic in all 5 major populations.

In the current post, I explore this issue further by using the SNP ascertainment panels released by Patterson et al. (2012). In particular, I use panel #3, which involves 48,531 SNPs ascertained in a Papuan individual.

All of these are positive, and many of them are significant with a Z-score greater than 3. Africans appear more "Denisovan" than West/East Eurasians and Amerindians using this panel. So, perhaps, this is another indication of the "surprising link" I discovered in my previous post.

This link may have been overlooked in previous analyses which found that Africans are less Denisovan than all Eurasian groups. But, as I argue in my previous post, this is potentially due to introgression of archaic African alleles into living Sub-Saharan Africans which shifted them away from Denisovans. So, the African story may involve admixture between a population somehow related to Denisovans (whether due to an early Out-of-Africa that affected them, or due to an Into-Africa event), and divergent native Palaeoafrican populations.

It would be worthwhile to follow up on these observations using the high-quality Denisovan genome recently published, to see how they might hold up.

24 comments:

Overall, greater proximity between a) Papuans followed by Africans and Denisovans vs. Europeans and East Asians, on the one hand, and b) between Asians followed by Europeans and Neanedrtals vs. Africans, on the other, makes good sense from the point of view of the distribution of such cultural traits myths where we tend to see the Continental Eurasian cluster opposed to the Indo-Pacific-African cluster (http://anthropogenesis.kinshipstudies.org/2012/08/comparative-mythology-and-the-study-of-modern-human-origins/).

Worldwide musical traditions, in their Lomax-Grauer original "two-roo?" model, also seem to endorse the same division. See http://anthropogenesis.kinshipstudies.org/2012/08/the-evolution-of-language-and-music/. If Victor Grauer is reading this, maybe he can comment?

I'm surprised not to see (South) Amerindians as proximate to Denisovans, as South American Indians belong in both mythological and musicological clusters. This could be a sample issue, although Surui and Karitiana (unlike North American Indians) are precisely where I would be searching for links to Denisovans.

Skoglund did rank South American Indian as second only to Papuans in having Denisovan admixture (http://anthropogenesis.kinshipstudies.org/2012/03/american-indians-neanderthals-and-denisovans-pca-views/), so I would check the data again.

Needless to say, I don't believe in Neandertal or Denisovan admixture, but rather in common descent of humans and these hominins. It's Africans who probably picked up an archaic African genetic component rich in all those chimpanzee homologies that make Africans look so divergent. If cultural data does correlate with those pseudo-admixture signals outside of Africa, then it's unlikely that admixture was so deep as to create those major cultural divisions.

Let's get this straight, for I still think this is all a hoax.According to you, in the group of Eurasian polymorphic SNPs there are less Denisovan than Neanderthal SNPs that are also monomorphic in the African Mbuti Pygmys. Because, actually the relation between Denisovans and Pygmy is ancestral and they share ancestral SNPs, right? If so, this might reflect a lower penetration into Africa of the shared Denisovan-Neanderthal portion of archaic admixtures.

I mean, some simple calculations based on the data of Meyer, Hochreiter and Hu will show you that Africa actually harbors a substantial portion of Neanderthal genes that are exclusively shared Afro-European. In such a scenario, the Denisovans are not more 'African', but Denisovans and the Pygmys alike are less Neanderthal admixed.

Please verify my calculations in "Expanding Hybrids And The Rise Of Our Common Genetic Denominator": http://rokus01.wordpress.com/2012/09/29/expanding-hybrids-and-the-rise-of-our-common-denominator/

We have likely modern humans in the Arabian peninsula in circa 62 000 BCE. It might be possible that both routes out of Africa were used and that people using the Arabian instead of the Egyptian route met and interbred with Denisovan populations in Asia before people using the northern route ended up there. There could have been immigration by this Arabian route population back to Africa - even perhaps under pressure from the northern route population now entering Asia - and this immigration would have then carried Denisovan genes to African populations in about 50 000 BCE, when most of the waves of modern humans out of Africa would have already left.

Wouldn't substantial Neanderthal admixture in Eurasians also raise this signature? Due to incomplete lineage sorting and also subsequent mutations, Neanderthals will be more Chimp in some sites than Denisovans.

These results could indicate Denisovan-like admixture in Africa, but they can also be explained by ascertainment bias - the Papuan panel used contains only polymorphic alleles, which, since Papuans are much close to Eurasians than Africans, will naturally contain more SNPs that are monomorphic in African populations, and thus cause African's to appear more Denisovan-like in the D-statistic. (I explain this in more detail in the previous post)

It should be noted that the source of the Papuan panel, Patterson et al. (2012) shows that different ascertainments can affect the D-statistic result by 5% (Table 2). They also state that the D-statitic returns 0 "if we ascertain in an outgroup", a number of times, indicating that the SNP selection is important it getting a meaningful result.

I understand the problem - if Africans have admixture from multiple archaic populations, then it's possible that any Denisovan admixture will be "hidden" in comparisons because the other archaic mixture will make Africans appear further away from Denisovans than a population with no admixture. By limiting the SNP range to those known contain Denisovan DNA (ie Papuan) you increase the chance of detecting Denisovan admixture that might otherwise be hidden in a wider panel. Unfortunately though, you also introduce an inherent bias in the data which makes the result less reliable.

I had a quick look at Rokus' blog and I see he is firmly on the side of evolution (not just human) being driven very much by hybridism between different populations. I must say I agree completely with him on this matter. There was never any simple, single OoA. Movement both into and out of the continent has occurred many times over our evolution. No species derives from the expansion of a single genetically isolated population.

To clarify my statement above, what I mean is that in sites where the Denisovan andmodern human expectation is "B", the Papuan panel by definition often will have the "A" polymorphism from common Eurasian Neanderthal admixture. So, "BABA" is enhanced over "ABBA" because in this panel Eurasians share the "A" from common Neanderthal admixture, while Africans don't - not because Africans are more Denisovan-like.

I've thought about this and am no longer sure that Denisovan admixture in Africans would get hidden in D-statistics due to "other" Paleo-African admixture.

My thoughts in brief: The D-statistic only looks at "BABA" and "ABBA" sites. Assuming the archaic admixture is in Pop1, then each admixed Denisovan allele turned a previous "AABA" site into a "BABA". To hide this "BABA" from the D-statistic the "other" admixture needs to change a "BBBA" site into an "ABBA" site - the "ABBA" will counteract the "BABA". (Note that changing the "BABA" to anything else would be removing the Denisovan admixture from the genome, not hiding it.) In order to change "BBBA" to "ABBA" the "other" would need to have "A"s where Denisovans, Pop1 and Pop2 all have "B"s, meaning it would have split from the modern human lineage before the Denisovans did. Also, "BBBA" sites only occur in about 2.5% of the genome* (they only occur when the "B" mutation happened after Chimps but before Denisovans diverged from modern humans). This means that each allele of admixture from the "other" has only a 2.5% chance of being one that turns a "BBBA" to "ABBA", and therefore, there needs to be ~40 times more admixture from the other to ensure that all of the "BABA"s are counteracted.

So, for Denisovan admixture to be hidden from the D-statistic, a population would need 40 times more admixture from a pre-Denisovan hominid. I don't think this has happened in Africans.

(*) 2.5% is an estimate: Using 97% Chimp DNA, Chimp split 6Mya, Denisovan split 1Mya we get 5/6 of the 3% non-Chimp DNA being shared with Denisovans ("BBBA")

Assuming the archaic admixture is in Pop1, then each admixed Denisovan allele turned a previous "AABA" site into a "BABA". To hide this "BABA" from the D-statistic the "other" admixture needs to change a "BBBA" site into an "ABBA" site - the "ABBA" will counteract the "BABA".

BBBA sites (i.e. sites where two modern human groups and Denisovans match and chimps do not) occur much more frequently than AABA sites, simply because BBBA are concordant with the tree topology and AABA are not.

So, Denisovan admixture from AABA to BABA or ABBA has fewer sites to work with than Palaeoafrican admixture from BBBA to BABA or ABBA.

Yes, that is true - roughly 2.5% of the genomes are BBBA sites, and 0.5% in Denisovan are xxBA (where x is A, C or D)

So, Denisovan admixture from AABA to BABA or ABBA has fewer sites to work with than Palaeoafrican admixture from BBBA to BABA or ABBA.

This is only true if the Paleoafrican genome has the Chimp allele at more than 20% of the BBBA sites (diverged at least 2Mya), but I think this is irrelevent - our situation has already presupposed some degree of Denisovan admixture regardless of the chance of it happening. In Pop1 and Pop2 there are 97% A's, 2.5% B's and 0.5% not-B. So a B from Denisovan will form BABA at every A in Pop1 (>97% of sites), while an A from "other" will form ABBA only if there's a B in Pop2 (2.5% of sites). Assuming "other" is a full Chimp genome, for each B allele recieved from Denisovan admixture, you will need, on average, about 40 alleles from "other" for it will be cancelled out in the D-statistic.

My point is that in order to have hidden Denisovan admixture in the D-statistic, Africans would need to have substantial admixture with a pre-Denisovan hominid to a degree that would render them non-AMH. Clearly this hasn't happened.

There's a major cline between the New World/Papua New Guinea and Africa/Europe of decreasing intergroup genetic diversity and linguistic stock diversity. There were no archaic hominins in the New World or Papua New Guinea, hence admixture cannot explain this important pattern that cuts across different lines of evidence.

The BBBA pattern occurs much more frequently than the AABA pattern, because modern humans and Denisovans share much more recent common ancestry than humans and Denisovans.

The pre-Denisovan Palaeoafrican *does* have an increased probability of having an A allele. A simple way to see this is to slide the "split" of the Palaeoafrican population up and down the tree: if it goes close to the Human-Chimp split, then the Palaeoafrican tends to have B or A with equal probability (because it tends to be equidistant from modern humans and chimps). Conversely, if it goes down the tree, then it tends to have B much more than A. But, given that genetic divergence between modern humans and Denisovans are about 1 million years (using the slow mutation rate), divergence between modern humans and pre-Denisovan Palaeoafricans will be more than 1million years.

A few percent Denisovan admixture can be tracked by the *smaller* number of AABA sites that shift to BABA or ABBA due to admixture. This can be nullified by the *larger* number of BBBA sites that shift to BABA or ABBA due to admixture with pre-Denisovan Palaeoafricans, where the Palaeoafricans have a substantial probability of possessing the A allele.

In short, I don't believe in your calculation that you need 40 times as much Palaeoafrican admixture to nullify the Denisovan admixture. The calculation is complex, as it relies on the relative abundance of BBBA and AABA sites and the genetic divergence of the admixing Palaeoafrican population.

"There's a major cline between the New World/Papua New Guinea and Africa/Europe of decreasing intergroup genetic diversity and linguistic stock diversity"

Quite possibly because of a greater difference between the New World population and the Papua New Guinea population than between Africans and Europeans. The last two have much more in common than do the Eastern groups.

Dienekes, we know there is Incomplete Lineage Sorting (ILS) between the chimpanzee, bonobos, and AMH. Can you run these against the bonobo instead of the chimp?

I would like to see a few simple things:

1. compare against chimpanzee, bonobo, Denisovan and Neanderthal just to make sure about which alleles are truly derived and which are ILS

2. run all Paleoafrican populations, meaning the Hadza who are just as divergent on some MDS plots as the San and Mbuti

Since we know that humans, chimpanzees and bonobos have ILS, and that Denisovans and Neanderthals form an early clade, it might be expected that there would be even greater or a different set of ILS alleles between the Neanderthal-Deniosvan clade and chimps and bonobos than between humans and chimps and bonobos because the Neanderthal-Denisovan clade would preserve alleles that died out later.

One other thing: There may be signs of a yet earlier "nested" East Asian Archaic admixture among Denisovans, presumably from East Asian Homo erectus. Maybe some of the similarity here derives from a common Archaic ancestral population that was (partially) directly ancestral to the Denisovans, but indirectly ancestral to the Africans via intermediaries (or a structured African population, or ILS, as the case may be).

One last question:Which population would have the least "Archaic" admixture? A population isolated from the Neanderthals, Denisovans, and "Archaic Africans", for example a an Ancestral South Indian group or the Andaman Islanders?

Thanks for your calculation. If I understand correctly, the numbers change, though, when using particular panels. When they are built from polymorphisms of a particular population, they will not just pick any "B", but (mostly? often?) those that also have "A" in that population even in the (normally) larger subgroup where "B" is the modern expectation.

So, if pop 2 has a similar legacy or e.g. a Neanderthal admixture similar to that of the panel (but pop 1 does not), then there is a bias towards BABA.

Which population would have the least "Archaic" admixture?

Ted,

I am afraid such a population may have never existed, because who is to say what part of admixture before ~120,000 ya is truly archaic and what part is not? I think you can only answer such a question for a specific region, e.g., Europe and West Asia, or East Asia - because that gives you a reference date.

@Dienekes: In short, I don't believe in your calculation that you need 40 times as much Palaeoafrican admixture to nullify the Denisovan admixture.

After going over this again, I have come to the conclusion that you are right - there wouldn't need to be 40 times admixture, just the same or greater. My thinking down that line was a red herring.

I think it should be possible to create a D-statistic test that avoids the "other" Chimp-like admixture but will still show Denisovan admixture. For instance, I think something like D(Mbuti, Chimp, Neandertal, Denisovan) will count sites where the Mbuti has a Denisovan or Neandertal allele, but skip sites where Mbuti has the Chimp allele. This means the result will not be affected by any pre-Denisovan Chimp-like admixture. It should be non-zero if there's admixture from either or both, unless there's exactly the same amount of both.... or am I missing something?

@Eurologist:When they are built from polymorphisms of a particular population, they will not just pick any "B", but (mostly? often?) those that also have "A" in that population even in the (normally) larger subgroup where "B" is the modern expectation.

The bias comes from the implied monomorphism of the other population's alleles and the intrinsic double-mutation of using 3 levels of ancestry.

Doing D(Pop1, Pop2, Denisovan, Chimp) means that only sites that have mutated away from Chimp in Denisovans will be used - they'll all be xxBA. This means that if Pop1 or Pop2 is more closely related to Denisovan than Chimp, then it will have the Denisovan "B" more often then the original Chimp "A" at those sites. Sapiens diverged from Chimps 6Mya and diverged from Denisovans 1 Mya, so 5/6 of our non-Chimp alleles will be the same as Denisovans even without admixture. Even though "A" is much more prevalant across the genome we'd expect "B" at 83% of the particular sites the D-statistic is looking at.

In a random or non-specific SNP panel this is irrelevent - both Pop1 and Pop2 will have an equal number of a sites that randomly mutated away from the "B" back to the "A". Since there is an equal chance of random non-admixed "BABA" and "ABBA" sites occuring they cancel each other out and we get a zero when there's no admixture. When you use a polymorphic/monomophic panel however, the polymorphic population will contain more random mutations than the monomorphic one. This is because, for the most part, monomorphic alleles are ancestral and polymorphic ones are derived. Since one of the populations has a much greater number of random mutations back to "A", you get a lot more "BABA"s than "ABBA"s (or vice versa if Pop1 is the polymorphic one) and you will get a signal of admixture in the monomorphic population, even though there may be no admixture.

So, if pop 2 has a similar legacy or e.g. a Neanderthal admixture similar to that of the panel (but pop 1 does not), then there is a bias towards BABA.

Exactly, although Neandertal admixture in this instance is not necessarily relevant - Papuans are closer to Eurasians regardless of Neandertal admixture. It's also important that the panel was intentially designed to ensure that the SNPs in it are monomorphic is most of the other populations. Since Africans are the furthest from Papuans genetically, we expect that the panel will contain more Eurasian polymorphic sites than African ones, and will thus show a bias towards African admixture.

For instance, I think something like D(Mbuti, Chimp, Neandertal, Denisovan) will count sites where the Mbuti has a Denisovan or Neandertal allele, but skip sites where Mbuti has the Chimp allele.

In the notes of their dataset, Reich and patterson mention the following:

"5. The sequencing work on Denisova and Vindija was not symmetric and the Vindija calls are of much worse quality. For some important questions it is critical to consider this. For instance we wish to know whether our African samples are symmetric between (Denisova, Neandertal). This would imply that the D-statistic (see qpDstat) D(Chimp, San; Neandertal, Denisova) should be zero, in panel 4 (San ascertainment).We actually observe D = -.05 which corresponds to a Z-score of -5.4. It is unclear whether this is artefactual, perhaps causedby sequencing and alignment biases in the Neandertal samples. Please consider this possibility if such biases may affect your results. "

Dienekes, how about trying the phylogenetic approach which has not been tried too much, except for certain HLA alleles and a few other genes and only with a few SNPs and not full sequences.

Can you select a region which has good coverage in all these samples, and then try to create a full sequenced-based phylogenetic tree, using the chimp, bonobo, Neanderthal, Denisovan, all the Paleoafricans (if possible, including the Hadza), the two Papuans, and the others?

The D-statistic of course works well as an overall survey. Many of these SNPs are tag SNPs for larger segments in full LD. However, I would like to see a phylogenetic position for some of these alleles, not just the 10 regions with deep phylogenies used by Reich et al. in the "Neanderthal" paper.

I think this approach with a comprehensive set of well-chosen alleles can distinguish the following cases, just what we are looking to distinguish:

1. ILS dating before Homo sp. with chimps and bonobos2. "Early hominid" (i.e. H. heidelbergensis) ILS3. Generalized "Eurasian Archaic" Neanderthal+Denisovan admixture, but not specific to either, which may have in fact taken place "in" or out of Africa, the "in" being an adjacent area sometimes isolated and sometimes not from Africa4. Specific late "African Archaic" admixture that is not from an ancestral "structured African population"5. Specific late Neanderthal admixture (even in places like North Africa adjacent to Iberia, or in eastern Central Asia)6. Specific late Denisovan admixture in East Asia

We know from phylogenies that many of the immune system alleles may in fact be ILS that predates Homo sp. or is "Archaic". I have done such partial full sequence based phylogenies from dbMHC for HLA-B using the full sequences from just exons 2&3. (Why didn't they do the entire gene from exons 1-9 and all the introns, given the medical importance of HLA-B and and the pre-Homo ILS HLA-B*27/B*73 allele for ankylosing spondylitis). The strong selective pressure on HLA alleles combined with the pre-Homo ILS redners chr6 HLA region unsuitable for distinguishing early ILS from true archaic admixture events.

On the other hand, many alleles across the genome appear to be monomorphic for each large subclade. These can be used for the relative dating of population splits, which is important, but tell us nothing about either a "structured African population" or archaic admixture "within" Africa (or just outside, but not completely isolated from Africa in all periods) or outside (totally isolated from) Africa.

There are those "just right" set of alleles that we are beginning to identify with the D-statistic which may have clear evidence of the various non-ILS admixture cases above. I think that the best evidence of true archaic admixture of any sort will be a set of parallel phylogenies across the genome, with the approximately the same biogeographical distribution and frequencies. If there was real archaic admixture of any sort, it would not be confined to just a few alleles or only alleles from genes of a certain functional category under high selective pressure like the immune system. It should be possible to filter out derived mutations that fail the neutrality test whether they are protein-coding or outside coding regions. It should not be difficult to use DNAsp to create PHYLIP files from just the polymorphic SNPs derived from VCF files or an ascertaiment panel (or full alignments). The results can be displayed using SplitsTree.

Dienekes, perhaps you can find some test cases that are not in the Reich 10 regions or the X DYS44 region and try this phylogenetic test using those?

Exactly, although Neandertal admixture in this instance is not necessarily relevant...

Tobus,

Not sure I agree with this, especially if you take "Neanderthal admixture" a bit wider to quite possible other early ooA, but yet unknown admixtures, and beyond the rather poor current Neanderthal genome specification.

E.g., if the test panel and #2 populations have 6% common ancient admixture outside of Africa, and half of this is different from Denisovans (as perhaps approximately expected), and both of the above have good representation of these alleles in the sampled groups, then there is (admittedly naively) a 3% bias in BABA. That's probably a worst-case scenario, but not a negligible amount in these number games.

This should not be surprising. Iwo Eleru would be closer to homo erectus, which is in all probability the ancient unknown dna in the Denisovan gene, because they both split off from our ancestral path around the same time...whereas neanderthal and "Denisovan proper" split off much later. And austronesians are closer to subsaharan Africans through purely homo sapien sapient channels: y haplogroup C split off from the "African" line BEFORE haplogroups F, D, and E.

I have been looking into this intriguing phenomena as of late. I have discovered that my Guanche Canarian Caribbean side of the family has a specifically South Asian phenotype (thus our greater risk for diabetes and CAD), plus, I have around 1% Denisovan I inherited from my maternal side (my father's side is primarily English, so it wouldn't have come from his side). So my maternal side may contain as much and probably up to twice as much Denisovan as I have. My speculation is that the Guanche may be the product of a back migration into North Africa from South Asia some time after the Pleistocene. Since the Guanche remained isolated for thousands of years until the Spanish conquest in the late 1400's, the survivors of the Guanche (mainly on the maternal side) may hold clues that may fill the gaps we still have about population movements during these remote time spans.

Old Blog Archive

Dienekes' Anthropology blog is dedicated to human population genetics, physical anthropology, archaeology, and history.

You are free to reuse any of the materials of this blog for non-commercial purposes, as long as you attribute them to Dienekes Pontikos and provide a link to either the individual blog entry or to Dienekes Anthropology Blog.

Feel free to send e-mail to Dienekes Pontikos, or follow @dienekesp on Twitter.