February 18, 2010

Complete Khoisan and Bantu genomes

The number of published complete genomes is starting to grow exponentially it seems. Pretty soon, I'm guessing, once most major groups are covered by at least one individual, they will cease to be paper-worthy, and we will move on to the era of full-genome population studies.

One of the published individuals is Desmond Tutu, so I am adding this to the Famous DNA label as well.

From the paper:

In the 117 megabases (Mb) of sequenced exome-containing intervals, the average rate of nucleotide differences between a pair of the Bushmen was 1.2 per kilobase, compared to an average of 1.0 per kilobase differing between a European and Asian individual.

It's striking that two Bushmen are more different from each other than a European and an Asian are. This is in agreement with the idea that Bushmen have a substantial Palaeoafrican genomic component, while Eurasians and most other Africans emerged from a younger population flowering within the species, which I have termed "Afrasian". Eurasians are descended from these (presumably East African "Afrasians"), as are most Africans, but with varying degrees of intermixture with other (non-"Afrasian") groups of humans that lived in the continent from times immemorial at the time of the Out-of-Africa (and "Deeper-into-Africa") expansions.

From the paper, number of SNPs shared:

On the right we see some interesting facts for three individuals (Bushman, KB1; Chinese: YH; European-American: J.C. Venter). 2-way sharing between individuals is roughly in the order of ~500. The Bushman has twice as many "private" variation as the Eurasians, consistent with the ancient basal position in the human family.

From the paper:

The large number of novel SNPs raises concerns regarding the ability of current genotyping arrays to capture effectively the true extent of genetic diversity and haplotype structure represented in southern Africa. Assessing percentage heterozygosity for 1,105,569 autosomal SNPs using current-content Illumina arrays, we were surprised to find lower heterozygosity in KB1 compared to a region-matched European control (Supplementary Data and Supplementary Fig. 3a, b), because it is well known that genetic diversity is highest in Africa. However, analysis of whole-genome sequencing data for KB1 and ABT revealed high percentages of heterozygous SNPs (59% and 60%, respectively), as expected. This discrepancy underscores the inadequacy of current SNP arrays for analysing southern African populations.

This is not very surprising: SNPs used in microarrays have not been discovered in Bushmen, so by testing for heterozygosity in Bushmen using them, you actually underestimate it. By not including Bushmen in the SNP-discovery process you implicitly assume non-variability (hence "no SNP") at sites where Bushmen (but not other humans) are variable.

With whole-genome sequencing we are able to see that Bushmen are indeed highly heterozygous. This should serve as a warning for making too much of patterns of heterozygosity from microarray genotype data, even the latest ~1M ones.

The PCA is also interesting as it points out the familiar differentiation between Europeans-African farmers-Bushmen at the top and Bushmen-West-South Africans at the bottom):

NB: The paper is freely available to non-subscribers, so you can go ahead and read it in full.

Nature doi:10.1038/nature08795

Complete Khoisan and Bantu genomes from southern Africa

Stephan C. Schuster et al.

Abstract

The genetic structure of the indigenous hunter-gatherer peoples of southern Africa, the oldest known lineage of modern human, is important for understanding human diversity. Studies based on mitochondrial1 and small sets of nuclear markers2 have shown that these hunter-gatherers, known as Khoisan, San, or Bushmen, are genetically divergent from other humans1, 3. However, until now, fully sequenced human genomes have been limited to recently diverged populations4, 5, 6, 7, 8. Here we present the complete genome sequences of an indigenous hunter-gatherer from the Kalahari Desert and a Bantu from southern Africa, as well as protein-coding regions from an additional three hunter-gatherers from disparate regions of the Kalahari. We characterize the extent of whole-genome and exome diversity among the five men, reporting 1.3 million novel DNA differences genome-wide, including 13,146 novel amino acid variants. In terms of nucleotide substitutions, the Bushmen seem to be, on average, more different from each other than, for example, a European and an Asian. Observed genomic differences between the hunter-gatherers and others may help to pinpoint genetic adaptations to an agricultural lifestyle. Adding the described variants to current databases will facilitate inclusion of southern Africans in medical research efforts, particularly when family and medical histories can be correlated with genome-wide data.

22 comments:

Amongst many things this paper must suggest, it does point to how misleading Y-chromosone only analysis can be. Looking at the PCA plots, MD8 appears in the Bushman cluster, which is widely separated from the Bantu Niger-Congo cluster where ABT (Desmond Tutu) appears.

However, both MD8 and ABT have E1b1 Y-chromosones.

Perhaps this dichotomy between whole genome analysis and Y-chromozone analysis exists only in African populations?

The paper also puts the tightly clustered homogeneous Europeans into perspective. (Maybe we can stop fighting each other now?)

Perhaps this dichotomy between whole genome analysis and Y-chromozone analysis exists only in African populations?

I don't find that surprising at all, and I am sure it exists almost the same elsewhere (although it cannot exist at the extreme level outside of Africa, of course). Take y-haplogroup I2, for example. Its distant relatives may have separated from R 40,000 years ago, or so. Yet, I am sure if you take an average I2 person from northern/central Europe and compare to his R1b neighbor in the same village, their autosomal DNA will be extremely close - likely closer than any two persons with identical y-haplogroup but separated by just several hundred miles.

That's just the nature of following just one of two parents up the tree, in populations that have sufficient density and are not isolated. After just ten generations (say, 200 to 250 years), the autosomal contribution of that original male is down to 0.1%.

I'll take issue with the wording "the ancient basal position in the human family" in the original post.

The Bushmen are, of course, no more ancient than my father. They are alive today. There is strong genetic evidence that the Khoisans have a more ancient common ancestor with Eurasian and other African populations, but this makes neither side of the split itself ancient.

Why fuss of the language? Because describing a people as "ancient" fosters the false inference that they have a stronger genetic similarity with the ancient common ancestor than those on the other side of the split.

This very likely isn't true. If anything, the reverse is likely. Bushmen have faced very heavy selective pressure in the 120,000-160,000 of hunter-gatherer living that have elapsed from the time of that common ancestor which is apparent from even a cursory read of these first four genomes; thse selective pressure that many very well have changed Bushmen more relative to the ancestor than other humans, who may have had environmental conditions more similar to the common ancestor for much of that time (at least from the time of the split until the Out of Africa to the Near East event during which many lived in Ethiopia, and probably until the colonization of Europe and Asia and Australia by modern humans) and also in light of the fact that the effective population size of the Bushmen has probably been smaller than the other side of the split from that common ancestor for much of that time period (from ca. 60,000 years ago until about 13,000 years ago as a result of a larger territory, and afterwards as a result of higher population density).

This isn't to deny that Bushmen provide extremely valuable information on the human genetic legacy that isn't available from any other population. It does shed light on ancient humans that wouldn't be available elsewhere. But, it would be a mistake to fail to recognize how indirect this insight would be.

An ideal approach to determing the genome of the earliest modern humans would take a sophisticated version of a Venn diagram to compare Bushman genomes (one branch of the first major split), Neanderthal genomes (the presumed source gene pool before the first major split) and a proto-other human genome inferred analytically (rather than by crude averaging) like a proto-language from modern human genetics in light of the phylogenic relationship of the genome donors (the presumed other branch of the first major split), and would give each approproximately equal weight.

A Baysean approach that makes some presumptions about the end distribution drawn from already available data, rather than a purely "neutral" statistical analysis is also likely to produce a more accurate view of our deepest common roots.

Very interesting about the I2 and their R1b neighbours. Has anybody done an autosomal analysis?

It will be interesting as more of these papers on full autosomal analysis come out. What will be really confusing will be to compare Y against autosomal and mtDNA against autosomal.

Andrew, I'd agree with your beef about "ancient basal position" except for one thing. The Koisan seem to show tremendous diversity, more than anyone else. It is this diversity which has lead the authors to infer an ancient basal position. No?

True, the Koisan are no more ancient that your father. Thank you for saying this. I think people often might think of Africans as more "ancient," forgetting that Africans have been subjected to just as much natural selection as any other population. Maybe more.

I guess you guys saw the film "The God's Must be Crazy." It's been many years since I saw that film, but I remember thinking at the time how funny and ironic it was.The joke was on us.

Why fuss of the language? Because describing a people as "ancient" fosters the false inference that they have a stronger genetic similarity with the ancient common ancestor than those on the other side of the split.

That is not true. Bushmen are more ancient and basal in the sense that they split off from the rest of humanity (or the rest of humanity split from them, it's symmetrical) at an earlier date and closer to the root of the tree.

That does not mean that they are more similar to the common ancestor; indeed this paper shows, by comparison to the chimp genome, that the great majority of new "Bushman" SNPs are not ancestral.

...this paper shows, by comparison to the chimp genome, that the great majority of new "Bushman" SNPs are not ancestral.

Yes, and this and the generally large separation from other Africans demonstrated here (almost comparable to Europeans and Asians) seem to support the idea that East-Africans separated from SW-Africans long before migration out of Africa, and that SW-Africans where essentially isolated until historic (or agriculturalist) times.

"That does not mean that they are more similar to the common ancestor; indeed this paper shows, by comparison to the chimp genome, that the great majority of new "Bushman" SNPs are not ancestral."

Well, that kinda makes me think that Bushmen are not the first population to branch off. The paper also attributes Y-DNA E lineages in Africa to a back migration from Asia. This means a vast group of African populations migrated into Africa. I just wonder if Bushmen also came from some place in Asia, went through a bottleneck as they traversed the vast Eurasian terrains, lost 94% of ancestral SNPs and then re-expanded within Africa. Why don't we find any of the earliest African-specific lineages (say, Y-DNA A and B) outside of Africa? They should have been much more widely spread as they had more time to expand.

"Why don't we find any of the earliest African-specific lineages (say, Y-DNA A and B) outside of Africa? They should have been much more widely spread as they had more time to expand."

Perhaps because not all populations are nomadic. Some populations seem to stay put for millenia, will others seem to have crossed and recrossed entire continents in time scales of about a thousand years.

Africa also has been populated the longest. Populations there experience tremendous population pressure in a way that Eurasians do not. Their strategies to deal with this pressure include migration, but not to the same extent as Eurasian populations. African populations and cultures are also much more adapted to deal with large scale deaths due to disease and warfare.

As an example, I would use the Ashanti in Ghana. (Sorry, for this reference, which is not related to the Bushmen, but I might as well talk about something I know about.)

In Ashanti culture, most property and many other things, are inherited through the mother. The society is also somewhat polygamous in that some men have two wives, often sisters. Both customs are believed to be adaptations to promote cultural survival when the Ashanti were in a constant state of war. This allowed the culture to survive when many men did not.

So, getting back to your question, I would say that due to tremendous population pressure, as well as greater incidence of tropical disease, there must be a great number of missing pieces in the genetic picture of our African ancestors.

"The paper also attributes Y-DNA E lineages in Africa to a back migration from Asia. This means a vast group of African populations migrated into Africa."

Hmmm. Beyond the R1b back migration, that's new. It would be nice to read more on this.

"Hmmm. Beyond the R1b back migration, that's new. It would be nice to read more on this."

The supplemental material (table 1, p. 9) for this paper says just that. Chiaroni et al. 2009 "Y chromosome diversity, human expansion, drift, and cultural evolution" mentions this is a possibility as well. The background is that DE/E is nested within the otherwise non-African CFDE clade.

"The paper also attributes Y-DNA E lineages in Africa to a back migration from Asia".

I have long assumed that to be the case because, 'DE/E is nested within the otherwise non-African CFDE clade'. Backmigration easily explains its distribution. On the other hand I don't accept at all the Y-haps A and B are immigrants to Africa, unless it be long, long ago. Way before 'modern' humans developed. Of course that all means that just a single Y-hap made it out of Africa, or at least just one surviving Y-hap.

"On the other hand I don't accept at all the Y-haps A and B are immigrants to Africa, unless it be long, long ago. Way before 'modern' humans developed. Of course that all means that just a single Y-hap made it out of Africa, or at least just one surviving Y-hap."

Going along these lines, I would consider the possibility that it's outside of Africa that some ancient haplotypes didn't survive. Such as the one(s) found in Australian Mungo Man. They didn't survive for a simple reason that population size (effective and demographic) were lower in Asia, Australasia and America than in Africa and Europe. Older lineages drifted out outside of Africa but survived in Africa. Places of survival aren't necessarily places of origin.

Actually Desmond Tutu does look a little like those Bantu folk in the face...but the article I read elsewhere makes it sound like somewhat of a condemnation for some unknown reason.Maybe they're jealous he is a priest,because he's not the only African from the jungle that has a high position of regard-many Africans come from the same root,yet they live different lifestyles...and they all live within the same country-some live on the plains,some are dictators,some are warriors,and so on many wear suits and have fancy wealth-esp. the dictators.Africans hold many different roles and live varied lifestyles.

BTW, Terry, when you say "On the other hand I don't accept at all the Y-haps A and B are immigrants to Africa, unless it be long, long ago," have you considered the possibility that mutation rate was simply higher for these lineages?

I'm getting to a comment you made further up about this paper inferring that the Y E1b1a lineage represents a back migration into Africa.

I read through the supplemental material. It states: "E1b1a is an African lineage believed to have expanded from northern African to sub-Saharan and equatorial Africa with the Bantu agricultural expansion."

A Northern Africa to Southern Africa Back migration is not a back migration into Africa.

yes, and right below it, in the next sentence, it says "E1b1a is an African lineage believed to have expanded from northern African to sub-Saharan and equatorial Africa with the Bantu agricultural expansion."

So when it says "emerged via back migration into Africa" I believe they mean "into sub-Saharan Africa."

The West African (Bantu) stories of origin speak of migrating from the North. However, it is pretty clear that West Africans are genetically speaking, very African.(except for a small number of cases of R1b.)

I don't think you've made a case for a significant back migration into Africa. Africans originated in Africa. All non-Africans migrated out of Africa sometime in the distant past.

As far as I am concerned, our African progenitors are more than worthy ancestors.

I didn't try to make a case for a back migration into Africa on the basis of the distribution of the DE clade. Other scholars did it more than 10 years. See "Out of Africa and Back Again: Nested Cladistic Analysis of Human Y Chromosome Variation," by M. F. Hammer et al. Mol. Biol. Evol. 15(4):427–441. 1998. It takes some time for these ideas to sink in.

"have you considered the possibility that mutation rate was simply higher for these lineages?"

Once you start considering such possibilities, along with incredible levels of drift, it's possible to justify any theory you might care to.

"So when it says 'emerged via back migration into Africa' I believe they mean 'into sub-Saharan Africa'".

The authors base the comment on Y-hap E being a member of CDEF*, most downstream mutations of which are outsdide Africa. Haplogroups are not an indicator of 'race' and the Bantu migration is long after the formation of Y-hap E. So E could weel be an 'outside Africa' haplogroup originally.

"Once you start considering such possibilities, along with incredible levels of drift, it's possible to justify any theory you might care to."

If you don't consider levels of drift in small populations and lineage-based mutation rate you can justify any theory just as easily. Genetics is a science with lots of unknowns. Only a cross-disciplinary check can verify which scenario is most likely.

Old Blog Archive

Dienekes' Anthropology blog is dedicated to human population genetics, physical anthropology, archaeology, and history.

You are free to reuse any of the materials of this blog for non-commercial purposes, as long as you attribute them to Dienekes Pontikos and provide a link to either the individual blog entry or to Dienekes Anthropology Blog.

Feel free to send e-mail to Dienekes Pontikos, or follow @dienekesp on Twitter.