February 23, 2008

Huge paper on human genetic relationships based on 650K SNPs

If you thought that this week's Naturepaper on human variation was nice, another new paper in Science will be another pleasant surprise. It seems that every time I turn my head geneticists are raising the number of SNPs they study. Lots of populations, 650K SNPs, and a treasure trove of new insight into where humans come from and how we differ from each other.

Before I get into the details of the paper, I want to reiterate my conviction that the problem of human origins is not really a hard one. It just requires a lot of data, a lot of populations, individuals sampled, a lot of genetic markers. Our history, our race, and now it seems even our ethnicity can be read off our genes. We just need to invest the money and effort to find out. With that said, it is sad that the same-old roster of populations from the CEPH panel makes yet another appearance.

There is really a lot on the paper that might interest you, but I will note a few points. First, look at the following PC plot from the paper.

No, your eyes aren't deceiving you. This paper is proof positive that European ethnicities can be distinguished from each other genetically. Even close-by populations (in this case the French and the Italians) are neatly separated. When geographic distance increases there isn't even a hint of confusion: e.g., Russians, Orcadians, and Basque are neatly and clearly separated from other groups. Doubtlessly there would be some more overlap if more individuals/population were used, but the thrust of the discovery is intact: it seems that several European ethnicities and local populations make sense not only culturally but also biologically.

Now, take a look at the standard STRUCTURE analysis which provides meaningful results up to K=7. The standard Sub-Saharan, Native American, East Asian, and Oceanian clusters are there, but now there is meaningful structure within the Caucasoids as well; they are broken into "Middle Eastern", "European", and "Central South Asian" groups.

I would guess that the "brown" Middle-Eastern cluster is largely an Arab/Semitic phenomenon, although the inclusion of the Berber Mozabites is interesting. If it reflected a pre-historic phenomenon, then it would be difficult to explain its apparent total lack of influence in Europe, except for a barely perceptible spillage into Tuscany, which once again reiterates the idiosyncratic "Middle Eastern" trace of that Italian population of likely Etruscan descendants.

The "Central South Asian" group is also extremely interesting, for several reasons. First, it reinforces the previous claim that the Kalash, rather than Greek descendants, as some romantics would have them, are simply a non-European native population, with no evidence of European ancestry. Second, it shows that there is minimal, yet evident European influence in Central Asia, which I would relate to the eastern Indo-Iranians. Third, Central Asian influence in Europe is non-evident, except for a trace among the Russians (and substantially more among the Adygei a people of the Caucasus). Fourth the minority Mongoloid and "Boreal" (Purple) influence in Russians is affirmed. Note that we are dealing with ethnic Russians from the north (Vologda oblast), and Russians are a heterogeneous people in terms of their origin.

We can only wish for inclusion of further populations in future studies of the kind. In particular, eastern and southeastern Europe and non-Arab West Asia, Siberia would be invaluable in further understanding Caucasoid origins, and perhaps uncovering additional structure.

Note that the CEPH panel has revealed that ethnic groups can be distinguished genetically, there is little more than it can offer in terms of understanding origins. The next milestone would be either to include previously unsampled populations, or to dig into ethnic groups themselves, and see if sub-ethnic entities are also discernible in our genomes. Genetic genealogists are in for a good time in the coming years...

Human genetic diversity is shaped by both demographic and biological factors and has fundamental implications for understanding the genetic basis of diseases. We studied 938 unrelated individuals from 51 populations of the Human Genome Diversity Panel at 650,000 common single-nucleotide polymorphism loci. Individual ancestry and population substructure were detectable with very high resolution. The relationship between haplotype heterozygosity and geography was consistent with the hypothesis of a serial founder effect with a single origin in sub-Saharan Africa. In addition, we observed a pattern of ancestral allele frequency distributions that reflects variation in population dynamics among geographic regions. This data set allows the most comprehensive characterization to date of human genetic variation.

Old Blog Archive

Dienekes' Anthropology blog is dedicated to human population genetics, physical anthropology, archaeology, and history.

You are free to reuse any of the materials of this blog for non-commercial purposes, as long as you attribute them to Dienekes Pontikos and provide a link to either the individual blog entry or to Dienekes Anthropology Blog.

Feel free to send e-mail to Dienekes Pontikos, or follow @dienekesp on Twitter.