December 01, 2010

Human genetic variation: the first 50 dimensions

Here is a huge data dump for anyone interested in human variation. Part of the reason I started the Dodecad Project was to be able to analyze data on my own, rather than having to squint to make sense of a plot, to speculate about what might show up at higher dimensions, or with more clusters, to wonder how the inclusion of additional populations would affect the results, and so on.

The following dataset represents the culmination (so far), of my efforts.

In the RAR file (~11MB) you will find 49 scatterplots (5000x5000 pixels each) representing the first 50 dimensions of a multi-dimensional scaling analysis of this dataset, together with information about the samples and their sources. There is a plot of the 1st and 2nd dimensions, 2nd and 3rd, 3rd and 4th, and so on, until the 49th and 50th.

I don't believe Picasa allows such huge pics, so I've made a few smaller (still 1600x1600 pixels each) ones to give you an idea of what to expect. Note that the legend in these small ones is partly visible.

In all plots, population labels have been placed on the population averages; this usually correspond to blobs of datapoints belonging to that population, but occasionally they are shifted due to the presence of outliers.

Before I proceed, it might be worth to give a visual representation of the three poles of human variation in its broadest context; these are Basques/Sardinians, Mbuti/Biaka Pygmies, and She. Well, these are marginally more toward the three poles than many others, but they will do:

Inspection of these plots gives you an idea of why Clusters Galore works so well. It can detect "clusteredness" of individuals along multiple dimensions. It does not look at a series of 2D plots, but it considers proximity of individuals to each other along multiple dimensions, and adapts to the shape, size, and orientation of the clusters.

Why are Australoids (Papuans etc) closer to Africa than East Asians and Amerindians are in the first two dimensions? Don't they possess one of the greatest Fst distances to Africa, even more so than some East Asians?

Why are Australoids (Papuans etc) closer to Africa than East Asians and Amerindians are in the first two dimensions? Don't they possess one of the greatest Fst distances to Africa, even more so than some East Asians?

I don't know how far they are from Africa off the top of my head, but you must remember that you need ALL dimensions to recreate the distance matrix.

If A is closer to B than to C in one 2D projection, you CANNOT conclude that A is closer to B than to C using the full distance matrix.

"Why are Australoids (Papuans etc) closer to Africa than East Asians and Amerindians are in the first two dimensions?"

This is my interpretation. The first dimension (y, height) is a measure of African vs European. By this measure Australoids are more closely related to Europeans (or more likely the common ancestors in the Middle East/India) than Africans.

The second dimension is East Asia (x). So movement to the right is a measure of Asian-ness.

But these are modern population and things have changed over the last 200,000 years. Europeans have become increasingly European and Asians have become increasingly Asian (and Africans increasingly African).

At the time the East Asian and Papuans spit off from the common population it would have sat at about where the Ethiopians are now. This is why they lie at the same height. These populations seem to be the earliest to split away.

Other populations may have spit off later (greater height) as the Europeans continued to deviate. This is why there are a series of lines fanning out from the Europeans roughly towards East Asia.

To complicate matters however there are a number of overlaid ad-mixed populations. This is best illustrated by the African Americans who lie distributed along the line that connects the modern Africans to the modern Europeans. Other admixed populations also form the spokes that fan out roughly between Europe and Asia. It appears that there is more than one spoke either because there are several East Asian populations in connection with the Europeans, or that these represent connections at different times in history. So the Australoid "spoke represents genetic tendency towards Asian-ness perhaps earlier in history (lower down so when the Europeans were less deviant). The Amerindian "spoke" is higher representing flow when the European population was more deviant (more recent, higher).

So basically the Australoids are actually slightly less African than the Chinese and just appear closer because they are less Asian than the Chinese/Pima/Maya. Also Maya and Pima are less African than the Chinese and Papuans but more African than the Europeans, by this measure.

There are no spokes connecting Asia and Africa because there was little historical connection.

I'm curious to know the precise geographical origin of the Egyptian sample. Is it from all over the country or just one part (north, south, Cairo, etc.)? I can't seem to find the answer in the Behar article or the GEO database.

Thanks but I already know how to open it up. My problem is that everything becomes blurry. I have had no problem viewing all the other plots on this site. Its only this one I have a problem with. Dienekes could you help me out.

Ok, the ones I was looking at must be the 1600x1600 you were talking about. The problem is I can't download the rar on my phone or download the supporting browsers, such as Firefox. Is there anyway to just relabel the populations for the small lumber of pixels or something else?

Old Blog Archive

Dienekes' Anthropology blog is dedicated to human population genetics, physical anthropology, archaeology, and history.

You are free to reuse any of the materials of this blog for non-commercial purposes, as long as you attribute them to Dienekes Pontikos and provide a link to either the individual blog entry or to Dienekes Anthropology Blog.

Feel free to send e-mail to Dienekes Pontikos, or follow @dienekesp on Twitter.