Within a drop of blood, you can find all the information you need to reasonably guess where a person came from, without ever having to look at their face, name or passport. Small variations in our DNA are enough for the task. They can be used to pinpoint someone’s place of origin to a remarkable degree of accuracy, often to within a few hundred kilometres.

The new discovery comes from a team of Swiss and American researchers led by John Novembre at UCLA, who wanted to understand how the human genome varies on a continental scale. To that end, they looked at the genomes of over 1.300 people sampled from almost three dozen countries across Europe. The sample was originally collected by GlaxoSmithKline to hunt out genetic variations that influence the effectiveness of drugs and their side effects, but Novembre’s team put it to use in understanding the links between genes and geography instead.

They analysed at single-letter differences in DNA (“single nucleotide polymorphisms” or SNPs) at about 200,000 places in each of the genomes. They compared this data to each person’s country of origin as well as that of their grandparents if possible.

To work with this massive collection of information, Novembre applied a mathematical technique called principal component analysis (PCA) to transform the unwieldy set of data into a more manageable form. The technique looked for underlying patterns in the massive collection of SNPs and boiled them all down to just two variables, known as principal components. The upshot is that each person could be plotted as a point on a simple two-dimensional graph, whose axes correspond to the two principal components. It collapsed a complicated cloud of data into a simple sheet.

The result was startling – the genetic and geopolitical maps of Europe overlap to a remarkable degree. On the two-dimensional genetic map, you can make out Italy’s boot and the Iberian peninsula where Spain and Portugal sit. The Scandinavian countries appear in the right order and in the south-east, Cyprus sits distinctly off the “coast” of Greece.

Zoom in closer, and the map even reveals distinct genetic cluster within Switzerland based on the language people speak. German-speaking Swiss cluster to the east, Italian speakers to the south and Francophones to the west. Even so, the clusters overlap and in general, the data reveals a genetic continuum between Europeans, where the borders of the genetic map are fuzzier than those of its geographical counterpart. As far as genes are concerned, the closer together two people live, the more similar their DNA is.

There were a few exceptions to the genetic map’s accuracy, with a few countries appearing in odd positions. Slovakia, for example, turns up in the middle of Italy rather than next to the Czech Republic where it belongs. Russia too is further west than its actual position and appears to be hugging Poland (which I find ironically unsettling in the light of recent political events). But Novembre says that both exceptions are probably due to small sample sizes – “Russia” in this case was only represented by six people, and just one poor individual was waving the flag for Slovakia.

Exceptions aside, the overlay between the two maps is startlingly accurate. Using only genetic information, Novembre’s team can place over 90% of people within 700km of their place of origin, and over 50% of people within 310km. The graph below shows the different degrees of accuracy for different countries.

The results have implications for a lot of biomedical research. Many scientists are scanning entire genomes on a hunt for SNPs that affect a person’s risk of diseases like cancer or their reaction to drugs. Novembre says that researchers who are running these “whole-genome studies” need to bear in mind where their sample has come from. Even if a study looks at a small and seemingly related parts of Europe, it would have to adjust for any geographical influences in the genetic variations it uncovers.

This study is just the beginning. At the moment, the analysis is too crude to detect rare genetic variants that are the result of new mutations. These tend to cluster around the place where the mutation first sprang into being, and as such, they can give us more information about the structure of populations on an even finer scale. As more and more genomes are sequenced and statistical methods improve, the genetic map will become clearer and clearer.

After reading some European history (Tony Judt’s “Postwar”), I would guess that this correlation is the work of the ethnic cleansing during (and after) WWII. I doubt that such good correlation would have been seen earlier than 1941.

Within a drop of blood, you can find all the information you need to reasonably guess where a person came from, without ever having to look at their face, name or passport.

I highly doubt that would work for most of the so called new world melting pots. I myself trace my family tree back through Hungarian and Danish ancestry and was born in Brazil. I don’t have any data to back up my hunch but I suspect that the mixing of genes in such countries would encompass many people of African origins, native Brazilian Indians, Europeans of all stripes, Asians and so on and on.

Omer, I take your point about so-called “ethnic cleansing” (or genocide as I prefer to call it). That might affect some parts of Eastern Europe quite markedly. However the proportions of the population involved in Western Europe by and large would have been too small to make much difference; moreover many of those countries have had considerably more immigration since the war than before. Take the UK, for instance. Immigration only really picked up when air travel became reasonably cheap.

Pedantic note :
Genocide = attempting to kill all of an ethnic group
Ethnic cleansing = attempting to remove an ethnic group from an area
All genocide is also ethnic cleansing, not all ethnic cleansing is genocide, and the terms are not synonymous, Tony

I think a topographic map, instead of a geopolitical one, might also have been interesting. The Alps and the Pyrenees, at least, seem to have had a definite isolating effect as regards the Spanish and Italian clusters, and waterways like the Channel the opposite effect.
Ethnic cleansing might also play into it, however, especially regarding PL and DE – I bet you would have found much more genetic overlap there before WW II.
Another interesting overlay might be a map of historically predominant religions, come to think of it.

Woobegone, the methods of removal usually, if not always, amount to genocide. Moreover I refuse to use the term “cleansing” in this context because, obviously, it’s an utterly revolting image. We should not condone dehumanization or let it enter into our language.

This homogeneity has more to do with the simple human behavior of marrying people close by and of the same social status than with violent nationalism. If ethnic slaughter were the cause, then places like England and Spain would show greater heterogeneity. The truth of the matter is that European communities have perennially been rather conservative marriage-wise and European society, at least since the fall of the Roman empire in the West, ridgidly segmented.
It would be interesting to see such a study in the Americas.

Who We Are

Phenomena is a gathering of spirited science writers who take delight in the new, the strange, the beautiful and awe-inspiring details of our world. Phenomena is hosted by National Geographic magazine, which invites you to join the conversation. Follow on Twitter at @natgeoscience.

Ed Yong is an award-winning British science writer. Not Exactly Rocket Science is his hub for talking about the awe-inspiring, beautiful and quirky world of science to as many people as possible.
Follow @edyong209
Subscribe via RSS