A guide to the science and pseudoscience of A Troublesome Inheritance, part I: The genetics of human populations

This is the first in a series of guest posts in which Chris Smith will examine the evolutionary claims made in Nicholas Wade’s book A Troublesome Inheritance. Chris is an Associate Professor of Evolutionary Ecology at Willamette University. He uses population genetic approaches to understand coevolution of plants and insects, and he teaches the interdisciplinary course “Race, Racism, and Human Genetics” with Emily Drew.

A Troublesome Inheritance was published in 2014 by Penguin Books. Cover image via Google Books.

Last month the former New York Times writer Nicholas Wade released his latest book on human evolution, A Troublesome Inheritance: Genes, Race, and Human History (2014, Penguin Press). In it, Wade argues that the genomic data amassed over the past ten years reveal real and meaningful biological differences between races, and that these differences explain much of the cultural and socioeconomic differences between people. If you haven’t read a newspaper or picked up a magazine in the last month, you may not have noticed that Wade’s book has—predictably—prompted intense and impassioned reaction from scientists, sociologists, and commentators from across the political spectrum. Writing for the Wall Street Journal, Charles A. Murray, author of The Bell Curve, called Wade’s book, “A delight to read … [that] could be the textbook for a semester’s college course on human evolution.” On the other hand, Arthur Allen, in his review for the New York Times, predicts that many readers will find Wade’s book to be, “a rather unconvincing attempt to promote the science of racial difference.”

Writers with considerably more gravitas than I have already pointed out that Wade seems to have a rather poor handle on the literature he reviews. Mike Eisen, professor of Molecular Biology in the Howard Hughes Medical Institute (HHMI) at University of California at Berkeley, writes that, “the book is riddled with scientific and logical flaws” and Wade’s “representation of modern genetics is simplistic and selective.” Likewise, Allen Orr, former president of the Society for the Study of Evolution, in his essay for the New York Review of Books warns that, “[Wade] is not the surest guide to a technical literature.” This is an understatement, to say the least. Indeed, many of Wade’s claims represent significant misunderstandings or misinterpretations of the literature.

Here, I offer the first in what I hope to be a series of posts examining Wade’s scientific claims, with a particular focus on his arguments about evolution and human genetics. I aim to review these in greater detail than has already been done elsewhere, but in terms that are still accessible to a general audience. I will not deal here with Wade’s arguments about the history of Western civilization and the relative contributions of economics and culture to the ascendancy of the West, which are topics that are well outside of my expertise.

Wade begins with the premise that recent population genetic studies reveal that human evolution has been “recent, copious, and regional.” On these very general points, I have no disagreement. The idea that all changes in allele frequencies ceased with the invention of agriculture is a notion that no one—apart from some of my introductory biology students—takes at all seriously. Likewise, it is inarguable that human populations vary genetically. As the evolutionary geneticist Richard Lewontin put it, “you don’t need a population geneticist to tell you that.”

However, starting from these uncontroversial (and frankly, rather banal) premises, Wade goes on to draw all manner of dramatic conclusions. Among other things, Wade works out that modern genetics confirm the existence of “three primary races”, that Europeans were genetically preprogrammed to become the world’s dominant culture, that African Americans may have evolved through natural selection to be inherently violent and socially deviant, and that Jews are genetically predisposed to careers in banking. (Seriously. You can’t make this stuff up!) Needless to say, the available evidence does not support Wade’s grandiose conclusions, many of which are directly contradicted by the very work he cites in support of his arguments. Over the next several weeks I will review several of Wade’s major claims, and evaluate what—if anything—the available data say about them.

Does modern genetics confirm the existence of human races?

Humans as a whole are unusual among primates in that we are remarkably genetically similar to one another (Gagneux et al. 1999). Needless to say, however, humans are not all genetically identical, and variation among humans is not distributed randomly. Rather, as is true of most mammals, genetic variation has a measurable geographic pattern in that people that live near each other tend to more genetically similar to one another. Within humans that pattern of geographic variation (or geographic structure) also bears a very strong mark of human history. Humans originated in Africa, and began to disperse into the rest of the world about 50,000 years ago, moving first into the Middle East, then into Europe, Asia, and finally into Oceania and the Americas. As a result, most of human genetic diversity is found within Africa, the source population.

Human populations outside of Africa show progressively less and less variation as one moves further from Africa (Wang et al. 2007); as humans colonized each part of the globe in turn, each group of colonists carried with them only a subset of the genetic variation found in its source population. The combined effects of geographic structure and the history of humanity’s spread from our African homeland means that, for some genes, particular variants (alleles) are more common in some parts of the world than in others. So, given a sample of DNA from an individual, by looking at which genetic variants that individual carries at many, many genes we can estimate from where in the world that individual originated, often with a stunning degree of precision (Novembre et al. 2008). In addition, work by Noah Rosenberg and colleagues showed that when you use statistical tools (for example, the program STRUCTURE), to group individuals together into a pre-determined number of evolutionarily ideal populations, these populations largely correspond to continents of origin—but with some important exceptions, as we will see (Rosenberg et al. 2002; Rosenberg et al. 2005).

A measure of genetic variation (expected heterozygosity) contained within human populations located progressively further from east Africa, where modern humans originated. Image is from Wang et al. (2007), figure 2A.

In summarizing these facts about human genetic variation, Wade is largely on the mark. However, he misses several important points, both of which have major implications for Wade’s conclusions. The first of these, as has been pointed out by Jennifer Raff on her blog, Violent Metaphors, and as Jeremy Yoder explains in great technical detail at The Molecular Ecologist, STRUCTURE (the software used to cluster individuals into populations) does not, on its own, identify how many clusters actually exist. Rather, the investigator defines the number of populations in advance, and STRUCTURE then clusters the individuals accordingly[1], trying to find the statistically ‘best’ arrangement of individuals.

So, for example, a scientist might obtain a sample of genetic data from people living in each of several villages in the Alps, including some villages in Germany, and some in Switzerland. She would then feed these data into STRUCTURE. STRUCTURE will then ask for directions about how the data should be analyzed, including how many clusters it should use when grouping the individuals. In this case, since samples were taken from each of two countries, the scientist might tell STRUCTURE to assign the people into two populations. STRUCTURE will then assign each individual into a particular population, trying to create populations that—based on the frequency of genotypes within each resulting cluster—appear to be freely interbreeding.

Depending on how STRUCTURE organizes the individuals into these clusters, potentially interesting conclusions could be drawn. For example, we might find that the two clusters correspond to political boundaries—with people from villages in each country clustering together. Alternatively, the results might show that people from different villages that speak the same language cluster together, with the French and German speakers each forming separate groups, suggesting that language is more important than geography in determining who mates with whom. However, if the scientist had chosen to group the people into three clusters, instead of two, a different result might have emerged. For example, she might have found that both geography and language matter, with all the French speakers from Switzerland forming one cluster, the German-speaking Swiss another, and the people living in Germany forming a third.

Importantly, how many clusters are identified is a decision made by the scientist, not something that STRUCTURE determines. So, figuring out how many populations actually exist requires that we use some other criteria. At the time that Rosenberg completed their initial analyses, appropriate statistical tools for identifying the “optimal” number of clusters had not been developed [2]. Rosenberg’s group did, however, evaluate how the number of clusters chosen in advance affected “clusteredness” (the extent to which each individual is identified as belonging to one population, as opposed to having ancestry in multiple populations). They found that the highest levels of clusteredness were reached when STRUCTURE was asked to group individuals into 5 or 6 clusters (both of these produced similar levels of clusteredness when all individuals and all the genetic data were included) (Rosenberg et al. 2005).

The second important point that Wade seems to miss is that these idealized populations (or population clusters) do not correspond to any conventional racial classifications. Although Wade, conveniently, never explicitly defines what he actually means by race, he repeatedly makes the claim that modern genetics identifies “three primary races,” which he identifies as Africans, East Asians, and Europeans. These groupings correspond to the ‘negroid’, ‘mongoloid’, and ‘caucasoid’ races described by classical physical anthropology. The trouble is that none of the contemporary studies of human genetic variation actually find this.

Although Rosenberg and colleagues’ work showed that for five clusters the resulting groups correspond reasonably well to continent of origin—Africa, Europe, Asia, Oceania and the Americas (Wade manages to fold this into his ‘three primary races’ narrative by calling the Oceania and American groups as ‘minor continental races’), subsequent work by Sarah Tishkoff, which used a statistical criterion to identify the ‘best’ number of population clusters, identified 14 groups, nine of which were contained entirely within Africa (Tishkoff et al. 2009). That is, if we allow the data to identify human ‘races’ without guidance, we find 14, not Wade’s “three primary races”.

Results of a STRUCTURE analysis assuming 14 clusters. Each thin bar in the diagram represents a single individual, and each color represents a different cluster. The fraction of each thin line (i.e., each individual) of a particular color represents the proportion of that individual’s genome derived from each cluster. For example, among the individuals from Nigeria (shown in the first row, under “Western Africa,” each bar is almost entirely orange, indicating that for most of these individual, almost 100% of their genome is derived from the Central African cluster. In contrast, among the Parsi people (shown in the third row, in the left-most section of “India”) almost all individuals have equal parts pink and blue, indicating that roughly equal portions of their genomes are derived from the European and South Asian clusters. Image from Tishkoff et al. (2009), figure 4.

Tishkoff’s finding that there is marked genetic variation within races[3] is also reflected in recent work by Andres Moreno-Estrada, a postdoctoral scholar at the Stanford Center for Computational, Evolutionary and Human Genomics (CEHG). Moreno-Estrada and colleagues examined genetic variation within and between indigenous cultures in Mexico. Their work revealed dramatic genetic differentiation and substructure between populations. Indeed, the genetic divergence between Seri people from Northwestern Mexico and the Lacandon people from southeastern Mexico were greater than the genetic differentiation[4] between Europeans and Asians (Moreno-Estrada et al. 2014)

A Seri woman from Punta Cheueca, Sonora, Mexico. The Seri are as genetically distinct from other Native American people in Mexico as Europeans are from Asians (Moreno-Estrada et al. 2014). Image is by Tomás Castelazo, via Wikimedia Commons.

Third, the conflict between the genetic data and traditional racial classifications are not just in the number of clusters; they also disagree about which groups belong to which races. For example, the indigenous peoples of New Guinea, Australia, and Tasmanian were classified under traditional racial groupings as ‘negroid’—indicating a close affinity with Africans (Morton 1839). However, the genomic data indicate that these people are genetically quite distinct from Africans, and are most similar to people from southeast Asia (as geography would predict) (Tishkoff et al. 2009). Likewise, people from south-Asia (including people from the Indian subcontinent) have traditionally been classified as Caucasian, but genetic analyses consistently indicate that most of the peoples of South Asian are genetically admixed (that is, they are not clearly assigned as either European or east Asian) (Rosenberg et al. 2005), and some ethnic groups (for example, the Kalash people) appear as their own genetically distinct cluster (Rosenberg et al. 2005).

A last but important point is that the picture of human genetics that Wade paints (as well as that painted by many of the early population genetic studies of humans) ignores the enormous fraction of the world’s population that is of mixed ancestry (or, in the language of the U.S. Census, that belong to ‘two or more races’). That is, research done by Rosenberg and by others (both before and since Rosenberg’s 2002 study), relied on samples from the Human Genomic Diversity Cell Line Panel (Rosenberg et al. 2002), which intentionally focused solely on ‘non-admixed’ populations—people who were not of mixed ancestry (Cavalli-Sforza et al. 1991). The focus on non-admixed peoples was understandable given the aims these studies, which were largely to reconstruct the impacts of human prehistory on contemporary genetic diversity. However, the result is that much of the early work on human genetic variation conformed to what one eminent human population geneticist called “Nineteenth-century notions about race.”

A more realistic portrayal of human genetic variation must take account of the fact that, due to migration that has taken place since the advent of trans-oceanic sea travel, a sizeable fraction of the world’s population today is of mixed ancestry. For example, within Mexico, the average person has fifty-percent Native American ancestry (Bryc et al. 2010), and across the Caribbean most people are of mixed ancestry, with many people having a combination of European, Native American, and African roots (Moreno-Estrada et al. 2013). Likewise, studies of genetic variation within African Americans show that people of African descent living in the United States can also trace their ancestry to European sources, with between ten and fifteen percent of their genomes being of European origin (Tishkoff et al. 2009). Last, as described above, south Asia is home to 1.6 billion people, the vast majority of which have both Asian and European roots. So, although Wade continually repeats his mantra that human evolution has been “recent, copious, and regional,” he conveniently overlooks the recent human evolution driven by gene flow, which has largely acted to homogenize human populations and reduce the (already minimal) genetic differences that existed between human populations up until the fifteenth century.

So, in summary, we find that although human populations exhibit geographic variation, this is not the same thing as ‘race.’ Genetic data do indicate that human populations are genetically variable (unsurprisingly), and that populations that are geographically nearby each other are more similar to each other than populations of people that live on opposite sides of the world (also unsurprising). However, to call these populations “races” would require using a very different definition of race than that used by physical anthropologists, sociologists, or indeed than what most of us mean when we use the word ‘race’ in our everyday speech. Indeed, the genetic groupings found in modern genomic studies largely contradict traditional notions of race, and we see that a large fraction of the world’s populace is of mixed ancestry, blurring the lines between traditionally recognized “races”.

In my next post, I will look at Wade’s claim that natural selection has favored violent and impulsive behaviors in some human populations.

Thanks are due to Sarah Tishkoff for discussion of her 2009 Science paper, and to Emily Drew, Malia Santos, Furey Stirrat, and Jeremy Yoder for comments on early drafts of this work.

[1] STRUCTURE groups individuals into a predetermined number of clusters ‘K’, in such a way that the distribution of genetic variation (the genotype frequencies) within each cluster makes it appear that mating is occurring at random. That is, the software seeks to arrange individuals into groups in a way that minimizes the overall departures from Hardy-Weinberg equilibrium and linkage equilibrium.

[2] Subsequently a statistical approach has been suggested (Evanno et al. 2005), which selects the optimal number of clusters based on the rate at which the probability of observing the data, given the number of clusters posited, increases as more clusters are proposed. To my knowledge this approach has not been used to identify the optimal number of clusters in the Rosenberg dataset.

[3] The difference between genetic variation between races versus genetic variation within races is an important distinction (one that Wade entirely misses), which I will take up in a future post.

[4] Note that ‘genetic differentiation’ is not the same thing as genetic variation. Overall, Native American populations harbor far less genetic variation than the people of any other continent, having traveled further from Africa than any other group. The Moreno-Estrada result refers to a statistical measure of genetic variation called Wright’s FST, which measures the extent of genetic exchange between two populations, or, more precisely, the degree to which the distribution of genetic variation differs from what we would expect if the people living in each population were as likely to mate with someone from the other population as with someone from their own population.

8 comments on “A guide to the science and pseudoscience of A Troublesome Inheritance, part I: The genetics of human populations”

Fantastic description. I’ve just started reading the book, inspired partly by the outrage it has generated. I had assumed that the first part, where he describes the genetics, would be okay, and it would get dodgy only when he got into the more “speculative” domain. But no, it seems that the disingenuousness starts on page one. Looking forward to the rest of the series.

Hi good post. Some details:(1) Its Noel Rosenberg. (2) I do not think that the ancestral southern indian population is any more related to east asians than to western eurasians, and nor are melanesians, which makes shoehorning variation into these 3 groups particularly difficult. (3) the structure program did give a way of estimating K but it was not particularly good, in any case K is not terribly meaningful since it is dataset dependent amongst other things. (4) FWIW Rosenberg 2002 identified 6 clusters in the global analysis, one of which was the Kalash. Kalash are very drifted but allowing for that definitely cluster with local populations, but with some admixture (5) Structure and other methods are good at modern but not ancient admixture, of which there is plenty hiding. (6) All of that said, blumenbach was correct to notice a similarity between the western eurasian populations, which is the basis of “caucasian”. The relative genetic similarity of these groups also provides good evidence for a common source of ancestry for these groups. But there has been lots of admixture with very distinct sources too and this helps to understand the great variation amongst definitions about who was considered caucasian and who wasn`t. Wade essentially overinterprets this physical evidence in the same way that he over-interprets the genetic evidence but my impression is that are fairly concordant, especially if skin colour is not given too much weight (an issue the early physical anthropologists varied on in predictable fashion, depending on how evidence and logic oriented they were..).

Thanks to Daniel Falush for very thoughtful comments, and to Both Daniel and Jon for correcting my embarrassing mistake about Rosenberg’s first name.

To Daniel, I agree that the ‘three races’ model is a particularly poor representation of the genetic data, but for some reason Wade seems particularly enamored of this scheme.

Also, I hope to deal with morphological data in a future post, although that literature is a bit farther afield from primary area of expertise. That said, my understanding is that Howell’s classical cranial morphology data group Australians together with Africans – consistent with the traditional racial definitions but in conflict with the genetic data. So, there may be places of both agreement and disagreement between the genetics and the morphology.

A minor point. I think that STRUCTURE really hides the hierarchical nature of variation when it exists. Daniel Falush’s comment about the highly drifted Kalash actually grouping with other local populations really highlights this. It seems possible that an accurate portrayal of that hierarchy (e.g. by estimating a phylogenetic network) might give slightly more credence to popular conceptions of race, although I have to admit that very high divergence among subpopulations resulting from rapid drift and selection (nevermind admixture), even if it were nested in a hierarchy broadly congruent with those conceptions, certainly argues against making biological generalizations based on them.

[…] Orr, Stretch Genes *Rob DeSalle & Ian Tattersall, Mr. Murray, You Lose the Bet Chris Smith, A guide to the science and pseudoscience of A Troublesome Inheritance, part I: The genetics of human… Understanding Human Variation *Jon Marks, The Genes Made Us Do It *American Anthropological […]