TWO NEW STUDIES IN EUROPEAN PREHISTORY have recently made headlines. The first (Haak et al. 2015) purports to show, using genetic data from ancient skeletons, that massive migration from the Central Asian Steppes into Europe during the Bronze Age likely introduced Indo-European languages, thus supporting the venerable “Kurgan” hypothesis championed by Marija Gimbutas decades ago. The second study (Smith et al. 2015) identified DNA from domesticated wheat (triticum) in submarine peat deposits off the southern coast of England dating to 8000 years ago, two millennia before such plants formed an identifiable component of crop assemblages in known terrestrial sites in England, and thus well ahead of the Neolithic agricultural “front.”

The validity of the later results remains to be seen—the DNA in question was cored out of the ocean bottom, and while the published results appear robust, it is unlikely that these data will single-handedly overturn the long-standing archaeological narrative of the Neolithic. That study does however provide a convenient point of digression for reexamining the first study and other similar studies of ancient genetics. Archaeologists have typically used two kinds of models to explain the past—diffusionist models in which ideas, things, and practices move, and migrationist models in which ideas, things, and practices move because people move. The movement of domesticated plants and animals of Near Eastern origins into Europe—the so-called “Neolithic Revolution”—has been the bell-weather case for testing these two types of explanations in archaeology. In the last three decades or so, human genetics has entered the picture as a way of testing competing hypotheses, first using modern DNA samples from living people in Europe and the Near East, and increasingly in the last decade, using ancient DNA (aDNA) extracted from archaeological burials (Pinhasi et al. 2012).

European population genetics, modern and ancient

Early studies of modern biological patterning (initially using blood types and other proteins) suggested a broad SE-NW trend in frequencies (Ammermann and Cavalli-Sforza 1984), one that was later confirmed when DNA sequencing became possible. This pattern was immediately interpreted as the outcome of a Neolithic period migration out of the Near East into Europe beginning after 8000 BC, swamping out “indigenous” European peoples (and their genes) that had been in place since at least the end of the last ice age (c. 12,000 years ago or longer). Vigorous debate ensued as some researchers argued that this trend could have resulted from a much earlier peopling of Europe by modern humans c. 45,000 years ago, or possibly during the reoccupation of Europe after the last glacial maximum (c. 22,000 years ago) by people who had occupied glacial refugia further south in Europe (see Pinhasi et al. 2012 and Deguilloux et al. 2012 for reviews of this work).

In 2005, the first study of DNA from actual early Neolithic skeletons was published (Haak et al. 2005), and the results were quite different from what most researchers had expected. As it turns out, early Neolithic skeletons, at least in central Europe (associated with an archaeological culture called the Linienbandkeramik or LBK) contain gene frequencies that are quite unlike those found in modern European populations. Specifically, mitochondrial DNA (mtDNA) haplogroups (sets of genomes related by shared mutations at particular locations on the genome suggesting common origins) thought to be clear markers of Neolithic population growth and movement were only present at relatively low frequencies, while one particular haplogroup—N1a—present at extremely low frequencies anywhere in modern day Eurasia and Africa, was quite common in the early Neolithic genepool. In the ensuing ten years, there has been a rapidly growing set of aDNA analyses performed in Europe, both on mtDNA (tracing descent through females) and Y-chromosome aDNA (tracing descent through males). As with modern DNA, measuring descent through males and females provides somewhat different answers, and suggests that on a whole, women have been more mobile than men in Europe (likely indicating a very old predominant pattern of patrilocality, e.g., Seielstad et al. 1998). In some places (parts of Northern Spain, for instance—see Sampietro et al. 2007), early Neolithic gene frequencies are not that different from earlier ones or modern ones, while in central Europe at least, the early Neolithic did witness a massive reshaping of the genetic landscape from a relatively genetically homogenous late-Paleolithic and Mesolithic background to a much more diverse Neolithic one, and little similarity is evident between the Neolithic and the present day (see Pinhasi et al. 2012 and Deguilloux et al. 2012 for reviews of this work).

The Haak et al. study (published earlier this month) identifies gene flow between central Asia and Europe in the Bronze Age, and a series of other recent studies also clearly demonstrate that the Neolithic is not the end of the story either. The researchers postulate a massive migration of Steppe populations into Europe associated with the Yamnaya archaeological culture, one that has been previously hypothesized to have spread Indo-European languages both eastwards and westwards out of Central Asia. aDNA research is for the most part slowly hammering the nail in the coffin of diffusionist models for the spread of agriculture, and for many, is now offering strong support for the spread of languages through massive migratory events (including Renfrew’s [1988] hypothesized spread of Indo-European during the early Neolithic).

Human “populations”

Care needs to be taken in interpreting these results, however. Population geneticists model the human past as a series of admixture events between discrete populations (see Hellenthal et al. 2014 for a recent attempt to define how many such populations there are). These populations may be defined in a number of ways—by geography (typically by continent), by language, by self-defined or externally perceived ethnicity, and in the case of palaeogenetics, by archaeological culture. There is thus an “LBK” or a “PPNB” set of gene frequencies which can admix or not (see for instance Fernández et al. 2015). This is a convenient shorthand, because it allows a small number of analyses (aDNA studies have sampled at most a few hundred individuals to date, while even modern studies are based on only thousands of individuals) to be taken as representative of some larger analytically meaningful population.

But what is a human population? That we have many ways of categorizing each other is unquestioned—we divide people up by race, income, clothing style, dialect, neighborhood, country, and a thousand other ways. It is also not implausible to imagine that real geographical boundaries such as major mountain chains, oceans, deserts, and so forth, may produce long-term vicariant barriers inhibiting interaction (i.e., people having sex with one another). That this is so is clearly demonstrated by the fact that modern gene frequencies are strongly patterned by geography in Europe, so much so that a multi-dimensional scaling plot (a way of representing many axes of variability on a single two-dimensional plot) of gene frequencies virtually recreates the geographic shape of Europe (Novembre et al. 2008). It is also everyone’s experience that humans live in social groups that can feel very real and rigid, and thus it might seem clear that human populations can be defined. However, the issue in palaeogenetics is different, namely, whether people live in sexual groups impermeable and long-lasting enough to explain the long-term configuration and development of gene frequencies, as well as serving as the basic scaffolding for other forms of human identity including the transmission of learning through time (i.e., culture and languages, including Indo-European ones).

Analytical simplification and historical reality

If the goal is simply to abstractly model how genes may have moved across the landscape historically, then perhaps an analytical fiction of discrete human “populations” is adequate for the job, similar to the use of the “gene” as analytical shorthand for modeling the complex network of DNA-RNA-protein interactions that drive biological function (e.g., Dawkins 2009). In econometrics, Friedman (1970) argued that it didn’t matter whether models were based on plausible assumptions, as long as those theories generated testable predictions that matched observations and resulted in predictive power. However, while predictive power may result even from a model with unrealistic starting assumptions, if social scientists want to explain what actually happened, our starting assumptions do matter. Their plausibility must be evaluated by examining how consistent they are with our knowledge of the world, updated in light of new information—if those assumptions are subsequently found wanting, we must reject the basic plausibility of our models, even if they produce outcomes consistent with empirical data (Nooteboom 1986). This is simply another way of stating that the same outcome can often be generated by several different models, and we need to turn to other lines of information to choose between them. In a recent paper, Pickrell and Reich (2014) use simulation to demonstrate that a number of opposing population genetic models used to explain human genetic patterning can produce the exact same results when operating over long periods of time.

As more aDNA analyses are published, the number of population migrations required to explain observed palaeo- and modern-gene frequencies in Europe (and by implication elsewhere) appears to be steadily increasing, in some cases seemingly at a rate of one per study (e.g., Hervella et al. 2015). This situation reminds me somewhat of the addition of spheres to the Ptolemaic system of planetary motions. Eventually, the Ptolemaic system grew so ponderous that some doubted it merely on the principal of parsimony. It took a radical rethinking of planetary positioning to generate a far simpler explanation of planetary motion. In the case of palaeogenetics (and other explanations of the past), perhaps a similar shift in thinking is required, one that moves away from the monolithic “billiard-ball” model of cultures and populations to something more plausible.

The human network

What should be the unit of analysis in historical genetics (and historical explanation more generally), and how do we create models that are consistent with other observations about human social structure and sexual behavior? In other words, how do we distinguish between competing historical genetic models by evaluating the basic plausibility of those models? One promising avenue comes from the recent explosion of interest in network analysis, which provides a robust method and body of knowledge for describing human social structure and comparing it to genetic patterning (e.g., Terrell 2010), and which does not necessarily require that one define broader units of analysis in advance, such as archaeological cultures. The challenge is to combine our knowledge of network structure in the human population (small-worlds and the like) with our understanding of genetics to create more plausible models of the human past. How this is to be accomplished in a formal mathematical sense remains to be seen.

This is more than just an academic concern—the popular media picks up on these studies and reinforces the viewpoint that humans do in fact come in particular “types” that can be identified through the new science of genetics—for instance, a recent distillation of one such aDNA study in a major media outlet described the results as indicating that modern Europeans derive from “three tribes” of ancient people, one of whom may be previously “unknown” to science (Rincon 2014). Do we really need “pulse-stasis” models for human population structure in the past? How do we adequately account for the fact that archaeological evidence suggests expansive social networks wherever and whenever we look, and that modern political/continental boundaries and perceived historical and cultural areas are not adequate units of analysis for splitting populations then or now? What happens if we resample our data and begin arbitrarily drawing lines that don’t correspond to these perceived political, geographical, linguistic, or archaeological categories? Does the story stay the same? A social networks perspective on the past is one way to transcend these problematic but common-sense ideas of human population(s) structure. If wheat can move beyond “Neolithic” communities thousands of years earlier than previously supposed, what else was moving?