Abstract

Lactase persistence (LP) is common among people of European ancestry, but with the exception of some African, Middle Eastern and southern Asian groups, is rare or absent elsewhere in the world.

Lactase gene haplotype conservation around a polymorphism strongly associated with LP in Europeans (1−13,910 C/T) indicates that the derived allele is recent in origin and has been subject to strong positive selection. Furthermore, ancient DNA work has shown that the −13,910*T (derived) allele was very rare or absent in early Neolithic central Europeans. It is unlikely that LP would provide a selective advantage without a supply of fresh milk, and this has lead to a gene-culture coevolutionary model where lactase persistence is only favored in cultures practicing dairying, and dairying is more favored in lactase persistent populations.

We have developed a flexible demic computer simulation model to explore the spread of lactase persistence, dairying, other subsistence practices and unlinked genetic markers in Europe and western Asia's geographic space. Using data on −13,910*T allele frequency and farming arrival dates across Europe, and approximate Bayesian computation to estimate parameters of interest, we infer that the −13,910*T allele first underwent selection among dairying farmers around 7,500 years ago in a region between the central Balkans and central Europe, possibly in association with the dissemination of the Neolithic Linearbandkeramik culture over Central Europe. Furthermore, our results suggest that natural selection favouring a lactase persistence allele was not higher in northern latitudes through an increased requirement for dietary vitamin D. Our results provide a coherent and spatially explicit picture of the coevolution of lactase persistence and dairying in Europe.

Author Summary

Most adults worldwide do not produce the enzyme lactase and so are unable to digest the milk sugar lactose. However, most people in Europe and many from other populations continue to produce lactase throughout their life (lactase persistence). In Europe, a single genetic variant, −13,910*T, is strongly associated with lactase persistence and appears to have been favoured by natural selection in the last 10,000 years. Since adult consumption of fresh milk was only possible after the domestication of animals, it is likely that lactase persistence coevolved with the cultural practice of dairying, although it is not known when lactase persistence first arose in Europe or what factors drove its rapid spread. To address these questions, we have developed a simulation model of the spread of lactase persistence, dairying, and farmers in Europe, and have integrated genetic and archaeological data using newly developed statistical approaches. We infer that lactase persistence/dairying coevolution began around 7,500 years ago between the central Balkans and central Europe, probably among people of the Linearbandkeramik culture. We also find that lactase persistence was not more favoured in northern latitudes through an increased requirement for dietary vitamin D. Our results illustrate the possibility of integrating genetic and archaeological data to address important questions on human evolution.

Introduction

Lactase persistence (LP) is an autosomal dominant trait enabling the continued production of the enzyme lactase throughout adult life. Lactase non-persistence is the ancestral condition for humans, and indeed for all mammals [1]. Production of lactase in the gut is essential for the digestion of the milk sugar lactose. LP is common in northern and western Europeans as well as in many African, Middle Eastern and southern Asian pastoralist groups, but is rare or absent elsewhere in the world [1]–[4]. In Europeans LP is strongly associated with a single C to T transition in the MCM6 gene (−13,910*T), located 13.91 kb upstream from the lactase gene [5]. Furthermore, in vitro studies have indicated that the −13,910*T allele can directly affect LCT gene promoter activity [6]. The −13,910*T allele ranges frequency from 6%–36% in eastern and southern Europe, 56%–67% in Central and western Europe, to 73%–95% in the British Isles and Scandinavia [7],[8] while LP ranges in frequency from 15%–54% in eastern and southern Europe, 62%–86% in Central and western Europe, to 89%–96% in the British Isles and Scandinavia [9]. This makes the −13,910*T allele a good candidate for predicting LP in Europe. However, genotype/phenotype frequency comparisons have shown that the −13,910*T allele cannot account for LP frequencies in most African [3] and Middle Eastern populations [10]. Instead, different LP-associated alleles occurring in the same genomic region have been reported, indicating convergent evolution [2],[4],[10],[11].

Using long-range haplotype conservation [8] and variation in closely linked microsatellites [12] as proxies for allelic age, the −13,910*T variant has been estimated to be between 2,188 and 20,650 years old and between 7,450 and 12,300 years old, respectively. These recent age estimates, when considered in conjunction with modern allele frequencies, indicate that −13,910*T has been subjected to very strong natural selection (s = 0.014–0.19; [8]). It is interesting to note that similar estimates for the strength of selection have been obtained for one of the major African LP variants [4].

It is unlikely that lactase persistence would provide a selective advantage without a supply of fresh milk and this has lead to a gene-culture co-evolutionary model where lactase persistence is only favoured in cultures practicing dairying [13]–[16], and dairying is more favoured in lactase persistent populations [14], [17]–[19]. The reasons why LP, in conjunction with dairying, should confer such a strong selective advantage remain open to speculation. Flatz and Rotthauwe [20] proposed the calcium assimilation hypothesis, whereby a lactase persistence allele is favoured in high-latitude regions because reduced levels of sunlight do not allow sufficient synthesis of vitamin-D in the skin. Vitamin D is required for calcium absorption and milk provides a good dietary source of both nutrients. Additional factors are likely to include the ability to consume a calorie and protein-rich food source, the relative constancy in the supply of milk (in contrast to the boom-and-bust of seasonal crops), and the value of fresh milk as a source of uncontaminated fluids. It is likely that the relative advantages conferred by these various factors differ in Europe and Africa.

Estimates of the age of the −13,910*T correspond well with estimates of the onset of dairying in Europe. Slaughtering age profiles in sheep, goats and cattle suggest dairying was present in south-eastern Europe at the onset of the Neolithic [21],[22], while residual milk proteins preserved in ceramic vessels provide evidence for dairying in present day Romania and Hungary 7,900–7,450 years BP [23]. Furthermore, residual analyses of fats indicate dairying at the onset of the Neolithic in England, some 6,100 years BP [24],[25], and after to 8,500 BP in the western parts of present day Turkey [26]. Allelic age estimates are also consistent with the results of a recent ancient DNA study [27] which showed that the −13,910*T allele was rare or absent among early farmers from Central and Eastern Europe. These observations lend support to the view that −13,910*T, and thus LP, rose rapidly in frequency only after the onset of dairying, as opposed to the ‘reverse-cause’ hypothesis [14], [17]–[19], whereby dairying developed in response to the evolution of LP.

Important questions remain regarding the location of the earliest −13,910*T-carrying dairying groups and the demographic and gene-culture co-evolutionary processes that shaped the modern distribution of LP in Europe. The present-day distribution of the −13,910*T allele might be taken to indicate an origin in Northwest Europe. However, the earliest archaeozoological and residual lipid and protein evidence for dairying is found in the Near East, in Southeast Europe and in Mediterranean Europe [21],[26],[28]. While these observations can seem contradictory, forward computer simulations have shown that the centre of distribution of an allele can be far removed from its location of origin when a population expands along a wave front [29],[30].

Assuming that the −13,910*T-allele was only subjected to strong natural selection in dairying groups, it is likely that −13,910*T-carrying dairyers underwent demographic expansion to a greater extent than non-dairying groups. While gene flow between dairying and non-dairying groups would ultimately lead to genetic homogeneity, under conditions of limited gene flow between cultural groups, it is plausible that the earliest LP peoples would have made a higher contribution to the European gene pool than their non-LP neighbours. In this study we use demic forward computer simulations to examine potential scenarios for the spread of LP in Europe. We simulate three interacting cultural groups (hunter gatherers, non-dairying farmers and dairying farmers) and track the spread of an allele that is selected only in one group (dairying farmers). We also track the expected proportion of genetic ancestry from the geographic region where LP/dairying coevolution began. We parameterize intrademic gene flow between cultural groups, interdemic gene flow, sporadic longer-distance migration, the cultural diffusion of subsistence practices and selection favouring lactase persistent dairyers. We compare the predicted frequency of a LP allele and arrival dates of farmers – from simulation outcomes – to known frequencies of the −13,910*T allele [3],[8] and carbon-14 based estimates of the arrival dates of farmers [31] at different locations throughout Europe. We employ approximate Bayesian computation (ABC), a set of methods that allow the estimation of parameters under models too complex for a full-likelihood approach [32]. By comparing summary statistics on the observed data with those computed on our simulated datasets, ABC enables us to estimate the key demographic and evolutionary parameters including the region where LP-dairying coevolution in began in Europe.

Supplementary Video S1 - Animation graphically representing the geographic frequency distribution of the −13,910*T allele at 10-generation time slices over the last 9000 years (assuming a generation time of 25 years), taken from simulations that best fitted data on modern −13,910*T allele frequency and timing of the arrival of farming in Europe.

Supplementary Video S2 - Animation graphically representing the geographic frequency distribution of the −13,910*T allele at 10-generation time slices over the last 9000 years (assuming a generation time of 25 years), taken from simulations that best fitted data on modern −13,910*T allele frequency and timing of the arrival of farming in Europe.

Supplementary Video S3 - Animation graphically representing the geographic frequency distribution of the −13,910*T allele at 10-generation time slices over the last 9000 years (assuming a generation time of 25 years), taken from simulations that best fitted data on modern −13,910*T allele frequency and timing of the arrival of farming in Europe.

1. Research Department of Genetics, Evolution and Environment, University College London, London, United Kingdom; and CoMPLEX (Centre for Mathematics & Physics in the Life Sciences and Experimental Biology)

2. Research Department of Genetics, Evolution and Environment, University College London, London, United Kingdom; and AHRC Centre for the Evolution of Cultural Diversity, Institute of Archaeology, University College London, London, United Kingdom

3. School of Animal and Microbial Sciences, The University of Reading, Whiteknights, Reading, United Kingdom

5. Research Department of Genetics, Evolution and Environment, University College London, London, United Kingdom; and AHRC Centre for the Evolution of Cultural Diversity, Institute of Archaeology, University College London, London, United Kingdom