The Italian Academy for Advanced Studies in America, Columbia University, New York, New York 10027, USA.

Abstract

Farming was first introduced to Europe in the mid-seventh millennium bc, and was associated with migrants from Anatolia who settled in the southeast before spreading throughout Europe. Here, to understand the dynamics of this process, we analysed genome-wide ancient DNA data from 225 individuals who lived in southeastern Europe and surrounding regions between 12000 and 500 bc. We document a west-east cline of ancestry in indigenous hunter-gatherers and, in eastern Europe, the early stages in the formation of Bronze Age steppe ancestry. We show that the first farmers of northern and western Europe dispersed through southeastern Europe with limited hunter-gatherer admixture, but that some early groups in the southeast mixed extensively with hunter-gatherers without the sex-biased admixture that prevailed later in the north and west. We also show that southeastern Europe continued to be a nexus between east and west after the arrival of farmers, with intermittent genetic contact with steppe populations occurring up to 2,000 years earlier than the migrations from the steppe that ultimately replaced much of the population of northern Europe.

PCA of 486 ancient individuals, projected onto principal components defined by 777 present-day West Eurasian individuals (grey points). This differs from in that the plot is not cropped and the present-day individuals are shown.

Supervised ADMIXTURE analysis modeling each ancient individual (one per row), as a mixture of populations represented by clusters that are constrained to contain Anatolian Neolithic (grey), Yamnaya from Samara (yellow), EHG (pink) and WHG (green) populations. Dates in parentheses indicate approximate range of individuals in each population. This differs from in that it contains some previously published samples, and includes sample IDs.

Unsupervised ADMIXTURE plot from k=4 to 12, on a dataset consisting of 1099 present-day individuals and 476 ancient individuals. We show newly reported ancient individuals and some previously published individuals for comparison.

Spatial structure in hunter-gatherers. Estimated effective migration surface (EEMS). This fits a model of genetic relatedness where individuals move (in a random direction) from generation to generation on an underlying grid so that genetic relatedness is determined by distance. The migration parameter m defines the local rate of migration, varies on the grid and is inferred. This plot shows log10(m), scaled relative to the average migration rate (which is arbitrary). Thus log10(m)=2, for example, implies that the rate of migration at this point on the grid is 100 times higher than average. To restrict as much as possible to hunter-gatherer structure, the migration surface is inferred using data from 116 individuals that date to earlier than ~5000 BCE and have no NW Anatolian-related ancestry. Though the migration surface is sensitive to sampling, and fine-scale features may not be interpretable, the migration “barrier” (region of low migration) running north-south and separating populations with primarily WHG from primarily EHG ancestry seems to be robust, and consistent with inferred admixture proportions. This analysis suggests that Mesolithic hunter-gatherer population structure was clustered and not smoothly clinal, in the sense that genetic differentiation did not vary consistently with distance. Superimposed on this background, pies show the WHG, EHG and CHG ancestry proportions inferred for populations used to construct the migration surface (another way of visualizing the data in , – we use two population models if they fit with p>0.01, and three population models otherwise). Pies with only a single color are those that were fixed to be the source populations.

log-likelihood surfaces for the proportion of female (x-axis) and male (y-axis) ancestors that are hunter-gatherer-related for the combined populations analyzed in , and the two populations with the strongest evidence for sex-bias. Numbers in parentheses give the number of individuals in each group. Log-likelihood scale ranges from 0 to -10, where 0 is the feasible point with the highest likelihood.

A: Populations modeled as a mixture of NW Anatolia Neolithic, WHG, and EHG. Dashed lines show temporal relationships between populations from the same geographic region. Percentages indicate proportion of WHG+EHG ancestry. Standard errors range from 0.7-6.0% (). B: Z-scores for the difference in hunter-gatherer-related ancestry on the autosomes compared to the X chromosome when populations are modeled as a mixture of NW Anatolia Neolithic and WHG (N=126 individuals, group sizes in parentheses). Positive values indicate more hunter-gatherer-related ancestry on the autosomes and thus male-biased hunter-gatherer ancestry. “Combined” populations merge all individuals from different times from a geographic area. C: Hunter-gatherer-related ancestry proportions on the autosomes, X chromosome, mitochondrial DNA (i.e. mt haplogroup U), and the Y chromosome (i.e. Y chromosome haplogroups I2, R1 and C1). Points show qpAdm (autosomes and X chromosome) or maximum likelihood (MT and Y chromosome) estimates and bars show approximate 95% confidence intervals (N=109 individuals, group sizes in parentheses).