search this blog

Friday, July 22, 2016

Update 25/01/2018: The Basal-rich K7 is now available to personal genomics customers for $6 USD a pop (see here).
...
I've got a new test. Currently I'm only using it to explore ancient genomes, but at some point I'll make another version available to personal genomics customers, one way or another. However, that might take a little bit of work and time to mitigate the effects of the calculator effect and so on.
Below is a spreadsheet featuring a wide range of ancient and present-day samples from recent papers. A table with the Fst genetic distances between the seven ancestral populations is available here.

Please note that the Basal-rich component is unlikely to be a perfect representation of the hypothetical Basal Eurasian population. At the same time, it's likely that the two hunter-gatherer components, Ancient North Eurasian (or AG3-related) and Villabruna-related, contain some Basal Eurasian admixture.
Here's a Principal Component Analysis (PCA) of the West Eurasian populations based on their K7 ancestry proportions. It captures all of the main features of West Eurasian genetic diversity, including the two parallel clines made up of Europeans and Near Easterners, and the intermediate position of South Central Asians between the ancient samples from Neolithic Iran and Bronze Age Europe.

An extra large version of the same PCA, with the samples labeled individually, can be downloaded here.
Also, using the K7 ancestry proportions, I modeled the ancient ancestry of a few present-day populations from the Near east, Northern Europe and South Central Asia with the nMonte R script. Bronze Age steppe admixture in groups from the latter two regions is usually inferred at 40-50% with tools based on formal stats, such as qpAdm and TreeMix, so I wanted to check if I could reproduce such results.

Admittedly, these estimates look very conservative, but certainly not out of the ballpark. I suspect that I'll be able to improve the models and statistical fits as new Bronze Age steppe samples become available. Indeed, I'll be updating the spreadsheet above regularly.

Thursday, July 14, 2016

Abstract: We sequenced Early Neolithic genomes from the Zagros region of Iran (eastern Fertile Crescent), where some of the earliest evidence for farming is found, and identify a previously uncharacterized population that is neither ancestral to the first European farmers nor has contributed significantly to the ancestry of modern Europeans. These people are estimated to have separated from Early Neolithic farmers in Anatolia some 46-77,000 years ago and show affinities to modern day Pakistani and Afghan populations, but particularly to Iranian Zoroastrians. We conclude that multiple, genetically differentiated hunter-gatherer populations adopted farming in SW-Asia, that components of pre-Neolithic population structure were preserved as farming spread into neighboring regions, and that the Zagros region was the cradle of eastward expansion.

Monday, July 11, 2016

Abstract: It is a long standing question as which genes define the characteristic facial features among different ethnic groups. In this study, we use Uyghurs, an ancient admixed population to query the genetic bases why Europeans and Han Chinese look different. Facial trait variations were analyzed based on high dense 3D facial images; numerous biometric spaces were examined for divergent facial features between European and Han Chinese, ranging from inner-landmarks to dense shape geometrics. A series of genome-wide association analyses were conducted on a discovery panel of Uyghurs. Six significant loci were identified and four of which, rs1868752, rs118078182, rs60159418 at or near UBASH3B, COL23A1, PCDH7 and rs17868256 were replicated in two independent cohorts of Uyghurs or Southern Han Chinese. We further developed a quantitative model to predict 3D faces based on 277 top GWAS SNPs. In hypothetic forensic scenarios, this model was found to significantly enhance the rate of suspect verification, suggesting a practical potential of related research.

Back in May I hypothesized that present-day East Asians were prehistoric hybrids of partly Ancient North Eurasian (ANE) origin. I got the idea from a series of TreeMix runs (see here).
This was essentially confirmed recently in the Lazaridis et al. 2016 preprint. Refer to page 147 in the paper's supplementary information PDF here.
However, based on more recent TreeMix runs featuring data from Lazaridis et al., I'd say the situation is more complex than just some minor ANE-related admixture in East Asians. I suspect now that all East Asians, including even the Onge, an ancient isolate population from the Andaman Islands, harbor significant ANE-related ancestry that may have arrived in East Asia in separate waves.
Here's what I'm talking about. Note that all of the samples on the East Asian node - Upper Paleolithic west Siberian forager Ust-Ishim, Han Chinese and Onge - are influenced by a massive migration edge from the base of the AG3-MA1 or ANE branch. However, as per the second graph, only the ancestors of more northerly East Asians, like those of the Han, appear to have been recipients of the latest ANE-related admixture into East Asia.

Indeed, when I add the Natufians from the Epipaleolithic Levant to the analysis, Ust-Ishim and the East Asians join AG3-MA1 on the same branch, but now receive a 36% migration edge from a point basal to all Eurasians. This is not admixture from the hypothesized Basal Eurasian clade, but probably from another basal clade, specific to East Asians, which I'd say occasionally shows up as pseudo Sub-Saharan admixture in East Asians.

Saturday, July 9, 2016

Lazaridis et al. showed that their Steppe_EMBA grouping, which included Afanasievo, Poltavka and Yamnaya, as well as two Potapovka samples, one Russia_EBA sample and one Srubnaya_outlier sample, were best modeled in the following two ways using qpAdm:

I'm not a huge fan of either of these models, but especially the first one, even though I understand that they're both statistically very sound. For one, the uniparental markers don't match, and two, TreeMix seems to disagree (see here).

So let's try something a little different and see what happens when I model Steppe_EMBA as EHG, CHG, and Anatolia Chalcolithic.

As far as I can tell, it's a very decent fit, especially considering that I'm using 12 outgroups and three reference populations. To me, at least, the standard errors look surprisingly low for such a complex model: 0.033, 0.046 and 0.020, respectively.
Now, I'm not arguing here that Chalcolithic Anatolia is the answer. What I'm saying is that multiple lines of evidence do not support Chalcolithic Iran as a real source of admixture for Steppe_EMBA, and I'm offering what I see as a plausible alternative among the currently available samples.
I know that this is a work in progress for the Broad MIT/Harvard team, and we'll have to wait for more ancient samples and another paper or two before a consensus is reached on the topic.
But here's my prediction: Steppe_EMBA only has 10-15% admixture from the post-Mesolithic Near East not including the North Caucasus, and basically all of this comes via female mediated gene flow from farming communities in the Caucasus and perhaps present-day Ukraine.
See also...
Another look at the genetic structure of Yamnaya

Friday, July 8, 2016

Abstract: In a recent interdisciplinary study, Das and co-authors have attempted to trace the homeland of Ashkenazi Jews and of their historical language, Yiddish (Das et al. 2016. Localizing Ashkenazic Jews to Primeval Villages in the Ancient Iranian Lands of Ashkenaz. Genome Biology and Evolution). Das and co-authors applied the geographic population structure (GPS) method to autosomal genotyping data and inferred geographic coordinates of populations supposedly ancestral to Ashkenazi Jews, placing them in Eastern Turkey. They argued that this unexpected genetic result goes against the widely accepted notion of Ashkenazi origin in the Levant, and speculated that Yiddish was originally a Slavic language strongly influenced by Iranian and Turkic languages, and later remodeled completely under Germanic influence. In our view, there are major conceptual problems with both the genetic and linguistic parts of the work. We argue that GPS is a provenancing tool suited to inferring the geographic region where a modern and recently unadmixed genome is most likely to arise, but is hardly suitable for admixed populations and for tracing ancestry up to 1000 years before present, as its authors have previously claimed. Moreover, all methods of historical linguistics concur that Yiddish is a Germanic language, with no reliable evidence for Slavic, Iranian, or Turkic substrata.

Monday, July 4, 2016

Courtesy of Arbuckle et al. at the Journal of Archaeological Science. Emphasis is mine:

Abstract: In this paper we address the timing of and mechanisms for the appearance of domestic cattle in the Eastern Fertile Crescent (EFC) region of SW Asia through the analysis of new and previously published species abundance and biometric data from 86 archaeofaunal assemblages. We find that Bos exploitation was a minor component of animal economies in the EFC in the late Pleistocene and early Holocene but increased dramatically in the sixth millennium BC. Moreover, biometric data indicate that small sized Bos, likely representing domesticates, appear suddenly in the region without any transitional forms in the early to mid sixth millennium BC. This suggests that domestic cattle were imported into the EFC, possibly associated with the spread of the Halaf archaeological culture, several millennia after they first appear in the neighboring northern Levant.

These findings more or less correlate with the results in the new Lazaridis et al. preprint:

During subsequent millennia, the early farmer populations of the Near East expanded in all directions and mixed, as we can only model populations of the Chalcolithic and subsequent Bronze Age as having ancestry from two or more sources. The Chalcolithic people of western Iran can be modelled as a mixture of the Neolithic people of western Iran, the Levant, and Caucasus Hunter Gatherers (CHG), consistent with their position in the PCA (Fig. 1b).

In other words, the small cows weren't just imported into the Eastern Fertile Crescent; they came with people who also made a major genetic impact on the region.
Here's my own PCA featuring the relevant Lazaridis et al. samples. Key: Caucasus_HG = Caucasus Hunter-Gatherer; Iran_ChL = Iran Chalcolithic; Iran_HG = Iran Hunter-Gatherer; Iran_N = Iran Neolithic; Levant_N = Levant Neolithic.

Thus, it would seem that after the early Neolithic farmers from Iran migrated to South Asia, they were largely replaced in their own homeland by Halaf pastoralists and/or related groups. Moreover, their descendents in South Asia, and especially South Central Asia, were then largely replaced by pastoralists from the Bronze Age Eurasian steppe (for instance, see here).
Obviously, this doesn't square too well with the idea of a Proto-Indo-European homeland in the Zagros Mountains of western Iran, does it?
See also...
Yamnaya =/= Eastern Hunter-Gatherers + Iran ChalcolithicZarathushtra and his steppe posse

Saturday, July 2, 2016

If Indo-Iranian languages didn't expand from the Andronovo horizon, but rather from an earlier archaeological steppe culture, which is what it seems like based on the latest analysis of ancient genomes from the steppe (see page 123 here), then I reckon the best option is the Catacomb Culture.
As far as I can tell, one of the Yamnaya samples from Allentoft et al. 2015, RISE552 from the Ulan IV burial, might actually be a Catacomb sample. That's because Ulan IV is classified as an West Manych Catacomb Culture site. Check out this awesome paper on one of the graves from this site here.

Here's an qpAdm model of the Kalasha from the Hindu Kush featuring Ulan IV RISE552 based on over 200K SNPs:

I'm not saying this model is definitive by any stretch, but it's more or less statistically sound, with fairly low standard errors for each of the coefficients (0.051, 0.066, 0.041, 0.023 respectively). It's also very similar to the optimal qpAdm model of the Kalasha in Lazaridis et al. 2016.
Interestingly, it also matches closely a TreeMix analysis that I posted at my other blog last year, months before I even knew that ancient genomes from Neolithic Iran were on the way (see here). This is what I said in that blog entry:

Both of these models are correct; they just show the same thing in different ways. So if we mesh them together the Kalash and Pathans come out ~65% LNE/EBA European (which includes substantial Caucasus or Caucasus-related ancestry), ~12% ASI, and ~23% something as yet undefined.
If I had to guess, I'd say the mystery ~23% was Neolithic admixture from what is now Iran.

That's not bad considering how difficult it is to make predictions about ancient population movements without direct evidence from ancient DNA. In any case, it's a lot better than what has been published on the topic in some major journals.