April 30, 2015

The Kalash represent an enigmatic isolated population of Indo-European speakers who have been living for centuries in the Hindu Kush mountain ranges of present-day Pakistan. Previous Y chromosome and mitochondrial DNA markers provided no support for their claimed Greek descent following Alexander III of Macedon's invasion of this region, and analysis of autosomal loci provided evidence of a strong genetic bottleneck. To understand their origins and demography further, we genotyped 23 unrelated Kalash samples on the Illumina HumanOmni2.5M-8 BeadChip and sequenced one male individual at high coverage on an Illumina HiSeq 2000. Comparison with published data from ancient hunter-gatherers and European farmers showed that the Kalash share genetic drift with the Paleolithic Siberian hunter-gatherers and might represent an extremely drifted ancient northern Eurasian population that also contributed to European and Near Eastern ancestry. Since the split from other South Asian populations, the Kalash have maintained a low long-term effective population size (2,319–2,603) and experienced no detectable gene flow from their geographic neighbors in Pakistan or from other extant Eurasian populations. The mean time of divergence between the Kalash and other populations currently residing in this region was estimated to be 11,800 (95% confidence interval = 10,600−12,600) years ago, and thus they represent present-day descendants of some of the earliest migrants into the Indian sub-continent from West Asia.

Large-scale genomic data offers the perspective to decipher the genetic architecture of natural selection. To characterize natural selection, various analytical methods for detecting candidate genomic regions have been developed. We propose to perform genome-wide scans of natural selection using principal component analysis. We show that the common Fst index of genetic differentiation between populations can be viewed as a proportion of variance explained by the principal components. Looking at the correlations between genetic variants and each principal component provides a conceptual framework to detect genetic variants involved in local adaptation without any prior definition of populations. To validate the PCA-based approach, we consider the 1000 Genomes data (phase 1) after removal of recently admixed individuals resulting in 850 individuals coming from Africa, Asia, and Europe. The number of genetic variants is of the order of 36 millions obtained with a low-coverage sequencing depth (3X). The correlations between genetic variation and each principal component provide well-known targets for positive selection (EDAR, SLC24A5, SLC45A2, DARC), and also new candidate genes (APPBPP2, TP1A1, RTTN, KCNMA, MYO5C) and non-coding RNAs. In addition to identifying genes involved in biological adaptation, we identify two biological pathways involved in polygenic adaptation that are related to the innate immune system (beta defensins) and to lipid metabolism (fatty acid omega oxidation). PCA-based statistics retrieve well-known signals of human adaptation, which is encouraging for future whole-genome sequencing project, especially in non-model species for which defining populations can be difficult. Genome scan based on PCA is implemented in the open-source and freely available PCAdapt software.

Principal components analysis (PCA) is a widely used tool for inferring population structure and correcting confounding in genetic data. We introduce a new algorithm, FastPCA, that leverages recent advances in random matrix theory to accurately approximate top PCs while reducing time and memory cost from quadratic to linear in the number of individuals, a computational improvement of many orders of magnitude. We apply FastPCA to a cohort of 54,734 European Americans, identifying 5 distinct subpopulations spanning the top 4 PCs. Using a new test for natural selection based on population differentiation along these PCs, we replicate previously known selected loci and identify three new signals of selection, including selection in Europeans at the ADH1B gene. The coding variant rs1229984 has previously been associated to alcoholism and shown to be under selection in East Asians; we show that it is a rare example of independent evolution on two continents.

April 19, 2015

Mitochondrial diversity of Iñupiat people from the Alaskan North Slope provides evidence for the origins of the Paleo- and Neo-Eskimo peoples

Jennifer A. Raff et al.ABSTRACT

Objectives:

All modern Iñupiaq speakers share a common origin, the result of a recent (∼800 YBP) and rapid trans-Arctic migration by the Neo-Eskimo Thule, who replaced the previous Paleo-Eskimo inhabitants of the region. Reduced mitochondrial haplogroup diversity in the eastern Arctic supports the archaeological hypothesis that the migration occurred in an eastward direction. We tested the hypothesis that the Alaskan North Slope served as the origin of the Neo- and Paleo-Eskimo populations further east.

Materials and Methods:

We sequenced HVR I and HVR II of the mitochondrial D-loop from 151 individuals in eight Alaska North Slope communities, and compared genetic diversity and phylogenetic relationships between the North Slope Inupiat and other Arctic populations from Siberia, the Aleutian Islands, Canada, and Greenland.

Results:

Mitochondrial lineages from the North Slope villages had a low frequency (2%) of non-Arctic maternal admixture, and all haplogroups (A2, A2a, A2b, D2a, and D4b1a–formerly known as D3) found in previously sequenced Neo- and Paleo-Eskimos and living Inuit and Eskimo peoples from across the North American Arctic. Lineages basal for each haplogroup were present in the North Slope. We also found the first occurrence of two haplogroups in contemporary North American Arctic populations: D2a, previously identified only in Aleuts and Paleo-Eskimos, and the pan-American C4.

Discussion:

Our results yield insight into the maternal population history of the Alaskan North Slope and support the hypothesis that this region served as an ancestral pool for eastward movements to Canada and Greenland, for both the Paleo-Eskimo and Neo-Eskimo populations

April 13, 2015

The origin of Iranian speakers is a big puzzle as in ancient times there were two quite different groups of such speakers: nomadic steppe people such as Scythians and settled farmers such as Persians and Medes.

I am guessing that the story of Iranian origins will only be solved in correlation to their Indo-Aryan brethren and their more distant Indo-European relations.

Clearly, G1 cannot be Proto-Indo-European as it has a rather limited distribution in Eurasia, but it could very well have been a marker of a subset of Indo-Europeans. If it was present in ancestral Iranians, then this would geographically constrain the places where ancestral Iranians were formed.

Y-chromosomal haplogroup G1 is a minor component of the overall gene pool of South-West and Central Asia but reaches up to 80% frequency in some populations scattered within this area. We have genotyped the G1-defining marker M285 in 27 Eurasian populations (n= 5,346), analyzed 367 M285-positive samples using 17 Y-STRs, and sequenced ~11 Mb of the Y-chromosome in 20 of these samples to an average coverage of 67X. This allowed detailed phylogenetic reconstruction. We identified five branches, all with high geographical specificity: G1-L1323 in Kazakhs, the closely related G1-GG1 in Mongols, G1-GG265 in Armenians and its distant brother clade G1-GG162 in Bashkirs, and G1-GG362 in West Indians. The haplotype diversity, which decreased from West Iran to Central Asia, allows us to hypothesize that this rare haplogroup could have been carried by the expansion of Iranic speakers northwards to the Eurasian steppe and via founder effects became a predominant genetic component of some populations, including the Argyn tribe of the Kazakhs. The remarkable agreement between genetic and genealogical trees of Argyns allowed us to calibrate the molecular clock using a historical date (1405 AD) of the most recent common genealogical ancestor. The mutation rate for Y-chromosomal sequence data obtained was 0.78×10-9 per bp per year, falling within the range of published rates. The mutation rate for Y-chromosomal STRs was 0.0022 per locus per generation, very close to the so-called genealogical rate. The “clan-based” approach to estimating the mutation rate provides a third, middle way between direct farther-to-son comparisons and using archeologically known migrations, whose dates are subject to revision and of uncertain relationship to genetic events.

Punctured extinct cave bear femora were misidentified in southeastern Europe (Hungary/Slovenia) as ‘Palaeolithic bone flutes’ and the ‘oldest Neanderthal instruments’. These are not instruments, nor human made, but products of the most important cave bear scavengers of Europe, hyenas. Late Middle to Late Pleistocene (Mousterian to Gravettian) Ice Age spotted hyenas of Europe occupied mainly cave entrances as dens (communal/cub raising den types), but went deeper for scavenging into cave bear dens, or used in a few cases branches/diagonal shafts (i.e. prey storage den type). In most of those dens, about 20% of adult to 80% of bear cub remains have large carnivore damage. Hyenas left bones in repeating similar tooth mark and crush damage stages, demonstrating a butchering/bone cracking strategy. The femora of subadult cave bears are intermediate in damage patterns, compared to the adult ones, which were fully crushed to pieces. Hyenas produced round–oval puncture marks in cub femora only by the bone-crushing premolar teeth of both upper and lower jaw. The punctures/tooth impact marks are often present on both sides of the shaft of cave bear cub femora and are simply a result of non-breakage of the slightly calcified shaft compacta. All stages of femur puncturing to crushing are demonstrated herein, especially on a large cave bear population from a German cave bear den.

April 04, 2015

Although initial studies suggested that Denisovan ancestry was found only in modern human populations from island Southeast Asia and Oceania, more recent studies have suggested that Denisovan ancestry may be more widespread. However, the geographic extent of Denisovan ancestry has not been determined, and moreover the relationship between the Denisovan ancestry in Oceania and that elsewhere has not been studied. Here we analyze genome-wide SNP data from 2493 individuals from 221 worldwide populations, and show that there is a widespread signal of a very low level of Denisovan ancestry across Eastern Eurasian and Native American (EE/NA) populations. We also verify a higher level of Denisovan ancestry in Oceania than that in EE/NA; the Denisovan ancestry in Oceania is correlated with the amount of New Guinea ancestry, but not the amount of Australian ancestry, indicating that recent gene flow from New Guinea likely accounts for signals of Denisovan ancestry across Oceania. However, Denisovan ancestry in EE/NA populations is equally correlated with their New Guinea or their Australian ancestry, suggesting a common source for the Denisovan ancestry in EE/NA and Oceanian populations. Our results suggest that Denisovan ancestry in EE/NA is derived either from common ancestry with, or gene flow from, the common ancestor of New Guineans and Australians, indicating a more complex history involving East Eurasians and Oceanians than previously suspected.

Old Blog Archive

Dienekes' Anthropology blog is dedicated to human population genetics, physical anthropology, archaeology, and history.

You are free to reuse any of the materials of this blog for non-commercial purposes, as long as you attribute them to Dienekes Pontikos and provide a link to either the individual blog entry or to Dienekes Anthropology Blog.

Feel free to send e-mail to Dienekes Pontikos, or follow @dienekesp on Twitter.