Africa is the geographic origin of anatomically modern humans; it is also home to a third of all modern languages, including four major language families: Niger-Kordofanian, Afro-Asiatic, Nilo-Saharan, and Khoesan. Despite the importance of African populations for studying human origins and the complexity of demographic and linguistic relationships among African populations, genome-wide analyses of sub-Saharan variation have been sparse. To address this deficiency, we used Illumina 1M-Duo SNP arrays to genotype samples (N=697) from 44 sub-Saharan populations, which we supplemented with published data sets. Principal components analysis (PCA) and linear regression were used to assess the statistical effect of geography and linguistics on the partitioning of genetic variation. As ascertainment bias can distort the allele frequency spectrum, we examined patterns of linkage disequilibrium (LD), haplotype sharing, and identity by descent (IBD) to understand the demographic relationship among populations. To affirm that LD-based analyses were robust to ascertainment bias, we assessed the rank correlation of estimates of effective population size from the rate of LD decay within populations and estimates of population size based on the variance of microsatellite repeat lengths from previously published data (Spearman’s ρ=0.782, p=0.011). Additionally, the presence of long IBD tracts between individuals indicates recent common ancestry. Thus, we used the GERMLINE algorithm to infer IBD tracts between individuals in hunting-gathering populations and neighboring agriculturalist and pastoralist populations. To infer the time to most recent common ancestor and test demographic models while accounting for the confounding effects of migration and changes in population sizes, we employed Approximate Bayesian Computation (ABC) using summaries of haplotype frequency, diversity and sharing within and between populations. We report, for the first time, evidence for recent common ancestry of Ethiopian hunter-gatherers and the Kenyan Sanye/Dahalo, who speak a language with remnant clicks, with click-speaking eastern African Khoesan populations. This work supports archaeological and linguistic studies that indicate that the distribution of Khoesan speaking populations may have extended as far north as Ethiopia.

Not very surprising to me, as I detected a contribution of the "Palaeo_African" component (which has one of its peaks in San) in East Africans.

Comparative study of the Y chromosome diversity in some ethnic groups living in Iran and populations of the Middle East.

L. Andonian et al.

Background: The main goal of this study is to conduct a population genetic study of: a) Armenians living in Iran, in the context of general Armenian population; and b) Iranian Azeris, one of the biggest ethno-linguistic communities, in comparison with other Turkic-speaking populations of the Middle East (from eastern Turkey, Azerbaijan Republic and Turkmenistan). Methods: Buccal cells of 89 Armenian males from central Iran, the descendants of Armenians forcibly moved to Iran in the beginning of 17th century CE, and 105 Turkic-speaking Azeri males from north-west Iran (Tabriz) were collected by mouth swabs. The samples were screened for 12 Single Nucleotide (SNP) and 6 microsatellite markers on the non-recombining portion of the Y chromosome. The results of genetic typing were statistically analyzed using Arlequin software. Results: Iranian Armenians display a moderate level of genetic variation and are genetically closer to Western Armenians which is in agreement with historical records. Iranian Azeris demonstrate much weaker genetic resemblance with Turkmens (as putative source population) than with their geographic neighbors. Conclusion: Political, religious and geographic isolation had moderate influence on the genetic structure of modern Iranian Armenians during the last four centuries, which is expressed in lower diversity of their patrilineal genetic legacy. The imposition of Turkic language to the populations of north-west Iran was realized predominantly by the process of elite dominance,i.e. by the limited number of invaders who left weak traces in the patrilineal genetic history of Iranian Azeris.

A direct characterization of human mutation.

J. X. Sun et al.

Mutation and recombination provide the raw material of evolution. This study reports the largest study of new mutations to date: 2,058 germline mutations discovered by analyzing 85,289 Icelanders at 2,477 microsatellites. We find that the paternal-to-maternal mutation rate ratio is 3.3, and that the mutation rate in fathers doubles between the ages of 15 to 45 whereas there is no association to age in mothers. Strong length constraints apply for microsatellites, with longer alleles tending to mutate more often and decrease in length, whereas shorter alleles tending to mutate less often and increase in length. Based on these direct observations of the microsatellite mutation process, we build a model to estimate key parameters of evolution without calibration to the fossil record. The sequence substitution rate per base pair is estimated to be 1.84-2.21×10-8 per generation (95% credible interval). Human-chimpanzee speciation is estimated to be 3.92-5.91 Mya, challenging views of the Toumaï fossil as dating to >6.8 Mya and being on the hominin lineage since the final separation of humans and chimpanzees.

This microsatellite based estimate of human-chimp speciation contrasts with a recent SNP-based estimate of 7 million years.

Genetic structure of Jewish populations on the basis of genome-wide single nucleotide polymorphisms.

N. M. Kopelman

The Jewish population forms a genetically structured population, due to historical migrations and diverse histories of the various Jewish communities. Discerning the ancestry and population structure of different Jewish populations is important for understanding the complex history of the Jewish communities as well as for research on the genetic basis of disease. Using >500,000 genome-wide single-nucleotide polymorphisms, we investigated patterns of population structure in 438 samples from 30 Jewish populations in the context of additional samples from non-Jewish populations. The collection of Jewish populations studied incorporates a variety of populations not previously included in other genomic population structure studies of Jewish groups (e.g. NM Kopelman et al. 2009 BMC Genet 10:80; G Atzmon et al. 2010 AJHG 86:850-859; DM Behar et al. 2010 Nature 466:238-242; SM Bray et al. 2010 PNAS 107:16222-16227; JB Listman et al. 2010 BMC Genet 11:48). We identify fine-scale population structure within the Jewish samples, including notable distinctions separating Ashkenazi, Mizrahi, Sephardi, and North African populations. Additionally, we identify distinctions within major regional groups, including a separation among the North African populations of Libyan, Moroccan, and Tunisian Jewish samples and a separation among the Mizrahi populations of Bukharan, Georgian, Iranian, and Iraqi Jewish samples. These results supply enhanced information regarding Jewish population structure, providing a basis for further detailed analysis of the genetic history of Jewish populations.

Hopefully the wealth of this new Jewish and non-Jewish data will be made publicly available.

LD patterns in dense variation data reveal information about the history of human populations worldwide.

S. Myers et al.

A detailed understanding of population structure in genetic data is vital in many applications, including population genetic analyses and disease gene mapping, and relates directly to human history. However, there are still few methods that directly utilize information contained in the haplotypic structure of modern dense, genome-wide variation datasets. We have developed a set of new approaches, founded on a model first introduced by Li and Stephens, which fully use this powerful information, and are able to identify the underlying structure in large datasets sampling 50 or more populations. Our methods utilize both Bayesian model-based clustering and principal component analyses, and by using LD information effectively, consistently outperform existing approaches in both simulated and real data. This allows us to infer ancestry with unprecedented geographical precision, in turn enabling us to characterize the populations involved in ancient admixture events and, critically, to precisely date such events. We applied our new techniques to combined data for 30 European populations sampled by us, or publicly available, and the worldwide HGDP data. We find almost all human populations have been influenced by mixture with other groups, with the Bantu expansion, the Mongol empire and the Arab slave trade leaving particularly widespread genetic signatures, and many more local events, for example North African (Moroccan) admixture into the Spanish that we date to 834-1394AD. Dates of admixture events between European groups and groups from North Africa and the Middle East, seen in multiple Mediterranean countries, vary between 800 and 1700 years ago, while Greece, Croatia and other Balkan states show signals of admixture consistent with Slavic migration from the north, which we date to 600-1000AD. At the finest scale, we are able to study admixture patterns in data gathered by a project (POBI) examining people within the British Isles. Our approaches reveal genetic differences between individuals from different UK counties, and show that the current UK genetic landscape was formed by a series of events in the millennium following the fall of the Roman Empire.

Existing methods (see comments below) for dating historical admixture events differ from each other by a factor of two, and they all assume a 2-population model. Hopefully the research described here will be an improvement, especially if it is encapsulated in an easy-to-use piece of software. It will definitely be interesting to see the evidence for Slavic admixture in the Balkans, which probably corresponds somewhat to the "East European" component discovered in the Dodecad Project which differentiates Balkan populations from their Italian and West Asian neighbors.

Evidence for extensive ancient admixture in different human populations.

J. Wall et al.

We generated whole-genome sequences from four Biaka pygmies and analyzed them along with the publicly available genomes of 69 individuals from a range of different ethnicities. We scanned each of the 73 genomes for regions with unusual patterns of genetic variation that might have arisen due to ancient admixture with an ‘archaic’ human group. While a majority of the most extreme regions were really misalignment errors, we did find hundreds of regions that likely introgressed in from archaic human ancestors, and we estimate the amount and the timing of these ancient admixture events. These regions were found in the genomes of both sub-Saharan African and non-African populations. While Neandertals are a natural source population for ancient admixture into non-Africans, the source for ancient admixture into sub-Saharan African populations is less obvious.

Wall and Hammer have been arguing for archaic admixture foryears, and there's a good chance they finally found the "smoking gun" here. I've argued before that Homo sapiens was not the only species in Africa at the time of its emergence, due to the great ecological diversity of the continent, and the long adaptation of humans there. We are unlikely to ever be able to find and sequence Paleolithic non-sapiens Homo from tropical Africa, but the signal is there to be discovered in modern African hunter-gatherers.

Validating the authenticity of the pedigrees of Chinese Emperor CAO Cao of 1,800 years ago.

H. Li

Deep pedigrees are of great value for studying the Y chromosome evolution. However, the authenticity of the pedigree information requires careful validation. Here, we validated some deep pedigrees in China with full records of 70-100 generations spanning over 1,800 years by comparing their Y chromosomes. The present clans of these pedigrees claim to be descendants of Emperor CAO Cao (155AD-220AD). Haplogroup O2-M268 is the only one that is enriched significantly in the claimed clans (P=9.323×10-5, OR=12.72), and therefore, is most likely to be that of the Emperor. Moreover, our analysis showed that the Y chromosome haplogroup of the Emperor is different from that of his claimed ancestry of the earlier CAO aristocrats (Haplogroup O3-002611). This study offers a successful showcase of the utility of genetics in studying the ancient history.

This is probably the oldest attested Y-chromosome lineage currently available. Confucius next? It will be interesting to know how many likely Cao descendants there are today, as a control on the rate with which a socially-selected lineage can grow.

Exceptions to the "One Drop Rule"? DNA evidence of African ancestry in European Americans.

J. L. Mountain et al.

Genetic studies have revealed that most African Americans trace the majority (75-80%, on average) of their ancestry to western Africa. Most of the remaining ancestry traces to Europe, and paternal lines trace to Europe more often than maternal lines. This genetic pattern is consistent with the "One Drop Rule,” a social history wherein children born with at least one ancestor of African descent were considered Black in the United States. The question of how many European Americans have DNA evidence of African ancestry has been studied far less. We examined genetic ancestry for over 77,000 customers of 23andMe who had consented to participate in research. Most live in the United States. A subset of about 60,000 shows genetic evidence of fewer than one in 16 great-great-grandparents tracing ancestry to a continental region other than Europe. They are likely to consider themselves to be entirely of European descent. We conducted two analyses to understand what fraction of this group has genetic evidence of some ancestry tracing recently to Africa. We first identified individuals whose autosomal DNA indicates that they are predominantly of European ancestry, but who carry either a mitochondrial (mt) DNA or Y chromosome haplogroup that is highly likely to have originated in sub-Saharan Africa. Of the 60,000 individuals with 95% or greater European ancestry, close to 1% carry an mtDNA haplogroup indicating African ancestry. Of approximately 33,000 males, about one in 300 trace their paternal line to Africa. We then identified the subset of these European Americans who have estimates of between 0.5% and 5.0% of ancestry tracing to Africa. This subset constitutes about 2% of this set of individuals likely to be aware only of their European ancestry. The majority (75%) of that group has a very small estimated fraction of African ancestry (about 0.5%), likely to reflect African ancestry over seven generations (about 200 years) ago. We estimate that, overall, at least 2-3% of individuals with predominantly European ancestry have genetic patterns suggesting relatively deep ancestry tracing to Africa. This fraction is far lower than the genetic estimates of European ancestry of African Americans, consistent with the social history of the United States, but reveals that a small percentage of “mixed race” individuals were integrating into the European American community (passing for White) over 200 years ago, during the era of slavery in the United States.

Hopefully this was not done with 23andMe's "Ancestry Painting" that grossly overestimates European ancestry with even East Africans and South Asians often getting >90% "European". The search for non-white ancestry seems to be a favorite pastime of many people who test at 23andMe, so this could potentially bias the results; on the other hand, I've encountered many, many more people who are seeking that illusive Amerindian ancestor of family lore, so, perhaps this is not as big of a problem for the detection of African ancestry.

Estimating a date of mixture of ancestral South Asian populations.

P. Moorjani

Linguistic and genetic studies have shown that most Indian groups have ancestry from two genetically divergent populations, Ancestral North Indians (ANI) and Ancestral South Indians (ASI). However, the date of mixture still remains unknown. We analyze genome-wide data from about 60 South Asian groups using a newly developed method that utilizes information related to admixture linkage disequilibrium to estimate mixture dates. Our analyses suggest that major ANI-ASI mixture occurred in the ancestors of both northern and southern Indians 1,200-3,500 years ago, overlapping the time when Indo-European languages first began to be spoken in the subcontinent. These results suggest that this formative period of Indian history was accompanied by mixtures between two highly diverged populations, although our results do not rule other, older ANI-ASI admixture events. A cultural shift subsequently led to widespread endogamy, which decreased the rate of additional population mixtures.

I have previously highlighted that ROLLOFF, the method used by these authors produces age estimates that are about half the age of HAPMIX and StepPCO. As of this writing, ROLLOFF does not seem to be available for independent evaluation, so it is not entirely clear to me whether it, or the older methods, are right. It would be great if this issue is dealt with in the publication arising from this research.

Another issue that must be dealt with is the spurious inference that Ancestral North Indians are more closely related to Europeans than to West Asians in the previous publication on the ANI/ASI division, an inference that was an artifact of unequal sample sizes between Adygei and CEU.

Synthesis of autosomal and gender-specific genetic structures of the Uralic-speaking populations.

K. Tambets et al.

The variation of uniparentally inherited genetic markers - mitochondrial DNA (mtDNA) and non-recombining part of Y chromosome (NRY) - has suggested somewhat different demographic scenarios for the spread of maternal and paternal lineages of North Eurasians, in particular those speaking Uralic languages. The west-east-directed geographical component has evidently been the most important factor that has influenced the proportion of western and eastern Eurasian mtDNA types among Uralic-speakers. The palette of maternal lineages of Uralic-speakers resemble that of geographically close to them European or Western Siberian Indo-European and Altaic-speaking neighbours. However, the most frequent in North Eurasia NRY type N1c, that is a common patrilineal link between almost all Uralic-speakers of eastern and western side of the Ural Mountains, is rare among Indo-European-speakers, with a notable exception of Latvians, Lithuanians and North Russians. In this study the information of genetic variation of uniparentally inherited markers in Uralic-speaking populations from 13 Finno-Ugric and 3 Samoyedic speakers is combined with the results of their genome-wide analysis of 650 000 SNPs (Illumina Inc.) to assign their place in a landscape of autosomal variation of North Eurasian populations and globally. The genome-wide analysis of the genetic profiles of studied populations showed that the proportion between western and eastern ancestry components of Uralic-speakers is concordant with their mtDNA data and is determined mostly by geographical factors. Interestingly, among the Saami - the population which is often considered as a genetic outlier in Europe - the dominant western component is accompanied by about one third of the eastern component, making the Saami genetically more similar to Volga-Finnic populations than to their closest Fennoscandian-East Baltic neighbors. The high frequency of pan-northern-Eurasian paternal lineage N1c among Saami cannot explain this phenomenon alone - genetic ancestry profiles of autosomes of other Finnic- and Baltic-speaking populations, who share the high N1c with the Saami, do not show a considerable eastern Asian contribution to their genetic makeup.

This study seems to include more Northern Eurasian references, but we will have to wait and see how its components are defined. Notice the slight discrepancy between its eastern Saami estimate (1/3) and that of the following study (22%), which is probably an artefact of the different range of samples used.

Population genetics of Finland revisited - looking Eastwards.

K. Rehnström et al.

We have previously reported that the genetic structure within Finland correlates well both with geography and known population history. While these studies have quantified the genetic distances between Finland and European neighbours to the south and the west, the influence of the Eastern and the Northern populations have not been described using genome-wide tools. Here we investigated the degree of Asian ancestry in Northern Europe. We also studied the genetic ancestry of geographic and linguistic neighbours of Finns, using genome-wide SNP data in a dataset comprising over 2200 individuals. First we quantied the proportions of European (represented by HapMap CEU) and Asian (HapMap CHB/JPT) genetic ancestry. Within Finland, the average Asian ancestry proportion varied from 2.5% in the Swedish speaking Finns to 5.1% in Northern Finland. The Saami population, being the indigenous inhabitants of Northern Finland, showed a surprisingly high proportion of Asian genetic ancestry (17.5%). We therefore hypothesize that, as genetic sharing between individuals in Northern Finland and Saami are higher than in other parts of the country, the Asian genetic ancestry in Finland could partly be through admixture with the Saami. Using a model-based estimation of individual ancestry, three ancestral populations provided a best fit for the combined Finnish and Saami dataset. Particularly, one of these ancestral populations was predominant in the Saami (average 78%), and higher in Northern Finland (average 14%) compared to the rest of the country (average 4%). Despite the fact that Finns are the closest relatives of the Saami of all populations included in this study, in general, our results show that language and genetics are only weakly related. The Finns are more closely related to most Indo-European speaking populations than to linguistically related populations such as the Saami. These analyses are currently being extended to sequence level variation using genome-wide sequence data for 100 Finns as part of the 1000 Genomes project, and 200 further individuals from the North-Eastern Finnish subisolate of Kuusamo. These 200 individuals provide good power to identify founder haplotypes within this isolate. Next, we aim to investigate the power to extend the imputation of haplotypes to the rest of Northern Finland as well as to the rest of the country.

It is unfortunate that these researchers used HapMap populations to study admixture in Finns; the Chinese are, especially, not a very good proxy for the East Eurasian element in the Finnish population. There are much data available on North Eurasian populations at this point, so I find the continued use of HapMap populations puzzling; hopefully this will be remedied when this research finds itself in the journals.

The current Dodecad estimate of East Eurasian admixture in the 1000 Genomes FIN population is 5.9%, the bulk of which is "Northeast Asian", a component which peaks in Nganasan, Chukchi, and Koryak, and is also well-represented in Central Siberia among Selkups. I don't have 5 Swedish-speaking Finns to report an average yet, but the ones I have are in the ~2-4% "Northeast Asian" range.

I also ran a quick test of FIN together with CEU and CHB and ~186k SNPs I am currently considering for the next version Dodecad v4 of my ancestry analysis. At K=2, FIN is 3.7% Asian, which seems consistent with the authors reporting the highest Asian ancestry of 5.1% in northern Finland, and also shows how the use of CHB as an Asian reference underestimates the degree of Eastern Eurasian admixture.

8 comments:

Our analyses suggest that major ANI-ASI mixture occurred in the ancestors of both northern and southern Indians 1,200-3,500 years ago, overlapping the time when Indo-European languages first began to be spoken in the subcontinent.

That may explain why there is still significant and consistent genetic difference between the castes of a certain region in terms of ANI-ASI ratio in South Asia (especially in the southern and central regions). If the major ANI-ASI mixture occurred in South Asia thousands of years before the Aryan invasion, ANI-ASI ratio differences should be expected to be less or even non related to the caste system, which is almost certainly a legacy of the Aryan invasion.

The Turkic source population (=the Turkic population that invaded what is now Turkmenistan, present-day Azeri lands and Anatolia) isn't present-day Turkmens nor can it be represented by present-day Turkmens, who are themselves the result of mixture in the centuries following the Seljuq invasion between the Turkic source population and Iranian natives of what is now Turkmenistan (what is now Turkmenistan was almost exclusively Iranian prior to the Seljuq invasion) thus clearly must have less Mongoloid than the Turkic source population has. The Turkic source population lived almost exclusively in what is now Kazakhstan until the Seljuq invasion, so we need to look at Kazakhstan (both through modern and ancient DNA studies) to find out the Turkic source population, which invaded with the Seljuqs in the twinkling of an eye what is now Turkmenistan, present-day Azeri lands and Anatolia in the 11th century and then gradually Turkified the natives of those lands during the following centuries.

The imposition of Turkic language to the populations of north-west Iran was realized predominantly by the process of elite dominance,i.e. by the limited number of invaders who left weak traces in the patrilineal genetic history of Iranian Azeris.

Based on the facts I stated in my second comment in this thread, genetic traces of the Turkic invaders must be even weaker than what the authors of this paper seem to suggest (this is true not just for Azeris, but also for Turks).

Onur, what do you mean by switching to a Turkic language? Is it only from a non-Turkic language or does it also include switching from a Turkic language?

Carlos, of course I meant only switching from a non-Turkic language to a Turkic language; it is clear from the context of my statement.

"We find that the paternal-to-maternal mutation rate ratio is 3.3, and that the mutation rate in fathers doubles between the ages of 15 to 45 whereas there is no association to age in mothers"

That makes it impossible that any 'molecular clock' will be found for the Y-chromosome.

"Human-chimpanzee speciation is estimated to be 3.92-5.91 Mya, challenging views of the Toumaï fossil as dating to >6.8 Mya and being on the hominin lineage since the final separation of humans and chimpanzees".

I thought when the Toumai fossil was first found that it could just as easily be a chimp ancestor as a human one. This date suggests it could be a common ancestor of both.

"Our analyses suggest that major ANI-ASI mixture occurred in the ancestors of both northern and southern Indians 1,200-3,500 years ago"

Even doubling the date only takes us back to 7000 years. This suggests that the two populations remained separated for thousands of years, unless ... Homo sapiens is not actually so very ancient in South Asia. Perhaps the South Asian population is made up of people who came in from the northwest and those from the east. The haplogroup evidence goes some way to supporting that scenario.

Old Blog Archive

Dienekes' Anthropology blog is dedicated to human population genetics, physical anthropology, archaeology, and history.

You are free to reuse any of the materials of this blog for non-commercial purposes, as long as you attribute them to Dienekes Pontikos and provide a link to either the individual blog entry or to Dienekes Anthropology Blog.

Feel free to send e-mail to Dienekes Pontikos, or follow @dienekesp on Twitter.