March 28, 2012

A rare look at the Y chromosomes of Afghanistan

I often bemoan the fact that some of the regions of the world that are most interesting to the student of prehistory (e.g., Mesopotamia and the Iranian Plateau) seem to also be the ones with more than their fair share of political trouble, hindering efforts to study them with the newest set of tools. Afghanistan is certainly one case that hasn't been quite the most welcoming of places in recent decades.

The country is transitional between the Iranic speaking world of Iran and the Indo-Aryan speaking world of South Asia, as well as between the Indo-Iranian world and the (mostly) Turkic-speaking world of Central Asia. Hence, the absence of data for that country has been acutely felt for all those who are trying to understand "what happened" in Eurasia.

The appearance of a new paper by the Genographic Project is a welcome sight, and a good example of what is best about this Project. I haven't been exactly a fan of the Genographic's interpretation of their own data, but kudos to them for getting them in the first place.

From the paper:

Pashtuns are the largest ethnic group in Afghanistan, accounting for about 42 percent of the population, with Tajiks (27%), Hazaras (9%), Uzbeks (9%), Aimaqs (4%), Turkmen people (3%), Baluch (2%), and other groups (4%) making up the remainder [6]. In the present study, eight ethnic groups were examined, with a focus on the largest four groups: - The Pashtuns, traditionally lived a seminomadic lifestyle, they reside mainly in southern and eastern Afghanistan and in western Pakistan. They speak Pashto which is a member of the Eastern Iranian languages. - The Tajiks are a Persian-speaking ethnic group which are closely related to the Persians of Iran. In Afghanistan, they are the largest Tajik population outside their homeland to the north in Tajikistan. - The Hazara population speaks Persian with some Mongolian words. They believe they are descendants of Genghis Khan's army that invaded during the twelfth century. - The Uzbeks are a Turkic speaking group that have been living a sedentary farming lifestyle in Northern Afghanistan.

The main features of the Y-chromosome gene pool:

Genotyping revealed 32 halpogroups present in Afghanistan's ethnic groups among our samples. Haplogroups R1a1a-M17, C3-M217, J2-M172, and L-M20 were the most frequent when Afghan ethnic groups were pooled, together comprising >66% of the chromosomes. Absolute and relative haplogroup frequencies are tabulated in Table S4.

-The PCA analysis (left) showcases wonderfully the correspondence between different haplogroups and the three main regions of the Near East (green), South Asia (yellow), and Central Asia (purple).

It is a real shame that the newer markers available within the most prominent R-M17 haplogroup were not tested:

The prevailing Y-chromosome lineage in Pashtun and Tajik (R1a1a-M17), has the highest observed diversity among populations of the Indus Valley [46]. R1a1a-M17 diversity declines toward the Pontic-Caspian steppe where the mid-Holocene R1a1a7-M458 sublineage is dominant [46]. R1a1a7-M458 was absent in Afghanistan, suggesting that R1a1a-M17 does not support, as previously thought [47], expansions from the Pontic Steppe [3], bringing the Indo-European languages to Central Asia and India.

Nonetheless, I can't really disagree with the dismissal of the R-M17/Indo-European theory. R-M17 is simply too populous in South Asia to be the genetic legacy of "Indo-Europeans": (i) under an elite-dominance model, its frequency is way too high (compared to well-attested examples of elite dominance, e.g., Hungary or Turkey where the genetic legacy of the elite element is in the minority), (ii) under a folk migration model, it is difficult to understand why a hypothetical migrating Indo-European people would have such an overwhelming influence in the region while at the same time hardly influencing at all other densely occupied agricultural landscapes of the Eurasian steppe periphery; moreover, no autosomal signal corresponding to a migration from eastern Europe to South Asia really exists -the main cline of variation links South with West Asia, not Europe- and the small signal that does exist does not really correspond to observed levels of R-M17.

From the paper:

The E1b1b1-M35 lineages in some Pakistani Pashtun were previously traced to a Greek origin brought by Alexander's invasions [48]. However, RM network of E1b1b1-M35 found that Afghanistan's lineages are correlated with Middle Easterners and Iranians but not with populations from the Balkans.

Greek populations are not homogeneous in their haplogroup E frequencies, so it would be useful to consider the possibility that the lack of this frequent Southeastern European haplogroup in South Asia may not reflect a complete lack of Greek influence in this region, but rather, an influence from a structured ancient Greek population.

Looking at the Y-haplogroup composition:

A few points of interest:

The clear link between C/N/O with Central Asia

A clear difference between Persian and Pashto speakers in terms of inverse J2a/R1a frequences

The paucity of J1 chromosomes (only 1 Tajik) testifies to the absence of relatively recent Middle Eastern influences associated with the spread of Islam; consistent with the absence of the autosomal "Southwest Asian" component in South/Central Asia.

Paucity of R1b, except in a couple Uzbeks and a Tajik; I have argued before that R1a had an early distribution in the arc of flatlands north and east of the Caspian, while R1b a complementary distribution in the smaller arc of the highlands west and south of it, out of which the Tocharians may have originated.

The small Nurestani sample comprises of J2a, R1a, and R2; these are linguistic relatives of the Kalash of Pakistan who -unlike the latter- were converted to Islam in the 19th century.

I would say that the evidence is pretty clear that the earliest Iranians may have included haplogroups R1a and J2, although I would not wager on their relative proportions and overall contribution to modern Iranian-speaking populations. For whatever reason, it seems that Kurds and Persians ended up with a J2-over-R1a advantage, while Pathans and (plausibly) Turkified Central Asian former Iranian speakers with the reverse. Nonetheless, the occurrence of both haplogroups in most Iranian groups, as well as in most Indo-Aryan ones is quite telling. It is unfortunate that the relationships between these Y chromosomes (still J2a*! six years after Sengupta et al.) and their West Eurasian brethren was not further pursued.

Hopefully, the data can be re-used down the road once the phylogeny of different haplogroups (and R1a in particular) is better understood. As I've stated before on this blog, I take Y-STR based age estimates with a huge grain of salt, so I would not put much faith in any of the ones presented in this paper.

Afghanistan has held a strategic position throughout history. It has been inhabited since the Paleolithic and later became a crossroad for expanding civilizations and empires. Afghanistan's location, history, and diverse ethnic groups present a unique opportunity to explore how nations and ethnic groups emerged, and how major cultural evolutions and technological developments in human history have influenced modern population structures. In this study we have analyzed, for the first time, the four major ethnic groups in present-day Afghanistan: Hazara, Pashtun, Tajik, and Uzbek, using 52 binary markers and 19 short tandem repeats on the non-recombinant segment of the Y-chromosome. A total of 204 Afghan samples were investigated along with more than 8,500 samples from surrounding populations important to Afghanistan's history through migrations and conquests, including Iranians, Greeks, Indians, Middle Easterners, East Europeans, and East Asians. Our results suggest that all current Afghans largely share a heritage derived from a common unstructured ancestral population that could have emerged during the Neolithic revolution and the formation of the first farming communities. Our results also indicate that inter-Afghan differentiation started during the Bronze Age, probably driven by the formation of the first civilizations in the region. Later migrations and invasions into the region have been assimilated differentially among the ethnic groups, increasing inter-population genetic differences, and giving the Afghans a unique genetic diversity in Central Asia.

R1a* (virtually a rare lineage that split off the mainstream tree right after the first R1a came to existance) was found in Iran, Anatolia, Italy, Germany, France, Britain. Quiet facinating "marching route".

But there is this problem:

R1a1a* (rare lineage that split of the maintree short after M17 came to existance) was found in Britain, France, Germany, Spain, Italy and Yugoslavia.

Makes it look like M17 started existance in Western Europe. But maybe its just an oddity created by drift (exotic M17 vannished from all other places exept Western Europe) + sample bias on Western europe.

Dienekes raises important points on elite dominance and the tiny signature it would leave in most cases.

Dare I say that the percentages are so small that there is a bias toward studying the more widespread haplogroups? Certainly this bias exists in the "genetic genealogy" boards where theories on how R1b males as the Ueber-race are common. (Nevermind that it is the most common Hg of the posters, hahaha).

If one reads one's Cavalli-Sforza and Cunliffe, one finds a decades old prediction for the spread and distribution of western Eurooean megalithic culture, and subsequent genetic data indicate this matches the distribution of I-M26. Simply put, I-M26 males can be found currently in every town, port, and township where a megalith can be found, at a frequency of 1-3% -- a stunning coincidence. And there are other promising theories, resting on elite dominance, for several other minor clades. But by and large the focus remains on the major clades.

I'm impressed that the same has some R1a, R1b, R2 and R*, as well as F* (with R, of course, derived from F). Afghanistan has to be located close to the root of Y-DNA hg R in general to sport that kind of diversity.

The H and L are quintessenitally South Asian, with L quite common in the sample. L and R2 both have Indus River Valley distributions, and a plausible source of both in Afghanistan would be late Harappan during the period when the Harappans established BMAC trade colonies to the Northwest of the Indus River Valley.

"The prevailing Y-chromosome lineage in Pashtun and Tajik (R1a1a-M17), has the highest observed diversity among populations of the Indus Valley. R1a1a-M17 diversity declines toward the Pontic-Caspian steppe where the mid-Holocene R1a1a7-M458 sublineage is dominant. R1a1a7-M458 was absent in Afghanistan, suggesting that R1a1a-M17 does not support, as previously thought, expansions from the Pontic Steppe, bringing the Indo-European languages to Central Asia and India."

The notion of R1a1a-M17 being an Indus River Valley marker, together with R2 and L, is certainly suggestive, again, of a lot of Afghanistan having a genetic source as a Harappan colony. Prior to Harappan agriculture adapted for the Afghan high desert by the BMAC civilization (and perhaps a predecessor or two), Afghanistan would only have been able to support a thin, semi-nomadic population that could easily have drifted to somewhere else in Central Asia or Iran after facing population pressure from Harappan colonist farmers, so the notion of the current population of Afghanistan being traceable to that era isn't implausible. And, an elite dominance impact from Central Asian/SW European Indo-Europeans makes much more sense if they are absorbing a farming civilization, than a bunch of semi-nomadic herders or hunter-gatherers. Farmers stay stuck to the land.

Hard to know what to make of the three (relatively geographically close but genetically distinct) men with Y-DNA hg B, an East African/Paleoafrican marker. My best guess would be remnants from the Indian Ocean trade who continued to follow their cargo on a Silk Road caravan. Because of the Silk Road, Afghanistan was cosmopolitan part of the ancient world as its blend of Eastern and Western Y-DNA hgs suggest. But if so, why isn't there Y-DNA hg T, so common on the Indian Ocean coast of Africa? Then again, a low frequency Y-DNA hg could easily be omitted in a sample of this size due to random sampling issues.

"E1b1b1-M35 found that Afghanistan's lineages are correlated with Middle Easterners and Iranians."

@moreisbetter The coincidence of Y-DNA hg I generally, and I-M26 in particular, with the geographic expanse of the meaglithic culture is not as wonderful as you would suggest. Many megalithic centers (Portugal, Wales, England, Scottish Isles, and South Sweden) have only trace (less than 1%) I-M26 (pre Rootsi 2004), but it makes up more than 40% of the male Sardinian population. It is not found in Norway, among the Danes or among the Dutch that had megaliths. Also I-M26 is found at appreciable frequencies in places in the interior of France and North Africa that lack megalithic ties (e.g. Macedonia and the Czech Republic) at percentages greater than some megalithic areas. Different Y-DNA I haplogroups are predominant in many places where I-M26 is found at mere trace levels, and many of these regions have a great diversity of quite distant Y-DNA hg I clades.

Elite dominance can make sense as a theory and perhaps even in Afghanistan. But, there seem to be lots of instances where we see what looks like more than an elite dominance impact in Indo-European influenced areas, so looking to trace impacts in a place that had lots of traders passing through for millenia seems like a weak strategy for demonstrating that.

Since G2c-M377 is 2% of Mesitzos from Merida Mexico, ~5-10% of Ashkenazi Jews, and about 6% of Pathans (of the Ghori tribes, and their descendants among other tribes like Afridis) then maybe the "interested" parties would like to PAY for a FTDNA 454 Y full sequence test for a Ashkenazi G2c-M377 that matches them, and a Pathan G2c1-M283s. Or maybe find a G2c-M377 from Merida and test him instead.

No, I never said "Nephites". No one believes that. Yucatan? It's all about "Family History" and those nice "Centers". It's a free public genealogical service, with no other motivations in mind.

The E1b1b1-M35 lineages in some Pakistani Pashtun were previously traced to a Greek origin brought by Alexander's invasions [48]. However, RM network of E1b1b1-M35 found that Afghanistan's lineages are correlated with Middle Easterners and Iranians but not with populations from the Balkans.Greek populations are not homogeneous in their haplogroup E frequencies, so it would be useful to consider the possibility that the lack of this frequent Southeastern European haplogroup in South Asia may not reflect a complete lack of Greek influence in this region, but rather, an influence from a structured ancient Greek population."

There is no reason to presume a Greek origin of the haplogroup E Y-DNA found in this study's samples from Afghanistan. Only one of the five Afghanistani haplogroup E individuals, an E1b1b1a1-M78(xV12,V22,V65) Uzbek from Mazar-e Sharif, Balkh Province in northernmost Afghanistan (near the border with Surkhandarya Province of Uzbekistan and Khatlon Province of Tajikistan), may possibly belong to E-V13, the clade that is quite common in modern Greeks and other populations of southeastern Europe. The other three Uzbeks from Mazar-e Sharif sampled in this study belong to haplogroups N1-LLY22g, R1a1a-M17(xM458), and R1b1a2-M269(xU106).

As for the other haplogroup E individuals, one is a Baloch whose father is from Nimruz Province of southwesternmost Afghanistan and who belongs to the E1b1b1a1c-V22 subclade, and the remaining three individuals are Hazaras whose fathers are from Balkh, Samangan, or Baghlan, three contiguous provinces in northern Afghanistan near the border with Uzbekistan and Tajikistan. These three Hazaras belong to the E1b1b1c1-M34 subclade, which, as I am sure you know, is common among Semites and other populations in the vicinity of Southwest Asia, but not among Greeks.

There is no reason to presume a Greek origin of the haplogroup E Y-DNA found in this study's samples from Afghanistan.

The point is that because haplogroup E has a variable frequency in Greek populations, its relative absence cannot be interpreted as a lack of Greek input in these populations, since we do not know whether it was existent/frequent in all ancient Greek populations.

The argument: "No E, or No E-V13 => No Greek influence" presupposes that E and E-V13 would be present in the relavant ancient Greek populations (principally Macedonians and Ionian Greeks), for which we do not have actual data.

I understand your point, but if not haplogroup E, then what might be considered a genetic trace of the (supposedly numerous) Greek inhabitants of Seleucid Bactria and latterly the Greco-Bactrian Kingdom? Neither E-V13 nor R1b-M269, two of the most frequently occurring Y-DNA haplogroups in modern Greek populations, is found with significant frequency in modern populations that inhabit the territories of the ancient Greek(-influenced) kingdoms of South-Central Asia.

In general, the Y-DNA pool of the Pashtuns of Afghanistan seems to be characterized by a high frequency of R1a1a-M17(xM458), with a minority of males belonging to L1c-M357. The Pashtuns of a small but densely populated area of east-central Afghanistan in the vicinity of the national capital, Kabul, exhibit an unusual concentration of haplogroup Q-M242(xM378,MEH2), along with some haplogroup H-M69. The high frequency of G2c-M377 in this study's small sample of Pashtuns from Wardak Province is probably just a fluke.

Andrew,It has to sample error on the B and M1 part. The strange thing is I also came across haplogroup O3 in Spain which is really weird. It's really strange that R1b1b1-M73 it's found in high frequencies in this 2006 study Sengupta et al. (2006) with an frequency of 32% but it's found 0% in this study of yours. Also Haplogroup O3 another marker that came from Mongols it's not on your study but it's 8% in this Hazara study. I don't know about the other groups, but something is definitely wrong with this Hazara study.

Old Blog Archive

Dienekes' Anthropology blog is dedicated to human population genetics, physical anthropology, archaeology, and history.

You are free to reuse any of the materials of this blog for non-commercial purposes, as long as you attribute them to Dienekes Pontikos and provide a link to either the individual blog entry or to Dienekes Anthropology Blog.

Feel free to send e-mail to Dienekes Pontikos, or follow @dienekesp on Twitter.