June 01, 2008

Wise words on Y chromosome phylogeography

How can you determine when and where a lineage originated? And how does the origin and spread of a lineage relate to what we think of as the origin of a population? These are rather contentious issues. According to the simplest way of thinking, current high frequency and high diversity may mark the place of origin of a lineage; but high frequency can also arise by genetic drift, and high diversity by admixture. The time can be calculated in several ways, and a wide range of mutation rates can be used, so molecular dates are much less certain than archaeological ones. In thinking about the second question, we can paraphrase the Italian geneticist Guido Barbujani: imagine that at some time in the future Indian astronauts colonise Mars, and geneticists then type their Y chromosomes. We may well find that their lineages date back to 9,000–20,000 years ago. But we would not be wise to infer that they have been living on Mars for 9,000 years.

27 comments:

Diversity can be tricky certainly. In some other context someone mentioned that Brazil has much higher genetic diversity than Portugal, probably even if considering only European genes, what, as we know, doesn't mean that Brazilians colonized Portugal, but that Brazilians have a much wider array of parental sources.

So yes, it's an difficult issue. I think the correct approach must be to ponder all the elements like any good detective (or like any good prehistorian actually). Certainly genetics alone can't replace archaeology but can give very important clues of archaeological value, maybe as important as the classical bones and stones.

Wise words are always welcome We have to agree with some of the points made by the authors regarding India.

At the same time most of the authors and scientists make so many conclusions which leads to doubt their motive for interpretation.

The so called Fertile Crescent:

Lets look. How much that area is suitable for human inhabitation and growth to human dispersal in last 60000 years. That area has miniscule proportion of language, cultural and genetic diversity or history except Egypt. Even current genetic diversity is negligible.

Just because it is close to Europe and Bible and Quran are originated there it can not be source for human dispersal.

The wise words are cautioning the works of Sengupta. But they are not guiding us to reasonable answer based on the current genetic dataset. for that matter no body want to show interest after the initial interest.

People are only ready to give second hand recognition to region around India as it is not close to Europe and Abraham belt.

They can not deny Jwalapuram proof from Toba eruption.They can not deny Southern dispersal theory.They can not deny massive of population mt haplogroup M in India.They can not deny availability of all derivatives of F They can not deny availability of all R haplogroups more than All other areas.They can not deny sprnkling of ( not originated) of haplogroups starting from India only.

If not India it has to be a place close to India like central Asia. It is definitely not this fertile or golden or some mythical crescent.

- the apparent greater K diversity in Melanesia, that is a very unlikely urheimat for that clade as whole considering all the rest (geography, lack of archaelogical connections...)

Stuff like that. For the case of K I actually think the best explanation is a branching out in northern South Asia, probably near the Ganges, so some branches (M, NO, S) went eastward (with C and D, or after them) and others (L, P, T) westward (with IJ and G, or after them).

Central Asia is not really necesary to explain most of the migrations and looks like a relatively undesirable choice (too cold and arid) for early human colonists, but it has some archaeology of its own that should not be ignored anyhow.

You see now Dienekes?So much about Genetics stability and credibility, ah?When i told you that Genetics is a very young science and that changes in its findings occur from month to month and that they can't be used to show things for sure (on the contrary with Linguistics) you said that this was wrong!!!And what about now?Doesn't this post agree with my comment?Did n't i say that patterns and affinities of various Hgs which are used to show the background and identity of various populations are changing every month or two (maximun two years of stability) and thus they are not very reliable?Didn't i say that since the things that we can conclude from Genetics are so volatile how can you so easily accept them while you are so circumspect to accept the Linguistic evidence in the Indoeuropean issues which has a 200 years presence?Do you remember all these?Do you remember me saying that the Genetics science is a million times more fragile and unstable than Linguistics?What do you have to say now?

Maju. Note the haplogroup Os came to India from the east. They didn't evolve in India. I though C* was found in India as well. Not mentioned. Did it too come from the east? I know C5 is found mainly in the north, specifically Pakistan.

Indian O is surely a back-migration and likely not too old anyhow, as it's most related with Austroasiatic tribes.

But C* is specially found in India, mostly in the South. As for C5, ISOGG mentions it's mostly found in India with one single instance found in Pakistan - so it does not look very much notherner or westerner, really.

But as for the coastal model, when going through India, people probably took as well other routes, specially the Narmada-Son-Ganges one and anothern between Cape Rama and Coromandel Coast, both spontaneously present in the GIS modelling of J.S. Field (2007) and well attested arcaheologically. The longer purely coastal route also appears in the GIS model but it's not so well attested archaeologically, surely because the coast of that period is now submerged.

So people going from the coasts of the Arabian Sea to those of the Bay of Bengal probably took three different routes to cross India (plus whichever westward or northward routes), maybe meeting again, after many generations, at the end of these journeys at Bengal (or maybe not).

Based on what we see now, I'd say that people with Y-DNA K may have taken the Narmada route through the north primarily, while those with its brother clade H went along the coast to the south, along with some "C people". IJ and G instead may have tended to head westward or remain near Hormuz instead but I guess they only really headed to West Asia together with at least one subclade of K: T, much like C and D may have headed eastwards with subclades of K too. In fact C and K derivates are almost always found coupled East of India, what would seem to support this joint miration to SE Asia (and beyond it).

"But C* is specially found in India, mostly in the South. As for C5, ISOGG mentions it's mostly found in India with one single instance found in Pakistan - so it does not look very much notherner or westerner, really."

Also, the single instance of haplogroup C5 that has been reported in a published study of Y-DNA from Pakistan noted that it was found in a sample of Brahui, a semi-Dravidian tribe from southern Pakistan. Haplogroup C5 could have been recently introduced into the Brahui in Pakistan due to migration from Dravidian-speaking regions of India.

Maju wrote: "So people going from the coasts of the Arabian Sea to those of the Bay of Bengal probably took three different routes to cross India." The branches in C are unresolved and deeprooted. If your comment were true wouldn't we be able to discern these three divisions: and a gradual regional diversification in C with the Indian ones being closer to each other than they are to more distant ones? All branches of C are basically confined to separate regions. It seems in fact that C diversified early after a rapid expansion. It may have been through India, but there's no need to assume it was without more evidence. C5 and C* are actually as different from each other as are any other two Y-chromosome C haplogroups. Suggests they originated in two different regions. C* is actually found in island SE Asia and I've read a paper from some Indian scientists that it is found mostly in coastal regions of India. (Can't find the paper at the moment).

C* is not a haplogroup but a paragroup. It's the same as saying "undifferentiated C". But we really don't know how it's differentiated downstream of the C node. Surely different individuals or groups in this paragroup have different (unknown) SNP mutations and maybe some are frequent enough to define new haplogroups in the future, when more research is done.

Suggests they originated in two different regions.

Not to me. With the exception of C3 and C1, all clades, including C*, are found in an arch from India to Australia, so either India or SE Asia should be the main diversificaton hub. C1 and C3 must have arrived to NE Asia via the coastal route early on, IMO.

Surely semi-open grassland has been hunter/gatherers' prefered habitat through most of our existence. Why is it that the most obvious route for human movement out of Africa, through the region today known as Israel and Jordan, is so totally dismissed? Even today the region lies on a fairly well defined boundary between the African (E) and one Asian (F-T, in the form of F's descendant J) Y-chromosome lines. The chromosomes have flowed each way across the boundary though. Modern tribal groups in the region usually have representatives from both Y-chromosome lines. The Y-chromosomes within each tribe are mixed but obviously cannot be mixed in individuals. Incidently, did Abraham have Y-chromosome J or E?

Although the four lines C, D, E and F-T probably belonged to a single population it's unlikely they each reached fixation in just four neighbouring valleys. E is African and F presumably Indian so the population was at least that widespread. Why on earth must we assume C and D too must have been confined within just this region?

If E and F are widely separated why on earth should not D and C have arisen in even more distant regions?

Regarding C*, I believe the authors claim this is now a much smaller catchall group as they have placed most of it in another group.

Surely semi-open grassland has been hunter/gatherers' prefered habitat through most of our existence.

Why do you have that so clear. I don't imagine the Pygmies or other now semi-agricultural peoples of the jungles of Papua, SE Asia or America prefering the savannah to their jungle. Each people may have their own preferences or rather be culturally adapted to some ecological niche.

Also why you seem to dimiss the fruitful ecosystems that coastal areas provide? Don't you like fish and seafood?

Why is it that the most obvious route for human movement out of Africa, through the region today known as Israel and Jordan, is so totally dismissed?

Because of genetics (highest diversity in or near South Asia) and also because of apparent discontinuity of the H. sapiens archaeological record in West Asia, where it was probably replaced for some time by Neanderthals. Certainly you cannot discard that route 100% but it seems much less likely than the coastal one, IMO. If it was the real OOA route, then it seems to imply a total loss and a late recovery of that territory. The result would be about the same: South Asia acting as the main dispersal hub.

Even today the region lies on a fairly well defined boundary between the African (E) and one Asian (F-T, in the form of F's descendant J) Y-chromosome lines. The chromosomes have flowed each way across the boundary though. Modern tribal groups in the region usually have representatives from both Y-chromosome lines. The Y-chromosomes within each tribe are mixed but obviously cannot be mixed in individuals. Incidently, did Abraham have Y-chromosome J or E?

Not sure what you mean but I'd say that the presence of IJ (and J and its major subclades) in West Eurasia is much older than that of E3b, which probably expanded in the Mesolithic from the area between Sudan and Egypt (Nubia) in relation with the expansion of Afroasiatic languages and the arrival of Capsian to North Africa and maybe even triggering the Mesolithic (grain-gathering economies) in the Eastern Mediterranean.

The distribution of E and J around the Mediterranean are not parallel anyhow. They may even be negatively correlated, suggesting two separate patterns of expansion. You see that in Greece, in the Balcans, in the duality West Asia/North Africa, in Iberia... Both have a Neolithic-related Mediterranean distribution and are mixed here and there (naturally) but often where one is dominant the other is rather low. Also their cores (or areas where they are strongest and more variegated) are very different: J is centered in West Asia clearly and E3b is centered in North and East Africa instead.

And no idea about Abraham: he is probably just a mythical ancestor, not any real person. Moshe instead may have been J1 (as per his brother Aaron and the supposed Cohen haplotype).

Although the four lines C, D, E and F-T probably belonged to a single population it's unlikely they each reached fixation in just four neighbouring valleys. E is African and F presumably Indian so the population was at least that widespread. Why on earth must we assume C and D too must have been confined within just this region?

If E and F are widely separated why on earth should not D and C have arisen in even more distant regions?

What are you suggesting? Not sure if you realize that the brother clade of F is C, not E. E is a "cousin", "brother" of D. The names were given before their filiation was well known.

Obviously at some moment all four clades had a single common ancestor (CR or CT). And at that time they were together for sure. Same for CF. D and E may have split in Africa or, as some might argue (weakly these days) in Asia (having then E back-migrated and coalesced there, what in the end is about the same).

Arguably each clade might have followed a separate route but only since some common original spread core. For F and C that common origin was probably in Asia, more specifically South Asia (or imediate surroundings). D you can imagine it doing whatever you wish, as long as it moved fast and leaving almost no traces towards Eastern Euasia, but a putative South Asian transit is justified and sort of common sense, considering what the others did.

Regarding C*, I believe the authors claim this is now a much smaller catchall group as they have placed most of it in another group.

Do you mean Indian C*? It's possible. It may be one or several "hidden haplogroups". But the same is true for all * paragroups. Even when downstream SNPs are known, if these are "private" (i.e. found in one or very few individuals) the definition of a separate haplogroup is not considered justified. But certainly it would be interesting to know down to the highest possible resolution.

"presence of IJ ... in West Eurasia is much older than that of E3b". Now Maju, I never claimed the boundary had remained static. However son lines may well have replaced parent lines and remained roughly in the same region.

"They may even be negatively correlated, suggesting two separate patterns of expansion". And two separate places of origin for F and E.

"Both ... are mixed here and there (naturally) but often where one is dominant the other is rather low". Replacing it?

Moshe is also almost certainly a mythical ancestor.

"the brother clade of F is C, not E. E is a 'cousin', 'brother' of D". But E and D are separated by huge distances. Why would C and F necessarily originate any closer to each other? As you say it seems reasonable to assume all four had "some common original spread core". But the problem remains, how widespread was that core population? It certainly spread from Africa to India. I'd argue it had also moved some distance onto the Iranian Plateau, and perhaps even beyond.

They specifically write that C5 is found in South and CENTRAL Asia. Not just India, obviously.

"I don't imagine the Pygmies or other now semi-agricultural peoples of the jungles of Papua, SE Asia or America prefering the savannah to their jungle". Hang on. Agriculture didn't develop until long after population pressure drove people into habitats other than savannah. The same is probably true of the Pymies. Increasing population on the African savannah forced some groups to move into deep forest to survive.

I certainly don't "dimiss the fruitful ecosystems that coastal areas provide". I simply have trouble accepting humans had the ability to utilise these regions, other than by walking, until long after they'd left Africa.

"Both ... are mixed here and there (naturally) but often where one is dominant the other is rather low". Replacing it?

Not for E3b and J. It rather would seem distinct founder effects in what regards to Europe. Neither E3b replaced J in West Asia nor J displaced E3b anywhere in Africa (Egypt?). Both have penetrations but remain secondary to the older clade in their respective areas - at least in the overall view.

"the brother clade of F is C, not E. E is a 'cousin', 'brother' of D". But E and D are separated by huge distances. Why would C and F necessarily originate any closer to each other?

Certainly D and E must have shared same locality, at least in the form of ancstral DE. The separation of D and E (i.e. the Arabian Sea) is one of the reasons (not the only one) that suggests the coastal migration model. D and E originated both in (probably) East Africa and now you see D primarily in Andaman Islands and Eastern Eurasia (with some remains in Australia and probably India). Certainly D moved fast and, possibly it was small enough to have been mostly "drifted out" in some areas by which it may have passed too.

But it could also be argued I guess that D went ahead and suffered a genocide at the hands of the CF clan, surviving only in the most remote areas. The problem is that you cannot correlate that presumpt early move with any mtDNA clade (more likely to survive drift and even genocide than Y-DNA normally).

As always, you have to take everything in consideration.

As for C and F they are both perfectly explained with a South Asian (or South Arabian) divergence - somewhere near Mumbai maybe? D is the only oddball and the easiest explanation is that it just came along with its "uncle" CF or "cousins" C and F.

The actual "mystery" is why there is no D nor C (other than what can be attributed to recent erratics) in Western Eurasia. That must mean a founder effect: a population (or populations) that either was small enough to "drift out" D and C or was already just only F since the beginning in North/NW South Asia and that derived into the clades we see now.

It would seem that the main direction of expansion, maybe because it was already largely free of older hominins, maybe because of the more suitable tropical conditions, was eastward along the coast and that such enterprise attracted more people (and therefore more variety of clades) than the one into the western semiarid regions (and full of Neanderthals).

The wise words analogy of Mars looks very lame coming from veterans as wise words.

As a student of science I like to believe what the science is uncovering so far.

Whether they moved to Mars or Timbaktu the entity of 1 billion has some significance. the root population may be only 100. If they are originated in Africa or Europe what does it matter. This 1 billion has traces pointing to east.

Also current political boundaries are not representation of genetic boundaries.

But the political , personal and regional agendas trying to influence the core result of the research. The days of plain discovery are gone.

These are all some of the sour grapes observed on these forums from research papers.

1. Mt haplo "effected" Spanish Basque region.

Here what is the effect?. People migrate and genes merge. Some traces will be left. Some will be lost due to genocide.

2.Sudden discovery of diversity of R1, R2, R1b in India and not in Europe.

If R2 is not found they might have moved R to Europe long back. Even the wikipidea dominates with Europe and gives least credit to its origin Central Asia.

3. Skeptics of Southern dispersal theory. A very fragile minority with lot of self doubt. for them 1 billion population and 20% of humanity is nothing. some time sour grapes.

For me if my ancestors come from Afria or Siberia it does not matter either via China or India

People just like to prove some thing and try to use Gobbels technique.

And you mean? Basque mtDNA is nearly 100% of apparent Paleolithic origins (there's almost no Neolithic intrussions, just the odd erratic).

Sudden discovery of diversity of R1, R2, R1b in India and not in Europe.

Not R1b. R1b is restricted to West and Central Eurasia. In fact I'd say it has a very similar distrbution to mtDNA H.

Even the wikipidea dominates with Europe and gives least credit to its origin Central Asia.

It's like H: it has Central Asian clades and European clades. The ultmate origin? My best guess is Turkey but certainly the European-only clades and haplotypes have a much more western center of gravity, justifying fully the post-LGM resettlement model. It is very possible that the Central Asian clades are just European offshots anyhow.

...

You can really. How about Y-hap C with mtDNA N and Y-hap F with mtDNA M?

Maybe if you tried the other way around you could make some sense. F is present in both Eastern And Western Eurasia, like N. But C and M are almost only present in Southern and Eastern Eurasia (Sahul included).

But then you see C subclades all the time together with K or H, both sublcades of F, so it makes sense if both went together.

But why C and F? Why not a spread of C-T minus the DE haplogroup eventually breaking up and becoming fixed in two different populations?

Why yes? I see no added value in a separate migration of D (with what mtDNA, bwt?)

"F is present in both Eastern And Western Eurasia, like N. But C and M are almost only present in Southern and Eastern Eurasia".

I couldn't understand how you reached that conclusion until I realised you actually consider that the trees reveal nothing.

F in Eastern Eurasia? Southeast, possibly yes, but further north only if you don't realise K-T (M9, P128, P131 and P132) should be collapsed into just a single branch of any original F distribution. F is then seen to be spread from Southwest Asia (although, as you say, G and I/J may have come out of India) through India and Sri Lanka, and possibly to SE Asia (K). Hardly coincident with mtDNA hap N.

Because mtDNA N is not present in India at all once you collapse the R-K group into just one branch of any original N distribution, presumably a more recent expansion. N is seen then to stretch across Central Asia as I, W, X, A, Y and down to Australia (S), with R possibly in either SE Asia or India. The original Y-hap F and mtDNA N distribution hardly overlap at all, except for possible overlap in SE Asia.

And Y-hap C is present right across Central Asia in the order C5, C3, C1 then south to C2, C6, C4. Corresponding fairly well with mtDNA N.

But once again the mtDNA line M is confined to India and East Asia, especially if you collapse the C/Z line, which is presumably the product of a later expansion. Corresponds much more with Y-hap Fs distribution than Cs.

The Y-haps K-T and mtDNA haps R-K presumably represent a more recent expansion from somewhere. We'll forget about from where for now.

K is F. It is a subclade of F, like G or H. O3a is not less descendant of Mr. F than J1. F is the dominant clade in all Eurasia and derived areas, except some pockets like Australia, where it's still second.

There is no any "R-K" group. In fact mtDNA R is more dominant where Y-DNA F(xK) is: in West Asia. But in any case R is a branch of N.

And Y-hap C is present right across Central Asia in the order C5, C3, C1 then south to C2, C6, C4.

I can't consider those clades "Central Asian": they are East (C3, C1) or South Asian (C5) by overall distribution. Their presence in Central Asia surely reflects immigration from the other two areas, either in the late Paleolithic, Neolithic, Iron Age or even historical Silk Road times. Of them only Turco-Mongol C3 can be considered a major clade in the area.

But once again the mtDNA line M is confined to India and East Asia, especially if you collapse the C/Z line, which is presumably the product of a later expansion. Corresponds much more with Y-hap Fs distribution than Cs.

F(xK) is dominant only in West Asia (IJ, G) and southern India (H). F had four major branches and four known minor ones (F1-4), all of whom, save widespread K, are confined to India and West of it. There's no (meaningful) F(xK) east of India.

Check your facts: you could try to couple (Y-DNA) F and (mtDNA) N or F and R but certainly there's no way F and M can be associated thatw way, much less if you exclude K from the equation.

The Y-haps K-T and mtDNA haps R-K presumably represent a more recent expansion from somewhere. We'll forget about from where for now.

What the heck is "mtDNA R-K"?

Also what you call "K-T" is just K. L-T are the major subclades of K, just nomenclature.

And I don't think there's any realistic reason to think in two waves of expansion nor to make any correlation between the two large second order macro-haplos Y-DNA K and mtDNA R. It's a possibility maybe but not any clear thing. A good example is West Asia, where mtDNA R is dominant and Y-DNA K derivates very secondary.

Maju, by that reasoning Y-chromosomer haplogroup R1b1c7 is simply haplogroup F. It comes from India and its distribution is a result of drift and founder effect. After all it's "just nomenclature". Surely the branches of the haplogroup trees means SOMETHING.

"There is no any 'R-K' group". So the fact that mtDNA haplogroups B, F, HV, H, V P, J, T U and K share a series of mutations is totally irelevant?

It comes from India and its distribution is a result of drift and founder effect.

In a sense it does. Certainly its ancestor P came out from South Asia with all that little swarm of K subclades.

"There is no any 'R-K' group". So the fact that mtDNA haplogroups B, F, HV, H, V P, J, T U and K share a series of mutations is totally irelevant?

MtDNA nomenclature is a lot more caothic and unregulated than that of Y-DNA (sadly) but still all that list of clades (plus many others) are grouped under R. K in fact it's just a sublcade of a subclade of a subclade (U8) of a subclade (U) of R, while B is just a regular 1st order subclade instead.

Indian subclades R1, R2, R6, R7, R8, R30 are sisters of East Eurasian R11 or B, or West Eurasian U (incl. K), JT and R0 (incl. HV), or Sahulian P - or even an ill defined same-level group that includes Indian R5 and R31, some Melanesian R* and East Asian R9b and F.

So, when you look at it properly, the highest top level diversity for R is in India and K is a local West Eurasian phenomenon derived from U, via U8. You can surely say that U is primarily Western but you certainly cannot claim that of widespread R.

MtDNA surely requires a radical review of nomenclature: it is totally obsolote and largely Eurocentric, giving a very wrong impression and creating much confusion.

What about ancestral clade R1a* which arose in Kashmir? This ancestral clade has been observed at a significant frequency among Kashmiri Pandits and Saharia tribe of central India. this ancestral clade is virtually absent anywhere else in the world. R1a* ancestral clade belongs with south asia and so does R1a1.Stop whining over accepted facts. No amt of admixture can produce an "ancestral" clade.

There are loads of R subclades in South Asia (R1, R2, R5, R6, R7, R8, R30 - plus F-related R5 and R31). After discussing with Terry for quite a while here and at Remote Central, I have decided to create and post schematic maps for N and R distribution in my blog, Leherensuge, take a look: http://leherensuge.blogspot.com/.

While you could maybe argue for a SE Asian gravity center of N, R has a very clear center in South Asia.

"What about ancestral clade R1a* which arose in Kashmir? This ancestral clade has been observed at a significant frequency among Kashmiri Pandits and Saharia tribe of central India."

Also, in Iran “ In north Iran , individuals within the R1-M306 clade can be further subdivided into R1-M306*, R1a1*-M198, R1b1a-M269 and R1a*-SRY1532 (XM198) occurring with frequencies of 3.0, 3.03, 15.15 and 3.03%, respectively ... the detection of rare R1-M173* and R1a-SRY1532 lineages in Iran at higher frequencies than observed for either Turkey, Pakistan or India suggests the hypothesis that geographic origin of haplogroup R may be nearer Persia.

Old Blog Archive

Dienekes' Anthropology blog is dedicated to human population genetics, physical anthropology, archaeology, and history.

You are free to reuse any of the materials of this blog for non-commercial purposes, as long as you attribute them to Dienekes Pontikos and provide a link to either the individual blog entry or to Dienekes Anthropology Blog.

Feel free to send e-mail to Dienekes Pontikos, or follow @dienekesp on Twitter.