If you have not read my post “To the antipode of Asia”, this might be a good time to do so if you are unfamiliar with the history, prehistory, and ethnography of mainland Southeast Asia. In this post I will focus on mainland Southeast Asia, and how it relates implicitly to India and China genetically, and what inferences we can make about demography and history. Though I will touch upon the Malay peninsula in the preliminary results, I have removed the Indonesian and Philippine samples from the data set in totality. This means that in this post I will not touch upon spread of the Austronesians.

I present before you two tentative questions:

– What was the relationship of the spread of Indic culture to Indic genes in mainland Southeast Asia before 1000 A.D.?

– What was the relationship of the spread of Tai culture to Tai genes in mainland Southeast Asia after 1000 A.D.?

The two maps above show the distribution of Austro-Asiatic and Tai languages in mainland Southeast Asia. Observe that when you join the two together in a union they cover much of the eastern 2/3 of mainland Southeast Asia. The fragmented nature of Austro-Asiatic languages in the northern region, edging into the People’s Republic of China, implies to us immediately that it is likely that in the past there was a continuous zone of Austro-Asiatic speech in this region. From the histories and mythologies of the Tai people we know that this group migrated from the southern fringes of China around ~1000 A.D. This is obvious when we note that there are still Tai people in southern China, and the expansion of the Tai across what is today Thailand is to some extent historically attested. Between 1000 and 1500 there was a wholesale ethnic reorganization of the Chao Phray river basin. Was that a matter of demographic replacement, or cultural assimilation, or some of both?

Second, what was the impact of Indians upon mainland Southeast Asia? One of the easiest ways to ascertain Indian influence is script. Burmese, Thai and Cambodian scripts all derive from Grantha, an archaic Tamil script (non-Islamic scripts in island Southeast Asia, such as Javanese and Balinese, are also derive from South Indian precursors). The Indian religious influences also are more southern than northern, manifesting in the southern forms of Shaivite Hinduism and Sri Lankan Theravada Buddhism.

There are three data sets which I looked at. I ran most of them from K = 2 to K = 12. This means that I threw all the individuals into a common pool and told the ADMIXTURE program to estimate their individual proportions of K number of populations. In this way we can get a general sense of the relationships of the populations. Remember that these aren’t necessarily real populations, and, the nature of the variation thrown into the pool impacts the nature of the inferred components greatly. I’m not reporting clear, distinctive, and objective entities extracted out of the data set. We’re looking at human intelligible interpretations of the patterns dependent upon the inputs and parameters. They’re telling us something real, but this isn’t like measuring the acceleration of a falling ball. It’s like describing the position of the ball in relation to a different set of reference objects. There’s a real ball with a specific position, but the descriptions are going to vary depending on what references you use (e.g., to the left of object A and below B, to the right of object C and above object D, etc.).

Here are the sets:

1) A “large” set which includes the mainland Pan-Asian populations, the white Americans from the HapMap, and some Malay peninsular groups.

2) A “medium” set which prunes most of the North Asian groups, Malaysian groups, and the white Americans. So it’s mostly mainland Southeast Asia, southern China, and India.

3) A “small” set, which removes many of the Southeast Asian populations, but keeps the Indian ones. I purposely overloaded this set with Indians to examine possibilities of Indian admixture in a few Southeast Asian groups.

Some notes. The Pan-Asian data set has ~56,000 markers. This is tolerable, but not optimal. It’s definitely good enough for European vs. Indian vs. East Asian vs. Negrito. But not less optimal for intra-regional variation. So take it with a grain of salt. But since I’m looking at Indian vs. East Asian, I’m mildly confident of that finding in relation to this data set. Second, the intersection of white Americans with the Pan-Asian set was ~30,000 markers. For Cambodians it was only ~22,000. There were ~100 white Americans, but only ~11 Cambodians. Be very cautious of the Cambodian results for this reason. Finally, remember that the ancestral components are abstractions, and can imply that stable and long admixed hybrid populations are their own distinct component, as well as isolates which are highly inbred.

There are three analyses and visualizations I will display below.

1) ADMIXTURE bar plots, which show the ancestral proportions of groups or individuals of a particular ancestral element.

2) Fst estimates across ancestral elements. This is a rough summary of genetic distance. I’ll also show you a two dimensional visualization on occasion, but remember that this removes some relationship information. The table is more accurate, though the visualization is easier to read.

3) Finally, I used EIGENSOFT to run some PCAs. This means that I took the pool of data and allowed the program to extract out the independent dimensions of variation. I ran it so that it pulled out the top 6 dimensions. The west-east dimension is always the largest by many multiples. Remember that the plots are not scaled.

I should also say that the K’s I’m showing are the most before inbred subgroups within the reported populations started breaking out into their own components (this happened especially within the Indians).

Starting at the beginning, I have noticed in the Pan-Asian data set that some groups, particularly Mons and Malays, seem to show Indian admixture. My question: is this really Indian admixture, or perhaps recent European admixture? That’s why I had the large data set, with white Americans. Here are the results:

So it seems unlikely that the Mon and Maly admixture with a West Eurasian element is from Europeans. Rather, it is consistent with Indians. In fact, I’m pretty confident it isn’t West Asian either, as is a possibility in the case of the Malays, because that component tends to align with Europeans at this scale. Finally, I will tell you that the admixture in both Mon and Malays is relatively even. In other words, the group estimates aren’t being shifted by one or two highly admixed Indians, which would be a good tell as to recent intermarriage. Not unheard of. Mahathir Mohamad’s paternal grandfather was a Kerala Muslim.

Now let’s look at the PCA. I’ll focus on dimensions 1, 2, and 3. Remember that these are the three largest dimensions of genetic variance rank ordered. Dimension one is by far the largest, by a factor of at least five usually in these plots. It’s the west vs. east Eurasian dimension.

I’ve highlighted the important bits. Two notes. First, I think you do see the suggestion that the Mon & Malay are shifted toward the Indians, not the Europeans. This is in perfect alignment with the ADMIXTURE result. Second, please note that the “Indian Singapore” population is heterogeneous. It is mostly Tamil, but there are clearly other Indians in the sample, and, some individuals who have Malay or Chinese ancestry.

Additionally, please note in the ADMIXTURE result above the similarity between the Tai and the Zhuang. The Zhuang are China’s second largest ethnic group, and reputedly the source population for the Tai migrations into mainland Southeast Asia. Before I move on, you should have some sense of the locations and ethno-linguistic affinities of some of the more obscure groups:

Location

Group

Language group

Northern Thailand

Htin

Austro-Asiatic

Northern Thailand

Lawa

Austro-Asiatic

Northern Thailand

Mon

Austro-Asiatic

Northern Thailand

Palong

Austro-Asiatic

Northern Thailand

Plang

Austro-Asiatic

Southern China

Wa

Austro-Asiatic

Northern Thailand

Yao

Hmong-Mien (Mien)

Southern China and Northern Thailand

Hmong

Hmong-Mien

Southern China

Zhuang

Tai

Northern Thailand

Karen

Tibeto-Burman

Southern China

Jinuo

Tibeto-Burman

One aspect which isn’t listed here is the classification of some of these populations as “hill tribes” or not. The Mon and the H’tin are both Austro-Asiatic, but the former are in some ways analogous to the Greeks on mainland Southeast Asia, while the latter are a tribal isolate which has preserved its identity in the hills of northern Thailand. By Greeks, I mean that the Mon have been assimilated or dominated by the Bamar in Burma and the Tai in Thailand, but in both cases have imparted to these groups the essence of Southeast Asian Indic high culture. The Mon were at one point ascendant from the lower Irrawaddy in southern Burma to the lower Chao Praya basin in Thailand, the terminus of which today is Bangkok. In contrast, groups like the H’tin and Lawa were presumably relatively insulated from Indic influence. The Hmong are relative newcomers to Southeast Asia, which explains their status as animists for example. Finally, you have groups like the Wa which are technically not even Southeast Asian, but are Austro-Asiatic. They should give us a sense of Austro-Asiatics without an Indic imprint.

Let’s move on to step two, the medium data set. I’m removing the white Americans, Malaysians, and North Asian groups. And now I’m including the Cambodians.

Again, the Mon have the Indian component. And so do the Cambodians. Remember that while everyone else has 56,000 SNPs, the Cambodians only have 22,000, so we need to be careful. Though you see this element in the HGDP runs as well. That is, an Indian affiliated component. It’s relatively evenly distributed among the Cambodians, so you can’t chalk it up to a few admixed individuals. Again, you see the similarity between the Zhuang and the Tai. The main difference is that the Tai seem to have admixed with various Southeast Asian groups. That’s to be expected. What surprised me though is that from these results it seems that the Tai expansion was demographically, not just linguistically, dominant. This is clear even the Bangkok sample. More on this later.

Below are the genetic distances between the inferred ancestral groups. The labels given the modal population, and then the language family:

Jinuo_Burman

Htin_Austro

Tai

SouthAsian

Palong_Austro

Hmong

Jinuo_Burman

0

0.073

0.057

0.115

0.092

0.085

Htin_Austro

0.073

0

0.03

0.088

0.065

0.06

Tai

0.057

0.03

0

0.09

0.064

0.047

SouthAsian

0.115

0.088

0.09

0

0.117

0.117

Palong_Austro

0.092

0.065

0.064

0.117

0

0.09

Hmong

0.085

0.06

0.047

0.117

0.09

0

Here are some visualizations:

And here’s the PCA:

In this plot you see both the Mon and Cambodians shifted toward the Indians, again. Also, note the Zhuang and the Tai mostly overlap rather well. The y-axis is defined it seems by Austro-Asiatic hill tribes, then the Tibeto-Burman groups, and a gap until you hit the Tai cluster, which eventually merges with the Hmong. There’s a reasonable language family affinity here, insofar as the Yao are between the Tai and the Hmong.

Finally, we move to the Indo-centric run. I’ve removed a lot of the Southeast Asian groups now. Some of the hill tribes are obviously relatively isolated, and so throw up their own clusters or diverge on PCA rather easily. That’s a function of genetic differences which build up if you are relatively insulated from gene flow. Because I removed so many populations I’m only left with three K’s before you get qasi-family clusters showing up as K’s. Also, I’m going to show you individual bar plots for Cambodians and Mon to illustrate that the Indian component isn’t just isolated instances of admixture:

The Fsts are straightforward in this case:

Austro-Asiatic

Tai

South Asian

Austro-Asiatic

0

0.028

0.084

Tai

0.028

0

0.085

South Asian

0.084

0.085

0

It’s the PCA which is really interesting in this run. The first isn’t too exceptional:

OK, first, since this is an Indian focused set, you see that there’s more than the standard west-east dimension. You have several lower order dimensions which separate Indians! I had previous assumed that the Indian component which always shows up in the Cambodians in the HGDP was a function of deep ancient ancestry with the “Ancestral South Indians” of Reich et al. This ancient population may have had affinities with many groups out toward Southeast Asia, and so the residual cluster in Cambodians may have been part of the deep Ice Age ancestry of this group. These results convince me that this is not so straightforward an explanation. In this sample the group that has the highest ASI are the Bhils, a tribal population. In one of the plots you see that the Bhils form one end of the distribution, and Gujarat Vaishyas the other. It is clear that this is an Ancestral North Indian-Ancestral South Indian cline. The Mon and Cambodians don’t deviate much from the center, suggesting to me that they aren’t too skewed toward the ASI! Additionally, the “center” of the distribution is weighted toward caste South Indians. This is then is a nice resolution, because it dovetails perfectly with the historical evidence for a South Indian specific influence on Southeast Asia in the early historic period.

This isn’t a slam dunk. There needs to be estimates of the time since admixture. It should post-date the ANI-ASI admixture event, and be in the same range as the Uyghurs. Unfortunately with only 56,000 SNPs I’m not sure this estimate is possible, but I’ll look into it. Additionally, a deeper survey of Y and mtDNA lineages need to be done in Southeast Asia. They may show sex-biased migration. I did look for the West Eurasian specific SLC24A5 variant, which goes no lower than ~50% in South India, but that’s not in the Pan-Asian SNP data set. It is in the HGDP, and none of the 11 Cambodians have it. This would lean toward the ASI hypothesis, but seeing as how the West Eurasian variant may only about ~50%, and the Cambodians are less than 10% South Asian, it isn’t totally implausible that it wouldn’t show up in 22 gene copies (using realistic assumptions I get a ~50% probability that a West Eurasian copy of SLC24A5 wouldn’t be found in the Cambodians with N = 11).

I’ve not devoted too much space to the Tai-Zhuang connection in this post, because it’s obvious in the plots. The Tai are obviously somewhat shifted toward Austro-Asiatic groups, but far less than I would have expected. In fact, taking the ADMIXTURE components too literally you might infer that there’s been more Tai admixture into the Mon and Khmer than the other way around! This might not be totally implausible when you consider that Thailand’s population is nearly five times that of Cambodia. But the standard model I’ve read suggests that Tai warrior bands conquered the Mon-Khmer indigenes, and absorbed much of their high culture. These results don’t cohere easily with that in terms of demographics.

I have a possible explanation for what occurred. Much of Thailand may not have been too populous until the past ~1,000 years, with lowland agriculture being driven by elite direction. The Tai may have brought superior agricultural techniques, and so entered into a phase of rapid population expansion into the lowland frontier, which had no parallel during the Mon and Khmer period of dominance. In other words, the Tai bands were small and initially outnumbered by the Mon and Khmer. But through favorable resource direction and priority allocation of newly arable land to co-ethnics the small Tai population might quickly have come to dominate the previous inhabitants. This is the model which is outlined in the Rise of Islam and the Bengal Frontier. In it the author basically argues that eastern Bengal was lightly populated until large scale Muslim elite driven projects to open up the agricultural frontier. The recruited peasants were either Muslim or converted to Islam, because the cultural landscape was relatively fluid and unsettled, in contrast to the more static peasant economy of western Bengal, which remained Hindu. The Islamicization of eastern Bengal in this model had less to do with the conversion of native tribes, and more to do with the rapid demographic expansion of Bengali peasant colonies which were enabled by agricultural projects, colonies which were Islamicized or were drawn from the minority Muslim peasantry of the western zone by Mughal elites intent on creating a region where the Hindu upper castes were marginalized. Similarly, the Tai expansion in Southeast Asia may have been into a de facto “empty” landscape. During the period when Mon and Khmer high culture was absorbed the Tai may have been the smaller element in terms of numbers. The current ratios are a function of later social and demographic processes.

It’s unlikely that there is extensive Tai admixture in the Khmer sample. There are Thai and Lao villages in Cambodia, but they have remained separate from the general population, and Thai influence did not extend to the post populous provinces in the east.

Something that should be considered is that the samples currently available for Cambodia and Thailand may have been taken from population centers with high recent Chinese admixture. Estimates for recent Chinese admixture in Bangkok and Phnom Penh vary in the ranges of 15-30% Chinese. It is not clear to me if this was considered in the analysis above.

Northern Thailand and Western Lao would be expected to be the areas with the highest Tai demographic impact, while central Thailand may preserve more of the pre-Tai populations (there are still Mon speaking communities in Bangkok).

Garvan

http://blogs.discovermagazine.com/gnxp Razib Khan

It’s unlikely that there is extensive Tai admixture in the Khmer sample. There are Thai and Lao villages in Cambodia, but they have remained separate from the general population, and Thai influence did not extend to the post populous provinces in the east.

the sample is from cambodia fyi. or at least khmer refugees in santa anna california.

Something that should be considered is that the samples currently available for Cambodia and Thailand may have been taken from population centers with high recent Chinese admixture. Estimates for recent Chinese admixture in Bangkok and Phnom Penh vary in the ranges of 15-30% Chinese. It is not clear to me if this was considered in the analysis above.

there’s surprisingly little inter-regional difference in the thai (there are thai samples from bangkok above, i aggregated them since they weren’t distinguishable from the northern thai). there doesn’t seem to be a chinese admixture, though that’s more obvious in the earlier post. for the thailand sample i they must have tried to exclude those with known chinese ancestry proactively, since the consortium already has lots of han.

Justin Giancola

Would it be more likely the South Indians came by land or sea?

http://blogs.discovermagazine.com/gnxp Razib Khan

#3, look at a map there’s no connection between s india and se asia. though more seriously, most india – sea connections are by sea.

Justin Giancola

wow I had to read that so many times to clarify you weren’t totally screwing with me. sea = S.E.A.

I knew about maritime connections of India and SE Asia in antiquity but this seemed potentially earlier and seemed more people to boat than merchants and monks. I thought it might also be leap-frogging through Bangladesh or there was a marginal continuum there that got replaced. I don’t know where all the Indian group names are by heart and their attested histories!

I thought it might also be leap-frogging through Bangladesh or there was a marginal continuum there that got replaced.

the mon focus was to the south if it was through bengal then the old indic zone would have been further north i think. also, the writing system is the tell. that’s a very old south indian script which is the root of southeast asian ones. bengali script is from a different root. interesting, the old script of the ahoms of assam is from a south indian root. that proves the point: the ahom arrived from burma within the last 1,000 years.

Hla Thein

In Burma – first millennium central Irrawaddy links were generally overland, both to Yunnan and Inner Asia, and also to Assam, Bengal and the Ganges basin. Coastal (mainly Mon-speaking) links were across the Bay of Bengal to Orissa, south India and Ceylon. There were also Persian arrivals along the coast. Also in the late first millennium was a likely central Asian (Turkish and Mongol) influx along the Yunnan borderlands.

Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Gene Expression

This blog is about evolution, genetics, genomics and their interstices. Please beware that comments are aggressively moderated. Uncivil or churlish comments will likely get you banned immediately, so make any contribution count!

About Razib Khan

I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. In relation to nationality I'm a American Northwesterner, in politics I'm a reactionary, and as for religion I have none (I'm an atheist). If you want to know more, see the links at http://www.razib.com