Category Archives: Austronesian

This ran first a long time ago, but I just sold an ad on this post, so I decided to repost it. Rereading it, it’s a great Historical Linguistics post.

One of the reasons that I am doing this post is that one of my commenters asked me a while back to do a post on the theories of long-range comparison like Joseph Greenberg’s and how well they hold up. That will have to wait for another day, but for now, I can at least show you how some principles of Historical Linguistics, a subfield that I know a thing or two about. I will keep this post pretty non-technical, so most of you ought to be able to figure out what is going on.

Let us begin by looking at some proposals about the classification of Vietnamese.

The Vietnamese language has been subject to a great deal of speculation regarding its classification. At the moment, it is in the Mon-Khmer or Austroasiatic family with Khmer, Mon, Muong, Wa, Palaung, Nicobarese, Khmu, Munda, Santali, Pnar, Khasi, Temiar, and some others. The family ranges through Vietnam, Cambodia, Laos, Thailand, Malaysia, Burma, China, and over into Northeastern India.

The homeland of the Austroasiatics was probably in China, in Yunnan, Southwest China. They moved down from China probably around 5,000 years ago. Some of the most ancient Austroasiatics are probably the Senoi people, who came down from China into Malaysia about 4,000 years ago. Others put the time frame at about 4-8,000 YBP (years before present).

A major fraud has been perpetrated lately based on Senoi Dream Therapy. I discussed it on the old blog, and you can Google it if you are interested. In Anthropology classes we learned all about these fascinating Senoi people, who based their lives around their dreams. Turns out most of the fieldwork was poor to fraudulent like Margaret Mead’s unfortunate sojourn in the South Pacific.

The Senoi resemble Veddas of India, so it is probably true that they are ancient people. Also, their skulls have Australoid features. In hair, they mostly have wavy hair (like Veddoids), a few have straight hair (like Mongoloids) and a scattering have woolly hair (like Negritos). Bottom line is that ancient Austroasiatics were probably Australoid types who resembled what the Senoi look like today.

There has long been a line arguing that the Vietnamese language is related to Sino-Tibetan (the family that Chinese is a part of). Even those who deny this acknowledge that there is a tremendous amount of borrowing from Chinese (especially Cantonese) to Vietnamese. This level of borrowing so long ago makes historical linguistics a difficult field.

Here is an excellent piece by a man who has done a tremendous amount of work detailing his case for Vietnamese as a Sino-Tibetan language. It’s not for the amateur, but if you want to dip into it, go ahead. I spent some time there, and after a while, I was convinced that Vietnamese was indeed a Sino-Tibetan language. One of the things that convinced me is that if borrowing was involved, seldom have I seen such a case for such a huge amount of borrowing, in particular of basic vocabulary. I figured the case was sealed.

Not so fast now.

Looking again, and reading some of Joseph Greenberg’s work on the subject, I am now convinced otherwise. There is a serious problem with the cognates between Vietnamese and Chinese, of which there are a tremendous number.

This problem is somewhat complex, but I will try to simplify it. Briefly, if Vietnamese is indeed related to Sino-Tibetan, its cognates should be not only with Chinese, but with other members of Sino-Tibetan also. In other words, we should find cognates with Tibetan, Naga, Naxi, Tujia, Karen, Lolo, Kuki, Nung, Jingpho, Chin, Lepcha, etc. We should also find cognates with those languages, where we do not find them in Chinese. That’s a little complicated, so I will let you think about it a bit.

Further, the comparisons between Chinese and Vietnamese should be variable. Some should look quite close, while others should look much more distant.

So there’s a problem with the Vietnamese as ST theory.

The cognates look like Chinese.

Problem is, they look too much like Chinese. They look more like Chinese than they should in a genetic relationship. Further, they look like Chinese and only Chinese. Looking for relationships in S-T outside of Chinese, and we find few if any.

That’s a dead ringer for borrowing from Chinese to Vietnamese. If it’s not clear to you how that is, think about it a bit.

Looking at Mon-Khmer, the case is not so open and shut. There seem to be more cognates with Chinese than with Mon-Khmer. So many more that the case for Vietnamese as AA looks almost silly, and you wonder how anyone came up with it.

But let us look again. The cognates with AA and Vietnamese are not just with its immediate neighbors like Cambodian and Khmu but with languages far off in far Eastern India like Munda and Santali. There are words that are found only in the Munda branch in one or two obscure languages that somehow show up again as cognates in Vietnamese.

Now tell me how Vietnamese borrowed ancient basic vocabulary from some obscure Munda tongue way over in Northeast India? It did not. How did those words end up in some unheard of NE Indian tongue and also in Vietnamese? Simple. They both descended long ago from a common ancestor. This is Historical Linguistics.

The concepts I have dealt with here are not easy for the non-specialist to figure out, but most smart people can probably get a grasp on them.

A different subject is the deep relationships of AA. Is AA related to any other languages? I leave that as an open question now, though there does appear to be a good case for AA being related to Austronesian.

One good piece of evidence is the obscure AA languages found in the Nicobar Islands off the coast of Thailand. Somehow, we see quite a few cognates in Nicobarese with Austronesian. We do not see them in any other branches of AA, only in Nicobarese. This seems odd, and it’s hard to make a case for borrowing. On the other hand, why cognates in Nicobarese and only in Nicobarese?

Truth is there are some cognates outside of Nicobarese but not a whole lot. In historical linguistics, one thing we look at is morphology. Those are parts of words, like the -s plural ending in English.

In both AA and Austronesian, we have funny particles called infixes. Those are what in English we might call prefixes or suffixes, except they are stuck in the middle of the word instead of at the end or the beginning. So, in English, we have pre- as a prefix meaning “before” and -er meaning “object that does X verb”. So pre-destination means that our lives are figured out before we are even born. Comput-er and print-er are two objects, one that computes and the other that prints.

If we had infixes instead, pre-destination would look something like destin-pre-ation and comput-er and print-er would look something like com-er-pute and prin-er-t.

Anyway, there are some fairly obscure infixes that show up not only in some isolated languages in AA but also in far-flung Austronesian languages in, say, the Philippines. Ever heard of the borrowing of an infix? Neither have I? So were those infixes borrowed, and what are they doing in languages as far away as Thailand and the Philippines, and none in between? Because they got borrowed? When? How? Forget it.

Bottom line is that said borrowing did not happen. So what are those infix cognates doing there? Probably ancient particles left over from a common language that derived both Austronesian and AA, probably spoken somewhere in SW China maybe 9,000 years ago or more.

Why is this sort of long-range comparison so hard? For one thing, because after 9,000 years or more, there are hardly any cognates left anymore, due to the fact of language change. Languages change and tend to change at a certain rate.

After 1000X years, so much change has taken place that even if two languages were once “sprung from a common source,” in the famous words of Sir William Jones in his epochal lecture to the Asiatic Society in Calcutta on February 2, 1786, there is almost nothing, or actually nothing, left to show of that relationship. Any common words have become so mangled by time that they don’t look much or anything alike anymore.

So are AA and Austronesian related? I think so, but I suppose it’s best to say that it has not been proven yet. This thesis is part of a larger long-range concept known as “Austric.” Paul Benedict, a great scholar, was one of the champions of this. Austric is normally made up of AA, Austronesian, Tai-Kadai (the Thai language and its relatives) and Hmong-Mien (the Hmong and Mien languages). Based on genetics, the depth of Austric may be as deep as 30,000 years, so proving it is going to be a tall order indeed.

What do I think?

I think Tai-Kadai and Austronesian are proven to be related (more on that later). AA and Austronesian seem to be related also, with a lesser depth of proof. Hmong-Mien seems to be related to Sino-Tibetan, not Austric.

The case for Vietnamese being related to S-T is still very interesting, and I still have an open mind about it.

All of these discussions are hotly controversial, and mentioning it in linguistics circles is likely to set tempers flaring.

Results. A ratings system was designed in terms of how difficult it would be for an English-language speaker to learn the language. In the case of English, English was judged according to how hard it would be for a non-English speaker to learn the language. Speaking, reading and writing were all considered.

This post will look at the Tsou language in terms of how difficult it would be for an English speaker to learn it.

Austro-Tai

Austronesian

Formosan

Tsouic

Tsou is a Taiwanese aborigine language spoken by about 2,000 people in Taiwan.

Tsou is also ergative like most Formosan languages. Tsou is the only language in the world that has no prepositions nor anything that looks like a preposition. Instead it uses nouns and verbs in the place of prepositions. Tsou allows more potential consonant clusters than most other languages.

About 1/2 of all possible CC clusters are allowed. Tsou has an inclusive/exclusive distinction in the 1st person plural and a very strange visible and non-visible distinction in the 3rd person singular and plural. Both adjectives and adverbs can turn into verbs, as they are marked for voice in the same way that verbs are. Verbs are extensively marked for voice.

Nouns are marked for a variety of odd cases, often referring to perception (visible/invisible) and person and place deixis

'e "visible and near speaker"
si/ta "visible and near hearer"
ta "visible but away from speaker"
'o/to "invisible and far away or newly introduced to discourse"
na/no ~ ne "non-identifiable and non-referential"*
*often when scanning a class of elements

Results. A ratings system was designed in terms of how difficult it would be for an English-language speaker to learn the language. In the case of English, English was judged according to how hard it would be for a non-English speaker to learn the language. Speaking, reading and writing were all considered.

Results. A ratings system was designed in terms of how difficult it would be for an English-language speaker to learn the language. In the case of English, English was judged according to how hard it would be for a non-English speaker to learn the language. Speaking, reading and writing were all considered.

Kwaio is an Austronesian language spoken by 13,000 people in the center of Malaita Island in the Solomon Islands. It has four different forms of number to mark pronouns – not only the usual singular and plural but also the rarer dual and the very rare paucal. In addition, there is an inclusive/exclusive contrast in the non-singular forms.

Results. A ratings system was designed in terms of how difficult it would be for an English-language speaker to learn the language. In the case of English, English was judged according to how hard it would be for a non-English speaker to learn the language. Speaking, reading and writing were all considered.

Sakao is a very strange langauge spoken by 4,000 people in Vanuatu. It is a polysynthetic Austronesian language, which is very weird. It allows extreme consonant clusters. Sakao has an incredible seven degrees of deixis. The language has an amazing four persons: singular, dual, paucal and plural. The neighboring language Tomoko has singular, dual, trial and plural. The trial form is very odd. Sakao’s paucal derived from Tomato’s trial:

jørðœl
“they, from three to ten”

jørðœl løn
“the five of them” (Literally, “they three, five”)

All nouns are always in the singular except for kinship forms and demonstratives, which only display the plural:

ðjœɣ – “my mother/aunt” -> rðjœɣ – “my aunts”

walðyɣ – “my child” -> raalðyɣ – “my children”

It has a number of nouns that are said to be “inalienably possessed”, that is, whenever they occur, they must be possessed by some possessor. These often take highly irregular inflections:

Here, “mouth” is either œsɨŋœ-, ɔsɨŋɔ- or œsœŋ-, and “hair” is either uly-, ulœ- or nøl-

Sakao, strangely enough, may not even have syllables in the way that we normally think of them. If it does have syllables at all, they would appear to be at least a vowel optionally surrounded by any number of consonants.

Results. A ratings system was designed in terms of how difficult it would be for an English-language speaker to learn the language. In the case of English, English was judged according to how hard it would be for a non-English speaker to learn the language. Speaking, reading and writing were all considered.

This post will look at the Tagalog language in terms of how difficult it would be for an English speaker to learn it.

Philippine
Greater Central Philippine
Central Philippine
Tagalog

We recently looked at two easy languages, Bahasa Indonesia and Malay, which are actually two forms of the same language. A well-known nearby language is Tagalog, the national language of the Philippines. Tagalog is much harder than Malay or Indonesian. Compared to many European languages, Tagalog syntax, morphology and semantics are often quite different. Also, Tagalog is typically spoken very fast. Unlike Malay, verbs conjugate quite a bit in Tagalog. The main idea of Tagalog grammar is something called focus. Once you figure that out, the language gets pretty easy, but until you understand that concept, you are going to have a hard time. Everything is affixed in Tagalog.

Results. A ratings system was designed in terms of how difficult it would be for an English-language speaker to learn the language. In the case of English, English was judged according to how hard it would be for a non-English speaker to learn the language. Speaking, reading and writing were all considered.

This post will look at the Malay and Bahasa Indonesia languages in terms of how difficult it would be for an English speaker to learn it.

Malayo-Polynesian
Malayo-Chamic
Malayic
Malay

Bahasa Indonesia is an easy language to learn. For one thing, the grammar is dead simple. There are only a handful of prefixes, only two of which might be seen as inflectional. There are also several suffixes. Verbs are not marked for tense at all. And the sound system of these languages, in common with Austronesian in general, is one of the simplest on Earth, with only two dozen phonemes. Bahasa Indonesia has few homonyms, homophones, homographs, or heteronyms. Words in general have only one meaning.

Though the orthography is not completely phonetic, it only has a small number of nonphonetic exceptions. The orthography is one of the easiest on Earth to use.

The system for converting words into either nouns or verbs is regular. To make a plural, you simply repeat a word, so instead of saying “pencils,” you say “pencil pencil.”

Bahasa Indonesia gets a 1.5 rating, extremely easy to learn.

Malay is only easy if you learn the standard spoken form or one of the creoles. Learning the literary language is quite a bit more difficult. However, the Jawi script, which is Malay written in Arabic script, is often considered to be perfectly awful.

Results. A ratings system was designed in terms of how difficult it would be for an English-language speaker to learn the language. In the case of English, English was judged according to how hard it would be for a non-English speaker to learn the language. Speaking, reading and writing were all considered.

This post will look at in the Tsou language in terms of how difficult it would be for an English speaker to learn it.

Austro-Tai
Austronesian
Tsouic

Tsou is a Taiwanese aborigine language spoken by about 2,000 people in Taiwan. It has the odd feature whereby the underlying glides y and w turn into or surface as non-syllabic mid vowels e̯ and o̯ in certain contexts:

jo~joskɨ -> e̯oˈe̯oskɨ = “fishes”

Tsou is also ergative like most Formosan languages. Tsou is the only language in the world that has no prepositions or anything that looks like a preposition. Instead it uses nouns and verbs in the place of prepositions. Tsou allows more potential consonant clusters than most other languages. About 1/2 of all possible CC clusters are allowed.

Tsou has an inclusive/exclusive distinction in the 1st person plural and a very strange visible and non-visible distinction in the 3rd person singular and plural. Both adjectives and adverbs can turn into verbs and are marked for voice in the same way that verbs are. Verbs are extensively marked for voice. Nouns are marked for a variety of odd cases, often referring to perception, (visible/invisible) person, and place deixis. The place deixis cases can be seen below:

‘e – visible and near speakersi/ta – visible and near hearerta – visible but away from speaker‘o/to – invisible and far away, or newly introduced to discoursena/no ~ ne – non-identifiable and non-referential (often when scanning a class of elements)

A fellow who I believe is Chinese came to the site a while back with some very interesting ideas about the earliest speakers of the Tai-Kadai languages, of which Thai and Lao are the most famous. His statement is in blockquotes below.

He argues for a close relationship between Austronesian and Tai-Kadai, two huge language families in Southeast Asia and Oceania. Tai-Kadai researchers have long opposed this notion, including a professor who I worked with quite a bit while obtaining my Master’s Degree.

French linguist Laurent Sagart has recently proven to my satisfaction that Austronesian and Tai-Kadai are indeed related. I have looked over the evidence, and it looks very good. Sagart is clearly an expert on the language families of the region, including Sino-Tibetan, Tai-Kadai and Austronesian.

However, the field has not yet accepted Austro-Tai. Historical Linguistics has become so conservative in recent years that one wonders whether any new prominent language families will ever be proven to the satisfaction of the field. In this sense, ultra-conservative “scientism” has clearly taken over Diachronic Linguistics, and the only people making any headway these days are the trailblazers who are practicing what boils down to “fringe science” and are expectedly being trashed from here to Kingdom Come for not going along with the ultra-conservative mindset of the day.

The problem is that like cryptozoology, psi, ghosts, UFO’s and so many other fields, ultraconservative people practicing scientism and not science have set up the biggest roadblocks imaginable for dismantling any paradigms or in fact discovering anything new or breathtaking.

Modern science reminds me of the Catholic Church in the Middle Ages. It’s another faith-based fundamentalist philosophy. I guess we already know everything there is to know, and there’s nothing more to learn. In fact, incredibly, some scientistic practitioners are actually making statements along these lines.

Sagart’s new language would be called Austro-Tai, from which two branches, Tai-Kadai and Austronesian, descended. We know that the homeland of the Austronesians was in Taiwan and on the mainland adjacent to Taiwan possibly 5,000 YBP. From there, they mostly spread to the east – to Philippines, Malaysia, Indonesia, Polynesia, Melanesia and Micronesia, with some going back to Mainland Southeast Asia (most prominently the Malay, but also the Chams, etc.)

That Tai-Kadai and Austronesian were together as a macro-language on and west of Taiwan over 5,000 years ago makes intuitive sense on a lot of levels. They split up, with Tai-Kadai moving west and inland and Austronesian moving out to the islands to the west as the Lapita Culture.

Here it is below, with some edits and additions:

I have some words about the Zhuang to tell you. First of all, your article claims that the Proto-Tai came from Central Asia. That’s a questionable study. The most recent research on linguistics has revealed that the Proto-Tai-Kadai migrated back from Taiwan and they are closely related to the Austronesians.

The basic lexicon between the two branches of Hlai and Kadai in Tai-Kadai language family shows a striking similarity to Austronesian, i.e. Indonesian. However, examining the Tai branch, linguists see that original lexicon in the Tai branch were replaced by some other linguistic stock. That shows a linguistic contact between Proto-Tai and other groups in the ancient times and the genetic mix-up may also have taken place.

In conclusion, according to linguistic studies, the original Tai-Kadai Uhrmeit may have been the Austronesian-inhabited in Taiwan island. Then later, when moving back to the mainland of Southern China, they probably mixed with other ethnolinguistic groups.

It’s also worth mentioning that a trace of old Kam-Tai language from 2-3,000 YBP, an earlier form of Proto-Tai, has been discovered in southern part of the ancient Chu State (1030 BC–223 BC) by comparing the non-Sinitic words on unearthed inscriptions materials with reconstructed Old-Chinese.

This indicates that the geographic distribution of Proto-Tai speakers may have been quite different from our current understanding. And the identity of the group that they mixed with that replaced much of their original Austro-Tai lexicon is still not known. The location of Tai-Kadai speakers, especially the present-day Tai speakers in Yunnan in South China is quite a ways away from the location of most Austronesian speakers such as Malay and Indonesian speakers in Mainland and Island Southeast Asia.

The Tamil-Japonic connection isn’t quite as off the wall as one might think at first glance. There’s apparently a strong Andaman-Indonesian language connection. The convention of repeat plurals seems to have found its way to Japan. There’s also some similarity between the Finno-Ugric languages, which are Uralic outliers in a sea of Indo-European languages, and Dravidian languages that have a remnant in Pakistan. Contact between proto-Dravidian-Uralic and Altaic languages is a real possibility.

If Uralic is close to anything, it is close to Altaic and Indo-European and probably even closer to Chukto-Kamchatkan, Eskimo-Aleut, Yukaghir and Nivkhi. Yukaghir may actually be Uralic itself, or maybe the family is called “Uralic-Yukaghir.”

There is no connection between Austronesian (Indonesian) and the Andaman Islanders. Austronesian is indeed related to Thai though (Austro-Tai); in my opinion, this has been proven. If the Andaman languages are related to anything at all, they may be related to some Papuan languages and an isolate in Nepal called Nihali. A good case can be made connecting Nihali with some of the Papuan languages.

Typology is not that great of way to classify. Typology is areal and it spreads via convergence. What you are looking in search genetic relationship among languages more more than anything else is morphology. After that, a nice set of cognates.

There is probably no connection between Dravidian and Uralic in particular. Dravidian is outside of most everything in Eurasia. It if is close to anything, it might be close to Afro-Asiatic. There also looks to be a connection with Elamite.

Dravidian and Afro-Asiatic are probably older than the rest of the Eurasian languages, and they were located further to the south. Afro-Asiatic is very old, probably ~15,000 YBP.