October 13, 2011

Origin and evolution of word order

From the paper's conclusion:

The distribution of word order types in the world’s languages, interpreted in terms of the putative phylogenetic tree of human languages, strongly supports the hypothesis that the original word order in the ancestral language was SOV. Furthermore, in the vast majority of known cases (excluding diffusion), the direction of change has been almost uniformly SOV > SVO and, beyond that, primarily SVO > VSO/VOS. There is also evidence that the two extremely rare word orders, OVS and OSV, derive directly from SOV.

These conclusions cast doubt on the hypothesis of Bickerton that human language originally organized itself in terms of SVO word order. According to Bickerton, “languages that did fail to adopt SVO must surely have died out when the strict-order languages achieved embedding and complex structure” (50). Arguments based on creole languages may be answered by pointing out that they are usually derived from SVO languages. If there ever was a competition between SVO and SOV for world supremacy, our data leave no doubt that it was the SOV group that won. However, we hasten to add that we know of no evidence that SOV, SVO, or any other word order confers any selective advantage in evolution. In any case, the supposedly “universal” character of SVO word order (51) is not supported by the data.

Recent work in comparative linguistics suggests that all, or almost all, attested human languages may derive from a single earlier language. If that is so, then this language—like nearly all extant languages—most likely had a basic ordering of the subject (S), verb (V), and object (O) in a declarative sentence of the type “the man (S) killed (V) the bear (O).” When one compares the distribution of the existing structural types with the putative phylogenetic tree of human languages, four conclusions may be drawn. (i) The word order in the ancestral language was SOV. (ii) Except for cases of diffusion, the direction of syntactic change, when it occurs, has been for the most part SOV > SVO and, beyond that, SVO > VSO/VOS with a subsequent reversion to SVO occurring occasionally. Reversion to SOV occurs only through diffusion. (iii) Diffusion, although important, is not the dominant process in the evolution of word order. (iv) The two extremely rare word orders (OVS and OSV) derive directly from SOV.

26 comments:

This paper's results are in complete concordance with those of another recent study according to which word order of body language is SOV in all humans irrespective of the word order of the spoken language.

Surely you must know that Merritt Ruhlen is considered a CRACKPOT by the overwhelming majority of linguists. Whatever he says has next to zero impact in the mainstream. And the same could be said for most "linguistics" articles published through PNAS, btw. Not much to see here, sorry.

Surely you must know that Merritt Ruhlen is considered a CRACKPOT by the overwhelming majority of linguists.

I don't believe in democracy when it comes to truth, so I could care less what "the overwhelming majority of linguists" think. I'll be happy to consider specific argued objections to this work or any other, but no argumentum ad populum.

Whatever he says has next to zero impact in the mainstream. And the same could be said for most "linguistics" articles published through PNAS, btw. Not much to see here, sorry.

I don't believe in peer-review or prestige journals either. If it's wrong, or if it's right, why is it wrong and why is it right? I'd evaluate it just the same no matter where it appeared.

Basque word order is not SOV but it depends on what you mean to emphasize: whatever that is, it will go before the verb (or auxiliary verb if appropriate).

You can say (SOV): 'Mikelek liburua dauka' (Mikel has the book, with emphasis in book), but you can also say 'Mikelek dauka liburua' (SVO) with emphasis in the book (liburua) or you can even say liburua dauka Mikelek (OVS). What you do not find normally is VSO or VOS (but OSV is also possible). SOV is vaguely more common but just one among several possibilities (which expand once you include other objects than direct one).

Hence I think this oversimplified grammatical structure debate is rather pointless. Not to mention the fantasy of "Nostratic" and "Dene-Caucasian" (are they still beating those dead horses?)

I have to agree that it verges on crackpottery and the Murray Gell-Mann really would have been better off sticking to physics. The data don't support the evidnece, and the macrolinguistic categories of the Russian school that he uses real have no serious support from linguistic evidence.

The notion that Basque has some closer relationship to Chinese and Ket than other languages has no support lexically, grammatically, or genetically. Neither does the alleged connections between, for example, Ainu and Indo-Hittite languages. The notion that American Indian languages are less related to the American Indian Na-Dene language than to other American Indian languages is also dubious. It is all really rather random. In short, the extremely undetailed phylogenetic tree is poppycock.

There is evidence that sign language isolates that arise in communities of deaf people with no other language strongly tend towards SOV word order, but this partially has to do with the fact that is a sign language rather than a verbal one. The chart they provide doesn't even show a consensus in any of the presumably oldest language groupings like Khoisan, Australian, Indo-Pacific and Amerind. And, as Maju indicates, their data set isn't even accurate.

I'm not even questioning the SOV thesis. It may be right, or may be wrong. The point is that definitive evidence won't come from studies like this one.

There are a lot of better papers around that hold the same core thesis without relying on Ruhlen's shady methodology, which linguists reject as quackery. But these high quality papers just happen to be less grandiose and bombastic and so are not good enough to make into the news, I guess.---As for the other point you made: PNAS regularly publishes a lot of dubious linguistics papers, and is often used by people who want to circumvent peer review. Say what you want, but why would a linguist decide to do that?

Let me answer my own question: this paper we are talking about is not new. It has been around at least since 2005 in manuscript form and, typically for Ruhlen, has been cited by non-linguists and fringe linguists ever since. They probably tried to publish it before elsewhere and just couldn't do it and so went for PNAS. Their target audience of non-specialists won't care anyway. ---(Oh, and as we are speaking of notability - Gell-Mann is a Nobel prize-winning... physicist :P )

As for the other point you made: PNAS regularly publishes a lot of dubious linguistics papers, and is often used by people who want to circumvent peer review. Say what you want, but why would a linguist decide to do that?

There are many possible reasons why someone would want to circumvent peer review. Some people don't care about peer review at all. Some people have ideas that are out there and are rejected by peer review. And, some people are crackpots with bad ideas that are rightly rejected by peer review.

Whether something has been peer reviewed or not perhaps has some meaning for funding agencies, or journalists, or tenure committees, but it has not meaning to me. If this paper is wrong (I have no particular opinion on it), people can argue what's wrong with it: saying that they had to publish it in PNAS because it's crap is not, in my opinion, an argument.

The data don't support the evidnece, and the macrolinguistic categories of the Russian school that he uses real have no serious support from linguistic evidence.

I am suspicious of the broad groupings as much as the next guy, and, indeed, I am not at all convinced in the monophyletic origin of language itself. Of course linguists debate language phylogeny even for well-studied groups like Indo-European, let alone all world languages, so it would be a good idea to show the robustness of the SOV inference across different proposed phylogenies.

@DienekesI never said it's an argument against the paper, only that it raises a red flag, just as the words "Merritt Ruhlen", "Nostratic" and "Dene-Caucasian" raise several red flags. It's a nightmare combination, frankly. Otherwise, I wouldn't even have mentioned it. Of course, good (and bad) scholarship can appear anywhere, any time, regardless of peer review, the prestige of the journal, etc. etc.

This paper's results are in complete concordance with those of another recent study according to which word order of body language is SOV in all humans irrespective of the word order of the spoken language.

Imagine trying to signal that person A is supposed to kill person B by pointing fingers and using the slashing hand-across-throat motion.

SVO is then very obviously ambiguous, OSV same if not outright false, while SOV (especially with a slight break after S, and a quick OV, and perhaps the "case-defining" nod to B a right after pointing to A) is rather clear. ;)

There is evidence that sign language isolates that arise in communities of deaf people with no other language strongly tend towards SOV word order, but this partially has to do with the fact that is a sign language rather than a verbal one.

There is also evidence that all humans use a SOV word order in sign or body language in natural conditions (if they weren't introduced to one of the recently-created synthetic thus unnatural sign languages) irrespective of the word order of the language they speak.

I have read these paragraphs several times, and they are self-contradictory. In the Gell-Mann/Puhlen update we have in successive sentences, the single earlier language was SVO and the ancestral language was SOV.

Suggest that general warning (W) sounds were the first 'words', which later evolved into 'subject' words (person, place, thing) for the various types of warnings in the most primitive speech. And these evolved with, not seperate from, action (A) gestures and sounds which later evolved into simple verbs. Declaritive statements grew from simple one sound S to S+V; and Questions grew from simple V (with inflection) to V+S with inflection. Objects were often pointed to or nodded toward or inferred until S's and V's became more complex/numerous and things in the cave got more complex. Only a suggestion. Guess we'd have to experiment with live subjects to really know, and it would take a few centuries.

PS: It did occure to me that in primative speech, the Explitive was probably the very first sound/word, and the warning came immediately thereafter. But then it seemed only natural that the first warming was just that, an explitive.

"The Amerind macrofamily is one of the few that have languages with all six possible orders...Every branch except Almosan contains at least some SOV languages, and in many branches this order is either the only one found or over- whelmingly predominant (Keresiouan, Hokan, Tanoan, Chibchan, Paezan, Andean, Macro-Tucanoan, Macro-Panoan, Macro-Ge)...Given these data, the hypothesis that Proto-Amerind was an SOV language would seem to be the most parsimonious."

"There is also evidence that the two extremely rare word orders, OVS and OSV, derive directly from SOV."

This suggests - at least if we take the paper at face value - that Amerindian languages are the most diverse and the rarest types that are unique to Amerindians (Austronesian and Austric has two more cases of OSV) are derived directly from the ancestral human SOV order. This is consistent with the fact that the greatest diversity of languages as measured in terms of language stocks is in the New World, followed by Papua New Guinea. It's also exactly the kind of picture I observed in Amerindian kinship systems (see Dziebel, The Genius of Kinship, 2007). This flies in the face of ideas that Amerindians recently colonized the New World. In fact, they may be the closest to the root.

@Dienekes and @To

It's true that most linguists think that Ruhlen is a quack. But typology is a whole different matter, as this is work in progress and linguists are just figuring out how to analyze languages typologically on a global scale and how to make diachronic inferences using typological data. So, in this case Ruhlen is just a pioneer with a lot of energy and a solid background in mathematics and linguistics. Notably, Edward Vajda who provided proof for the Na-Dene-Ket connection - the proposal met with wide approval by mainstream linguists - named Ruhlen's work on Na-Dene and Ket as one of his inspirations. Ironically, however, Vajda gives Ruhlen special credit in the instance of the famous cognate for 'birch; birch bark' but in this specific case Ruhlen was likely wrong, as it's a borrowing from Selkup into Ket and not an inherited part of the vocabulary (at least per Lyle Campbell).

"The Amerind macrofamily is one of the few that have languages with all six possible orders...Every branch except Almosan contains at least some SOV languages,"

This contains two good examples of the kind of thing that discredits this paper. "Amerind" is undemonstrated, and there has been plenty of time since it was proposed for someone to have made it look a lot more valid than it so far still does. It's one thing to buck the community, but at some point the voice crying in the wilderness starts to sound like it is preaching a flat earth. You can't be the only one in the formation who is in step. So that's a fail.

Almosan is not a branch of anything, it is a Sprachbund and any similarities in it can hardly shed any light on a proto-language that by definition cannot exist.

Merritt Ruhlen, like his mentor Joseph Greenberg who introduced the S, O, V, typology believes that very distant historical relationships between languages are identifiable. Msny historical linguists may disagree but this point of view is not regarded as that of a "crackpot" or "quack."Having said that I must take exception to the articles' assumption that word order necessarily signals syntax. Word order also has other functions which can be characterized as rhetorical. In many languages inflections do much of the work of signalling syntactical relations and word order plays a mostly rhetorical role.

Another big problem with the phylogeny approach is that we know next to nothing about the early history of human language and the paper isn't very explicit about what assumptions it is based upon in that regard.

Even if language has its origins in H. Sapiens and not earlier, there are 150,000 years or so from H. Sapiens in Africa to Out of Africa give or take a dozen or two millenia, and another 50,000 years from their first appearance in the Levant to signs of a H. Sapiens presence in Australia and New Guinea. A common language's dialects can diversify enough to become mutually unintelligable in something on the order of 1000 years. SVO/SOV/etc. word order is succeptible enough to change that almost every language family for which we have any appreciable sample size has multiple word orders. In the 15,000 year period of Native American languages which originated from a founding population that may have been as small as 75 reproducing adults (even if they had more than one language they can't have spoken all that many languages), every single variation in word order ultimately emerged. The evidence that word order is not stable at great time depth from this study alone is overwhelming.

Also, using individual languages on a more or less equal basis as a unit of analysis is unsound because language families have expanded and diversified at unequal rates. It isn't sound to overweights the Romance languages or Indo-Aryan languages when trying to look at the deep past, because all Romance languages came from a single Latin language in the last 1500 years and all Indo-Aryan languages came from a single Sanskrit language in the last three thousand or so years. You need good information about the linguistic mix at the oldest date possible, including now dead languages, to have any hope of discerning what languages looked like at an earlier period.

While the method of working out a full fledged proto-language as a prerequiste for establishing a typology is probably overkill, the notion that you need to fit your hypothesis to a family tree of languages rather than employing mass comparison on an every language is created equal basis doesn't work unless you have a trait comparison of something like common lexical roots that is so rich that you can use it to infer a family tree. Tracking a single variable with six values that can and have arisen independently isn't sufficiently rich to do that kind of analysis.

A good analysis by Gell-Mann and Ruhlen would have concluded that noise utterly overwhelms signal in this case and that there is no way to know what word order was used in an ultimate proto-language.

I once was cycling along a beach with my dogs (a Jack Russell and a large lurcher). The lurcher was in all the bushes but the Jack Russell stayed close. After a while I noticed a seagull was trailing me and making a noise like a little yappy dog. A few minutes later the lurcher emerged from the bushes and a second seagull started to make a deep big dog noise (completely unlike my lurcher's high pitched yip). We travelled down the beach for some time with the chorus of alarm following us. littledog bigdog, littledog bigdog

My point is that I agree warnings are probably the first words and language is much more widespread than I had imagined at that time.

Having said that I must take exception to the articles' assumption that word order necessarily signals syntax. Word order also has other functions which can be characterized as rhetorical. In many languages inflections do much of the work of signalling syntactical relations and word order plays a mostly rhetorical role.

But there is also a default word order in all languages, and that is what the authors of this paper mean by word order.

Probably not. Nor the one related to the other side of the human 'coin'. It was probably something similar to a warning noise made today by baboons and easy to scream. I think your suggestion and "Holy Moly Batman!" and "Shaaaazam!" came a few generations later. You're on the right track, no bout a'doubt it,

I myself am a bit surprised to see PNAS let through a paper that takes for granted linguistic groupings dismissed by most linguists as spurious. Obviously, first Greenberg's and now Gell-Mann's membership and reputation allowed Ruhlen to leap-frog some barriers to get published. Earlier, PNAS published Ruhlen's paper on Na-Dene-Ket.

On the other hand, the paper could've avoided "Amerind", "Almosan", Khoisan", etc. altogether and mapped the distribution of word order types geographically. For the purposes of the paper, these phylogenetic groupings are irrelevant: they don't add value but they don't invalidate the findings either.

@Andrew

" In the 15,000 year period of Native American languages which originated from a founding population that may have been as small as 75 reproducing adults (even if they had more than one language they can't have spoken all that many languages), every single variation in word order ultimately emerged. The evidence that word order is not stable at great time depth from this study alone is overwhelming."

The rare word order types are also recorded outside of America, namely in Australia, PNG, Oceania and SE Asia. This suggests that it's an ancient pattern of variation largely superseded by expansive SOV, SVO and VSO. If I were to hypothesize what word order variation existed among latest hominids and early humans, I probably wouldn't bet on SOV being the absolute root but I'd say it spanned the whole gamut but then 3 out 6 became dominant on all continents. Notably, these rare types aren't found in Africa or Western Eurasia. So, the diversity is reduced there, which is consistent with other linguistic findings.

"A good analysis by Gell-Mann and Ruhlen would have concluded that noise utterly overwhelms signal in this case"

Your 75 reproducing adults who colonized the New World 15,000 years ago (and killed off all of the megafauna, right?) is a myth. So, it's not even on the level of noise. It's more like the belief that Amerindians originated from the lost tribes of Israel.

It is true that Greenberg and Ruhlen are "pioneers" in the sense you describe. However, the fact is that they tend to avoid most of the hard, painstaking comparative work that historical linguists consider essential - and instead focus on dubious lists of "cognates" as "proof". I can tell you from experience that it takes linguists a lot more "energy" to sort out cognates in the "traditional" way than it takes to assemble such lists.--Also, If Ruhlen himself recognized that this work is only a starting point, fine, linguists wouldn't mind him at all. The real problem, however, is that he and his associates don't. They take these starting points for the "real deal"; and then just repeat their controversial conclusions (often to non-specialist audiences, and typically circumventing peer-review) as if they were almost self-evident.-----------------@Charles

IMO, Your first two claims are incorrect. Most people agree that you can eventually establish long-range genetic relationships. However, there is an accepted method for that (which is hard and demanding, sure, but necessary), and Ruhlen refuses to follow it. And yes, I think it is an undisputable fact that most historical linguists consider him a quack. Now, whether that is fair or not is another matter..

"However, the fact is that they tend to avoid most of the hard, painstaking comparative work that historical linguists consider essential - and instead focus on dubious lists of "cognates" as "proof"."

I agree with the spirit of your comment. If linguistics was the only science in town, your criticism would be 200% valid. At this point, it's probably 90% valid. And the reason it's not 100% valid is because both Greenberg and Ruhlen (and I know him personally, just as well as I know his main detractors) have been influenced by methodologies widely practiced outside of linguistics, especially in biology. Geneticists never sample the whole population - they generally sample .0001 of the population - but they arrive at world-historic conclusions. And we don't even know how many hot spots are out there to be sure with gene identity by descent and not by coincidence.

Again, I completely agree with you when it comes to Greenberg's and Ruhlen's treatment of linguistic data, if I assume a traditional linguistic perspective. What makes it interesting to me is that some very mainstream geneticists eagerly latch onto their classifications to support both out of Africa and the recent peopling of the Americas seeing a lot of correspondence between linguistics and genetics. If we subsume all 140 language stocks in the Americas under "Amerind" we can easily compare it to, say, Y-DNA Q hg that's found everywhere in America. The problem is the Amerind classification is bogus from the traditional linguistic point of view. How can it fit genetic data?

"can tell you from experience that it takes linguists a lot more "energy" to sort out cognates in the "traditional" way than it takes to assemble such lists."

I can tell you from my experience that even traditional linguists don't do due diligence on correctly identifying cognate sets. They routinely avoid patterns of semantic evolution and artificially divorce relatable data into 2 or 3 different cognate sets. I wrote abundantly about this problem with Indo-European reconstructions, but only in Russian at this point. Don't try to open my website - it's been attacked my some cyberspammers and I have't fixed it yet. My apologies.

"I can tell you from my experience that even traditional linguists don't do due diligence on correctly identifying cognate sets. They routinely avoid patterns of semantic evolution and artificially divorce relatable data into 2 or 3 different cognate sets. "

Oh God and isn't it always the same peevologists who whine about people "misusing words" (semantic drift) who are the ones who deny semantic drift in language evolution and insist on denying perfectly obvious cognate sets. By their prissy standards most of the cognate list for Germanic is bunk.

"Oh God and isn't it always the same peevologists who whine about people "misusing words" (semantic drift) who are the ones who deny semantic drift in language evolution and insist on denying perfectly obvious cognate sets. By their prissy standards most of the cognate list for Germanic is bunk."

I love how people without even a family name spitball insults in the name of Science. Just because a few smart men 200 years ago put those obvious Germanic cognate sets together for your free enjoyment, you think you stop being a dud yourself.

Old Blog Archive

Dienekes' Anthropology blog is dedicated to human population genetics, physical anthropology, archaeology, and history.

You are free to reuse any of the materials of this blog for non-commercial purposes, as long as you attribute them to Dienekes Pontikos and provide a link to either the individual blog entry or to Dienekes Anthropology Blog.

Feel free to send e-mail to Dienekes Pontikos, or follow @dienekesp on Twitter.