Can linguistic features reveal time depths as deep as 50,000 years ago?

Throughout much of our history language was transitory, existing only briefly within its speech community. The invention of writing systems heralded a way of recording some of its recent history, but for the most part linguists lack the stone tools archaeologists use to explore the early history of ancient technological industries. The question of how far back we can trace the history of languages is therefore an immensely important, and highly difficult, one to answer. However, it’s not impossible. Like biologists, who use highly conserved genes to probe the deepest branches on the tree of life, some linguists argue that highly stable linguistic features hold the promise of tracing ancestral relations between the world’s languages.

Previous attempts using cognates to infer the relatedness between languages are generally limited to predictions within the last 6000-10,000 years. In the present study, Greenhill et al (2010) decided to examine more stable linguistic features than the lexicon, arguing:

If some typological features are consistently stable within language families, and resistant to borrowing, then they might hold the key to uncovering relationships at far deeper levels than previously possible. For example, Nichols (1994) uses typological features to argue for a spread of languages and cultures around the Pacific Rim, connecting Australia, Papua New Guinea, Asia, Russia, Siberia, Alaska and the western coasts of North and South America. If this is correct, then these typological features must be reflecting time depths at least 16 000 years and possibly as deep as 50 000 years ago

Still, to really get the most information possible, it’s best to use a large corpus reflecting the diversity of the world’s languages. This is where the World Atlas of Language Structures (WALS) comes in: it contains a vast body of information about 141 typological features across 2561 languages. It’s a great resource, comparable to the online tools available for geneticists, with Greenhill et al employing phylogenetic analyses of this typological data. They break up their approach into three parts. First, a network method is applied to the observed patterns of typological variability in an effort to find any deep signals within the data. Second, they “quantify the fit of typological and lexical features onto known family trees for two of the world’s largest and best-studied language families — Indo European and Austronesia”. Lastly, they estimate the rates of evolution for typological and lexical features within these families, subsequently comparing the two.

Using a network technique (see figure below), the authors are able to visualise the divergence between languages by looking at the length of the branches, with the box-like structures representing a conflict between signals when certain typological features support incompatible language groupings. So if typological features are stable, then we would expect to see instances where known linguistic history is displayed in the groupings, whilst having a relatively minimal amount of conflicting signals. Conversely, those typological features tending to evolve too rapidly, or undergo diffusion between adjacent languages, will produce a star-like network — creating many boxes and lots of clustering.

So how did they fare? Well, the network shown above does group some of the languages into known families, as shown in Indo-European, Altaic and Nakh-Daghestanian. In other instances, however, the language families were not recovered — including, Sino-Tibetan, Uralic, and Trans-New Guinea. They also note that there are substantial number of conflicting signals (box-like structures), leading to an inaccurate recovery of many well-attested phylogenetic relationships within major language families. An example being the network linking German to French, when in fact German is more closely related to English. There do exist high level clusters in the data, including languages from continental Eurasia, which may suggest an ancient common ancestry. This is consistent with the hypothesis that typological features evolve slowly enough to allow linguists to identify deep historical relationships, but as the authors note:

[…] phylogenetic networks cannot distinguish between similarity owing to common ancestry and similarity owing to areal diffusion or chance resemblances arising through independent innovation… If some typological features are highly stable and good indicators of common ancestry, then we would expect them (i) to fit well with established language groupings and (ii) to show slower rates of change than lexical features as a whole.

To assess the shape of language evolution they fitted typological and lexical data onto the established family trees. In both Indo-European and Austronesian, the lexical data provided a significantly better fit to the expected family trees than the typological data, with lexical networks displaying a much more tree-like signal. Now, as for estimating the rates of change, they calculated the maximum-likelihood estimate for the rate of evolution across the posterior distribution of trees in each family:

In both families, the distributions of lexical and typological rates are comparable. The similar ranges evident in these plots indicate that there is in fact no substantial difference between the slowest rates of lexical and typological change in either family.

Lastly, they found that, in agreement with their previous research, rates of lexical change are correlated across language families. In contrast, the rate of typological feature change shows no significant correlation between Indo-European and Austronesian. The general conclusion from this absence of correlation suggests there are not any sets of universally stable typological features. In fact, their analysis of rates of evolution failed to identify any typological features that evolve at consistently slower rates than the basic lexicon. Assuming, then, that the signal in the lexicon does stretch back 10,000 years, the authors suggest the typological data is constrained across a similar temporal horizon. And this is not the only difficulty in inferring deep ancestral relationships. First, there are high rates of homoplasy across the typological features. So shared typological features are even less reliable an indication of common ancestry than shared basic vocabulary. Second, those languages situated geographically close to one another might undergo diffusion:

This can occur through processes like language shift (Thomason & Kaufman 1988)–where speakers of one language change to another owing to societal influences, yet retain morphology or phonology from their original language, or metatypy (Ross 1996)–where a language rearranges some aspect of typology (e.g. morphosyntax) owing to contact between languages without explicit borrowing between the languages, usually as an outcome of intimate cultural contact.

Ultimately, I think the current study highlights how little linguists know about the shape and tempo of language change. Contrary to the notion that structural elements of language change on a near-glacial time scale, it appears structural change is comparable to lexical change; and nor does it limit the diffusion of features between languages. Another important finding is the difference between the rates of structural evolution in language families. According to Greenhill et al, just as frequency of use is crucial in lexical change, so it may be true for the use of different structural elements in determining structural change. Complicating the situation somewhat is whereas word use is relatively constant across languages, structural features are dependent on what other structural constraints are operating within a language. It might invite incredulity, but despite the present problems outlined I do think future studies will be able to use phylogenetic methods, and the increasing body of data available, to test specific hypotheses relating to the underlying mechanisms driving the shape and tempo of language evolution.