Project Members

This research theme served to support the related theme on Cross-Disciplinary Prehistory. It did so by focusing on the single most basic key to the linguistic record of population (pre)history, namely language relatedness, and developing new quantitative approaches to address both of its main facets:

To help evaluate whether given language lineages do or do not stem from a common origin…

And if they do, to measure as finely as possible just how closely they are related, or in other words, how far they have diverged from their common ancestor.

Once we can produce meaningful measures of language difference and divergence, they can then feed into the growing trend for taking the latest techniques for phylogenetic, probabilistic and statistical analysis, drawn originally from the biological sciences, and applying them now to data on language divergence, to help analyse and represent how languages relate to each other. These are extremely powerful tools with undoubted potential for clarifying language (pre)histories, even if applying them to language ‘evolution’ is by no means problem‑free.

Often most appropriate are analyses of the type not restricted to modelling language relationships only in terms of a ‘family tree’ with binary branches, but which can instead also visualise networks. These are often a more realistic representation of how language varieties actually relate to each other, and indeed of the underlying processes in the real-world that shaped those relationships in the first place (see Heggarty et al. 2010). These language data, and new quantitative tools for analysing them, can open up valuable new perspectives on the linguistic signals that survive from past relationships between human populations, which can then be compared and combined with those of other disciplines.

Moreover, however sophisticated these ‘number‑crunching’ tools may be, their results can only ever be as good as the numbers we feed into them in the first place. This research then centred on this critical prior stage, of how best to ‘encode’ and put numbers on real language data, with all its complexities. Language is an inherently non‑numerical phenomenon, so to be realistic all we can aspire to is a most meaningful approximation to it in figures. A key concern is to ensure that our measures are appropriately weighted against each other, in terms of their respective real linguistic significance. Quite how to assess that is a core question for this research, as it aims to establish basic methodological principles for language quantification, as advanced in Heggarty (2006), Heggarty (2010), and a chapter in preparation for the forthcoming Oxford Handbook of Diachronic and Historical Linguistics.

This research theme has developed its own new techniques for measuring distances between language varieties. Some of these do look to the traditional data source of lexical semantics, although they are purposely devised to be very different indeed from traditional ‘lexicostatistics’, with its many known weaknesses (see Heggarty 2010). The novel techniques developed here aim to help assess whether given language families are or are not related to each other in the first place (e.g. Heggarty 2010, Heggarty 2011).

Principally, though, this theme looks to phonetics, where divergence can be measured to an especially fine-grained level: between languages closely related to each other within the same family, and indeed down to the dialect and accent level. For any one family, the raw data collected are recordings of specific sets of common words, cognate (i.e. directly related) across all language varieties within that family. These are first transcribed phonetically, and the sounds within any one word then matched up against those in the corresponding pronunciations in all other languages in that family. This matching is achieved through node forms that encapsulate the basic knowledge of the common ancestral language from which that family derived. As an example, the individual sounds in French étoile, Spanish estrella and Romanian steauă (all meaning star) are matched up against each other through their respective relationships of derivation from the sounds within the original Latin stella(m) from which each of the modern words derives. This matching is highly automated by the phonetic analysis programme used, but always followed by expert linguist revision. From these transcriptions and matchings as input data, a purposely developed programme produces measures of how far these languages have diverged from each other in phonetics, which in turn provides a perspective on how the Romance family developed through (pre)history.

This approach has been explored so far particularly for varieties of English worldwide and through history, and more widely across dialects of Germanic (e.g. Heggarty et al. 2010, Maguire et al. 2010). In ongoing work, analysis is now underway of similar databases already collected for further studies on the Slavic and Romance families, and particularly a further specific project on the Sounds of the Andean Languages.