Abstract

Recent work which combines methods from linguistics and evolutionary biology has been fruitful in discovering the history of major language families because of similarities in evolutionary processes. Such work opens up new possibilities for language research on previously unsolvable problems, especially in areas where information from other sources may be lacking. I use phylogenetic methods to investigate Tasmanian languages. Existing materials are so fragmentary that scholars have been unable to discover how many languages are represented in the sources. Using a clustering algorithm which identifies admixture, source materials representing more than one language are identified. Using the Neighbor-Net algorithm, 12 languages are identified in five clusters. Bayesian phylogenetic methods reveal that the families are not demonstrably related; an important result, given the importance of Tasmanian Aborigines for information about how societies have responded to population collapse in prehistory. This work provides insight into the societies of prehistoric Tasmania and illustrates a new utility of phylogenetics in reconstructing linguistic history.

1. Introduction

The indigenous people of Tasmania were severely affected by European settlement in the nineteenth century [1]. Although it is known from ethnographic sources and early reports [2] that Indigenous Tasmanians comprised 48 bands in nine tribes [3,4] (figure 1), the number of languages and their internal phylogenetic relationships have remained a mystery. Previous work [5–9] has identified anywhere from a single language [5] to as many as 12 [6]. Despite the dearth of information about them, Tasmanian Aborigines have long held an important place in anthropology [10–12]. Their toolkit, for example, was the simplest of any attested group in the nineteenth century, and they are often cited as an example of how population collapse may also lead to technological collapse and societal decomplexification [10,11] (the so-called ‘Tasmanian effect’).

Information from language has thus far been underused in studying Tasmanian society; nonetheless, it provides an important window on Tasmanian internal diversity. The linguistic information may, indeed, be the only investigable source for Tasmanian heterogeneity at the level of the whole island. The anthropological, archaeological and genetic data are all insufficient here. Ethnographically, Ryan [4] describes Tasmanian tribes as a single culture bloc with extensive shared practices and beliefs (such as ‘star gods’ and the evil spirit Wrageowrapper), and a common toolkit. Jones [13] provides evidence for a strong cultural boundary between eastern and western Tasmania, but also notes many shared practices across the island. While recognizing nine distinct tribes, both Ryan & Jones [3] focus on the documentation of exchange networks and seasonal travel which reinforce reciprocal links across the island. Other work assumes a monolithic view of Tasmania without discussion [14]. The archaeological record is patchy, with few Pleistocene sites [15]; moreover, Tasmanians did not have a rich material culture and the Tasmanian climate is not conducive to long-term preservation of wooden artefacts. There is, however, some evidence of internal diversity in the archaeological record (for example, the abandonment of rainforest sites after the Late Pleistocene [16] and the expansion of people down the western coast over the last 3000 years [13]). Within genetics, there is not sufficient genetic information to be able to determine any differences between Tasmanian populations, and subsequent history has led to sufficient European admixture that such work is not possible. There is, however, work which compares genetic data from Tasmanians with other populations [17,18], including those from Australia, such as Presser et al. [19], who find evidence of mitochondrial DNA links between Tasmania and the mainland.

It is known that the Tasmanian population underwent a population crash following the flooding of Bass Strait at the end of the Last Glacial Maximum [13], approximately 12 000 years ago. The population remained well below carrying capacity and was only recovering at the time of European colonization. Presumably, the Early Holocene population collapse led to a reduction in linguistic diversity on the island. It is not known, however, whether rates of language diversification were rapid enough to obliterate any evidence of a bottleneck, or whether the current languages and families show a common ancestor which predates the flooding of Bass Strait. If the data show that Tasmanian languages most probably belong to a single family, this would provide good evidence for slow rates of change in small societies, since there is no evidence for population replacement during the Holocene. However, it is also possible that any evidence for a linguistic bottleneck would be obliterated by subsequent linguistic diversification. This would have implications for our interpretation of the closeness of linkages between Tasmanian groups, since populations require isolation for linguistic diversification. I return to these points below.

Given the paucity of island-wide research into genetic, archaeological and ethnographic diversity, language may provide us with the best opportunity of inferring change in prehistory. However, records of Tasmanian languages are poor [5,6]. The 44 known wordlists were recorded between 1777 and 1847. Vocabularies were recorded opportunistically, often with very little information about speakers or locations of recording. They vary in length from a single word to nearly 1040 items and originate from all over the island. In five cases, there is no information about provenance. Backhouse and Walker, for example, recorded vocabularies on Flinders Island from displaced persons of unknown tribal affiliation [5]. Other sources combined, or ‘admixed’, vocabulary from multiple locations, as evidenced both by the number of synonyms given in the lists and from comments from compilers. Other lists contain only general or ambiguous location information.

Previous attempts [1,6,9] to discover the linguistic history of Tasmania are rife with equivocations and are internally irreconcilable, despite being based on identical source material. Roth [1] was convinced that there was a single language, despite quoting considerable ethnographic evidence to the contrary. Walker [20] follows Robinson in arguing for four languages, but does not provide any evidence for this conclusion. Schmidt [9] found two languages, one with three dialects; O'Grady [7] also found ‘at least two languages’ (but not the same two as Schmidt), while Crowley & Dixon [6] argued that source materials are too poor to determine the number of languages, but are detailed enough to tentatively reject a single family. Most authors have not provided objective measures of degrees of cognacy of materials, relying instead on inspection of isolated forms to gauge the extent to which the varieties represent related languages or dialects.

Here I present a new analysis of the extant Tasmanian sources. The aims are to identify the number and composition of discrete linguistic units (languages) in the data, to determine whether they are demonstrably related, and if so, in how many families, to classify the wordlists without information about provenance, and to determine levels of admixture within wordlists. I use several methods from evolutionary biology to systematically investigate the Tasmanian corpus. Such concepts and approaches have proved useful in shedding light on linguistic prehistory [21–23]. Although linguistic traits evolve more rapidly than genetic data, they exhibit many of the same properties and problems [24,25]. Some of these methods have been used on historical records already [23,26], though this work is the first to investigate Tasmanian data in this way, and the first to use methods in population biology to test for admixture in old sources.

2. Results and discussion

Previous work on Tasmanian languages did not estimate the degree of source mixture within vocabularies. This was achieved with Structure [27], a Bayesian clustering algorithm designed to identify admixture. Figure 2 shows the results of the structure analysis for K = 2–5 hypothesized groups, with K = 5 the highest likelihood value. Mean-likelihood values and further discussion are given in the electronic supplementary materials. Structure provides both information of admixture levels and tentative assignments of wordlists of unknown origin to clusters. These include the Norman vocabulary, which appears to group with the northeast languages, and the Lhotsky and Backhouse vocabularies, which are a separate language within the northeastern cluster. The Fisher vocabulary appears to be western, and the vocabularies designated by Plomley as ‘southern’ and ‘northern’ belong predominantly to the western group, with southeastern and northern mixture, respectively. Furthermore, several vocabularies identified as belonging to the Oyster Bay region show substantial mixing with the northeastern region. This mixture is probably what has led previous authors to posit relatedness between the languages of the east coast. Since this pattern is confined to only two sources, it is likely to represent source mixing rather than a genuine fact about the languages.

To investigate language relations further, 26 vocabularies of more than 100 items which did not show evidence of admixture were coded for similarity, and a Neighbor-Net [28] was produced using SplitsTree [29] (figure 3). This network shows five primary clusters: the same groups recovered in the Structure analysis. Furthermore, these clusters nest well with the tribal groups identified by Jones [3] and shown in figure 1. The southeastern/Bruny Island cluster corresponds to the Bruny tribe. The Oyster Bay cluster contains the Oyster Bay and Big River tribes. The northeastern cluster comprises vocabularies associated with the northeastern, Ben Lomond and north midlands tribes. The western cluster includes the northwestern and southwestern tribes, and the northern cluster includes the northern tribe and vocabularies designated by Milligan [30,31] as ‘western’ and ‘northwestern’; these vocabularies were recorded on the Flinders Island mission. Electronic supplementary materials provide further discussion. There is no evidence here to group the southeastern, Oyster Bay and northeastern languages into a single macro-group, contra Schmidt [9] and O'Grady [3,7].

Neighbor-Net of wordlists with high levels missing data and admixture removed, showing five language clusters.

Linguists consider the boundary between languages and dialects often difficult to define in absolute terms. In this respect, languages are similar to biological species [32]. There are no single arbitrary cut-off points for measurements of similarity between languages and dialects and measures based on the percentage of shared vocabulary are unreliable [33]. In the absence of speaker intuitions, however, we are reliant on the use of arbitrary measures of similarity. Here I group together as single ‘language’ vocabularies which have an uncorrected p distance of less than 0.15. These languages are marked on figure 3 within each cluster. They accord well with geographical placement, where known. This measure suggests that 12 languages are represented in the source materials. Two of the languages are represented exclusively by vocabularies recorded on the Flinders Island mission. These are the Milligan vocabularies (SW_W and NW_nw) in the northern cluster and the Milligan Bruny Island vocabulary (SE_s_mj). The Milligan Oyster Bay vocabulary clusters with other Oyster Bay sources and the other late Flinders Island vocabulary (UNK_bkwb), recorded by Backhouse and Walker, clusters with materials recorded earlier. Plomley [5, p. 19] has suggested that Milligan's vocabularies are heavily influenced by the Tasmanian pidgin [8] in use on Flinders Island, and this is most likely (though they could also represent an otherwise unrecorded language).

Neighbor-Nets assume relatedness between all taxa under analysis. Moreover, they cannot be used to date divergences in lineages, or to quantify support for groups except informally. To further explore these topics, a Bayesian maximum-likelihood analysis was conducted in BEAST [34], using a covarion model with base frequencies estimated empirically (figure 4). While the same clusters identified in the previous methods were also identified here with strong support, more remote groups had only very weak support, including the grouping of northern and western vocabularies. The exception is the node joining the Bruny Island and Oyster Bay clusters, which had strong support. Therefore, we can, with a high degree of confidence, recover four Tasmanian macro-families; though there is weak evidence that the languages of the north and west are remotely related, as are, perhaps, the three families of the eastern coast (see the electronic supplementary materials for discussion of cognate words). There is linguistic support here for Jones' [13] archaeological division between eastern and western Tasmania, and the clusters here correspond to groups previously labelled as ‘dialects’ or ‘languages’ in work such as Schmidt [9].

Fifty per cent consensus tree based showing posterior probabilities of internal nodes, based on a Bayesian maximum-likelihood analysis of 2777 data points from 26 non-admixed vocabularies.

Evidence for a single Tasmanian macro-family, however, is non-existent. Only 24 words (out of 3412) are found in all main branches, and most of those are either terms for recently introduced items, such as cattle, or are cultural or mythological terms and thus likely to be borrowed. If the languages were all ultimately descended from a common ancestor, that ancestor was spoken too far in the past to be recoverable. Calibrating the divergence of the western and northern group to Jones' identified spread of people down the west coast of Tasmania 3000 years ago [13] leads to a root age estimate of approximately 8500 years BP. However, root estimate confidence limits are very poor (see electronic supplementary material, figure S5); a better interpretation of the tree is that there is very weak support for any relationship beyond the five clusters already identified.

3. Conclusions

The findings here have implications for wider prehistory. Previous work regarding Tasmanian external relationship [35–37]—whether to Australian languages or as part of an Indo-Pacific family—assumes the unity of a Tasmanian language family, and in fact requires such an assumption. The data, however, do not support this view. There is no evidence at this stage that Tasmanian languages are all related to one another.

Likewise, there is no evidence here that ‘Tasmanian’ is related to the attested Indigenous languages on the Australian mainland. The languages closest to Tasmania in mainland Australia are Pama-Nyungan, a family that most probably spread from northern Australia in the Mid-Holocene [38–41], well after Tasmania was separated from the Australian mainland. If we were to look for Tasmania's nearest linguistic relatives on the mainland, we would perhaps find them among the groups who were replaced by Pama-Nyungan speakers in southern Victoria. However, those languages have been lost without trace.

The results here—12 languages in five clusters—are plausible. Linguistic populations will diversify due to drift once isolated from one another [42]. Thus, for Tasmanian groups to have maintained one or two languages over thousands of years would have required long-standing cohesive social connections over a population numbering several thousands. But what the ethnographic literature clearly describes is a collection of mobile hearth groups of ca 50 people with strong local affiliations (in ‘bands’) and a much looser set of regional affiliations (‘tribes’) with seasonal contact and reciprocal exchange rights [1,3,4,43]. Jones infers clan exogamy but tribal endogamy, which would also reinforce localist [44] rather than island-wide stances and would lead to boundary enforcement between tribes. Unless the ethnographic practices and principles of social organization documented at European settlement arose in the recent past prior to Tasmanian colonization, we would expect to find multiple languages, and—depending on the length of time those social patterns had been in place—multiple language families. The results presented here are thus consistent with the anthropological record, while a single language or language family is not.

It is established here that the Tasmanian linguistic scene was diverse at the time of European settlement. The age of that diversity is difficult to establish. It is possible that the Early Holocene demographic collapse led to a reduction in linguistic diversity. Bayesian dating using calibration from divergence along the western coast implies a common ancestor of 8000 years BP, around the time of the demographic collapse. However, root age estimates from the linguistic data are so poor that such a conclusion should be treated with extreme caution. Alternatively, the attested families could represent diversity preserved from a time previous to the demographic collapse. This would imply that language diversity may persist among small groups, even where cultural diversity is low. The former view (that the diversity is more recent) is more likely, however, since modern attested responses to demographic collapse [45] show clear substantial reductions in linguistic diversity.

4. Methods

All vocabulary materials in this study come from N. J. B. (Brian) Plomley's [5] compilation of manuscript and published materials on the languages of Tasmania. The materials are categorized by recorder or compiler, geographical area, and occasionally by speaker (where that information is known). Plomley grouped words from all sources by a standard gloss, and within a gloss, provided language ‘headwords’; that is, words from different vocabularies which he argued to represent the same form-meaning pair. The headword sets are closely parallel to a character set in computational phylogenetics [46–48]. The electronic supplementary materials provide further discussion and illustration.

To determine which sources likely contained admixture, Structure [27] was used. This is a Bayesian clustering procedure that uses multilocus genotype data both to model population structure and to assign individuals to populations. Individuals may be assigned to more than one population, in which case the proportionate membership in each population is inferred. The algorithm identifies populations which are characterized by specific allele frequencies at the loci under study. In linguistic terms, the algorithm identifies clusters of vocabularies with cognate items. It has previously been used in linguistics by Reesink et al. [21] to infer ancient population clusters in Sahul. Following Reesink et al., each wordlist is treated as an individual, and the character values are analogically equivalent to alleles (see electronic supplementary material information for discussion). In the Structure analysis, data were treated as haploid and no linkage was inferred. The model was a simple admixture model [21]. Run parameters included 35 individuals—the vocabularies under consideration here—with 559 loci. Each locus is an English gloss (examples are in the electronic supplementary material, table S2). Each 100 independent runs were completed for K-values varying between 2 and 20, with a 10 000 iteration burn-in followed by 25 000 reps. Electronic supplementary material, figure S1 (compiled with DISTRUCT v. 1.1 [49]) gives the distribution of log-likelihood scores.

For the Bayesian maximum-likelihood analysis conducted with the software program Beast [34], 26 non-admixed vocabularies were used. Data were 2777 Plomley headwords (out of 3412 possible), coded as present or absent. Headwords which were absent, but for which there was another headword recorded with the same gloss, were coded as 0. Items for which no headword was recorded in that gloss were recorded as missing. The excluded data were those with greater than 70 per cent missing data and terms which refer to acculturation terms (that is, recently introduced items such as guns and cattle). Shared terms for such items reflect rather post-contact loan patterns rather than shared ancestry. Results reported here are from the covarion model; other models are discussed in the electronic supplementary materials.

Acknowledgements

Research was funded by NSF grant BCS-844550 ‘Pama-Nyungan and the prehistory of Australia.’ Plomley's materials were digitized by Tyler Lau. The phylogenetic character sets used for the paper are reported in the electronic supplementary material; original data are published in Plomley's [5]. Thanks to Keith Hunley, Erich Round, Catherine Sheard and Russell Gray for comments.

2004The coherence and distinctiveness of the Pama-Nyungan language family within the Australian linguistic phylum. In Australian languages: classification and the comparative method (eds BowernC., KochH.), pp. 69–92. Amsterdam, The Netherlands: John Benjamins.