“In this talk I will discuss ways in which language documentation (LD) is aligning linguistic methods with aims more generally being adopted by Digital Humanists. Descriptive linguistics arose out of the structuralist tradition and has mainly been concerned with writing linguistic analyses of narrowly focused phenomena and, occasionally, of lesser-known languages. In the recent past a theory of language documentation has developed, partly in reaction to the abstraction of, for example, the minimalism of Chomsky and his followers, but also in response to the needs of people we work with in the field.

LD can be seen as delivering a new kind of linguistics by acknowledging the partiality of the data on which an analysis is based, and in explicitly inserting the linguist into the process of recording, annotating and preparing the corpus for scrutiny by others. It also emphasises the presentation of the context of an utterance in contrast with the earlier practice of basing a theoretical point on a decontextualised example sentence. All of this can be seen as increasing self-reflexivity and contingency in linguistic analysis, but, at the same time it is building a set of data that can be used in a scientific method of data gathering, hypothesis development and testing and analysis and in which critical aspects are: replication of analyses, external review of collections and processes, and data recording and sharing.

In this context I will present the Pacific and Regional Archive for Digital Sources in Endangered Cultures (PARADISEC) as an example of humanities scholars taking responsibility for building the necessary infrastructure to curate research outputs. In this way the records can be cited and reused, both by researchers and by the speakers and others who want to access them. While we do not have the programming skills to build the necessary computational infrastructure, we have developed the interlingua required to work with programmers and to successfully attract the funding to support the various projects around PARADISEC. Our experience suggests that computational infrastructure is necessary but not sufficient to ensure that LD data can be secured. We must also advocate new methods and train practitioners to understand why they should create the data properly to begin with.”