In honor of World Digital Preservation Day, members of the University of Texas Libraries’ Digital Preservation team have written a series of blog posts to highlight preservation activities at UT Austin, and to explain why the stakes are so high in our ever-changing digital and technological landscape. This post is part three in a series of five. Read part one and part two.

At AILLA, we are developing guidelines for language researchers and activists that are intended to facilitate the organization and ingestion of their collections of recordings and annotations of Indigenous, and often endangered, languages into digital repositories so that these valuable digital resources can be preserved for the future. One of the areas of focus for these guidelines is on the importance of using open and sustainable file formats to increase the likelihood that digital files can be opened and read in the future. To help explain these ideas, we produced a short animated video that is available under a Creative Commons license on YouTube at https://youtu.be/2JCpg6ICr8M.

Many digital documents are produced using proprietary software, and future users will need to have the same, or similar, software to open the files or read their contents. While documents in proprietary formats can be put into a digital repository so their bitstreams (all the ones and zeroes) are preserved well into the future, the exact copy of the file a user downloads years from now may be impossible to use if the proprietary software it was made with is no longer available. Documents preserved in these non-open and non-sustainable formats then end up like cuneiform tablets: objects whose marks and features have survived a long passage through time but can only be read by a small number of people after considerable effort and study.

Choosing sustainable open formats helps ensure that materials are not just preserved but are accessible and usable into the future, since open-source applications can be more easily built to read files stored in non-proprietary formats.

Featured photo: Howard Reid’s collection of research materials from his ethnographic field work with the Hup in Brazil; photo: S. Kung

Susan Kung, manager of the Archive of the Indigenous Languages of Latin America (AILLA), kicked off work on the new National Endowment for the Humanities grant, Archiving Significant Collections of Endangered Languages: Two Multilingual Regions of Northwest South America (PD-260978-18, Co-PIs Patience Epps and Susan Kung) with a seven-week trip to the UK and France to acquire and begin the work of digitizing three of the eight collections included in the grant.

Kung’s work in the UK relied heavily on collaboration with the Endangered Language Archive (ELAR) at the School of Oriental and African Studies (SOAS), University of London. ELAR, like AILLA, is a digital repository that specializes in providing online access to, and long-term preservation of, multimedia materials in and about endangered indigenous languages. Kung’s trip started in London with a series of meetings at SOAS, where she helped to provide training to researchers in language documentation, archiving, and preservation methodologies, and helped ELAR’s staff plan for its imminent data migration.

From there, Kung headed to Cajarc in the southwest of France to work with Dr. Elsa Gomez-Imbert, a retired researcher from the French National Research Center who conducted linguistic fieldwork in the Colombian Vaupés from 1973 to 2010 on several different languages of the region, including Tatuyo, Barasana, Karapana, Eduria, Bará, and Makuna, all of which are members of the Eastern Tukanoan language family.

Susan Kung & Elsa Gomez-Imbert in Cajarc, France; photo: S. Kung

Kung and Gomez-Imbert spent four days compiling metadata and creating an inventory of Gomez-Imbert’s audio tapes and slides, all of which Kung then transported to London for digitization at SOAS.

Cajarc, France; photo: S. Kung

Back in London, Kung spent a day doing similar work with Dr. Howard Reid, an anthropologist, documentary filmmaker for the BBC, and chair of the Royal Anthropological Institute’s Film Committee, who lived with the hunter-gatherer Hup people in the Amazon basin in 1974–76.

Susan Kung and Howard Reid in London

Howard Reid’s collection of research materials from his ethnographic field work with the Hup in Brazil; photo: S. Kung

Kung finished up the acquisition part of her trip with four days of inventory and metadata work with Dr. Stephen Hugh-Jones, Emeritus Research Associate at the Cambridge University Department of Social Anthropology, at his office in King’s College, Cambridge. Hugh-Jones and his wife, Christine Hugh-Jones, lived with the Barasana people in the Colombian Vaupés in 1968–1971 and again in 1978–1979, along with their two young children on the second occasion. Over the course of 50 years, Hugh-Jones has worked with Barasana, as well as the Bará, Eduria, Makuna, and Tatuyo people in the Colombian Amazon. His research has included ritual, symbolism and mythology, shamanism, kinship, architecture, barter and gift exchange, food and drugs, and ethno-education.

The Hugh-Jones collection consists of born-digital and analog (cassette and open reel) audio recordings, 45 field notebooks, manuscript transcriptions of recordings, photographs and negatives, and an unprecedented accumulation of indigenous artworks. Kung, along with Bernard Howard, the sound technician for the SOAS Linguistics Department, spent three weeks digitizing these collections at SOAS, where Howard concentrated on digitizing the 137 audio tapes (cassettes and open reels) and Kung focused on scanning slides and paper documents.

Bernard Howard, sound technician, SOAS, working with cassette tapes from the collection of Elsa Gomez-Imbert

When it was time for Kung to return to Austin in mid-October, she and Howard had completely finished digitizing two of the three collections—those of Elsa Gomez-Imbert and Howard Reid—and Kung had finished digitizing the indigenous art compiled by the Hugh-Joneses.

Before returning home, Kung returned Reid’s and Gomez-Imbert’s collections to them, and shipped the remainder of the Hugh-Jones collection to AILLA, where it will be digitized during this academic year and then returned to the Hugh-Joneses. Once all the digital files from all three collections have been curated in collaboration with the Gomez-Imbert, Reid, and Hugh-Jones, they will be ingested into AILLA and available for public viewing.

The Archive of Indigenous Languages of Latin America (AILLA) has received a pilot grant from the Humanities Collections and Reference Resources program of the National Endowment for the Humanities. This grant will improve access to some of the archive’s thousands of audio recordings in indigenous languages by supporting pilot efforts to crowdsource the creation of digital texts for manuscript transcriptions and translations that accompany recordings already in AILLA’s collections. Specifically, the grant will support the transcription of materials in the Mixtec languages of Mexico that are included in the MesoAmerican Languages Collection of Kathryn Josserand. These materials include a very broad survey of the grammar and vocabulary of the Mixtec languages spoken in over 100 towns and villages of southern Mexico.

Transcription of Tehuelche, from the AILLA archive of Jorge Suárez

Digital transcriptions will improve users’ access to these materials and will also facilitate their reuse for humanistic and especially linguistic research studying the dialectology of the Mixtec languages, which, decades after these materials were collected, is still not completely understood. They will also contribute to research on the prehistory of the Mixtec-speaking people, who today number almost a half-million in Mexico. One component of the project will be the development of educational modules that will use the transcription task to teach lessons on linguistic transcription, language description, and historical linguistics. This pilot project will also allow AILLA to develop transcription workflows that can be applied to other significant collections of handwritten documents in the archive’s collections.

Pilot project will improve access to a collection of Mixtec audio recordings.

The National Endowment for the Humanities, created in 1965 as an independent federal agency, supports research and learning in history, literature, philosophy, and other areas of the humanities by funding selected, peer-reviewed proposals from around the nation. Additional information about the National Endowment for the Humanities and its grant programs is available at www.neh.gov.

For more information on the AILLA transcription project, contact Ryan Sullivant.

The National Endowment for the Humanities (NEH) has awarded a Documenting Endangered Languages Preservation Grant of $227,365 to Patience Epps and Susan Smythe Kung of the Archive of the Indigenous Languages of Latin America (AILLA) for support of their upcoming project entitled “Archiving Significant Collections of Endangered Languages: Two Multilingual Regions of Northwestern South America.”

The AILLA grant is one among 199 grants, totaling $18.6 million, announced by the NEH on April 9, 2018.

This is a three-year project that will gather together, curate, and digitize a set of eight significant collections of South American indigenous languages, the results of decades of research by senior scholars. The collections will be archived at AILLA, a digital repository dedicated to the long-term preservation of multimedia in indigenous languages. These materials constitute an important resource for further linguistic, ethnographic, and ethnomusicological research, and are of high value to community members and scholars. They include six legacy collections from the Upper Rio Negro region of the northwest Amazon (Brazil and Colombia), and two collections focused on Ecuadorian Kichwa, most notably the Cañar variety.

All of the languages concerned are endangered or vulnerable to varying degrees, and the collections are heavily focused on threatened forms of discourse, such as ritual speech and song. Of the Upper Rio Negro set, the collections of Elsa Gomez-Imbert, Stephen Hugh-Jones, and Arthur P. Sorensen, Jr., include the East Tukanoan languages Bará, Barasana, Eduria, Karapana, Tatuyo, Makuna, and Tukano. The collections of Howard Reid and Renato Athias are focused on Hup, while Reid’s collection also contains a few materials from two languages of the wider region, Nukak and Hotï (yua, isolate). Robin Wright’s collection involves Baniwa. Of the Ecuadorian Kichwa set, Judy Blankenship’s and Allison Adrian’s collections are both focused on Cañar Highland Kichwa, while Adrian’s also includes some material from Loja Highland Kichwa (qvj, Quechua).

The two regions targeted by these collections are highly significant for our understanding of language contact and diversity in indigenous South America. The multilingual Upper Rio Negro region, famous for the linguistic exogamy practiced by some of its peoples, has much to tell us about language contact and maintenance, while Ecuadorian Kichwa varieties can shed light on the dynamics of pre-Colombian language shift. These collections will be made accessible in AILLA in standard formats, and will provide a foundation for further study of these fascinating regions and multilingual dynamics.

The National Endowment for the Humanities, created in 1965 as an independent federal agency, supports research and learning in history, literature, philosophy, and other areas of the humanities by funding selected, peer-reviewed proposals from around the nation. Additional information about the National Endowment for the Humanities and its grant programs is available at www.neh.gov.