D-LUCEA: Curation of the UCU Accent Project Data

DSpace/Manakin Repository

The UCU Accent Project was set up in 2010 to collect a wide variety of non-native and native accents of English in an environment where English is the lingua franca, namely an international liberal arts and sciences college in Utrecht in the Netherlands. The recordings were made longitudinally over the ... read more three years of undergraduate study, and four cohorts of students were recorded in total. This yielded over 1,000 speech recordings over a six-year period in which the development of both native and non-native English accents in a non-native environment can be examined. In order to facilitate sharing the data with the wider research community, the D-LUCEA project undertook to curate the data. For each recording, the relevant concomitant metadata was produced, giving information to users of the database about the speaker, the technical specifications, the kinds of speech material recorded, and so forth. The project was funded by CLARIN, and specific CLARIN tools for curation were made available to us, including the Component Metadata Infrastructure (CMDI). To date, all of the speech data has been processed such that the metadata is available, and research is already running on this corpus, on topics as varied as prosodic convergence, L1 phonetic drift and phone convergence. Further plans include work with speaker recognition, accent recognition and models of language learning such as Flege’s Speech Learning Model, the Critical Theory Hypothesis, and the Perceptual Assimilation Model. show less