The Voice Bank Corpus: Design, Collection and Data Analysis of a Large Regional Accent Speech Database

Publication Type

Journal Article

Abstract

The University of Edinburgh has started the development of a new speech database, the Voice Bank corpus, specifically designed for the creation of personalised synthetic voices for individuals with speech disorders. This corpus already constitutes the largest corpora of British English currently in existence, with more than 300 hours of recordings from approximately 500 healthy speakers. New recordings are continuously being made in order to get the best coverage of the different combinations of regional accents, social classes, age and gender across Britain. This paper describes the motivation and the processes involved in the design and recording of this corpus as well as some analysis of its content. The paper concludes with our future plans to further extend this corpus and to overcome its current limitations.