NON-NATIVE SPEECH CORPORA FOR THE DEVELOPMENT OF COMPUTER ASSISTED PRONUNCIATION TRAINING SYSTEMS

Although nowadays Automatic Speech Recognition (ASR) is often incorporated in systems for Computer Assisted Language Learning (CALL), difficulties with automatic assessment and error detection have commonly hampered its use as a means for providing detailed, accurate feedback on specific pronunciation problems in the framework of Computer Assisted Pronunciation Training (CAPT) systems. Discussions in the literature have often addressed general aspects such as the type of pronunciation errors that should be addressed and their degree of seriousness, paying little attention to the fact that specific choices need probably to be made for specific combinations of L1 (the learner’s native language) and L2 (the target language). However, research on L2 speech production and perception (Flege, 1995; Best et al., 2001; Escudero, 2005) has provided ample evidence for the fact that learners with different L1’s may experience different problems in acquiring the pronunciation of the L2 and it is now generally acknowledged that pronunciation acquisition by L2 learners is highly dependent on their L1 phonological system. In order to obtain data about the type of learners’ mispronunciations, contrastive analysis of the phonological systems of L1 and L2 proved not to be sufficient (Cucchiarini et al., 2011), because it is based on a priori intuition, does not give any information on frequency and relevance of errors neither on the influence that exposure to L2 input has for the correct acquisition of L2 sounds.In previous research, we have developed and studied innovative ASR-based systems for e-Learning and e-Health (Strik, 2012). For these systems we developed novel speech technology for automatic speech recognition and error detection. In order to develop and optimize such speech technology suitable training material is needed. If information about learner errors is available, it can be used to design pronunciation activities, for evaluating learner’s pronunciation and providing meaningful feedback. In order to obtain quantitative information about the pronunciation difficulties of L2 learners annotated non-native spoken corpora are needed.In this paper we report on research that was carried out with this purpose in mind. We describe and compare the results of two studies aimed at compiling and investigating L2 speech corpora for the purpose of CAPT development for two distinct L1-L2 combinations: 1) L2 Dutch - L1 Spanish, and L2 Spanish - L1 Japanese. Both studies focused on obtaining recurrent patterns and frequency lists of pronunciation errors that should be addressed when designing CAPT systems for the mentioned pairs of languages. Vowel confusion appeared to be the most frequent difficulty for Spanish learners of Dutch L2, due to problems with vowel length, vowel height, and front rounding. Japanese speakers of Spanish L2, on the other hand, showed a tendency to vowel devoicing, confusion between liquid consonants and syllabic reorganization by means of vowel epenthesis, most likely due to the phonotactical constraints in Japanese.The consequence of these large differences is that CAPT systems for specific language pairs should be optimized in different ways. However, this poses methodological issues that should be handled well. These issues will be discussed.