Jeju Island, Korea
October 4-8, 2004

Antoine Raux

Carnegie Mellon University, USA

This paper describes a method to improve speech recognition for
non-native speech in a spoken dialogue system. Based on very
general rules about possible vocalic substitutions, the frequency of
occurrence of each substitution in different phonetic contexts is
estimated on a small set of recordings. The most frequently observed
substitutions are applied to the lexicon of the recognizer. Speakers in
the training set are automatically clustered according to their
preferred phonetic variants, and a specific lexicon is built for each
cluster. Acoustic adaptation is also performed on each cluster.
Experiments show that lexical adaptation provides a 16.4% relative
WER reduction over acoustic adaptation alone. Lexical clustering can
further reduce WER if the system can reliably select the cluster best
matching each input utterance.