LVCSR-BASED LANGUAGE IDENTIFICATION
Tanja Schultz, Ivica Rogina, Alex Waibel
Interactive Systems Laboratories
University of Karlsruhe (Germany)
Carnegie Mellon University (USA)
published at: ICASSP 96
Automatic language identification is an important problem in
building multilingual speech recognition and understanding
systems. Building a language identification module for four
languages we studied the influence of applying different levels
of knowledge sources on a large vocabulary continuous speech
recognition (LVCSR) approach, i.e. the phonetic, phonotactic,
lexical, and syntactic-semantic knowledge.
The resulting language identification (LID) module can
identify spontaneous speech input and can be used as a front-end
for our multilingual speech-to-speech translation system JANUS-II.
A comparison of five LID systems showed that the incorporation of
lexical and linguistic knowledge reduces the language identification
error for the 2-language tests up to 50%.
Based on these results we build a LID module for German, English,
Spanish, and Japanese which yields 84% identification rate on the
Spontaneous Scheduling Task (SST).