Rhodes, Greece
September 22-25, 1997

Viterbi Based Splitting of Phoneme HMM's

Luis Javier Rodriguez, Ines M. Torres

Continuous Speech Recognition Systems (CSR) usually include large
sets of context dependent units to model contextual variations in the
pronunciation of phones. The goal of this work was to obtain adequate
sets of sub-lexical models by using acoustic information but excluding
any previous phonological knowledge. At each iteration of a classical
Viterbi training scheme each acoustic model was split into a set of
more accurate models. This approach was evaluated over a Spanish
acoustic phonetic decoding task. The experimental results showed that
this approach produces similar recognition rates than classical
triphones.