EUROSPEECH '97
5th European
Conference on Speech Communication and Technology

Rhodes, Greece
September 22-25, 1997

Continuous Speech Recognition Using a Context Sensitive ANN and HMM2s

Nicolas Pican, Jean-Francois Mari, Dominique Fohr

CRIN-CNRS & INRIA Lorraine,
Vandoeuvre-les-Nancy, France

The phonetic context has a large effect on phonemes in a continuous
speech signal [1]. Therefore recognition systems that model
allophones using context-dependent Hidden Markov Models have
been implemented [4]. Second-order HMMs (HMM2s have a great
ability for the segmentation in the temporal domain [6][7] but have
some difficulties in the recognition because the MLE training
(Maximum Likelihood Estimation) is not discriminant, whereas the
discrimination is one of the abilities of the Artificial Neural Networks
models. In the last three years we have developed a new ANN model
named OWE (Orthogonal Weight Estimator)[10][11]. The principle of
the OWE is a ANN that classifies an input pattern according to
contextual environment. This new ANN architecture tackles the
problem of context dependent behaviour training. Roughly, the
principle is based on main MLP (Multilayered Perceptron) in which
each synaptic weight connection value is estimated by another MLP
(an OWE) with respect to context representation. In this paper, we
present 2 hybrid systems for phoneme recognition. In both systems, 48
context independent HMM2s segment the input signal. In the first
system, the OWE performs the labelling of segments and, in the
second system, the OWE outputs are the input frames of the HMM2s.
Experiments on TIMIT range from 56% to 67% accuracies on the 48
phonemes set.