Studies have demonstrated that articulatory information can model speech
variability effectively and can potentially help to improve speech recognition
performance. Most of the studies involving articulatory information have focused
on effectively estimating them from speech, and few studies have actually used
such features for speech recognition. Speech recognition studies using
articulatory information have been mostly confined to digit or medium vocabulary
speech recognition, and efforts to incorporate them into large vocabulary systems
have been limited. We present a neural network model to estimate articulatory
trajectories from speech signals where the model was trained using synthetic
speech
signals generated by Haskins Laboratories’ task-dynamic model of speech
production. The trained model was applied to natural speech, and the estimated
articulatory trajectories obtained from the models were used in conjunction with
standard cepstral features to train acoustic models for large-vocabulary
recognition systems. Two different large-vocabulary English datasets were used in
the experiments reported here. Results indicate that employing articulatory
information improves speech recognition performance not only under clean
conditions but also under noisy background conditions. Perceptually motivated
robust features were also explored in this study and the best performance was
obtained when systems based on articulatory, standard cepstral and perceptually
motivated feature were all combined.