Abstract

Phoneme posterior probabilities estimated using Multi-Layer Perceptrons
(MLPs) are extensively used both as acoustic scores and
features for speech recognition. In this paper we explore a different
application of these posteriors - as phonetic event detectors for
speech recognition. We show how these detectors can be built to reliably
capture phonetic events in the acoustic signal by integrating
both acoustic and phonetic information about sound classes. These
event detectors are used along with Segmental Conditional Random
Fields (SCRFs) to improve the performance of speech recognition
systems on the Broadcast News task.