A SIMULATION STUDY ON THE USEFULNESS OF BROAD PHONETIC
CLASSIFICATION IN AUTOMATIC SPEECH RECOGNITION.

Gertjan Vernooy
Gerrit Bloothooft
Yvonne van Holsteijn

Research Institute of Language and Speech
University of Utrecht, The Netherlands

Investigations on the use of broad phonetic classes in automatic
speech recognition systems are mostly limited to the level of the broad categories
themselves. Results are reported in terms of number of cohorts, maximum cohort size,
expected cohort size etc. However, the aim of automatic speech recognition is not to
identify broad phonetic classes but individual words. A simulation study, using a 12113
word lexicon of high frequent Dutch words, was conducted to make an inventory of the
additional acoustic information, needed to identify all words of the lexicon uniquely
after broad phonetic classification.

After broad classification the lexicon is divided into a number of
cohorts, which each share a unique sequence of global acoustic labels. When more than one
word is present in a cohort, some further acoustic processing is needed to identify each
word separately. For this, we have to refine one or more of the broad phonetic labels into
finer phonetic categories (possibly phonemes). We found that a complete identification of
all words in a cohort can be obtained after refinement of different subsets of the labels
of the cohort, i.e. there exists a number of different refinement strategies which are all
adequate. In our investigation we examined two criteria to make the best choice out of
these strategies. The first criterion was to minimize the number of different acoustical
refinements, irrespective the types of refinement (including,for instance, an /n/-/m/
distinction). The second criterion was to arrive at acoustical refinements which are
relatively simple to perform (for instance, chose to resolve an /a/-/u/ and a /b/-/d/
distinction above resolving the distinction between /n/-/m/). The latter criterion made
use of data on perceptual confusions between phonemes.

On the basis of these criteria we developed an iterative procedure
which resulted in an optimal hierarchy of acoustical information needed to identify each
word of the lexicon uniquely. We will present the resulting broad phonetic classes and the
additional acoustic refinements needed to identify all words in the cohorts defined by
these classes. This information may be usefull for the design of an acoustic-phonetic
speech recognition system.