Abstract

A number of alternate spectral representations have been suggested for vowel spectra [see H. Hermansky, J. Acoust. Soc. Am. 87, 1738–1752 (1990)]. To better evaluate the perceptual relevance of some of these, 972 vowels were synthesized. The stimuli were each 115 ms in duration with a falling F0 contour (125–100 Hz). F1 ranged (in 0.5 Bark steps) from 250 to 760, F2 from 750 to 2260, and F3 from 1360 to 3080 Hz. F4 and F5 were fixed at 3500 and 4500 Hz, respectively. (Constraints were placed on formant separations to ensure relatively natural stimuli.) Fifteen speakers of Western Canadian English categorized the stimuli as the vowels /i, i, e, eh, æ, inverted vee, inverted open aye, o, u, u, hooked backward eh/. Preliminary results indicate that while nominal synthesis formant frequencies can provide a relatively good fit to the data, alternate representations such as cepstral coefficients based on Hermansky’s PLP analysis may provide moderate improvements of fit. However, linear transformations of the PLP cepstra show strong correlations with formant frequencies [similar to those noted by D. Broad and F. Clermont, J. Acoust. Soc. Am. 86, 2013–2017 (1985)]. [Work supported by SSHRC.]