Abstract

The purpose of this paper is to provide insight into how speech is processed by the auditory system, by quantifying the nature of nonsense speechsound confusions. (1) The Miller and Nicely [J. Acoust. Soc. Am. 27(2), 338–352 (1955)] confusion matrix (CM) data are analyzed by plotting the CM elements as a function of the signal-to-noise ratio (SNR). This allows for the robust clustering of perceptual feature (event) groups, not robustly defined by a single CM table, where clusters depend on the sound order. (2) The SNR is then re-expressed as an articulation index (AI), and used as the independent variable. The normalized log scores and then become linear functions of AI, on log-error versus AI plots. This linear dependence may be interpreted as an extension of the band-independence model of Fletcher. (3) The model formula for the average score for the finite-alphabet case is then modified to include the effect of entropy Due to the grouping of sounds with increased SNR (and AI), the sound-groupentropy plays a key role in this performance measure. (4) A parametric model for the confusions is then described, which characterizes the confusions between competing sounds within a group.