The multidimensional phoneme identification model is applied to consonant confusion matrices obtained from 28 postlingually deafened cochlear implant users. This model predicts consonant matrices based on these subjects’ ability to discriminate a set of postulated spectral, temporal, and amplitude speech cues as presented to them by their device. The model produced confusion matrices that matched many aspects of individual subjects’ consonant matrices, including information transfer for the voicing, manner, and place features, despite individual differences in age at implantation,implant experience, device and stimulation strategy used, as well as overall consonant identification level. The model was able to match the general pattern of errors between consonants, but not the full complexity of all consonant errors made by each individual. The present study represents an important first step in developing a model that can be used to test specific hypotheses about the mechanisms cochlear implant users employ to understand speech.

Key features of the voice—fundamental frequency (F0) and formant frequencies (Fn)—can vary extensively among individuals. Some of this variation might cue fitness-related, biosocial dimensions of speakers. Three experiments tested the independent, joint and relative effects of F0 and Fn on listeners’ assessments of the body size, masculinity (or femininity), and attractiveness of male and female speakers. Experiment 1 replicated previous findings concerning the joint and independent effects of F0 and Fn on these assessments. Experiment 2 established frequency discrimination thresholds (or just-noticeable differences, JND’s) for both vocal features to use in subsequent tests of their relative salience. JND’s for F0 and Fn were consistent in the range of 5%–6% for each sex. Experiment 3 put the two voice features in conflict by equally discriminable amounts and found that listeners consistently tracked Fn over F0 in rating all three dimensions. Several non-exclusive possibilities for this outcome are considered, including that voiceFn provides more reliable cues to one or more dimensions and that listeners’ assessments of the different dimensions are partially interdependent. Results highlight the value of first establishing JND’s for discrimination of specific features of natural voices in future work examining their effects on voice-based social judgments.

Electrical field interaction caused by current spread in a cochlear implant was modeled in an explicit way in an acoustic model (the SPREAD model) presented to six listeners with normal hearing. The typical processing of cochlear implants was modeled more closely than in traditional acoustic models by careful selection of parameters related to current spread or parameters that could amplify the electrical field interactions caused by current spread. These parameters were the insertion depth, electrode spacing, electrical dynamic range, and dynamic range compression function. The hypothesis was that current spread could account for the asymptote in performance in speech intelligibility experiments observed at around seven stimulation channels in a number of cochlear implant studies. Speech intelligibility for sentences, vowels, and consonants at three noise levels (SNR of +15 dB, +10 dB, and +5 dB) was measured as a function of the number of spectral channels (4, 7, and 16). The SPREAD model appears to explain the asymptote in speech intelligibility at seven channels for all noise levels for all speech material used in this study. It is shown that the compressive amplitude mapping used in cochlear implants can have a detrimental effect on the number of effective channels.

When a target-speech/masker mixture is processed with the signal-separation technique, ideal binary mask (IBM), intelligibility of target speech is remarkably improved in both normal-hearing listeners and hearing-impaired listeners. Intelligibility of speech can also be improved by filling in speech gaps with un-modulated broadband noise. This study investigated whether intelligibility of target speech in the IBM-treated target-speech/masker mixture can be further improved by adding a broadband-noise background. The results of this study show that following the IBM manipulation, which remarkably released target speech from speech-spectrum noise, foreign-speech, or native-speech masking (experiment 1), adding a broadband-noise background with the signal-to-noise ratio no less than 4 dB significantly improved intelligibility of target speech when the masker was either noise(experiment 2) or speech(experiment 3). The results suggest that since adding the noise background shallows the areas of silence in the time-frequency domain of the IBM-treated target-speech/masker mixture, the abruption of transient changes in the mixture is smoothed and the perceived continuity of target-speech components becomes enhanced, leading to improved target-speech intelligibility. The findings are useful for advancing computational auditory scene analysis, hearing-aid/cochlear-implant designs, and understanding of speech perception under “cocktail-party” conditions.

Voice command sound pressure levels (SPLs) were recorded at distances up to 1500 m. Received SPLs were related to the meteorological condition during sound propagation and compared with the outdoor sound propagation standard ISO 9613-2. Intelligibility of received signals was calculated using ANSI S3.5. Intelligibility results for the present voice command indicate that meteorological condition imposes little to no effect on intelligibility when the signal-to-noise ratio (SNR) is low (<−9 dB) or high (>0 dB). In these two cases the signal is firmly unintelligible or intelligible, respectively. However, at moderate SNRs, variations in received SPL can cause a fully intelligible voice command to become unintelligible, depending on the meteorological condition along the sound propagation path. These changes in voice command intelligibility often occur on time scales as short as minutes during upward refracting conditions, typically found above ground during the day or upwind of a sound source. Reliably predicting the intelligibility of a voice command in a moderate SNR environment can be challenging due to the inherent variability imposed by sound propagation through the atmosphere.