Cochlear hearing loss is often associated with a loss of basilar-membrane (BM) compression, which in turn may contribute to degraded processing of suprathreshold stimuli. Behavioral estimates of compression may therefore be useful as long as they are valid over a wide range of levels and frequencies. Additivity of forward masking (AFM) may provide such a measure, but research to date lacks normative data from normal-hearing (NH) listeners at high sound levels, which is necessary to evaluate data from hearing-impaired (HI) listeners. The present study measuredAFM in six NH listeners for signal frequencies of 500, 1500, and 4000 Hz in the presence of background noise, designed to elevate signal thresholds to levels similar to those experienced by HI listeners. Results consistent with compressive BM responses were found for all six listeners at 500 Hz, five listeners at 1500 Hz, but only two listeners at 4000 Hz. Further measurements in the absence of background noise also indicated a lack of consistent compression at 4000 Hz at higher signal levels, in contrast to earlier results collected at lower levels. A better understanding of this issue will be required before AFM can be used as a general behavioral estimate of BM compression.

This paper describes a neurocognitive model of pitch segregation in which it is proposed that recognition mechanisms initiate early in auditory processing pathways so that long-term memory templates may be employed to segregate and integrate auditory features. In this model neural representations of pitch height are primed by the location and pattern of excitation across auditory filter channels in relation to long-term memory templates for common stimuli. Since waveform driven pitch mechanisms may produce information at multiple frequencies for tonal stimuli, pitch priming was assumed to include competitive inhibition that would allow only one pitch estimation at any time. Consequently concurrent pitch information must be relayed to short-term memory via a parallel mechanism that employs pitch information contained in the long-term memory template of the chord. Pure tones, harmonic complexes and two pitch chords of harmonic complexes were correctly classified by the correlation of templates comprising auditory nerve excitation and off-frequency inhibition with the excitation patterns of stimuli. The model then replicated behavioral data for pitch matching of concurrent vowels. Comparison of model outputs to the behavioral data suggests that inability to recognize a stimulus was associated with poor pitch segregation due to the use of inappropriate pitch priming strategies.

Two experiments investigated listeners’ ability to use a difference of two semitones in fundamental frequency (F0) to segregate a target voice from harmonic complex tones, with speech-like spectral profiles. Masker partials were in random phase (experiment 1) or in sine phase (experiment 2) and stimuli were presented over headphones. Target’s and masker’s harmonicity were each distorted by F0 modulation and reverberation. The F0 of each source was manipulated (monotonized or modulated by 2 semitones at 5 Hz) factorially. In addition, all sources were presented from the same location in a virtual room with controlled reverberation, assigned factorially to each source. In both experiments,speech reception thresholds increased by about 2 dB when the F0 of the masker was modulated and increased by about 6 dB when, in addition to F0 modulation, the masker was reverberant. Masker partial phases did not influence the results. The results suggest that F0-segregation relies upon the masker’s harmonicity, which is disrupted by rapid modulation. This effect is compounded by reverberation. In addition, F0-segregation was found to be independent of the depth of masker envelope modulations.

In many experiments on comodulation masking release (CMR), both across- and within-channel cues may be available. This makes it difficult to determine the mechanisms underlying CMR. The present study compared CMR in a flanking-band (FB) paradigm for a situation in which only across-channel cues were likely to be available [FBs placed distally from the on-frequency band (OFB)] and a situation where both across- and within-channel cues might have been available (proximally spaced FBs, for which larger CMRs have previously been observed). The use of across-channel cues was selectively disrupted using a manipulation of auditory grouping factors, following Dau et al. [J. Acoust. Soc. Am. 125, 2182–2188(2009)] and the use of within-channel cues was selectively disrupted using a manipulation called “OFB reversal,” following Goldman et al. [J. Acoust. Soc. Am. 129, 3181–3193 (2011)]. The auditory grouping manipulation eliminated CMR for the distal-FB configuration and reduced CMR for the proximal-FB configuration. This may indicate that across-channel cues are available for proximal FB placement. CMR for the proximal-FB configuration persisted when both manipulations were used together, which suggests that OFB reversal does not entirely eliminate within-channel cues.

Spectrally shaped steady noise is commonly used as a masker of speech. The effects of inherent random fluctuations in amplitude of such a noise are typically ignored. Here, the importance of these random fluctuations was assessed by comparing two cases. For one, speech was mixed with steady speech-shaped noise and N-channel tone vocoded, a process referred to as signal-domain mixing (SDM); this preserved the random fluctuations of the noise. For the second, the envelope of speech alone was extracted for each vocoder channel and a constant was added corresponding to the root-mean-square value of the noise envelope for that channel. This is referred to as envelope-domain mixing (EDM); it removed the random fluctuations of the noise. Sinusoidally modulated noise and a single talker were also used as backgrounds, with both SDM and EDM. Speech intelligibility was measured for N = 12, 19, and 30, with the target-to-background ratio fixed at −7 dB. For SDM, performance was best for the speech background and worst for the steady noise. For EDM, this pattern was reversed. Intelligibility with steady noise was consistently very poor for SDM, but near-ceiling for EDM, demonstrating that the random fluctuations in steady noise have a large effect.

Spectral density (D), defined as the number of partials comprising a sound divided by its bandwidth, has been suggested as cue for the identification of the size and shape of sound sources. Few data are available, however, on the ability of listeners to discriminate differences in spectral density. In a cue-comparison, forced-choice procedure with feedback, three highly practiced listeners discriminated differences in the spectral density of multitone complexes varying in bandwidth (W = 500–1500 Hz), center frequency (fc = 500–2000 Hz), and number of tones (N = 6–31). To reduce extraneous cues for discrimination, the overall level of the complexes was roved, and the frequencies were drawn at random uniformly over a fixed bandwidth and center frequency for each presentation. Psychometric functions were obtained relating percent correct discrimination to ΔD in each condition. For D < 0.02 Hz−1, the steepness of the functions remained constant across conditions, but for D > 0.02 Hz−1, they increased with D. The increase, moreover, was accompanied by a reduction in the upper asymptote of the functions. The data were well fit by a model in which spectral density discrimination is determined by the frequency separation of components on an equivalent rectangular bandwidth scale, yielding a roughly constant Weber fraction of ΔD/D = 0.3.

The relationship between the ability to hear out partials in complex tones,discrimination of the fundamental frequency (F0) of complex tones, and frequency selectivity was examined for subjects with mild-to-moderate cochlear hearing loss. The ability to hear out partials was measured using a two-interval task. Each interval included a sinusoid followed by a complex tone; one complex contained a partial with the same frequency as the sinusoid, whereas in the other complex that partial was missing. Subjects had to indicate the interval in which the partial was present in the complex. The components in the complex were uniformly spaced on the ERBN-number scale. Performance was generally good for the two “edge” partials, but poorer for the inner partials. Performance for the latter improved with increasing spacing. F0 discrimination was measured for a bandpass-filtered complex tone containing low harmonics. The equivalent rectangular bandwidth (ERB) of the auditory filter was estimated using the notched-noise method for center frequencies of 0.5, 1, and 2 kHz. Significant correlations were found between the ability to hear out inner partials, F0 discrimination, and the ERB. The results support the idea that F0 discrimination of tones with low harmonics depends on the ability to resolve the harmonics.

The analysis of musical signals to extract audio descriptors that can potentially characterize their timbre has been disparate and often too focused on a particular small set of sounds. The Timbre Toolbox provides a comprehensive set of descriptors that can be useful in perceptual research, as well as in music information retrieval and machine-learning approaches to content-based retrieval in large sounddatabases.Sound events are first analyzed in terms of various input representations (short-term Fourier transform, harmonic sinusoidal components, an auditory model based on the equivalent rectangular bandwidth concept, the energy envelope). A large number of audio descriptors are then derived from each of these representations to capture temporal, spectral, spectrotemporal, and energetic properties of the sound events. Some descriptors are global, providing a single value for the whole sound event, whereas others are time-varying. Robust descriptive statistics are used to characterize the time-varying descriptors. To examine the information redundancy across audio descriptors, correlational analysis followed by hierarchical clustering is performed. This analysis suggests ten classes of relatively independent audio descriptors, showing that the Timbre Toolbox is a multidimensional instrument for the measurement of the acoustical structure of complex sound signals.

The factors influencing the stream segregation of discrete tones and the perceived continuity of discrete tones as continuing through an interrupting masker are well understood as separate phenomena. Two experiments tested whether perceived continuity can influence the build-up of stream segregation by manipulating the perception of continuity during an induction sequence and measuring streaming in a subsequent test sequence comprising three triplets of low and high frequency tones (LHL-). For experiment 1, a 1.2-s standard induction sequence comprising six 100-ms L-tones strongly promoted segregation, whereas a single extended L-inducer (1.1 s plus 100-ms silence) did not. Segregation was similar to that following the single extended inducer when perceived continuity was evoked by inserting noise bursts between the individual tones. Reported segregation increased when the noise level was reduced such that perceived continuity no longer occurred. Experiment 2 presented a 1.3-s continuous inducer created by bridging the 100-ms silence between an extended L-inducer and the first test-sequence tone. This configuration strongly promoted segregation. Segregation was also increased by filling the silence after the extended inducer with noise, such that it was perceived like a bridging inducer. Like physical continuity, perceived continuity can promote or reduce test-sequence streaming, depending on stimulus context.

Compression in the basilar-membrane input–output response flattens the temporal envelope of a fluctuating signal when more gain is applied to lower level than higher level temporal components. As a result, level-dependent changes in gap detection for signals with different depths of envelope fluctuation and for subjects with normal and impaired hearing may reveal effects of compression. To test these assumptions, gap detection with and without a broadband noise was measured with 1 000-Hz-wide (flatter) and 50-Hz-wide (fluctuating) noise markers as a function of marker level. As marker level increased, background level also increased, maintaining a fixed acoustic signal-to-noise ratio (SNR) to minimize sensation-level effects on gap detection. Significant level-dependent changes in gap detection were observed, consistent with effects of cochlear compression. For the flatter marker, gap detection that declines with increases in level up to mid levels and improves with further increases in level may be explained by an effective flattening of the temporal envelope at mid levels, where compression effects are expected to be strongest. A flatter effective temporal envelope corresponds to a reduced effective SNR. The effects of a reduction in compression (resulting in larger effective SNRs) may contribute to better-than-normal gap detection observed for some hearing-impaired listeners.

This work was aimed at determining whether binaural interference occurs in electric hearing, and if so, whether it occurs as a consequence of perceptual grouping (central explanation) or if it is related to the spread of excitation in the cochlea (peripheral explanation). Six bilateral cochlear-implant listeners completed a series of experiments in which they judged the lateral position of a target pulse train, lateralized via interaural time or level differences, in the presence of an interfering diotic pulse train. The target and interferer were presented at widely separated electrode pairs (one basal and one apical). The results are broadly similar to those reported for acoustic hearing. Alllisteners but one showed significant binaural interference in at least one of the stimulus conditions. In all cases of interference, a robust recovery was observed when the interferer was presented as part of an ongoing stream of identical pulse trains, suggesting that the interference was at least partly centrally mediated. Overall, the results suggest that both simultaneous and sequential grouping mechanisms operate in electric hearing, at least for stimuli with a wide tonotopic separation.

The present study examined the effect of combined spectral and temporal enhancement on speech recognition by cochlear-implant(CI) users in quiet and in noise. The spectral enhancement was achieved by expanding the short-term Fourier amplitudes in the input signal. Additionally, a variation of the Transient Emphasis Spectral Maxima (TESM) strategy was applied to enhance the short-duration consonant cues that are otherwise suppressed when processed with spectral expansion. Nine CI users were tested on phoneme recognition tasks and ten CI users were tested on sentence recognition tasks both in quiet and in steady, speech-spectrum-shaped noise. Vowel and consonant recognition in noise were significantly improved with spectral expansion combined with TESM. Sentence recognition improved with both spectral expansion and spectral expansion combined with TESM. The amount of improvement varied with individual CI users. Overall the present results suggest that customized processing is needed to optimize performance according to not only individual users but also listening conditions.