Abstract

Individuals with Asperger syndrome (AS) often have difficulties in perceiving speech in noisy environments. The present study investigated whether this might be explained by deficient auditory stream segregation ability, that is, by a more basic difficulty in separating simultaneous sound sources from each other. To this end, auditory event-related brain potentials were recorded from a group of school-aged children with AS and a group of age-matched controls using a paradigm specifically developed for studying stream segregation. Differences in the amplitudes of ERP components were found between groups only in the stream segregation conditions and not for simple feature discrimination. The results indicated that children with AS have difficulties in segregating concurrent sound streams, which ultimately may contribute to the difficulties in speech-in-noise perception.

In clinical practice, difficulties in perceiving speech in noisy situations are often reported by individuals with AS or their parents. However, research on this topic has so far been quite sparse. Alcantara et al. (2004) showed that adults with autism spectrum disorders required a higher signal-to-noise ratio in order to be able to perceive and repeat sentences during various kinds of background noise. This result was later replicated, and extended to the repetition of nonsense syllables (Alcantara et al., 2006). Teder-Sälejärvi et al. (2005) used both behavioral methods and event-related brain potentials (ERP) to investigate whether high-functioning adults with autism are able to focus on sounds coming from a certain spatial location, while ignoring similar sounds coming from different locations. The subjects heard eight streams of complex sounds at the same time, each coming from different spatial locations, and were instructed to press a response button to target sounds coming from a designated location. The subjects with autism were found to be less able to focus on the relevant sound source, and it was therefore concluded that their auditory selective attention skills were deficient. However, because the ability to segregate the sounds to multiple streams was inherent in this task, an alternative interpretation of this result is that there was an underlying impairment in auditory streaming abilities.

The fundamental ability to distinguish between different speakers and to hear a single voice in a crowd is referred to as auditory scene analysis (Bregman, 1990). Auditory scene analysis involves the ability to segregate sounds that come from different sources, and integrate those that belong together. The stream segregation process relies on a variety of acoustic cues, such as sound frequency and stimulation rate. In general, sounds with similar acoustic features are heard as coming from a single source, whereas sounds that are acoustically distant from each other are perceptually segregated and perceived as originating from different sources.

In the present study, auditory event-related potentials (ERP) were used to investigate whether school-age children with AS have deficits in their auditory sound segregation abilities as compared to typically developing children. Specifically, the mismatch negativity (MMN) component of the ERP was used as the index of stream segregation. For the MMN elicitation, no behavioural response is required, and MMN can be recorded even when the subject is ignoring the sound stimuli (Näätänen et al., 1993; Paavilainen et al. 1993). Therefore, using the MMN, it is possible to study auditory stream segregation without the interfering influence of attention-based abilities.

MMN reflects early cortical stages of sound discrimination, and is elicited by perceptibly different sounds (“deviants”) embedded into a sequence of repetitive sounds (“standards”) (Näätänen et al., 1978; for a review, see Näätänen et al., 2007). In other words, the MMN is elicited when an incoming sound does not match with a sensory memory trace formed by the standard sound (Näätänen, 1992). Importantly, MMN elicited under passive conditions is associated with behavioral discrimination abilities, as its amplitude and latency to a particular sound contrast closely parallel the individual’s discrimination ability of that contrast (Amenedo & Escera, 2000; Kujala et al., 2001; Lang et al., 1990; Novitski et al., 2004).

The MMN is often recorded to occasional sound changes in a sequence of repeated standard sounds (Oddball condition; Fig. 1a). When stream segregation is studied, intervening sounds are embedded in the original oddball sequence (e.g., Sussman, 2005; Sussman et al., 2001; Winkler et al., 2003). The paradigm is set up so that the segregation of sounds to streams is a prerequisite for the detection of within-stream deviants. In other words, when the intervening sounds are far in frequency from the Oddball sequence, they segregate from each other, and the MMN is elicited by the deviant sounds in the Oddball sequence (Fig. 1c). In contrast, when the intervening sounds are near in frequency, they integrate with the Oddball sequence, and no MMN is elicited (Fig. 1b). Importantly, both in children and adults the MMN elicitation in this paradigm has been shown to correspond with the subject’s perception of the sounds as one integrated or two segregated streams in a separate behavioral study (Sussman et al., 2001; Sussman & Steinschneider, 2009). Therefore, the elicitation of the MMN can be used as a marker for determining whether stream segregation occurred.

A schematic illustration showing segments of the three experimental conditions. Rectangles represent tones; their y-axis coordinate shows the tone frequency. Different intensity values are marked with different shades of gray. Note that the pace at which...

In the present study, MMN was expected to be elicited by deviant tones in both the Oddball and Segregated conditions but not in the Integrated condition, as suggested by previous studies using the same paradigm (Sussman et al., 2001). Children with AS were expected to have diminished or absent MMN responses in the Segregated condition. This result would be consistent with less efficient stream segregation abilities.

Methods

Participants

The participants were 16 children with AS (mean age 8.10 years, range 7.0–9.10, 13 boys) and 14 typically-developing children (mean age 8.10 years, range 7.6–9.7, 12 boys). The children with AS had been diagnosed by experienced clinicians according to the ICD-10 (World Health Organization, 1993) and DSM-IV (American Psychiatric Association 1994) criteria at the Helsinki University Central Hospital or at the Helsinki Asperger Center (Dextra Medical Center). Their mean VIQ was 113 (range 90–145), and mean PIQ 106 (range 80–134) as assessed with WISC-III (Wechsler, 1991). They had no co-morbid diagnoses, and were all unmedicated at the time of testing.

A group of children without AS served as controls. They had no past or present neurological disorders, language or learning difficulties, and no emotional problems. Additionally, none of the children had close relatives diagnosed with autism spectrum disorder or other neurodevelopmental or psychiatric disorders. The mean VIQ of the control group on the WISC-III was 112 (range 96–137), and mean PIQ 109 (range 85–136).

The two groups did not differ in age (t(28)= −0.06, ns.), VIQ (t(28)= −0.27, ns.), or PIQ (t(28)= 0.47, ns.). One child with AS was left-handed, and one control child was ambidextrous; all other children were right-handed.

The study was conducted in accordance with the Declaration of Helsinki, and was approved by the Ethical Committees of the Department of Psychology, University of Helsinki, and the Helsinki University Central Hospital. Informed written consent was obtained from the parents and assent from the children.

Stimuli & Paradigm

In the Oddball condition, there was one repetitive standard sound (440 Hz, 52 dB SPL) that was randomly replaced with a deviant sound of higher intensity (67 dB SPL, p= .12) (Fig. 1a). There was always at least one standard sound in between the deviants. The stimulus onset asynchrony (SOA) was 300 ms, and the sound sequence consisted of altogether 2000 stimuli.

The Segregated condition was otherwise similar to the Oddball sequence, except that two tones were presented between the consecutive tones of the Oddball condition, resulting in an SOA of 100 ms (Fig. 1c). These intervening tones were 2637 Hz tones, occurring with a random equiprobable distribution of four intensity values (47, 57, 62, and 72 dB SPL). Note that the intensity values of the intervening tones spanned above and below the intensities of the Oddball sequence. Thus, the intensity of the Oddball sequence had neither the loudest nor the softest sounds in the sequence. The resulting two simultaneous sound streams shared the same intensity range but were separable on the basis of their frequency distance (440 Hz vs. 2637 Hz). Segregation of the sounds by frequency allows the oddball intensity (67 dB) to be detected as a deviant. The Segregated condition consisted of 6000 stimuli.

The Integrated condition was similar to the Segregated condition, except that the frequency value of the intervening tones (523 Hz) was near to that of the Oddball sequence (Fig. 1b); thus, the sounds were integrated into a single sound stream. The Integrated condition consisted of 6000 stimuli. It should be noted that although the SOA was shorter in the Segregated and Integrated conditions than in the Oddball condition, “oddball sequence” tones occurred once every 300 ms in every experimental condition (Fig. 1).

The present paradigm makes it possible to obtain an objective physiological index of stream segregation. When the sounds are integrated into a single stream, the intensity variation of the intervening tones obscures the regularity of the standard intensity tone in the Oddball stream from being detected, which prevents MMN from being elicited (Rahne et al., 2007; Sussman et al., 2001; Sussman & Steinschneider, 2006). In contrast, when the sounds segregate by frequency, the Oddball tones emerge as a separate stream, allowing the detection of the louder intensity Oddball tones and the elicitation of the MMN. Thus, when MMN is elicited by the deviant sounds in the Oddball sequence, we can conclude that these tones were physiologically segregated from the intervening tones.

Control conditions to delineate the MMN

For each experimental condition, a corresponding control condition was conducted that was identical to its respective experimental condition, except for the standard and the deviant sounds being exchanged so that the standard was 67 dB SPL, and the deviant 52 dB SPL. This way the ERPs to the 67 dB SPL tones when they were deviants in each experimental condition could be compared with the ERPs to the 67 dB SPL tones when they were standards in the corresponding control condition. The purpose of this was to control for the different stimulus effects on the obligatory responses elicited by the standard and deviant stimuli that may occur when presenting a louder sound. The oddball control condition included 500 stimuli, and the other control conditions 1500 stimuli.

The experiment was carried out in an electrically shielded and sound-attenuated chamber. During the experiment, the children were instructed to watch a self-selected soundless video and ignore the sound stimuli presented through four loudspeakers, two of which were located on the right side, and two on the left side of the TV screen where attention was directed. The children were video-monitored throughout the experiment, and accompanied by a parent if necessary. The order of presentation of the conditions was counterbalanced across participants.

ERP recording

High-density EEG (amplified by BioSemi ActiveTwo amplifiers, band pass DC–67 Hz, sampling rate 256 Hz) was recorded using a 64-location electrode cap and additional electrodes at mastoids. Eye movements were monitored with electrodes placed below and at the outer corner of both eyes.

EEG-epochs of 700 ms (including a 100-ms pre-stimulus time) were offline averaged separately for each stimulus type and condition. Epochs contaminated by artifacts causing peak-to-peak deflections exceeding 100μV in any channel were excluded. The epochs were digitally filtered with 1–20 Hz band-pass filter and baseline-corrected with respect to the 100-ms pre-stimulus period. The data were re-referenced to the average of the left and right mastoid recordings. The final data set consisted of, on the average, 197 (range 139–238) accepted deviant trials per condition for controls, and 195 (range 113–234) for children with Asperger syndrome.

Data analysis

First, given that MMN is evoked by a deviant sound, it was important to determine that there were no significant differences in the ERP responses evoked by the standard stimuli (e.g., the P1 and N2 components in children). The peaks of the standard-ERP responses obtained from the control conditions were measured. The P1 peak latency was identified at 50–150 ms, and the N2 peak latency at 200–400 ms from stimulus onset. The P1 and N2 amplitudes (integrated over 50 ms) were measured at the latencies of their individual peak amplitudes at the Fcz electrode. The significance of each peak was assessed at the Fcz electrode with a two-tailed t-test against zero. Group differences were studied by one-way ANOVAs for repeated measures.

The MMN is delineated by subtracting the ERP elicited by the standard from the ERP elicited by the deviant. These deviant-minus-standard -ERP difference waveforms were constructed for each condition using the deviant-ERP obtained from the experimental condition and the standard-ERP obtained from the control condition. The MMN peak latency was identified at 100–300 ms from stimulus onset from these waveforms. The MMN amplitudes (integrated over 50 ms) were measured at the latencies of their individual peak amplitudes at the Fcz electrode. The MMN mean amplitudes were also measured at the grand-average peak latencies at the Fcz electrode; these data were used for the testing of the statistical presence of the MMN in different conditions before group comparisons.

The statistical presence of the MMN was determined in each condition at the Fcz electrode on the difference waveforms using a two-tailed t-test to verify whether the mean amplitude was significantly different from zero. Mixed model Analysis of Variance (ANOVA) for repeated measures was used to compare amplitude across groups. The between-subjects factor was group (AS vs. Control) and the within-subjects factor was condition (Oddball vs. Segregated). An additional analysis of scalp distribution of the MMN was conducted using a mixed model ANOVA for repeated measures (Group × Condition × Electrode) including the electrodes F3, F4, Fc3, Fc4, C3, C4, Cp3, Cp4, P3, and P4. The MMN latencies were analyzed at Fcz with two-way (Group × Condition) ANOVA. The Greenhouse-Geisser correction was applied when appropriate. Newman-Keuls post hoc test was performed to calculate the sources of significant main effects and interactions.

Grand averaged ERP waveforms elicited by control tones at the Fcz electrode in children with AS (dotted line) and their controls (solid line). P1 and N2 responses to the control tones are clearly seen in the Oddball and Segregated conditions; in the latter...

A statistically significant MMN was elicited in both groups of children by deviant sounds in the Oddball condition (controls: t(13) = 5.23, p < .001; children with AS: t(15) = 5.23, p < .001), and in the Segregated condition (controls: t(13) = 3.96, p < .002; children with AS: t(15) = 5.43, p < .001), but not in the Integrated condition (controls: (t(14) = 0.99, ns., children with AS (t(16) = 0.58, ns.), which is consistent with previous child studies (e.g., Sussman et al., 2001) (Table 1, Figs. 3., ​.,4.4. & 5.). As the MMN was not significantly elicited in the Integrated condition for either group of children, this condition was not further analyzed for comparisons of MMN amplitude and latency.

Deviant-minus-standard difference waveforms at the Fcz electrode in children with AS (dotted line) and their controls (solid line). Note that these difference waveforms were constructed for each condition using the deviant-ERP obtained from the experimental...

Mean MMN amplitudes and latencies in children with AS and their controls at the Fcz electrode.

The main result was that the MMN amplitude in the Segregated Condition was significantly smaller in the group of children with AS compared with the control group (interaction between group and condition: F(1,28) = 6.11, p < .02) (Table 1, Figs. 3., ​.,4.4. & 5.). In contrast, the MMN amplitude did not significantly differ between groups in the Oddball Condition. Further, there was no significant group difference in the MMN scalp distribution (Fig. 6) or latency (Table 1).

Voltage maps showing the MMN scalp distribution (in ± 25ms time windows surrounding the grand-average peak latencies) in children with AS (right) and age-matched controls (left).

Discussion

In the present study, the ability of children with AS to segregate sound streams based on frequency separation was investigated using auditory event-related brain potentials. Previous studies have shown that the elicitation of the MMN can be used as an index of stream segregation, since being able to detect changes in the oddball sequence is dependent on successful streaming (e.g., Sussman, 2005; Sussman & Steinschneider, 2006). The main finding of the present study was a diminished MMN amplitude elicited by deviant probe tones in children with AS compared to the control children, when two simultaneous sound streams were presented, and the change detection required successful segregation of these streams (Segregated condition). This result indicates that auditory stream segregation ability is less efficient in children with AS compared to typically developing children. It is especially notable given that in the Oddball condition there were no MMN amplitude differences between the groups. This shows that children with AS do not differ from controls in the ability to passively detect intensity changes embedded in a sound sequence when stream segregation is not required. Thus, the reduced MMN amplitude to the same intensity deviants in the Segregated condition suggests that children with AS have deficits in sound discrimination when more than one sound stream is present. That is, they process complex mixtures of sounds less efficiently than their peers.

Another notable finding was that there were no significant group differences in the elicitation or amplitudes of the obligatory P1 and N2 responses. This result is important because it indicates that the group difference in MMN obtained in the Segregated condition reflects memory-related processes, and is not caused by differences in the basic auditory feature processing.

Although children with AS may be well able to perceive and process sounds in quiet conditions, deficient passive stream segregation abilities may make it difficult for them to discriminate sounds in noisy environments, that is, when multiple streams are simultaneously active. This may also result in deficient abilities to separate different voices in noisy environments, making it difficult to focus on one speaker in the room while ignoring other speakers. Furthermore, this could also impact on the ability to make sense of what is being said in a noisy environment. This may be seen, for example, in the difficulty of children with AS to attend to and understand a teacher’s instructions in noisy classroom situations.

Previous behavioral and MMN studies have suggested that the discrimination of frequency is enhanced in autism spectrum of disorders (Bonnel et al., 2003; Ferri et al., 2003; Heaton, 2003; Lepistö et al., 2005, 2006, 2007). Therefore, it is interesting to note that children with AS in the current study had difficulties in using frequency proximity as a cue for streaming. However, it has been shown that the frequency discrimination acuity has no direct correlation with the stream segregation ability (Rose & Moore, 2005; Sussman et al., 2007). For example, although 9–11 years old children are as able as adults to discriminate frequency differences, their stream segregation abilities are not yet adult-like (Sussman et al., 2007). This is probably because in addition to the precise coding of acoustic features, stream segregation involves more general perceptual and cognitive processes, such as attention, integration of prior context, and schematic knowledge (Snyder & Alain, 2007). That is, stream segregation is a higher-level complex skill perhaps affected more by experience than by the ability to discriminate sound frequencies (Sussman et al., 2007).

This preliminary study investigating complex auditory skills in children with AS used non-speech stimuli, and a paradigm in which the sounds would be clearly perceived as either one or two streams (Sussman et al., 2007). Therefore, caution should be taken in drawing conclusions, and clearly, further studies are needed. Nonetheless, the present results are consistent with the notion that deficient passive stream segregation abilities may in part contribute to difficulties in perceiving speech in noisy environments reported in individuals with AS. The finding has important implications, since speech perception rarely takes place in quiet conditions without any background noise.

Acknowledgments

We thank all the children and their families for participation. The study was supported by the Finnish Cultural Foundation, the University of Helsinki, the Academy of Finland (grant number 128840), and the National Institutes of Health (DC 06003).

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.