Crowding effect is the visibility reduction of a target when presented with neighboring distractors. It has been explained by either lateral inhibition at a pre-attentive level or coarse spatial resolution of attention. To test these theories, high-resolution fMRI was used to measure V1 response to the target in the presence or the absence of the distractors in both attended and unattended conditions. We found the cortical response to the target was not affected by the presence of distractors in the unattended condition. However, the spatial distribution of attention modulation in the target and its surrounding area depended on the crowding configuration. When distractors were placed in the same radial axis as the target, a configuration with a severe crowding effect, significant attention enhancements were observed not only in the target's and the distractors' locations, but also in regions next to the target where even no stimulus was presented. But this spread of attention enhancement did not occur when distractors were placed in the same circumference as the target, a configuration with a weak crowding effect. The pattern of interaction between attention and target-distractor configuration supports that crowding results from coarse spatial resolution of attention.

Introduction

When a target is presented with neighboring distractors, its visibility is reduced (Figure 1A). This phenomenon is known as crowding in psychophysics (Bouma, 1970; Strasburger, Harvey, & Rentschler, 1991) but is referred to as surround suppression in neurophysiology (Cavanaugh, Bair, & Movshon, 2002; Levitt & Lund, 1997). The crowding effect occurs under a wide range of conditions and tasks, including Vernier acuity (Levi, Klein, & Aitsebaomo, 1985; Williams & Essock, 1986), stereoacuity (Westheimer & Truong, 1988), orientation discrimination (Westheimer, Shimamura, & McKee, 1976), contrast discrimination (Wilkinson, Wilson, & Ellernberg, 1997), letter recognition (Pelli, Palomares, & Majaj, 2004), and face recognition (Louie, Bressler, & Whitney, 2007; Martelli, Majaj, & Pelli, 2005). Two competing theories propose different explanations for the crowding effect. One emphasizes that it is due to sensory level lateral inhibition (Banks, Larson, & Prinzmetal, 1979; Levi, Klein, & Yap, 1987) and occurs at a pre-attentive level (Pelli et al., 2004). The other proposes that attention is a key factor (Walley & Weiden, 1973; Wolford & Chambers, 1983) and that crowding could be attributed to coarse spatial resolution of attention (He, Cavanagh, & Intriligator, 1996, 1997; Intriligator & Cavanagh, 2001) or unfocussed spatial attention (Strasburger, 2005; Strasburger et al., 1991). Crowding effect has been an active research topic in visual psychophysics for more than four decades (Bouma, 1970; Stuart & Burian, 1962). However, few neuroimaging studies have been conducted to explore the underlying neural mechanism of the crowding effect and test these theories. The goal of the current study is to investigate two issues, using high-resolution fMRI: (1) How is the cortical response to the target affected by the presentation of distractors when attention is not directed to the target (unattended condition)? (2) How does attention modulate the cortical responses in crowded and non-crowded conditions at the target's and the distractors' locations? With answers to these two questions, we would be at a better position to address some long-standing questions about the crowding effect.

Example stimuli and experimental design. (A) Example stimuli used in the experiment. A target was always positioned on the right horizontal meridian. It was presented either alone or with two distractors positioned either tangentially (above and underneath the target) or radially (left and right of the target). (B) Schematic description of the experiment. Stimulus blocks were interleaved with blank intervals. A stimulus block consisted of ten trials, in which subjects were asked to perform either a luminance discrimination task at the fixation point (attend-to-fixation condition) or a contrast discrimination task to the target (attend-to-target condition).

Figure 1

Example stimuli and experimental design. (A) Example stimuli used in the experiment. A target was always positioned on the right horizontal meridian. It was presented either alone or with two distractors positioned either tangentially (above and underneath the target) or radially (left and right of the target). (B) Schematic description of the experiment. Stimulus blocks were interleaved with blank intervals. A stimulus block consisted of ten trials, in which subjects were asked to perform either a luminance discrimination task at the fixation point (attend-to-fixation condition) or a contrast discrimination task to the target (attend-to-target condition).

A total seven healthy subjects (3 male and 4 female) participated in the experiments, all of whom had extensive experience as subjects in psychophysical and fMRI experiments. They were right-handed, reported normal or corrected-to-normal vision, and had no known neurological or visual disorders. Ages ranged from 23 to 34. They gave written, informed consent in accordance with the procedures and protocols approved by the human subjects review committee of the University of Minnesota.

Stimuli and design

The stimulus elements used in the main experiment were round checkered patches with a mean luminance of 120 cd/m2 and were centered at 2.88°, 5°, and 8.33° eccentricity in the right visual field. Size and spatial frequency were scaled for cortical magnification (Duncan & Boynton, 2003), with patches subtending 1.65°, 2.59°, and 4.06° of visual angle. There were three stimulus configurations (Figure 1A): single, tangential, and radial. In the single configuration, only a 2.59° size patch was presented at 5° eccentricity, which was the target in this experiment. In the tangential configuration, two 2.59° size patches were presented immediately above and underneath the target at the same eccentricity (along the circumference). In the radial configuration, a 1.65° and 4.06° size patches were presented immediately to the left and right of the target at 2.88° and 8.33° eccentricity, respectively. These patches were the distractors in this experiment.

There were six 260-s functional scans in the main experiment. Each scan consisted of seven 20-s blank intervals and six 20-s stimulus blocks that were interleaved with each other (Figure 1B). A stimulus block contained ten trials. In each trial, two successive stimuli (single, tangential, or radial configuration) were presented for 0.3 s respectively, with a 0.4-s blank interval between them followed by a 1-s blank interval as response period. From the first stimulus to the second one, the contrasts of all the patches increased or decreased randomly and independently by 0.08, but under the constraint that their values were between 0.2 and 0.4. The phases of the patches were also independently counterphase randomized. Another difference between the two stimuli in a trial was the luminance difference in a tiny region (0.12° × 0.12°) at the center of the fixation cross. During a stimulus block, subjects were asked to fixate the cross and perform either a luminance discrimination task at the center of the fixation cross (attend-to-fixation condition) or a contrast discrimination task to the target (attend-to-target condition), depending on a task cue presented throughout the preceding 20 s blank interval. The task cue was a slight length increase of either the vertical bar (attend-to-fixation) or the horizontal bar (attend-to-target) of the fixation cross. To help the subjects localize the position of the target, a thin black circle with the same diameter as the target was presented at the target's position during all the blank intervals. This kind of position cue has been shown to have little effect on the crowding effect in the periphery (Strasburger, 2005; Wilkinson et al., 1997). The main experiment had total six combinations of attention conditions (attend-to-fixation and attend-to-target) and stimulus configurations (single, tangential, and radial), which were distributed in a functional scan (one combination per stimulus block) and were counter-balanced within subjects. The stimuli in the attend-to-target condition and the attend-to-fixation conditions were the same.

Retinotopic visual areas were defined by a standard method developed by Engel, Glover, and Wandell (1997) and Sereno et al. (1995). Five 10-Hz counterphase-flickering patches (Figure 2A) were used in a block-design scan to localize five regions of interest (ROI) (central, upper, lower, left, and right) corresponding to the locations of target and the distractors in the main experiment. Except in that they were at full contrast, these five patches for defining the ROIs were the same as those in the main experiment. The ROI scan consisted of five cycles, and each cycle consisted of six 10-s blocks for presenting five patches and a blank interval. This scan started with a 10-s blank interval. Before the subjects were scanned, they were given a 30-min training session in the psychophysics lab for practicing with the stimuli used in the main experiment.

Regions of interest. (A) Five flickering round checkered patches with a full contrast were used to define the ROIs (central, upper, lower, left and right). They occupied the same spatial extents as the target and the distractors. (B) Cortical activations by the five patches are depicted in a representative inflated brain. The red, green, blue, yellow, and light blue areas correspond to the left, central, right, lower, and upper ROIs, respectively. V1 is defined by retinotopic mapping and its boundaries are indicated by the white dashed lines.

Figure 2

Regions of interest. (A) Five flickering round checkered patches with a full contrast were used to define the ROIs (central, upper, lower, left and right). They occupied the same spatial extents as the target and the distractors. (B) Cortical activations by the five patches are depicted in a representative inflated brain. The red, green, blue, yellow, and light blue areas correspond to the left, central, right, lower, and upper ROIs, respectively. V1 is defined by retinotopic mapping and its boundaries are indicated by the white dashed lines.

The anatomical volume for each subject in the retinotopic mapping session was transformed into the AC-PC space and then inflated using BrainVoyager 2000. Functional volumes in all the sessions for each subject were preprocessed which included 3D motion correction using SPM99, linear trend removal, and high-pass (0.015 Hz) (Smith et al., 1999) filtering using BrainVoyager 2000. Head motion within any fMRI session was less than 1.5 mm for all subjects except two female subjects. The data from these two subjects were excluded from further analysis. The images were then aligned to the anatomical volume in the retinotopic mapping session and transformed into the AC-PC space. The first 14 s of BOLD signals was discarded to minimize transient magnetic-saturation effects.

A GLM (general linear model) procedure was used for ROI analysis. The five ROIs in V1 were defined as areas that responded more strongly to the corresponding flickering round patch than blank interval (p < 10−4, corrected) and confined by the V1/V2 boundaries defined by the retinotopic mapping experiment. Even with such a high statistical threshold, there were still a few voxels defined to belong to more than one ROI in some subjects. We excluded these voxels from further analyses, which meant that all the ROIs were spatially non-overlapping.

The BOLD signals induced by the stimulus blocks were calculated separately for each ROI and each subject. For each fMRI scan, the time course of MR signal intensity was first extracted by averaging the data across all the voxels within the pre-defined ROI and then normalized by the average of the last two time points of all 20-s blank intervals. The peak response in an ROI was extracted by averaging the response within a 6- to 20-s interval after the start of the stimulus block and then averaged according to different experimental conditions. Attention modulation was defined as the BOLD signal difference between the attend-to-target condition and the attend-to-fixation condition. Paired t-tests were used to evaluate BOLD signal differences between experimental conditions for each of the ROIs.

Eye movement recording

Eye movements were recorded at 60 Hz with a long distance optics module of ASL eye tracker (Applied Science Laboratories, Bedford, Massachusetts) in the 3T magnet during the experiment for four subjects and with an iView X RED eye tracker (SensoMotoric Instruments GmbH, Teltow, Germany) in a psychophysics lab for the other three subjects when they viewed the same stimuli as those in the magnet.

Results

The behavioral result (Figure 3A) shows that adding distractors induced a crowding effect and impaired the contrast discrimination to the target, especially in the radial configuration (t = 17.962, p < 0.001), although the difference between the single and tangential configurations did not reach a significant level. A hallmark of the crowding effect is its anisotropy in the radial or tangential direction. There is much less interference between items when they are arranged tangentially, rather than radially, relative to the eyes' fixation (Toet & Levi, 1992).

Behavioral and cortical responses to the target in the single, tangential and radial configurations. (A) Performance in the contrast discrimination task. (B) BOLD responses to the target with the luminance discrimination task at the fixation point. Error bars denote 1 SEM calculated across subjects.

Figure 3

Behavioral and cortical responses to the target in the single, tangential and radial configurations. (A) Performance in the contrast discrimination task. (B) BOLD responses to the target with the luminance discrimination task at the fixation point. Error bars denote 1 SEM calculated across subjects.

For the fMRI data, we first looked at the BOLD signals at the location of the target when subjects directed their attention away from the target and performed a demanding central fixation task (accuracy mean ± SEM: 0.83 ± 0.03). There was no significant difference between three stimulus configurations (single, tangential, and radial) in this unattended condition (Figure 3B).

We further investigated the attention modulation in different stimulus configurations. Figure 4 shows the BOLD signals in all ROIs in three stimulus configurations when subjects attended to either the target or the fixation. Attention modulation was defined as the BOLD signal difference between the attend-to-fixation condition and the attend-to-target condition. At the target location, we found that there was significant attention enhancement in all three stimulus configurations (all t > 3.738 and p < 0.02) but no significant difference between them (Figure 5A). However, we found significant alterations in the spatial distributions of attention modulations between the stimulus configurations. In other words, significant difference in attention modulation was found at the locations around the target. For the single configuration, the BOLD signals were enhanced by attention strongly in the left ROI (t = 4.347, p = 0.012) and moderately in the right ROI (t = 1.828, p = 0.142). However, attention had little effect in the upper and lower ROIs, even a weak suppression effect in the upper ROI.

BOLD signals in the left, right, upper, lower, and central ROIs in the single, tangential, and radial configurations when subject attended to either the fixation (left part of a panel) or the target (right part of a panel). Error bars denote 1 SEM calculated across subjects.

Figure 4

BOLD signals in the left, right, upper, lower, and central ROIs in the single, tangential, and radial configurations when subject attended to either the fixation (left part of a panel) or the target (right part of a panel). Error bars denote 1 SEM calculated across subjects.

When we compared the spatial distribution of attention modulation in the single configuration with those in the tangential and radial configurations, there were always larger attention enhancements at the locations where distractors were present. The enhancements at the locations of distractors are likely to be automatic and stimulus driven (Corbetta & Shulman, 2002). It is more interesting to examine the attention modulation at the “no-distractor present” ROIs next to the target in the tangential and radial configurations. Compared to the single configuration, the attention enhancements in the tangential configuration significantly dropped down in the left (t = 3.118, p = 0.036) and right (t = 3.623, p = 0.022) ROIs, even became a bit suppressive in the right ROI (Figure 5A, gray bars). On the other hand, the attention enhancements in the radial configuration significantly boosted up in the upper (t = 3.64, p = 0.022) and lower (t = 3.863, p = 0.018) ROIs (Figure 5A, black bars).

The overall picture of attention modulation across the locations of the target and the distractors is that, relative to the single configuration, attention spread into the neighboring regions in all directions in the radial configuration but was narrowed down to form a ridge along the circumference in the tangential configuration (Figure 4B). This effect was consistently found in all subjects.

Eye movement data demonstrated that subjects could fixate very well. Figure 6 shows the horizontal and vertical eye positions during an fMRI scan averaged across subjects. Their eye movements were small and further statistical analyses confirmed that both horizontal and vertical mean eye positions did not significantly deviate from the fixation point in both attention conditions and for all stimulus configurations (all t < 2 and p > 0.12). These results suggest that it is unlikely that our results could be significantly confounded by eye movements.

With the behavioral and fMRI data, we tried to examine the two distinct theories of the crowding effect. If the crowding effect results from pre-attentive lateral inhibition, we would observe a BOLD signal decrease at the target location by presenting the distractors, as a previous study has demonstrated (Zenger-Landolt & Heeger, 2003). However, there was no significant difference between three stimulus configurations (single, tangential, and radial) in this unattended condition. The discrepancy between our study and Zenger-Landolt and Heeger's study might be attributed to stimulus differences, the tasks subjects performed and fMRI protocol (e.g., high resolution vs. standard resolution). In another study, Arman, Chung, and Tjan (2006) varied the distance between the target and the distractors to manipulate the strength of the crowding effect. They found that this manipulation did not affect the overall V1 response to the target and the distractors, which is in line with our observation.

In this study, we measured cortical responses not only at the target location, but also at its surrounding area, which made it possible to investigate the spatial distribution of attention modulation in the crowded and non-crowded conditions. Many researchers have used a “spotlight” metaphor to describe that attention can be restricted to a small part of the visual field and that this beam of attention can move around. Psychophysical measurements of the distribution of attention within the beam suggested that the gradient of attention has a tear-drop shape, orientated along radial lines from fixation (Andersen & Kramer, 1993; LaBerge & Brown, 1989). The schematic description of attention enhancement for the single configuration stimulus (see Figure 4B) supports the claim about the shape of “attention spotlight” acquired from psychophysics.

Although there was no significant difference in attention enhancement at the target location between three stimulus configurations, we found that the spatial distribution of attention modulation was altered by crowding. Compared with the single configuration, attention enhancement spread into the neighboring regions in all directions in the radial configuration. However, this phenomenon did not occur in the tangential configuration. The radial arrangement of target and distractors is pervasive in crowding studies because it is very effective at inducing a profound crowding effect. Previous studies by He et al. (1996, 1997) suggested that spatial resolution of attention could determine the strength of the crowding effect. Intriligator and Cavanagh (2001) found that spatial resolution of attention is coarser in the radial direction than in the tangential direction, which they used to explain the radial/tangential asymmetry in the crowding effect. Similarly, Strasburger (2005) has argued that crowding is due to spatially imprecise focusing of attention, especially in the radial configuration. Either a coarse resolution of spatial attention or imprecise focusing of attention could lead to the difficulty of localizing a target. Our subjects reported this kind of difficulty in the radial configuration, even with a preceding position cue. The spread of attention enhancement in the radial configuration might reflect a coarse spatial resolution of attention and/or unfocussed attention accompanying the localization difficulty. In other words, the radial configuration made it difficult to restrict spatial attention precisely on the target and consequently attention enhancement had spread into the regions next to the target where even no stimulus was presented. Overall, our data support the attention explanation of the crowding effect.

As mentioned above, the crowding effect occurs under a wide range of conditions and tasks. It should be noted that, although our data favor the attention explanation, it is important to test if our conclusion can be generalized to other conditions in the future. Furthermore, although we did not find the BOLD response to the target can be modulated by the presence of the distractors in the unattended condition, it remains possible that a modulation, if any, can be detected by more sensitive techniques and better designs.

In summary, we found the local cortical response to a peripheral target was not affected by the presence of neighboring distractors in the unattended condition. However, crowding altered the spatial distribution of attention modulation in the target and its surrounding area. Our data showed that in the radial configuration, attention enhancement spread into the target's surrounding area regardless of whether there was a stimulus presented. This pattern of results is more consistent with that the crowding effect arises from coarse spatial resolution of attention and unfocussed spatial attention rather than sensory level lateral inhibition at a pre-attentive level.

Acknowledgments

We thank Jay Hegde for his helpful comments. This work is supported by the James S. McDonnell Foundation and NIH grant R01 EY02934. The 3T scanner at the University of Minnesota, Center for Magnetic Resonance Research was supported by NCRR P41 008079 and P30 NS057091 and by the MIND Institute.

Example stimuli and experimental design. (A) Example stimuli used in the experiment. A target was always positioned on the right horizontal meridian. It was presented either alone or with two distractors positioned either tangentially (above and underneath the target) or radially (left and right of the target). (B) Schematic description of the experiment. Stimulus blocks were interleaved with blank intervals. A stimulus block consisted of ten trials, in which subjects were asked to perform either a luminance discrimination task at the fixation point (attend-to-fixation condition) or a contrast discrimination task to the target (attend-to-target condition).

Figure 1

Example stimuli and experimental design. (A) Example stimuli used in the experiment. A target was always positioned on the right horizontal meridian. It was presented either alone or with two distractors positioned either tangentially (above and underneath the target) or radially (left and right of the target). (B) Schematic description of the experiment. Stimulus blocks were interleaved with blank intervals. A stimulus block consisted of ten trials, in which subjects were asked to perform either a luminance discrimination task at the fixation point (attend-to-fixation condition) or a contrast discrimination task to the target (attend-to-target condition).

Regions of interest. (A) Five flickering round checkered patches with a full contrast were used to define the ROIs (central, upper, lower, left and right). They occupied the same spatial extents as the target and the distractors. (B) Cortical activations by the five patches are depicted in a representative inflated brain. The red, green, blue, yellow, and light blue areas correspond to the left, central, right, lower, and upper ROIs, respectively. V1 is defined by retinotopic mapping and its boundaries are indicated by the white dashed lines.

Figure 2

Regions of interest. (A) Five flickering round checkered patches with a full contrast were used to define the ROIs (central, upper, lower, left and right). They occupied the same spatial extents as the target and the distractors. (B) Cortical activations by the five patches are depicted in a representative inflated brain. The red, green, blue, yellow, and light blue areas correspond to the left, central, right, lower, and upper ROIs, respectively. V1 is defined by retinotopic mapping and its boundaries are indicated by the white dashed lines.

Behavioral and cortical responses to the target in the single, tangential and radial configurations. (A) Performance in the contrast discrimination task. (B) BOLD responses to the target with the luminance discrimination task at the fixation point. Error bars denote 1 SEM calculated across subjects.

Figure 3

Behavioral and cortical responses to the target in the single, tangential and radial configurations. (A) Performance in the contrast discrimination task. (B) BOLD responses to the target with the luminance discrimination task at the fixation point. Error bars denote 1 SEM calculated across subjects.

BOLD signals in the left, right, upper, lower, and central ROIs in the single, tangential, and radial configurations when subject attended to either the fixation (left part of a panel) or the target (right part of a panel). Error bars denote 1 SEM calculated across subjects.

Figure 4

BOLD signals in the left, right, upper, lower, and central ROIs in the single, tangential, and radial configurations when subject attended to either the fixation (left part of a panel) or the target (right part of a panel). Error bars denote 1 SEM calculated across subjects.