The exact function of color vision for natural-scene perception has remained puzzling. In rapid serial visual presentation (RSVP) tasks, categorically defined targets (e.g., animals) are detected typically slightly better for color than for grayscale stimuli. Here we test the effect of color on animal detection, recognition, and the attentional blink. We present color and grayscale RSVP sequences with up to two target images (animals) embedded. In some conditions, we modify either the hue or the intensity of each pixel. We confirm a benefit of color over grayscale images for animal detection over a range of stimulus onset asynchronies (SOAs), with improved hit rates from 50 to 120 ms and overall improved performance from 90 to 120 ms. For stimuli in which the hue is inverted, performance is similar to grayscale for small SOAs and indistinguishable from original color only for large SOAs. For subordinate category discrimination, color provides no additional benefit. Color and grayscale sequences show an attentional blink, but differences between color and grayscale are fully explained by single-target differences, ruling out the possibility that the color benefit is purely attentional.

Introduction

The primate visual system is remarkably fast in grasping the “gist” of a complex natural scene (Biederman, 1972; Potter & Levy, 1969). Although the exact definition of what constitutes such a gist has remained elusive, Fei-Fei, Iyer, Koch, and Perona (2007) have provided a working—albeit somewhat circular—definition as the “contents of a glance.” Experimental tests on the limits of perception within a glance frequently employ detection and/or recognition tasks. For example, observers are asked whether a scene contained a given high-level category (e.g., animal, means of transportation). When scenes are presented in isolation and without postmask, humans perform such tasks near ceiling for presentation durations as short as 20 ms (Thorpe, Fize, & Marlot, 1996). In that study, manual responses were given in under 300 ms, and the earliest category-dependent signal in the event-related potential (ERP) emerged as early as 150 ms after stimulus onset. In a later study that used a forced-choice saccade task, saccades had to be conducted to the hemifield where an animal had been briefly shown before, and some participants had reaction times as short as 120 ms (Kirchner & Thorpe, 2006). These findings are not restricted to animal targets, but are also valid for inanimate items (vehicles/no vehicles; VanRullen & Thorpe, 2001). Nonhuman primates show qualitatively similarly rapid categorization and are even somewhat faster than humans (Delorme, Richard, & Fabre-Thorpe, 2000; Fabre-Thorpe, Richard, & Thorpe, 1998).

Complementary to asking for possible neural implementations (Thorpe, Delorme, & VanRullen, 2001; Thorpe & Gautrais, 1997) of rapid scene processing, two questions arise on a behavioral level: First, which features are responsible for rapid recognition, and second, how does rapid recognition relate to attention processes? Wichmann, Drewes, Rosas, and Gegenfurtner (2010) addressed the former question and particularly the role of the power spectrum in rapid animal detection. They found that a spectral cue eases animal detection without being causal. In another detailed analysis of the former question, Elder and Velisavljević (2009) investigated the role of several potential cues on visual processing in a rapid (30–120 ms) animal/no animal categorization task: two-dimensional boundary shape, texture, luminance, and color. They found that the fastest mechanisms relied on shape, while somewhat slower mechanisms integrated shapes with texture cues to become more robust. Color and luminance played virtually no role in this categorization task. Meng and Potter (2008) found similar results in an RSVP detection task with varying presentation durations (53, 107, 213, and 426 ms). Removing color information did not affect performance. In contrast, Delorme et al. (2010) investigated visual features for rapid (32-ms-presentation) animal categorization without postmask presentation in natural scenes and found a small but significant benefit of color in accuracy for responses later than 325 ms, while there was no benefit of color for the fastest responses. In addition to global image characteristics like luminance and color, they also tested the dependence of accuracy and reaction time on diagnostic animal features and target configuration. The most crucial features leading to high accuracy and speed turned out to be the presence of a typical animal posture and the area occupied by the animal (20%–30%). Wichmann, Braun, and Gegenfurtner (2006) reported an increase in performance of 2%–3% for colored as compared to grayscale pictures in a rapid animal/no animal categorization task. In monkeys and humans, color had a small but significant effect on reaction times when they had to detect food, but not when animals should be detected, and performance dropped slightly in some humans when color was removed (Delorme et al., 2000). The authors concluded that rapid identification may rely mainly on fast feed-forward processing of achromatic information in the magnocellular pathway.

In a rapid serial visual presentation (RSVP) paradigm, Yao and Einhäuser (2008) found again little effect of color on the detection of a single animal target among natural-scene distractors, though color boosted observers' confidence. In contrast, when participants were presented with two animal targets that belonged to different subordinate categories (species) within the same stream, the colored target was preferentially reported. This suggests that color, though having little effect on initial detection, plays a role for retrieval from memory. Not only retrieval from but also encoding into memory is influenced by color (Gegenfurtner & Rieger, 2000; Spence, Wong, Rusan, & Rastegar, 2006; Wichmann, Sharpe, & Gegenfurtner, 2002). Gegenfurtner and Rieger (2000) showed that color helps recognition in two ways, through adding a cue in coding at an early level and adding a cue in retrieval at a later stage. Thus they differentiated between the early, sensory influence and the later, cognitive influence of color. Although the benefit of color in early visual processing is small, it depends on the natural-scene content. If color is diagnostic for certain natural scenes (e.g., sea), it speeds up recognition without affecting accuracy (Oliva & Schyns, 2000; Rousselet, Joubert, & Fabre-Thorpe, 2005) and thus in these cases mediates rapid scene recognition. Nonetheless, the questions remain how the sensory influence of color develops over time, whether it affects detection of a superordinate category (e.g., animal) and the recognition of subordinate categories (e.g., animal species) alike, and whether or not color yields attentional benefits.

The role of attention in rapid visual processing of natural scenes has been the subject of many studies during recent years. When briefly peripherally flashed pictures had to be categorized into animals/no animals concurrently with an attentionally demanding central task, performance in both tasks did not drop as compared to single-task performance (Li, VanRullen, Koch, & Perona, 2002). Importantly, however, attention-demanding peripheral tasks like detecting a rotated L or T instead of detecting animals, led to a drop in performance, implying that animals can be detected even in the (near) absence of attention. This logic was later extended to specific subordinate classification tasks, such as gender discrimination (Reddy, Wilken, & Koch, 2004). Using a similar paradigm, Fei-Fei, VanRullen, Koch, and Perona (2005) found that grayscale pictures could also be processed very efficiently when attention was engaged elsewhere; furthermore, animal detection performance in the peripheral task was not impaired when a distractor image was shown in the periphery simultaneously with the target image at a location where either target or distractor could appear. This points out that early visual processing of natural scenes is not only nearly attention-free but also highly parallelized. This parallelization of early visual processing was also found by Rousselet, Fabre-Thorpe, and Thorpe (2002), who also used an animal/no animal categorization task showing either two pictures or one at the same time (left and/or right of central fixation) in a stream of pictures. Reaction times were the same in both conditions, and this was confirmed by category-related ERPs that emerged simultaneously (occipital: after 140 ms; frontal: after 160 ms) in both conditions and only differed after 190 ms.

When two items appear within close temporal succession in an RSVP stream, frequently an attentional blink (AB) is observed: A second target (T2) is impaired when presented in a time window of 200–700 ms after a first target (T1). This decreased detection rate is usually absent if T2 immediately succeeds T1 (“lag-1-sparing”; Raymond, Shapiro, & Arnell, 1992). Initially, in these AB paradigms, artificial items were used (Chun & Potter, 1995; Raymond et al., 1992), but more recently a number of AB studies using natural scenes have been conducted. Evans and Treisman (2005) used the AB in their experiments 4 through 7 as a tool to test attentional effects on natural-scene processing. They presented a series of 12 natural scenes for 110 ms each, two of which contained targets. Here, target categories were animals and vehicles. When both targets had to be identified by giving a subordinate category, an AB was clearly measured and was more severe when targets were of different categories than when both targets were of the same category. There was also a subtle difference between categories, since animals were in general slightly better identified than vehicles. When both targets only had to be detected without identifying, the AB disappeared for targets of the same category and was only marginally present for sequences containing targets of different categories. Another study also found this dependency on stimulus category in an AB paradigm using natural scenes, where target categories were faces and watches (Einhäuser, Koch, & Makeig, 2007). Target identification was better and the AB was shorter for faces than for watches. Since the function of color vision in human beings and monkeys is frequently associated with attentional processes (Frey, Honey, & König, 2008; Maunsell & Treue, 2006; Motter, 1994; Zhang & Luck, 2009), the question arises whether color has an impact on the timing and depth of the AB.

To investigate the role of color in rapid visual processing and in particular its relation to attention, we conducted four RSVP experiments with animals as the target category. In the first experiment, observers in each trial had to report whether there were zero, one, or two animal targets in a 2-s stream, followed by a four-alternative forced-choice subordinate classification. Streams could be either colored or grayscale. This allowed us to replicate the small but frequently significant benefit of color for single-target processing and to characterize the dependence of subordinate classification on color and the modulation of the AB by color. In the second experiment, we asked whether the observed color benefits were a consequence of color being diagnostic for animals. To this end, we inverted the hue of each pixel (roughly: red-green, blue-yellow, etc.) while keeping saturation and luminance constant. In the third experiment, we tested whether the effects of color remained the same when stimulus presentation duration was decreased to 50 ms, using the same stimuli as in Experiment 2. And in the fourth experiment, we tested the dependence of color on six different presentation durations (which were also the stimulus onset asynchronies, SOAs) to test how the dependence on color develops over time.

Methods

In total we conducted four experiments. Experiment 1 targeted the effect of natural color in images on detection, recognition, and the time course of the attentional blink. Experiment 2 aimed at dissociating the effects of color that result from color's diagnosticity for animal images from those that result from other color-related effects. Experiment 3 investigated whether the results of Experiment 2 held for shorter SOAs. Experiment 4 analyzed the detection and recognition of single targets for a larger variety of SOAs.

Stimuli

A total of 480 animal target stimuli were used from the COREL data set: animals, vehicles, and distractors (http://vision.stanford.edu/resources_links.html; Li et al., 2002). For subordinate classification, animal images were subdivided into canine (including wolves, foxes, and dogs), feline (including tigers, pumas, and leopards), avian (including all kinds of birds), and ungulate (including horses, deer, cows, and goats), with 120 of each (Figure 1A). Distractor images were taken from the same database. In Experiments 1–3, the same subset of 360 target stimuli (90 per category) was used; in Experiment 4, all 480 were used. Stimuli were 384 × 256 pixels in size. We used four conditions, which we refer to as “original color,” “color inverted,” “grayscale,” and “gray inverted.” To modify stimuli, they were first transformed in the physiologically defined DKL color space (Derrington, Krauskopf, & Lennie, 1984). DKL color space is a three-dimensional space, in which the z-axis defines the luminance (for convenience, we map the minimal displayable luminance to −0.5 and the maximal displayable luminance to +0.5) and the other axes are spanned by the differential cone excitations: the difference between L and M cones (L − M axis) and the difference between the sum of L and M cones and S cones (S − (L + M) axis).

The original image (Figure 1A) was kept unchanged. For the grayscale condition (Figure 1B), saturation in the DKL space was set to 0 (i.e., each pixel was projected on the luminance axis). For the color-inverted condition (Figure 1C), the DKL space was rotated by 180°, which results in a mapping of each hue to its opponent hue without any change in saturation or luminance. Since it is not guaranteed that the modified image can be displayed within the screen's gamut, we applied the following procedure to keep luminance and inverted hue as unaffected as possible. After the hue inversion, we determined for each pixel the maximal chroma (distance from the luminance axis in DKL space) the screen could display for the given hue. If the chroma of the pixel was at or below this maximum, the pixel remained unchanged. If the chroma of the pixel was above this maximum, it was reduced to this maximally displayable value while keeping luminance and hue unchanged (i.e., both cardinal color axes were scaled by the same factor, while the luminance axis was not scaled). On average, 7.2% ± 9.9 pixels were affected by such a reduction, and on average the scaling factor for the axes was 0.98. That is, the reduction in chroma was small and affected only a small number of pixels. For the gray-inverted condition (Figure 1D), the luminance values were inverted as follows: Luminance was mapped to the interval [0, 1] (i.e., 0.5 was added to the luminance axis of the DKL space), the square root of the resulting values was subtracted from 1, that result was squared, and it was then mapped back to [−0.5, 0.5] by subtracting 0.5.

Procedure

In Experiments 1–3, observers viewed streams of 20 natural scenes that contained either no target, one target, or two targets (Figure 1E). In Experiment 4, all streams contained either no target or one target. All images in the stream (target and distractors) were subjected to the same color conditions. Observers were asked to fixate the center of the screen and press and release a button to start each trial. After viewing each stream, observers were first asked how many animals they had seen in the preceding stream. Then they were asked to choose the animal class (if they had responded “one”) or classes in order (if they had responded “two”; Experiments 1–3) among the set of four options (feline, canine, avian, ungulate). The number of queries depended on the response, not on ground truth. That is, even if a detection was a false alarm, observers had to respond which category they had recognized, and they were not prompted for categorization if they had not detected any target.

In Experiment 1, the SOA was 100 ms and only grayscale and original-color conditions were used. For each color condition (grayscale, original), Experiment 1 included 240 streams with zero targets, 240 streams with one target, and 240 streams with two targets, 48 for each tested lag (one, two, three, four, and seven frames). This yielded a total of 1,440 (2 × 3 × 5 × 48) trials. The order of trials was randomized and the experiment split in two sessions of about equal length. In Experiment 2, the SOA was also 100 ms, but each stream was presented in all four color conditions, with 120 streams of zero targets, 120 streams with one target, and 120 streams with two targets (all at lag-2) for each condition, again yielding a total of 1,440 (4 × 3 × 120) trials that were split in two sessions of about equal length. In Experiment 3, the SOA was 50 ms and the experiment was otherwise identical to Experiment 2. In Experiment 4, six different SOAs were used: the 50 and 100 ms of the previous experiments as well as 30, 60, 90, and 120 ms. Each stream was presented in all four color conditions with 80 streams of zero targets and 80 streams with one target per color condition and per SOA, yielding 2 × 4 × 6 × 80 = 3,840 trials. They were split in three sessions of about equal length. In Experiment 1, each of the 360 target stimuli was used twice per condition (in different streams of distractors); in Experiments 2 and 3, each of the 360 target stimuli was used once per condition; and in Experiment 4, each of the 480 target stimuli was used once per color condition.

Setup

The study was conducted in a dark and sound-isolated room. Stimuli were presented on a 19.7-in. EIZO Flex Scan F77S CRT monitor set to 1024 × 768 pixel resolution at 100 Hz, located at 73 cm distant from the observer, whose head was stabilized with a chin rest and a forehead rest. The maximum luminance (white) was 66.0 cd/m2, the minimum luminance (black) was 0.11 cd/m2, and the CIE color coordinates of the monitor's guns (x, y) were (0.623, 0.344) for red, (0.287, 0.609) for green, and (0.151, 0.065) for blue. Stimuli spanned 11.6° × 7.8° on a gray background. Before each trial, a gray fixation screen with a black fixation cross was presented.

Since the design for all experiments was “within-subject” for all variables of interest, all analyses treated observers as repeated measures. For analyses of more than one factor or more than two levels per factor, a repeated-measures ANOVA was used. For post hoc pairwise comparisons and for factors with two levels, paired t tests were used.

Two types of analysis have to be distinguished, hereafter “detection” and “recognition.” Detection refers to the question whether the number of targets the observer reported corresponded to the number of targets present in the stream. We tested results for zero-target, one-target, and two-target trials separately, and refer to the relevant variables by standard signal-detection-theory terms.

For the first part of analysis in all experiments, we considered only single-target and no-target trials. For zero-target streams, we defined the report of any target (one or two) as a false alarm. For single-target streams there are two possible errors: the report of no target or the report of two targets. Although the latter was rare (see Appendix A for each individual's 3 × 3 matrix of all possible truth/response combinations), we performed the analysis for both definitions: at least one target reported and exactly one target reported. For the computation of d′ (computed as the difference between the z-scored hit and false-alarm rates; Macmillan, 1993), we used the former definition. In Experiment 4, only zero or one target was possible, so that the hit and false-alarm rates are unambiguously defined.

In recognition, we asked whether the target was correctly identified according to the four available categories. Most analysis is based on “recognition given detection,” that is, refers only to trials in which the target or targets were detected. In one-target streams for which two targets were reported, the target was counted as recognized if at least one of the two responses matched the target category. When analyzing recognition for two-target streams in which exactly one target was detected and T1 was of the same category as T2, it is impossible to infer from the response whether T1 or T2 was recognized (as both require the same response). For this particular analysis, we therefore excluded trials for which T1 and T2 were from the same category.

Results

Detection of single targets

For a first analysis, we consider zero-target and single-target trials (Figure 2; Tables 1 and 2). In Experiments 1 and 2, with an SOA of 100 ms, color sequences had more hits (Figure 2A; Appendix B) and fewer false alarms (Figure 2B; Appendix B) than grayscale sequences. The difference in hit rates was in the typically observed range, no matter whether hits were defined as response 1 or 2 in one-target trials (Experiment 1: 4.1% ± 3.8%; Experiment 2: 3.1% ± 2.3; all data mean ± SD) or as response of exactly 1 (Experiment 1: 4.8% ± 3.0%; Experiment 2: 4.0% ± 2.2%). Qualitatively, the same held for Experiment 3 (SOA: 50 ms), with a difference of 4.6% ± 4.4% (response > 0) or 5.1% ± 3.8% (response = 1) between color and grayscale. In Experiment 4, where hit rates are well defined, as only responses 0 and 1 are possible, color had higher hit rates than grayscale for all SOAs (Figure 2A, right). This qualitative benefit for color is also reflected in the separability (d′), which combines hits and false alarms into a measure of performance (Table 2; Figure 2C): The value of d′ for color sequences is larger than for grayscale sequences across all conditions (Figure 3). In Experiments 2–4 we had two additional conditions: color inverted and gray inverted. The gray-inverted images have fewer hits, more false alarms, and consequently a smaller d′ for all conditions tested (Figures 2 and 3, gray). The color-inverted condition (Figures 2 and 3, red) shows a more mixed pattern: For small SOAs (Experiment 3 and the short SOAs of Experiment 4) it tends to be close to the grayscale condition, while for larger SOAs (Experiment 2 and the long SOAs of Experiment 4) it tends to be close to the color condition.

Detection in zero-target and single-target trials. (A) Hit rate for Experiments 1–3 (left, sorted by SOA) and the different SOAs of Experiment 4. Different colors code different conditions (blue: original color; red: color inverted; black: grayscale; gray: gray inverted). The left panel defines hits as any response to single-target trials (response 1 or 2), the middle panel as an exact response (response 1). For Experiment 4, there was no two-target option. (B) False alarms. Notation as in (A). Left panel: any false alarm (response 1 or 2); middle panel: single false alarm (response 1). (C) Value of d′ as computed from z-scored hit and false-alarm rates of (A) and (B); the “ >0” definition of hits and false alarms is used for this computation. Error bars are mean and standard error of the mean across observers.

Figure 2

Detection in zero-target and single-target trials. (A) Hit rate for Experiments 1–3 (left, sorted by SOA) and the different SOAs of Experiment 4. Different colors code different conditions (blue: original color; red: color inverted; black: grayscale; gray: gray inverted). The left panel defines hits as any response to single-target trials (response 1 or 2), the middle panel as an exact response (response 1). For Experiment 4, there was no two-target option. (B) False alarms. Notation as in (A). Left panel: any false alarm (response 1 or 2); middle panel: single false alarm (response 1). (C) Value of d′ as computed from z-scored hit and false-alarm rates of (A) and (B); the “ >0” definition of hits and false alarms is used for this computation. Error bars are mean and standard error of the mean across observers.

To quantify these effects statistically, for the experiments with more than two conditions (2–4) we first tested whether the factor color condition had an effect at all by means of a repeated-measures ANOVA (in Experiment 4 with SOA as an additional factor). For hits (in either definition), false alarms, and d′ we find main effects of condition in all experiments (Table 1). In Experiment 4, we additionally find a main effect of SOA for hits and d′ (though not for false alarms), but no interaction between condition and SOA (Table 1). This allowed us to perform post hoc tests for all experiments and each SOA level in Experiment 4, as to which color conditions differ from each other in terms of hits, false alarms, and/or d′. In the remainder of the main text we will focus on d′; hit and false-alarm data are analyzed in Appendix B.

When considering d′ as a performance measure that combines over hits and false alarms and is thus insensitive to subjective criteria, the difference between color and grayscale sequences increases monotonically up to 100 ms (Figure 3) and becomes significant at 90 ms and above (Table 2). This indicates that there is a benefit induced by color that increases with increasing SOA, at least up to 100 ms.

To address whether the performance benefit derives from color being diagnostic for animal scenes, we included the color-inverted images in Experiments 2–4. For SOAs of 90 and 100 ms (Experiment 4), where color already excels over grayscale, the color-inverted sequences yield significantly worse performance than the original color sequences (Figure 3; Table 2). Only for the longest SOAs (100 ms in Experiment 2 and 120 ms in Experiment 4), color-inverted sequences yield (or trend to yield) better performance than grayscale and become indistinguishable from original color.

Performance in the gray-inverted condition is—with the exception of an SOA of 60 ms in Experiment 4, where it is indistinguishable from grayscale and color inverted—significantly worse than in any other condition (all t(7) > 3.02, all p < 0.02). As the target is clearly identifiable in these images if viewing time is infinite, the gray-inverted condition verifies that even at the largest SOAs tested, detection is not yet trivial (i.e., it is not equivalent to prolonged viewing).

In sum, the benefit of color for detection increases with increasing SOA (Figure 3, blue), but only at large SOAs can a similar benefit be observed for color-inverted images (Figure 3, red). This suggests that at short SOAs, the color benefit results from mechanisms that require the correct hue (e.g., the hue being diagnostic for target images), while for longer SOAs other mechanisms, which only require color contrasts to be intact, may come into play.

Color and grayscale targets both induce an attentional blink

In Experiment 1, we tested two-target streams at a variety of lags (1, 2, 3, 4, 7) between targets. When analyzing color and grayscale sequences separately (Appendix C), we find reduced performance for short lags, the attentional blink (AB). Performance is worst at lag 1; that is, we do not observe lag-1 sparing (Figure 4A). This absence of lag-1 sparing also holds when only trials with T1 and T2 from the same category are considered, ruling out the possibility that it results from the dissimilarity between categories.

If the detection of one target in a two-target stream were independent from the detection of the other, the probability of detecting both targets would equal the square of the single-target hit rate. Using this baseline, we find a significant AB at lag 2 in Experiment 2 (Figure 4B) and Experiment 3 (Figure 4C) for all color conditions (Appendix C). Hence, there is an attentional blink (without lag-1 sparing) for lags 1, 2, and 3 for any color condition and for short (50 ms) and long (100 ms) SOAs.

Any effect of color on the attentional blink is explained by single-target performance alone

When testing the two-target detection rate at each lag, there seems an apparent effect of color: Detection performance is better for color conditions at all lags (all ts > 2.7, all ps < 0.03; Figure 4A). Similarly, there is a main effect of color condition on the two-target hit rate in Experiment 2, F(3, 21) = 59.27, p = 2.0 × 10−10 (Figure 4B), and in Experiment 3, F(3, 21) = 3.96, p = 0.022 (Figure 4C). This raises the question whether there is an attentional benefit of color or whether this effect can solely be explained by differences in single-target performance. To answer this question, we subtracted the baseline, defined by the individual's squared single-target hit rate in the respective color condition, from the plain two-target hit rate. For Experiment 1, we find that these baseline-corrected data do not show differences between color and grayscale at any lag (all ts < 0.69, all ps > 0.21; Figure 4D). Similarly, there is no main effect of color condition on two-target detection performance after baseline correction in Experiment 2, F(3, 21) = 1.18, p = 0.34 (Figure 4E) or Experiment 3, F(3 ,21) = 1.14, p = 0.36 (Figure 4F). Consequently, while we find an attentional blink for two-target detection in any color condition, we do not observe any effect of color in addition to what single-target detection performance had predicted.

Color effects in single-target recognition are explained by detection performance

Besides the mere detection of animals in a sequence of distractors, we also tested the capability of observers to identify the subordinate category. Of all single-target trials in Experiment 1, the subordinate animal category is correctly identified in 85.8% ± 6.1% of grayscale and 89.3% ± 4.7% of color sequences, with a significant benefit of color, t(7) = 3.30, p = 0.01. Similarly, there is a significant effect of color condition on recognition in Experiment 2, F(3, 21) = 56.95, p = 2.9 × 10−10, and in Experiment 3, F(3, 21) = 26.79, p = 2.26 × 10−7. In Experiment 4, a 6 × 4 repeated-measures ANOVA reveals a main effect of SOA, F(5, 35) = 265.31, p < 10−20, and of color condition, F(3, 21) = 140.37, p = 4.72 × 10−14, but no interaction, F(15, 105) = 1.71, p = 0.059. In line with the absence of an interaction, color condition has an effect on recognition at each SOA (Fs > 18.67, ps < 3.90 × 10−6), and in turn, SOAs have an effect on recognition in all color conditions (Fs > 118.3, ps < 10−20).

This analysis considered recognition unconditionally; that is, it compared raw recognition rates independent of whether the target was at all detected. However, when only considering single-target trials in which the target is correctly detected (recognition given detection), subordinate recognition is indistinguishable between grayscale and color (Figure 5). In Experiment 1, 94.8% ± 3.2% of grayscale and 94.4% ± 3.4% of color sequences in which a target is correctly detected have the target also correctly identified, t(7) = 0.48, p = 0.65. For Experiments 2 and 3, there are still main effects of color condition on this recognition-given-detection performance—Experiment 2: F(3, 21) = 27.4, p = 1.9 × 10−7; Experiment 3: F(3, 21) = 16.37, p = 1.02 × 10−5—but this is solely explained by the difference between the gray-inverted category and all other categories (Appendix D; Figure 5). For the recognition-given-detection analysis in Experiment 4, there is a main effect of SOA, F(5, 35) = 26.93, p = 4.46 × 10−11, and of color, F(3, 21) = 62.54, p = 1.22 × 10−10, but no interaction, F(15, 105) = 1.0, p = 0.46. This main effect of color also results almost entirely from the difference between gray-inverted and all other conditions (Appendix D; Figure 5). In general, once animal detection has been successful, color has no additional effect on subordinate animal categorization. In contrast, the polarity of luminance has an effect.

Recognition given detection. Number of single-target trials for which the correct category was reported divided by the number of single-target trials in which at least one target was reported (hits). Colors as in Figure 2; bars denote standard error of the mean across observers.

Figure 5

Recognition given detection. Number of single-target trials for which the correct category was reported divided by the number of single-target trials in which at least one target was reported (hits). Colors as in Figure 2; bars denote standard error of the mean across observers.

For the data of Experiment 1, we tested whether lag and/or color has an effect on recognition performance in trials in which two targets are correctly detected by means of a two-way (2 color conditions × 5 lags) repeated-measures ANOVA. We find a main effect of lag on the probability that both targets were correctly identified, F(4, 28) = 4.62, p = 0.0055, but no effect of color condition, F(1, 28) = 0.0017, p = 0.97, and no interaction, F(4, 28) = 1.04, p = 0.40. This shows that there is an attentional blink for recognition on top of that for detection, but no additional effect of color. Once detection partially fails, subsequent recognition is insensitive to color or the attentional blink, which also holds for Experiments 2 and 3 (Appendix E).

Discussion

The present study shows that color is beneficial for rapid scene perception (“color benefit”). Targets in rapidly presented sequences of images are slightly easier to detect when sequences are in color as compared to grayscale, which is qualitatively and quantitatively in line with several earlier reports (Delorme et al., 2010; Wichmann et al., 2006). We find that this sensory color benefit increases monotonically with presentation time up to about 100 ms. For short presentation times, the color benefit requires the hue to be intact, pointing out that color being diagnostic for images containing an animal may be the dominant effect driving the color benefit for short SOAs. For longer SOAs, hue-modified images tend to approach original-color performance, suggesting that a general benefit of color as such, possibly related to a segmentation process, comes into play. Color does not aid performance in naming subordinate animal categories, provided detection of the category “animal” was successful. Finally, color has no influence on the characteristic of the attentional blink beyond the effects explained by single-target trials alone. Together, these results suggest a preattentive, rather than attentional, source of the color benefit.

While some previous studies have also found an effect of color on performance in rapid detection (Delorme et al., 2010; Wichmann et al., 2006), others have not (Elder & Velisavljević, 2009; Meng & Potter, 2008). In an RSVP paradigm, Meng and Potter (2008) instructed participants about the target with short descriptions of the scene to expect and found no effect of color on the detection of the scene for a wide range of SOAs (53–426 ms) and for normal and impoverished viewing conditions alike. One possible explanation for the absence of an effect could be that color might be beneficial only for broad categories (like animals), not for more detailed descriptions, especially if they implicate a spatial relation (like the “businessmen at table” example in Meng & Potter, 2008). In contrast to the present study and Meng and Potter's, Elder and Velisavljević (2009)—who did not find an effect of color—did not use sequences of images but instead used masked presentations of isolated images. Depending on the exact design of the mask, the colored mask might be relatively more effective than a grayscale mask as compared to the difference between the temporally adjacent color or grayscale frames in RSVP. Whether the color benefit extends from animals to other categories, whether it depends on the richness of the instruction, whether it depends on whether the instructions imply spatial relations, and whether there is a fundamental difference between RSVP and isolated masked images are interesting issues for further research.

When detecting targets in complex backgrounds, the separation of figure from ground is an important role for segmentation processes. The interpretation that color facilitates figure–ground segmentation has been proposed by other studies, suggesting this mechanism as an early contribution of color to visual processing (Gegenfurtner & Rieger, 2000; Skiera, Petersen, Skalej, & Fahle, 2000). Wurm, Legge, Isenberg, and Luebker (1993) found that color improved accuracy in recognition of food targets irrespective of its diagnosticity of the target object, which points to a rapid low-level contribution of color to object recognition. This early contribution of color in visual processing and particularly in figure–ground segmentation has also been shown in an fMRI study where activity related to figure–ground segmentation in checkerboards by color, luminance, and motion stimuli was already found in the primary visual cortex (Skiera et al., 2000). For long SOAs, performance in color-inverted sequences trends more towards original-color performance than towards grayscale performance (Figure 3). Since segmentation in natural scenes benefits from chromatic boundaries (Hansen & Gegenfurtner, 2009), which are unaffected by our hue inversion, it is conceivable that for long SOAs, color aids detection by fostering segmentation independent of hue being diagnostic for the target category.

Although we focus on sensory aspects and find an effect of color on detection but not on recognition or attention, our results do not contradict the notion that color also plays a prominent role in later stages of visual processing. It has already been proposed that color aids visual processing not primarily during detection but in the later stages, when—for example—memory has to be accessed (Yao & Einhäuser, 2008). This prominent role of color in encoding and retrieval in recognition memory paradigms has been shown in several studies (Gegenfurtner & Rieger, 2000; Spence et al., 2006; Wichmann et al., 2002) and has typically exceeded the comparably subtle effect in rapid visual categorization tasks (Delorme et al., 2000, 2010; Wichmann et al., 2006).

Unlike the original characterization of the AB (Raymond et al., 1992), we do not observe lag-1 sparing in our Experiment 1 for color nor for grayscale sequences. Visser, Zuvic, Bischof, and Di Lollo (1999) demonstrated that lag-1 sparing occurred only when T1 and T2 were at the same location, a condition that may be violated in complex scenes where the target does not fill the full image. In addition, lag-1 sparing decreases for lower similarity between T1 and T2 (Visser, Davis, & Ohan, 2009). On a basic feature level, animal targets can be rather dissimilar, and the subordinate categorical similarity does not seem to be of relevance for lag-1 sparing in our experiment (detection at lag 1 was virtually identical, no matter if T1 and T2 were of the same or different categories). Finally, there are a number of other conditions under which lag-1 sparing is not observed, for example, when no short-term consolidation takes place (Dell' Acqua, Jolicœur, Pascali, & Pluchino, 2007) or when T1 is masked (Martin & Shapiro, 2008). So while lag-1 sparing is widely considered a hallmark of the AB, there are AB conditions in which no lag-1 sparing is observed, and therefore lag-1 sparing is not necessarily considered indicative of an AB effect (MacLean & Arnell, 2012).

In our paradigm, responses were unspeeded and we did not measure reaction times. Hence, we cannot fully rule out the possibility that for some targets, especially at short SOAs, color could have sped up the responses without affecting accuracy, as has been reported earlier (Oliva & Schyns, 2000; Rousselet et al., 2005). Since decreased reaction times are associated with increased confidence (Henmon, 1911), such speeding up could, however, also be related to increased subjective confidence for color as compared to grayscale sequences, which has been reported earlier even in the absence of a performance difference (Yao & Einhäuser, 2008). It should be noted, however, that our results of increased hit rates cannot be explained by a shift in criteria towards more liberal responses, since false-alarm rates were indistinguishable between conditions.

Raw data, hit rates, and false-alarm rates. All hit and false-alarm rates for all experiments, conditions, and SOAs. Within each experiment, the same color denotes the same observer. Responses 1 and 2 are counted as false alarms and hits for this representation. If data points were exactly overlapping, the covered data point was moved horizontally out of the axes and connected to its original location with a thin line.

Figure 6

Raw data, hit rates, and false-alarm rates. All hit and false-alarm rates for all experiments, conditions, and SOAs. Within each experiment, the same color denotes the same observer. Responses 1 and 2 are counted as false alarms and hits for this representation. If data points were exactly overlapping, the covered data point was moved horizontally out of the axes and connected to its original location with a thin line.

Whether detection and recognition are based on the same underlying process is a matter of debate. Grill-Spector and Kanwisher (2005) found the same performance and reaction times in detection and basic-level categorization tasks and concluded that figure–ground segmentation leading to detection and basic-level categorization are closely linked and mediated by one mechanism. In turn, this hypothesis has been challenged by subsequent studies that found better performance in detection than in categorization in basic-level categorization tasks (Bowers & Jones, 2008; Mack, Gauthier, Sadr, & Palmeri, 2008; Mack & Palmeri, 2010). It has been shown that both mechanisms can be selectively manipulated (Mack et al., 2008; Mack & Palmeri, 2010), and thus there is no intrinsic link between them. Here we find—consistent over all experiments and SOAs—that color has little influence on recognition, once the target has been detected. In contrast, the gray-inverted condition shows that luminance information influences recognition for detected targets. Since there are more false alarms for the gray-inverted than for any other condition, this result could still be explained by an increased number of guesses within the population of hits and thus decreased recognition performance. However, a similar effect is observed in all conditions of Experiment 4 when decreasing SOAs, which does not affect false-alarm rates. Decreasing SOAs not only reduces performance (in terms of d′) but also further reduces the fraction of correctly recognized targets among the correctly detected ones. This argues against entirely overlapping mechanisms for detection and recognition. Nonetheless, even for short SOAs and the gray-inverted condition, recognition for detected targets is clearly above chance (>60%, with chance level at 25%). This offers an alternative explanation that, at least for high performance at larger SOAs, might contribute to the strong coupling between detection and recognition: For difficult targets (those only detected at large SOAs), the report of a detection may depend on some subordinate recognition. This view is supported by the conservative criterion nearly all observers apply for detection under difficult conditions (Figure 6). Whether distinct mechanisms or not, however, our data clearly show that color influences recognition and detection alike, such that once a target has been detected, the probability of it being correctly recognized does not depend on the presence of color.

Acknowledgments

This work was supported by the German Research Foundation (DFG; grants: EI 852/1, EI 852/3, IRTG-1901, and SFB/TRR135).

With a few exceptions, observers in all experiments show consistent patterns with respect to their performance. In general, all are conservative (making more misses than false alarms), and observers with comparably liberal criteria tend to remain so across all conditions (Figure 6). Considering all nine combinations of ground truth and response for Experiments 1–3, the incidence of double false alarms is small (Figure A1) and—except for a few individuals and conditions—so are false alarms in single-target trials (i.e., two reported targets where one is correct).

Raw data, all combinations of ground truth and response. For Experiments 1–3, there were nine combinations of correct responses (truth) and actual responses (response). For each individual, experiment, and condition, the raw percentage of responses for the respective truth are color-coded and provided (i.e., columns sum to 100%). The large matrix on the top right defines signal detection theory (SDT) terms. Note that “hit(*)” contains 1 hit and 1 false alarm (truth 1, response 2) and yields the two different definitions of hit used in the article.

Figure A1

Raw data, all combinations of ground truth and response. For Experiments 1–3, there were nine combinations of correct responses (truth) and actual responses (response). For each individual, experiment, and condition, the raw percentage of responses for the respective truth are color-coded and provided (i.e., columns sum to 100%). The large matrix on the top right defines signal detection theory (SDT) terms. Note that “hit(*)” contains 1 hit and 1 false alarm (truth 1, response 2) and yields the two different definitions of hit used in the article.

In all experiments, we find main effects of condition for hits (in either definition) and for false alarms, while in Experiment 4 we additionally find a main effect of SOA for hits though not for false alarms (Table 1) .There are significantly more hits (in either definition) for color than for grayscale images across all experiments and SOAs, with the exception of the shortest SOA (30 ms) in Experiment 4 (Table 3). Interestingly, such a difference cannot be identified for false alarms (Table 4). With respect to hit rates, the color-inverted condition is different from grayscale and indistinguishable from original color at the 100-ms SOA of Experiment 2 when hits are defined as reporting at least one target in one-target trials (Table 3). If we instead restrict hits to correct responses (response = 1 for one-target streams), the picture reverses and now the color-inverted condition is indistinguishable from grayscale but yields significantly fewer hits than the original-color condition. For the 50 ms of Experiment 3, this reversed pattern holds for either definition. This underlines the importance of conducting Experiment 4, where zero targets and one were the only possible response options. Considering hits alone, the color-inverted condition is indistinguishable from gray for all SOAs of Experiment 4 and different from color for 30, 50, and 90 ms (Table 3). In the gray-inverted condition, significantly fewer hits than in any other condition in all experiments can be observed: response > 0: all t(7) > 2.74, all ps < 0.03; for response = 1: all t(7) > 2.81, all ps < 0.026.

Appendix C: Detailed analysis of the attentional blink

In the separate analysis of color and grayscale sequences, we find a significant main effect of lag on the probability that both targets are detected: gray: F(4, 28) = 18.63, p = 1.4 × 10−7; color: F(4, 28) = 33.26, p = 2.9 × 10−10). There is a monotonic increase in performance up to lag 4 (Figure 4A). Pairwise post hoc tests show in both color conditions that lags 1, 2, and 3 are significantly different from lag 7, all ts > 2.5, all ps < 0.05, while lag 4 is not different from lag 7: gray: t(7) = 0.48, p = 0.64; color: t(7) = 1.05, p = 0.33. Using the squared single-target hit rate of the respective color category as a baseline, lags 1, 2, and 3 are different from this baseline, all ts > 2.5, all ps < 0.05, while lags 4 and 7 are indistinguishable from the baseline, all ts < 2.01, all ps > 0.08. Similarly, in Experiments 2 and 3, where we tested only lag 2, there is a significant difference between the two-target hit rate and the baseline in each color condition: Experiment 2: all ts > 2.4, all ps < 0.04, Figure 4B; Experiment 3: all ts > 3.24, all ps < 0.014, Figure 4C.

In Experiments 2 and 3, the main effect of color condition on recognition-given-detection performance is solely explained by the gray-inverted category: While the gray-inverted condition is different from all other conditions—Experiment 2: ts > 6.06, ps < 5.1 × 10−4; Experiment 3: ts > 4.22, ps < 0.0039—there are no pairwise differences between any of the other conditions: Experiment 2: all ts < 1.71, ps > 0.13; Experiment 3: ts < 1.66, ps > 0.14. In Experiment 4, the main effects of SOA and color on this measure also result almost entirely from the difference between the gray-inverted and all other conditions, all ts > 2.99, all ps < 0.02, with the exception of the difference between gray inverted and color at an SOA of 100 ms, which is not significant, t(7) = 1.54, p = 0.17, and the difference between grayscale and color inverted at an SOA of 120 ms, which is significant, t(7) = 2.90, p = 0.023.

Detection in zero-target and single-target trials. (A) Hit rate for Experiments 1–3 (left, sorted by SOA) and the different SOAs of Experiment 4. Different colors code different conditions (blue: original color; red: color inverted; black: grayscale; gray: gray inverted). The left panel defines hits as any response to single-target trials (response 1 or 2), the middle panel as an exact response (response 1). For Experiment 4, there was no two-target option. (B) False alarms. Notation as in (A). Left panel: any false alarm (response 1 or 2); middle panel: single false alarm (response 1). (C) Value of d′ as computed from z-scored hit and false-alarm rates of (A) and (B); the “ >0” definition of hits and false alarms is used for this computation. Error bars are mean and standard error of the mean across observers.

Figure 2

Detection in zero-target and single-target trials. (A) Hit rate for Experiments 1–3 (left, sorted by SOA) and the different SOAs of Experiment 4. Different colors code different conditions (blue: original color; red: color inverted; black: grayscale; gray: gray inverted). The left panel defines hits as any response to single-target trials (response 1 or 2), the middle panel as an exact response (response 1). For Experiment 4, there was no two-target option. (B) False alarms. Notation as in (A). Left panel: any false alarm (response 1 or 2); middle panel: single false alarm (response 1). (C) Value of d′ as computed from z-scored hit and false-alarm rates of (A) and (B); the “ >0” definition of hits and false alarms is used for this computation. Error bars are mean and standard error of the mean across observers.

Recognition given detection. Number of single-target trials for which the correct category was reported divided by the number of single-target trials in which at least one target was reported (hits). Colors as in Figure 2; bars denote standard error of the mean across observers.

Figure 5

Recognition given detection. Number of single-target trials for which the correct category was reported divided by the number of single-target trials in which at least one target was reported (hits). Colors as in Figure 2; bars denote standard error of the mean across observers.

Raw data, hit rates, and false-alarm rates. All hit and false-alarm rates for all experiments, conditions, and SOAs. Within each experiment, the same color denotes the same observer. Responses 1 and 2 are counted as false alarms and hits for this representation. If data points were exactly overlapping, the covered data point was moved horizontally out of the axes and connected to its original location with a thin line.

Figure 6

Raw data, hit rates, and false-alarm rates. All hit and false-alarm rates for all experiments, conditions, and SOAs. Within each experiment, the same color denotes the same observer. Responses 1 and 2 are counted as false alarms and hits for this representation. If data points were exactly overlapping, the covered data point was moved horizontally out of the axes and connected to its original location with a thin line.

Raw data, all combinations of ground truth and response. For Experiments 1–3, there were nine combinations of correct responses (truth) and actual responses (response). For each individual, experiment, and condition, the raw percentage of responses for the respective truth are color-coded and provided (i.e., columns sum to 100%). The large matrix on the top right defines signal detection theory (SDT) terms. Note that “hit(*)” contains 1 hit and 1 false alarm (truth 1, response 2) and yields the two different definitions of hit used in the article.

Figure A1

Raw data, all combinations of ground truth and response. For Experiments 1–3, there were nine combinations of correct responses (truth) and actual responses (response). For each individual, experiment, and condition, the raw percentage of responses for the respective truth are color-coded and provided (i.e., columns sum to 100%). The large matrix on the top right defines signal detection theory (SDT) terms. Note that “hit(*)” contains 1 hit and 1 false alarm (truth 1, response 2) and yields the two different definitions of hit used in the article.