Spatial filtering models are currently a widely accepted mechanistic account of human lightness perception. Their popularity can be ascribed to two reasons: They correctly predict how human observers perceive a variety of lightness illusions, and the processing steps involved in the models bear an apparent resemblance with known physiological mechanisms at early stages of visual processing. Here, we tested the adequacy of these models by probing their response to stimuli that have been modified by adding narrowband noise. Psychophysically, it has been shown that noise in the range of one to five cycles per degree (cpd) can drastically reduce the strength of some lightness phenomena, while noise outside this range has little or no effect on perceived lightness. Choosing White's illusion (White, 1979) as a test case, we replicated and extended the psychophysical results, and found that none of the spatial filtering models tested was able to reproduce the spatial frequency specific effect of narrowband noise. We discuss the reasons for failure for each model individually, but we argue that the failure is indicative of the general inadequacy of this class of spatial filtering models. Given the present evidence we do not believe that spatial filtering models capture the mechanisms that are responsible for producing many of the lightness phenomena observed in human perception. Instead we think that our findings support the idea that low-level contributions to perceived lightness are primarily determined by the luminance contrast at surface boundaries.

Introduction

It has been known for a long time that the luminance ratio, or contrast, at edges (i.e., steps in luminance) is of crucial importance for the perception of lightness1 (e.g., Cornsweet, 1970; Wallach, 1948; Whittle, 1994; also see Gilchrist 2006, for a historical overview). At the same time, mid- to high-level factors appear also to be important for lightness perception (e.g., Bloj & Hurlbert, 2002; Gilchrist, 1980; Knill & Kersten, 1991; Mach, 1886; Maertens, Wichmann, & Shapley, 2015; Radonjic, Todorovic, & Gilchrist, 2010). It is therefore unlikely that early visual processes alone will provide a complete account of lightness perception, but it remains an open question to what extent low- versus high-level visual mechanisms contribute to different lightness phenomena. We think that understanding their relative contributions to the computation of lightness is crucial, because low-level visual mechanisms provide the input to mid- and high-level vision.

One popular class of low-level models of lightness perception are the multiscale spatial filtering models (Blakeslee & McCourt, 1999; Dakin & Bex, 2003; Otazu, Vanrell, & Párragam 2008; Robinson, Hammon, & de Sa, 2007). They attempt to explain lightness phenomena as the result of a spatial filtering operation where the model output is a weighted sum of spatial frequency (and sometimes orientation) tuned filter responses. The weights are determined through contrast or response energy normalization mechanisms, the details of which differ from model to model.

As Kingdom (2011, p. 661) pointed out, “However appealing is the idea that contrast normalization is responsible for many brightness errors, there is at present little actual evidence for it. Experiments that manipulate the amount of contrast normalization, perhaps via adaptation or masking, in order to test whether the predicted changes in the magnitude and direction of brightness errors occur, would be welcome.”

Salmela and Laurinen (2009) took an important step in the direction suggested by Kingdom (2011). They generated versions of simultaneous brightness contrast, White's illusion (White, 1979; see Figure 1), and the Benary cross (Benary, 1924), and masked them with narrowband noise. Then they measured the effect of the noise's spatial frequency and orientation on perceived brightness. They found that noise with a narrow frequency band (between 1 and 5 cycles per degree [cpd]) had a strong masking effect on the illusions, meaning that the apparent brightness differences were considerably reduced, whereas noise at higher or lower frequencies had little effect (see Figure 2). They concluded that their results “suggest that filters below 1 cpd and above 5 cpd are not used in the computation of surface brightness” (Salmela & Laurinen, 2009, p. 688).

An example of White's illusion. To most observers, the gray patch on the dark bar looks lighter than the gray patch on the light bar, even though the two are equiluminant. The illusion cannot simply be explained in terms of contrast, because both test patches share an equal amount of border with dark and light regions.

Figure 1

An example of White's illusion. To most observers, the gray patch on the dark bar looks lighter than the gray patch on the light bar, even though the two are equiluminant. The illusion cannot simply be explained in terms of contrast, because both test patches share an equal amount of border with dark and light regions.

Illustration of the effect of narrowband noise on White's illusion. Left: Stimulus is masked with a noise center frequency of 0.58 cpd, Middle: 3 cpd, Right: 9 cpd (assuming a viewing distance of 40 cm). White's effect should be reduced or absent in the middle panel.

Figure 2

Illustration of the effect of narrowband noise on White's illusion. Left: Stimulus is masked with a noise center frequency of 0.58 cpd, Middle: 3 cpd, Right: 9 cpd (assuming a viewing distance of 40 cm). White's effect should be reduced or absent in the middle panel.

We replicated and extended Salmela and Laurinen's (2009) results, and tested a number of different multiscale spatial filtering models with respect to their capability to reproduce the psychophysically observed effects of narrowband noise on lightness perception. In particular, we implemented the models by Blakeslee and McCourt (1999) and Dakin and Bex (2003), and used publicly available implementations of the models by Robinson et al. (2007) and Otazu et al. (2008). We used White's illusion to test the models, because its phenomenological effect in terms of perceived lightness is opposite to that of (simultaneous) contrast, and it is more difficult to account for. Spatial filtering models were shown to account for both effects, White's illusion and simultaneous contrast, and we decided to take the more difficult one as the test case in order to probe for the postulated mechanisms. We show that all models systematically fail to reproduce the noise-masking effect that was observed psychophysically. We discuss the results with regard to the mechanisms that are implemented in the models and argue that the models, despite an apparent resemblance to known physiological mechanisms, do not capture the mechanisms that are crucial for lightness perception in human observers. Instead we think that our results point to an important role of luminance edges for lightness perception.

Multiscale filtering models

This section introduces the different models that we tested and explains how each of the models accounts for White's illusion. Our Python implementations of the models by Blakeslee and McCourt and Dakin and Bex (2003) are available at https://github.com/TUBvision/lightness_models. The models by Robinson et al. (2007) and Otazu et al. (2008) can be obtained from the respective authors, and also from https://github.com/TUBvision/betz2015_noise, where we supply all code required to reproduce the results of the present article.

The oriented difference of Gaussians model

In the oriented difference of Gaussians (ODOG) model by Blakeslee and McCourt (1999), the input image is convolved with ODOG filters in six orientations (0°, 30°, 60°, 90°, 120°, 150°) and seven spatial frequencies (zero-crossing distance 0.13°–8.16° in octave steps, corresponding to peak frequencies of 6.5, 3.25, 1.63, 0.81, 0.41, 0.2, and 0.1 cpd). The outputs of filters within the same orientation are summed, with weights that are determined by the spatial frequency. Lower frequencies receive smaller weights than higher frequencies, and the weights fall off with an exponent of 0.1 (i.e., halving the filter frequency decreases the filter weight by a factor of 20.1). This yields six multifrequency orientation responses. These orientation responses are normalized by their root-mean-square (RMS) energy, which is computed across all pixels and summed to yield the model output. In other words, the filter output is multiplied by the inverse of the output's RMS energy (the normalization factor) prior to summation. As a consequence of the response normalization, orientations with little energy in the input image will have a proportionally larger influence on the model output.

In the ODOG model, two mechanisms contribute to the difference in predicted lightness between the two equiluminant test patches in White's illusion. The first critical mechanism is the orientation normalization step (cf. Robinson et al., 2007). After the filter responses are summed across spatial frequencies, the response of the filter that is oriented orthogonal to the square wave grating is positive for the test patch on the dark grating bar and negative for the test patch on the light grating bar (fourth row in Figure 3). This happens because filters in this orientation are sensitive to the contrast between the test patch and the bar on which the patch is placed. Conversely, the response of the filters that are oriented parallel to the grating is positive for the test patch on the light bar and negative for the test patch on the dark bar (first row in Figure 3). This is because the filters oriented parallel to the grating are sensitive to the contrast between the test patch and the adjacent bars. The responses of the filters that are sensitive to all other orientations lie in between these values. For the model to produce White's illusion, the response of the filters orthogonal to the grating must receive a larger weight, because it is this response that entails a difference in the correct direction between the two test patches. The normalization by RMS energy does precisely that, because the filters oriented orthogonal to the grating do not respond to the grating itself, and thus have lower total response energy (see Figure 3).

Illustration of the ODOG model. The model consists of ODOG filters with seven spatial scales and six different orientations. To predict brightness, it processes images in four steps: First, an input image is convolved with all 42 (6 × 7) filters. Second, the filter outputs of different spatial frequency at the same orientation are summed, and outputs of higher frequencies receive slightly larger weights, indicated by the inset figure above the summation, where each dot corresponds to the weight of one spatial frequency. Third, the different orientation responses are normalized by pointwise division through their RMS energy computed over the entire image. Fourth, the normalized responses are summed to yield the model output. White's illusion in the model response is mainly caused by a higher weight given to the filter oriented orthogonal to the grating (fourth row), because its response has little energy. Since in this filter response, the test patch on the left has a higher value than the one on the right, it also receives a higher value in the final output.

Figure 3

Illustration of the ODOG model. The model consists of ODOG filters with seven spatial scales and six different orientations. To predict brightness, it processes images in four steps: First, an input image is convolved with all 42 (6 × 7) filters. Second, the filter outputs of different spatial frequency at the same orientation are summed, and outputs of higher frequencies receive slightly larger weights, indicated by the inset figure above the summation, where each dot corresponds to the weight of one spatial frequency. Third, the different orientation responses are normalized by pointwise division through their RMS energy computed over the entire image. Fourth, the normalized responses are summed to yield the model output. White's illusion in the model response is mainly caused by a higher weight given to the filter oriented orthogonal to the grating (fourth row), because its response has little energy. Since in this filter response, the test patch on the left has a higher value than the one on the right, it also receives a higher value in the final output.

A second mechanism is at work for stimulus versions in which the grating has a relatively high spatial frequency. Here, the response of filters with lower spatial frequency will perform a spatial smoothing of the luminance differences between test patches and underlying grating, and this will cause the test patches to perceptually assimilate with the adjacent bars even for filters that are oriented parallel to the grating. This explains why the model correctly predicts White's illusion to increase with increasing grating frequency (Blakeslee & McCourt, 2004).

Frequency-specific locally normalized ODOG

Robinson et al. (2007) criticized the global nature of the response normalization in the ODOG model and proposed an extended model that uses local normalization. This model computes the RMS energy of the filter responses in a spatially more restricted, local window, and applies the normalization at each pixel based on the energy in its surround. Robinson et al. (2007) argued that the more local model is physiologically more plausible than the original ODOG model. In addition, the model makes qualitatively correct predictions for a number of stimuli for which ODOG failed, such as radial versions of White's illusion, while it still accounts for all the stimuli that the original model predicted correctly. Robinson et al. also presented a model in which the normalization is frequency specific in addition to being local. This model is called the FLODOG model (frequency-specific locally normalized ODOG). In this model, unlike in the original ODOG model, the responses of the individual spatial frequencies are not added prior to normalization. Instead, the normalization weight for each filter response and location is computed as the RMS energy of the responses of filters with identical orientation and similar spatial frequency in a local surround. This implements a form of surround inhibition that is orientation- and frequency-specific. The model adds more stimuli to the list of correctly predicted illusions. The frequency specific normalization makes the model a good candidate for investigation with the noise-masked stimuli, so it was included in our analysis.

The FLODOG model has two free parameters, the size of the orientation normalization window, n, and the weighting of nearby frequencies, m. We chose the values n = 4 and m = 0.5 in our simulations, because those are the values that Robinson et al. consider to yield the best overall predictions.

Dakin and Bex

Dakin and Bex's (2003) model of lightness perception is inspired by the shape of the (average) amplitude spectrum of natural images. In this model, the average amplitude spectrum of natural images is believed to guide the visual system's attempt to reconstruct the input image based on its internal filter activations. Their main focus was the Craik-O'Brien-Cornsweet (COBC) illusion (Cornsweet, 1970; Craik, 1966; O'Brien, 1958), but they also used their model to account for White's illusion. Their model consists of isotropic (i.e., nonoriented) log Gabor filters. The filters span the entire range of frequencies encountered in an image and they are spaced in half-octave steps (from 1 cycle per image [cpi] up to the Nyquist frequency, corresponding to 0.08 to 20.4 cpd with the image parameters we used in our model evaluation).

The normalization, unlike in the ODOG model, is frequency specific. Individual frequencies are weighted so as to make the amplitude spectrum of the reconstructed image more similar to the average amplitude spectrum observed in natural images (1/f). The model assumes that all input images are composed of spatial frequencies following this 1/f distribution. It weights the actual filter responses such that the energy distribution in the model output conforms to this assumption. Thus, if in the actual input, low frequencies are underrepresented, they will be boosted, and vice versa.

This model predicts a difference between the two test patches in White's illusion, because filters tuned to very low frequencies will average the luminances of the test patches and their surround. A test patch on a dark bar increases the local mean luminance by replacing a part of the dark bar with mean gray. It will thus have a relatively higher response in those low-frequency filters. A test patch on a light bar decreases the local mean luminance by replacing a part of the light bar with mean gray, and hence leads to a relatively lower response in those low-frequency filters. Since the amplitude spectrum of White's stimulus has more power in high frequencies than natural stimuli, the model increases the weight given to the low frequencies, and so the effect of local surround averaging from the low-frequency filters is seen in the model prediction.

Brightness induction wavelet model

The last model that we tested is the brightness induction wavelet model (BIWAM) by Otazu et al. (2008). It is similar to the ODOG and FLODOG models as it also decomposes the image into frequency- and orientation-specific subbands, but the filters are not ODOG filters, the number of orientation bands is lower (three instead of six), and the frequency space is larger ranging from 1 cpi to the Nyquist frequency. The most crucial difference, again, is the normalization scheme. Normalization is performed within each frequency and orientation subband, so in this aspect, the BIWAM is most similar to the FLODOG model. However, unlike in FLODOG, the normalization factor is determined by the relationship between the filter response and the average response in the local surround, not simply by the total local response energy. Computationally, the normalization is achieved by computing the standard deviation of a filter response in a square centered at each location, and dividing it by the standard deviation of a surround area that is 3 times as large as the central area and excludes the center square. This yields a normalization factor that, after some nonlinear but monotonic transformations, is used to weight the filter response at the location for which it was computed. Thus, at each location, filters that have a stronger response in the center than in the surround will be boosted and filters with a high surround response will be suppressed. By contrast, in the FLODOG model, the suppression is dependent on the weighted sum of the response energy in the center and in the surround, not on the ratio between the two. BIWAM is less well known than the ODOG model, but we included it in our analysis because its normalization scheme is more neurophysiologically plausible than that of ODOG. At the same time, it makes qualitatively (and in many cases quantitatively) correct predictions for all the stimuli tested by Blakeslee and McCourt (1999). Furthermore, the fact that the normalization scheme is frequency specific makes the model an interesting candidate for explaining the frequency-specific effects of narrowband noise masking.

Effects of noise on perceived lightness

Salmela and Laurinen (2005) showed that brightness polarity identification (i.e., the judgment whether a target patch is relatively lighter or darker than its surround) depends on information represented in a narrow spatial frequency band. They found that brightness polarity identification of an oval test area on a uniform surround was compromised by narrowband Gaussian noise, and that the effect of the noise strongly depended on its spatial frequency. Importantly, the most effective noise frequency did not scale proportionally with the size of the test area. Increasing the stimulus 16-fold shifted the most effective masking frequency only about one octave (i.e., 2-fold) towards lower frequencies. They drew two conclusions from these results: first, brightness perception does not depend on filters that are matched in size to the test areas. And second, the visual system seems to use edge information to determine brightness, but does not exploit the full frequency range that would be informative about edges.2 In a later study, Salmela and Laurinen (2009) extended their previous results by quantitatively assessing brightness (not just brightness polarity) in different types of classical lightness illusions. They found a strong effect of noise masking on perceived brightness and the effect again depended on the spatial frequency of the noise. Comparable results have been reported by Perna and Morrone (2007), who used filtered images instead of noise masking, and also found that brightness information was mediated by a narrow spatial frequency band (in their case, centered at somewhat lower frequencies, around 1 cpd).

A similar type of noise masking has also been used by Solomon and Pelli (1994) to show that letter identification is mediated by a single frequency channel. Thus in letter identification the visual system failed to integrate relevant information from different spatial frequency bands, similar to the results in brightness perception discussed here. However, the relationship between the most effective noise frequency and the stroke width of the letters was stronger than for brightness polarity estimation. A 5-fold increase in stimulus size led to a 2.8-fold increase in most effective masking frequency (Petkov & Westenberg, 2003). As a side note, although both types of experiments showed some dependence of the most effective noise frequency on the scale of the stimuli, Salmela and Laurinen (2005, 2009) emphasized that the scaling was incomplete, whereas Petkov and Westenberg (2003) emphasized the presence of scaling. Still, it seems clear that the scaling between stimulus frequency and most effective noise frequency is not 1:1. This can be easily demonstrated by looking at Figure 2 from different viewing distances. Even though the visibility and the lightness of the test patches change as a function of distance, the relationship between noise frequency and grating frequency is not affected by the change in viewing distance. The same is true for letter stimuli (see figure 1 in Solomon & Pelli, 1994, and also the cover of that issue).

The effect of narrowband noise on perceived brightness provides an opportunity to test, as Kingdom (2011) suggested, whether contrast normalization is indeed responsible for the brightness differences in White's illusion. The hypothetical processing channels in the visual system that are thought to be affected by narrowband noise are precisely those that the models attempt to capture. The effects of narrowband noise on the models' output are therefore a critical test for a model's adequacy to account for White's illusion. If the mechanism causing the illusion in the visual system is in fact the same mechanism as the one emulated in the model (or at least one of the models), then the model should predict the psychophysically observed effects of noise masking. We repeated the lightness matching experiment of Salmela and Laurinen (2009), and included versions of White's illusion at three different spatial frequencies. Thus, we could examine the relationship between stimulus size and most effective noise frequency not only for polarity matching or detection, but directly for lightness matching. We replicated the main findings of Salmela and Laurinen (2009). Therefore a strong test of lightness models is testing their predictions of results in the noise-masking experiments. Specifically, the models should predict (a) a reduction in illusion strength for noise frequencies between 1 and 5 cpd, and (b) the location of this reduction along the frequency axis should not shift proportionally with changes in grating frequency. We tested these predictions by simulating our noise-masking experiment with the four models.

Psychophysical experiment

Methods

Participants

Eleven observers (authors TB and MM, two experienced and seven naive observers; four male) participated in the experiment. Observers' mean age was 29 years (range of 23 to 35 years). All observers had normal or corrected-to-normal vision. Naive observers were financially compensated for their time. All observers gave written informed consent to their participation in the study.

Stimuli and apparatus

The test stimuli consisted of a horizontal square wave grating in which a single square test patch was embedded. Three different grating frequencies were used. At the lowest frequency (0.2 cpd), the grating contained four bars, two light and two dark bars, with a total size of 10.2° × 10.2°. The test patch was 2.55° × 2.55° wide. At the medium frequency (0.4 cpd), the grating contained six bars, total size was 7.66° × 7.66°, and the test patch measured 1.28° × 1.28°. This condition corresponded to the stimulus dimensions used by Salmela and Laurinen (2009), in their matching experiment. The highest frequency grating (0.8 cpd) consisted of 12 bars, total size was again 7.66° × 7.66°, and the test patch measured 0.64° × 0.64°. While higher frequencies than 0.8 cpd are often used in experiments on White's illusion, in our particular case with square test patches, the test patches would have become so small at grating frequencies above 0.8 cpd that lightness matching would have been very difficult. The test patch was always located on the bar directly below the center of the grating, and centered horizontally on the grating. Test stimuli were embedded in a 16.3° × 16.3° noise mask. These noise masks were slightly larger than those employed by Salmela and Laurinen (2009) in order to accommodate the larger low-frequency grating. The noise was uniform white noise, band-pass filtered with a Gaussian filter with one octave spatial frequency bandwidth (full-width at half height). The RMS contrast of the noise was 0.2. We used six different noise center frequencies ranging in logarithmic steps from 0.58 to 9, and one control condition without noise. The noise masks were created with Matlab code kindly provided by Dr. Salmela, saved as .mat files, and then loaded into our own Python scripts.

In all conditions, the dark bars had a luminance of 41.8 cd/m2 and the light bars of 46.2 cd/m2 corresponding to a Michelson contrast of 0.05. Salmela and Laurinen (2009) reported a contrast of 0.1 for their grating, but presented the grating and the noise in alternating frames, which resulted in an effective grating contrast over time of 0.05, so we used this value. The background luminance was 44 cd/m2, and the test patch luminance was also 44 cd/m2. The comparison square was always 2.37° × 2.37° in size. Its initial luminance was randomly set in each trial to a value between 35.2 and 52.8 cd/m2. It was presented on top of a random checkered background that consisted of 6 × 6 checks of size 0.8° × 0.8° with gray values sampled uniformly from values between 35 and 53 cd/m2.

The noise mask with the embedded grating was centered on the screen. The comparison background was placed on the left of the screen (Figure 4).

Illustration of the screen during matching. The observer adjusted the comparison square on the left to match the lightness of the test patch in the grating. The gray background was actually larger and has been cropped for this illustration. The contrast of the grating has been increased for better visibility in this illustration.

Figure 4

Illustration of the screen during matching. The observer adjusted the comparison square on the left to match the lightness of the test patch in the grating. The gray background was actually larger and has been cropped for this illustration. The contrast of the grating has been increased for better visibility in this illustration.

Stimuli were presented on a linearized 21-in. Siemens SMM21106LS monitor (400 × 300 mm, 1024 × 768 px, 130 Hz) controlled by a DataPixx (VPixx Technologies Inc., Saint-Bruno, QC, Canada) and custom presentation software developed in our lab and published at https://github.com/TUBvision/TUBvision/hrl. Observers were seated 70 cm from the screen, and their position was fixed with a chin-rest. Responses were recorded with a ResponsePixx button-box (VPixx Technologies Inc.).

Procedure

Our goal was to measure the effect of noise on the perceived lightness of the test patches in White's illusion. On each trial, observers first saw a low-contrast fixation ring (inner diameter 4 pixels, outer diameter 10 pixels, ring luminance 26.4 cd/m2), centered on the position where the test patch would later appear. The low contrast of the test patch was chosen to avoid afterimages or adaptation that could interfere with perception of the test patch. Upon a button press, the stimulus was shown for 0.5 s, after which the fixation ring reappeared. Observers could then adjust the lightness of the comparison patch so as to match the perceived lightness of the target patch that was embedded in the square wave grating of the White's stimulus. They could review the stimulus as often as they liked with a button press, but the viewing time on each presentation was limited to 0.5 s. We chose this procedure in order to reduce the effect of strategies by different observers (see Difficulties in replication experiment section). Observers indicated when they were satisfied with their setting by a button press and continued to the next trial.

Experimental design

The independent variables were the frequency of the noise, which could take seven values (six different frequencies, or no noise), and the frequency of the grating. Grating frequencies were blocked, such that each observer first completed all trials at one grating frequency before moving on to the next. The order in which the different grating frequencies were tested was randomized across observers. The topmost bar of the grating could either be light or dark, so that the test patch could become an increment or a decrement with respect to the bar on which it was placed.

Overall, the experiment consisted of 420 trials (7 noise types × 3 grating frequencies × 2 grating phases × 10 repetitions). Each repetition used a different random noise mask, but the masks were the same for all observers. Masks 6–10 were 180° rotated versions of Masks 1–5. Experienced observers only completed five repetitions. Since both incremental and decremental test patches were presented in the same position, differences in the luminance levels of the noise masks should not affect White's illusion, which was computed as the difference in matched luminance between incremental and decremental test patches with the same noise mask.

Results

The matching data were analyzed by computing the difference in match luminance between each pair of trials that differed only in whether the test patch was placed on a dark or a light grating bar, but that had the same noise mask and the same grating frequency. This difference was taken as a measure of the strength of White's illusion. For the no-noise control trials, no such pairings could be made, so the illusion strength at each individual trial was computed as the difference between the match luminance at that trial and the mean match luminance of the trials with a complementary background bar lightness. Data from a typical observer are shown in Figure 5. It can be seen that at all grating frequencies, there is a small range of noise frequencies for which illusion strength is reduced. Results of all individual observers are available as supplemental material.

Noise-masking results for one typical observer. x-axis indicates the center spatial frequency of the noise mask, in cycles per degree. 0.06 cpd would correspond to 1 cycle per image (cpi) in these stimuli. y-axis indicates illusion strength, measured as the difference in matched lightness between test patches placed on different background bars. The grating frequency is indicated by the star on the x-axis. Large circles are means across trials, small circles are individual trials. The results in the no-noise control are shown on the very left. Light gray circles are trials where the test patch was an increment with respect to the bar on which it was placed; dark gray circles are trials where the test patch was a decrement. These circles are plotted only to give an impression of the variance of the data in the no-noise condition. The dotted line indicates the overall mean illusion strength of the no-noise condition. The light gray line is the best fit to the data, and the location of the minimum of the fit is indicated by the light gray text inset.

Figure 5

Noise-masking results for one typical observer. x-axis indicates the center spatial frequency of the noise mask, in cycles per degree. 0.06 cpd would correspond to 1 cycle per image (cpi) in these stimuli. y-axis indicates illusion strength, measured as the difference in matched lightness between test patches placed on different background bars. The grating frequency is indicated by the star on the x-axis. Large circles are means across trials, small circles are individual trials. The results in the no-noise control are shown on the very left. Light gray circles are trials where the test patch was an increment with respect to the bar on which it was placed; dark gray circles are trials where the test patch was a decrement. These circles are plotted only to give an impression of the variance of the data in the no-noise condition. The dotted line indicates the overall mean illusion strength of the no-noise condition. The light gray line is the best fit to the data, and the location of the minimum of the fit is indicated by the light gray text inset.

To quantify the location of this most effective noise frequency range, a log Gaussian function was fitted to the data with the lmfit package in Python. The function was defined as

where b determined the upper asymptote of the function, m the minimum value at the dip, σ the width of the dip, and μ the location of the dip. The parameter of interest was μ, the location of the dip on the spatial frequency axis (i.e., the noise frequency that most strongly reduced White's illusion). This value was computed for all observers and all grating frequencies, and is plotted in Figure 6. Two observers (observeres n2 and e1, Supplemental Figures S2 and S8) did not show a clear effect of the noise, and one was extremely variable in her responses (observer n7, Supplemental Figure S7), so they were excluded from this analysis. An additional observer (n4, Supplemental Figure S4) only showed a clear noise effect at the two higher grating frequencies, so no function was fit to her low-grating frequency data. It is clear from these results that all observers for which a clear effect of the noise was measurable were most affected by noise in the range between 1 and 5 cpd. Furthermore, the most effective noise frequency increased with increasing grating frequency, but not proportionally. The mean slope across observers in Figure 6 is 0.63 for both line segments. The 95% confidence intervals (bootstrapped with 10,000 trials with replacement) are [0.51, 0.75] for the lower segment and [0.52, 0.76] for the upper segment.

Summary of the effect of narrowband noise on White's illusion at different grating frequencies. The x-axis indicates the spatial frequency of the grating, the y-axis the frequency at which the noise had the largest effect on illusion strength. Results from the psychophysical experiment are shown in blue (individual observers in light blue, mean across observers in dark). The results for the two models that also predict noise in a specific frequency range to be most effective are shown in red and green. Both models predict much lower noise frequencies to be most effective, and both predict the increase in most effective noise frequency with increasing grating frequency to be steeper than observed psychophysically.

Figure 6

Summary of the effect of narrowband noise on White's illusion at different grating frequencies. The x-axis indicates the spatial frequency of the grating, the y-axis the frequency at which the noise had the largest effect on illusion strength. Results from the psychophysical experiment are shown in blue (individual observers in light blue, mean across observers in dark). The results for the two models that also predict noise in a specific frequency range to be most effective are shown in red and green. Both models predict much lower noise frequencies to be most effective, and both predict the increase in most effective noise frequency with increasing grating frequency to be steeper than observed psychophysically.

Modeling results were obtained by computing the output of each of the models to each of the noise stimuli used in the psychophysical experiment, with three minor changes. First, to speed up computations, and to avoid shifts in mean stimulus luminance caused by the introduction of a test patch on either a light or a dark grating bar, all stimuli contained two test patches, one on the bar just above the center of the stimulus, one on the bar just below. The test patches were shifted to opposite sides so that there was 2° of space between them. Second, we included three additional lower noise frequencies, because for some of the models, a frequency specific effect appeared at lower frequencies than for observers, and we wanted to quantify this effect. These frequencies were not included in the psychophysical experiment to reduce the number of trials. We had found in pilot experiments that responses at these low frequencies did not differ from those at the lowest frequency tested in the main experiment. Third, we included 50 different noise masks, instead of the 10 used in the psychophysical experiment. The relatively small number of 50 samples was chosen because it was already sufficient to make the standard error in the model prediction small in comparison to the mean effect size of the noise frequency. Due to the random nature of the noise (i.e., random luminance variation across the image) it was possible that by chance different noise values would be added to the luminances at each of the two test patch positions. To counteract such random fluctuation in test patch luminance, illusion strength was computed as the difference between responses to incremental and decremental test patches that were placed in the exact same noise environment on consecutive trials. Since the scaling of the output is arbitrary in three out of the four models, illusion strengths were normalized such that the illusion strength for a stimulus with the medium grating frequency in the absence of noise received a value of 4. This approximately corresponds to the mean illusion strength across observers in this condition. The normalization does not unduly alter the results and makes comparing the models easier. Model results can be compared to the psychophysical data shown in Figure 5.

The oriented difference of Gaussians model

The ODOG model did not reproduce the psychophysical effects (Figure 7). The illusion was abolished at all but the highest noise frequencies, not just in a narrow range between 1 and 5 cpd. The noise with 9 cpd was least effective, which is plausible given that the highest filter frequency in the model has a center frequency of 6.5 cpd. Increasing the grating frequency had little effect on the model results, other than generally raising the predicted illusion strength.

Noise masking results for the ODOG model, analogous to Figure 5. x-axis indicates the center spatial frequency of the noise mask, y-axis indicates illusion strength, measured as the difference in model output between test patches placed on different background bars, but in the same noise environment. The grating frequency is indicated by the star on the x-axis. Large circles are means across trials; small circles are individual trials. For most models and noise frequencies, the effect of different noise masks is so small that individual trial data cluster together and form a line, or are hidden by the mean. The dotted horizontal line indicates the overall mean illusion strength of the no-noise condition. No single trial data are plotted for the no-noise condition, since the model response does not vary across trials without noise.

Figure 7

Noise masking results for the ODOG model, analogous to Figure 5. x-axis indicates the center spatial frequency of the noise mask, y-axis indicates illusion strength, measured as the difference in model output between test patches placed on different background bars, but in the same noise environment. The grating frequency is indicated by the star on the x-axis. Large circles are means across trials; small circles are individual trials. For most models and noise frequencies, the effect of different noise masks is so small that individual trial data cluster together and form a line, or are hidden by the mean. The dotted horizontal line indicates the overall mean illusion strength of the no-noise condition. No single trial data are plotted for the no-noise condition, since the model response does not vary across trials without noise.

The results show that the ODOG model cannot account for the effect of narrowband noise on lightness perception in White's illusion. To understand this failure, recall that the model produces White's effect mainly through orientation normalization. Adding isotropic noise to the stimuli adds energy in all orientations, thus reducing the ratio of weights between different orientations, and in turn reducing the effect size. As long as the range of noise frequencies is within the range picked up by the model (i.e., between 6.5 and 0.1 cpd) the frequency of the noise will have no specific effect, because the energy is computed after summing across all spatial frequencies. Therefore, the model cannot capture this aspect of human lightness perception—that White's illusion is not affected by adding low or very high-frequency noise.

Frequency-specific locally normalized ODOG

The ODOG model failed in accounting for the frequency-specific masking effect because energy was pooled across all spatial frequencies within an orientation channel. The FLODOG model might be better equipped to account for the effect of noise on White's illusion, because it uses frequency specific normalization. Indeed, as depicted in Figure 8, the frequency of the noise did have an effect on the predicted illusion strength in the FLODOG model. However, the most effective masking frequency was much lower than the 1 to 5 cpd observed to be most effective psychophysically. Furthermore, the masking frequency scaled almost proportionally with the grating frequency, which can be seen by comparing the middle and bottom panels in Figure 8. This is also evident in Figure 6, where the slope of the line connecting the two values for the FLODOG model is 0.82. The fit for the low-frequency grating should be ignored, since it is not sufficiently constrained by data at noise frequencies below the dip frequency.

Noise-masking results for the FLODOG model. Same conventions as in Figure 7. The light gray line is the best fit to the data, and the location of the minimum of the fit is indicated by the light gray text inset.

Figure 8

Noise-masking results for the FLODOG model. Same conventions as in Figure 7. The light gray line is the best fit to the data, and the location of the minimum of the fit is indicated by the light gray text inset.

The precise location of the dip in the FLODOG model is determined by two factors: an orientation-specific effect and a spatial smoothing related effect. These are easiest understood by considering which of the 42 individual filter responses contributed most to White's illusion (see Figure 3 for individual filter responses to a stimulus without noise). One needs to look at filters that have a higher average response to the region that corresponds to the test patch on the dark bar than to the region that corresponds to the test patch on the light bar. This is the case in all filters that are oriented orthogonal to the grating. However, the difference is most pronounced in the filters with a spatial frequency that matches that of the grating. In addition, all the low-frequency filters have a higher average response to the test patch on the dark bar, regardless of filter orientation. This latter effect is due to spatial smoothing. A reduction in the normalization weight for any of those filters that contribute to White's illusion will lead to a reduction in predicted illusion strength.

Following the above reasoning, adding noise with a spatial frequency that matches that of the grating has a pronounced effect. However, an even larger effect is achieved with a noise frequency slightly below the grating frequency, because that noise affects both the orientation specific and the spatial smoothing related contributions to the illusion. These model predictions are in contradiction to the psychophysical results, where the largest masking effect is achieved with noise between 1 and 5 cpd, and the scaling relationship between the grating frequency and the most effective masking frequency is not constant. The above analysis also implies that changing the free parameters of the FLODOG model (i.e., the size of the local normalization window or the influence of neighboring spatial frequencies on response normalization) could not make the model reproduce the psychophysical results, because no such change would make noise between 1 and 5 cpd most effective if the grating frequency is below 1 cpd.

Dakin and Bex

In the model of Dakin and Bex (2003), White's illusion is not produced by orientation normalization, because the model contains only isotropic filters. Their model increases the contribution of low spatial frequency filters to the output image, and the resulting spatial smoothing between the surround grating and the test patches causes the illusion. As illustrated in Figure 9, the Dakin–Bex approach failed in the presence of narrowband noise. Adding energy, in the form of noise, to the low frequencies caused the model to assign less weight to the filters with low frequencies. Therefore, the model predicted the illusion to disappear or even reverse in the presence of low-frequency noise. Conversely, for higher noise frequencies, the model increased the weights given to low-frequency responses and so the predicted illusion strength rose above baseline level, contradicting the data in Figure 5. Since in the model, White's illusion is based on spatial smoothing between the grating and the test patch, the model predictions for a high-frequency grating are shifted towards higher spatial frequencies and predictions for a lower frequency grating are shifted downward by an amount that corresponded almost exactly to the frequency difference between the two gratings. This shift is also in contradiction to the psychophysical data.

It should be noted that the most prominent reduction in illusion strength for low-frequency noise was predicted for frequencies that were not included in the psychophysical experiments. As mentioned above, the illusion was not much affected by noise at these frequencies in pilot experiments. The reader can convince her- or himself of this fact by inspecting the example stimuli with very low-frequency noise that are depicted in Figure 10.

Illustration of the effect of low-frequency noise on White's illusion. The leftmost stimulus is masked with a noise center frequency of 0.11 cpd, the central stimulus with 0.19 cpd, the rightmost stimulus with 0.33 cpd (assuming a viewing distance of 40 cm). None of these masks appear to cause a large reduction in illusion strength.

Figure 10

Illustration of the effect of low-frequency noise on White's illusion. The leftmost stimulus is masked with a noise center frequency of 0.11 cpd, the central stimulus with 0.19 cpd, the rightmost stimulus with 0.33 cpd (assuming a viewing distance of 40 cm). None of these masks appear to cause a large reduction in illusion strength.

In the BIWAM, White's illusion is explained through a combination of contrast and assimilation effects mediated by center-surround interactions of the filter responses. Since BIWAM output is not arbitrarily scaled and can be interpreted in the same units as the input image, the results show that the illusion strength predicted by BIWAM is much smaller than observed psychophysically (dotted lines in Figure 11). These small effect sizes are a result of the low contrast of the stimulus. At least for the two lower frequencies, the predicted effect is so small that it would be impossible to measure. In addition, the BIWAM incorrectly predicted that White's illusion is reversed for the low-frequency grating (the dotted line is below the zero-line in the top-most panel of Figure 11, implying that even without noise, the test patch on the dark bar is predicted to be darker than the test patch on the light bar at the lowest spatial frequency). The BIWAM is also very sensitive to random differences in the noise masks, which can be seen in the large variability across the individual trials. Finally, the BIWAM predicted noise at approximately the frequency of the carrier grating to be most effective in reducing White's illusion. Thus, although the model predicted a frequency specific effect of narrowband noise on White's illusion, it predicted the wrong frequency to be most effective. It also incorrectly predicted a stronger coupling between the most effective noise frequency and the frequency of the square wave grating than observed psychophysically. The slope of the line connecting the two values for the BIWAM model in Figure 6 is 0.86, which lies outside the 95% confidence interval computed from the psychophysical data.

Noise-masking results for the BIWAM. Same conventions as in Figure 7. The light gray line is the best fit to the data, and the location of the minimum of the fit is indicated by the light gray text inset.

Figure 11

Noise-masking results for the BIWAM. Same conventions as in Figure 7. The light gray line is the best fit to the data, and the location of the minimum of the fit is indicated by the light gray text inset.

None of the spatial filtering models tested in this study was able to reproduce the effect of narrowband noise on White's illusion that had been observed psychophysically. This failure is critical, because, as we argued above, narrowband noise specifically interferes with the mechanisms that are supposedly responsible for White's illusion in these models. Therefore, if this type of interference does not result in a model response that parallels the observed perceptual effects, something important is wrong. The fact that the models seemed to capture human perception in the simple case without noise appears to have been a coincidence. We have therefore arrived at the conclusion that the models are not simply incomplete, but qualitatively inadequate.

A common criterion for evaluating models of any kind is their predictive power. For example, Allred, Radonjic, Gilchrist, and Brainard (2012, p. 12) write that “[a] complete model of the perception of surface lightness would allow prediction of the lightness of any image region.” However, in addition to being interested in predicting how observers perceive the lightness of an image region, we are also interested in explaining why they perceive it as they do. We agree with Kaplan and Craver (2011, p. 602) who argued that “models [in systems or cognitive neuroscience], like models in ‘lower-level' neuroscience, carry explanatory force to the extent, and only to the extent, that they reveal (however dimly) aspects of the causal structure of a mechanism.” Our criticism of the models discussed here touches precisely on their mechanistic adequacy. The models considered in this article aimed at more than prediction. This is evident from the fact that the authors of each model we considered discussed the model's physiological plausibility. And although some parts of the models were not expected to be found in visual cortex, the critical mechanisms for the explanation of lightness effects, such as the response normalization and surround integration, were intended to have physiological equivalents. To summarize, we think that the failure of multiscale spatial filtering models in the presence of narrowband noise speaks against their adequacy as explanations of human lightness perception. Even if a model is designed to carry out computations that bear a resemblance to those happening in cortex, that does not mean that it is a correct model of a specific process, such as lightness perception.

Lightness perception and luminance edges

Given that spatial filtering models cannot explain lightness perception in the presence of noise, the question is whether there are other low-level mechanisms that might be able to account for the observed effects. Salmela and Laurinen (2005) proposed that their results are compatible with the idea that lightness perception is served by mechanisms that respond to luminance edges. This conclusion requires some explanation, because luminance edges are not narrowband, but instead would elicit responses in filters sensitive to a range of frequencies. However, the results from letter detection (Solomon & Pelli, 1994) already indicated that the visual system does not always exploit the full range of spatial frequencies for solving a specific task. If the edges that determine perceived lightness are similarly detected only in a limited frequency range, this would explain why noise at these frequencies interferes with lightness perception. The argument linking the noise-masking results to the importance of luminance edges for lightness perception further relies on the finding that the most effective noise frequency did not scale proportionally with the size of the stimulus. Luminance edges are an obvious candidate to account for that effect, because they remain locally unchanged when the stimulus is scaled. Thus, a mechanism that first detects edges would still be impaired by noise at the same frequency, even if the surface bounded by the edge becomes smaller or larger. This interpretation also fits well with the perceptual impression many observers have in Figure 2: In the central panel, the edges of the test patch are difficult to see, and this in turn makes the test patches merge with the bars on which they are embedded.

One argument against the edge-based explanation is that the most effective noise frequency was not fully constant across stimulus scales. There was some scaling with grating frequency, although much less than what would be expected if White's effect depended on the response of filters matched to the scale of the test patches. The partial scaling of the effective noise frequency with grating frequency could hint at the involvement of more than one mechanism in the judgment of lightness in the present task. For example, it has been suggested that White's illusion depends on a perceptual scission of the test patches and the grating bars into two layers (Anderson, 1997). In addition, separating the test patches and the grating in depth through stereo presentation has also been found to affect the illusion (Taya, Ehrenstein, & Cavonius, 1995). If indeed multiple mechanisms are involved in causing White's illusion, it might be that some of these mechanisms are not scale-invariant. A further complication could be caused by the fact that the scission mechanism may also be influenced by noise. A number of observers have informally reported that they perceive the low-frequency noise as a layer of clouds overlaying the grating stimulus. This layer separation does not seem to happen for the high-frequency noise.

There are a number of studies that support the importance of luminance edges in lightness perception, psychophysically (Anstis, 2013; Geier & Hudák, 2011; Kurki, Peromaa, Hyvärinen, & Saarinen, 2009; Robinson & de Sa, 2013; Shapley & Tolhurst, 1973), as well as physiologically (von der Heydt, Friedman, & Zhou, 2003; Zurawel, Ayzenshtat, Zweig, Shapley, & Slovin, 2014). Rudd (Rudd, 2013, 2014; Rudd & Zemach, 2005) proposed a model of lightness perception in which edge integration is a critical factor in the computation of lightness. We have recently provided additional support for this approach by showing that White's illusion is largely determined by the luminance contrast across the edges of the test patch (Betz, Shapley, Wichmann, & Maertens, 2015). In that study, we used contour adaptation (Anstis, 2013) to selectively mask the edges of the test patch that are either orthogonal or parallel to the inducing grating. We found that adapting to the orthogonal edges greatly reduced, and for some observers reversed White's illusion. Adapting to the parallel edges had a smaller effect, and tended to enhance the illusion. These results support our conclusion that narrowband noise affects perceived lightness primarily in the frequency range between 1 and 5 cpd because noise in that frequency range interferes with the edge detection that is a first step in the computation of surface lightness.

Edge-based models and the filling-in problem

There is one obvious advantage of spatial filtering models over models that are based on luminance edges. Spatial filtering models sidestep the filling-in problem, which refers to the problem that in order to perceive homogeneous surfaces, information at the edges of a surface must somehow be used to fill-in the entire surface area. Spatial filtering models solve the filling-in problem by including large filters that are tuned to low spatial frequencies and these filters produce a surface response.

The argument that the perceptual phenomenon of filling-in is mediated by low spatial frequency signals was most prominently formulated by Dakin and Bex (2003). They stressed the importance of low spatial frequency content for lightness perception, but the precise meaning of “low” was not defined. Instead they presented a demo using the COBC illusion in which either the high or the low spatial frequency content was shuffled. Phase shuffling the high-frequency content left the illusion intact, whereas phase shuffling the low-frequency content destroyed it (Figure 12A through C).

COBC-type illusion (after Dakin & Bex, 2003) demonstrating the effect of shuffling the phase information in different frequency bands. (A) Unshuffled stimulus. The hair and cap look darker then the face, even though luminance differences are only present at the border between the regions. (B) Shuffling the phases below 30 cpi destroys the illusion. (C) Shuffling phases above 30 cpi preserves the illusion. (D) Shuffling phases below 6 cpi also preserves the illusion, in fact the shuffling is hardly noticeable, since there is by design very little energy in these low frequencies in the COBC-type image. This shows that the effect depends on some frequency band below 30 cpi and above 6 cpi, not simply on all low frequencies. Note that at a viewing distance of 40 cm, these images subtend around 6° visual angle, so the critical band is between 1 and 5 cpd. (A–C) recreated after Dakin and Bex (2003), but using a face image that is in the public domain, as done previously by Geier, J. (2009). A diffusion based computational model and computer simulation for the lightness illusions. Perception ECVP Abstract, 38, 95.

Figure 12

COBC-type illusion (after Dakin & Bex, 2003) demonstrating the effect of shuffling the phase information in different frequency bands. (A) Unshuffled stimulus. The hair and cap look darker then the face, even though luminance differences are only present at the border between the regions. (B) Shuffling the phases below 30 cpi destroys the illusion. (C) Shuffling phases above 30 cpi preserves the illusion. (D) Shuffling phases below 6 cpi also preserves the illusion, in fact the shuffling is hardly noticeable, since there is by design very little energy in these low frequencies in the COBC-type image. This shows that the effect depends on some frequency band below 30 cpi and above 6 cpi, not simply on all low frequencies. Note that at a viewing distance of 40 cm, these images subtend around 6° visual angle, so the critical band is between 1 and 5 cpd. (A–C) recreated after Dakin and Bex (2003), but using a face image that is in the public domain, as done previously by Geier, J. (2009). A diffusion based computational model and computer simulation for the lightness illusions. Perception ECVP Abstract, 38, 95.

At first sight this demo seems in contradiction to the claim advocated here—that luminances edges, detected predominantly at frequencies within a band of 1 to 5 cpd, are most relevant for lightness perception. However, Dakin and Bex (2003) referred to all frequencies below 30 cpi as low frequencies. This would correspond to about 5 cpd in an image that is 4-cm wide and viewed at a distance of 40 cm. Their original demo, thus, cannot distinguish between the alternative hypotheses that all frequencies below 5 cpd are relevant for lightness perception, or that the effect is actually dependent on a specific frequency band somewhere between 0 and 5 cpd. Figure 12 shows a demo that is analogous to the one used by Dakin and Bex, but with an additional stimulus, in which the phase shuffling has only been applied to frequencies below 6 cpi (≈1 cpd; Figure 12D). Readers can judge for themselves, but to us it seems that this manipulation leaves the illusion intact. This demo supports the finding that some intermediate frequency band is important for lightness perception. The apparent conflict between Dakin and Bex's demo and the present results can thus be resolved in favor of the importance of luminance borders. However, if we abandon spatial filtering models as accounts for the perception of surface lightness in favor of models that are based on luminance edges, the filling-in problem needs to be addressed in future experiments and modeling.

One might question whether the failure of four specific models shown here suffices to argue against an entire class of models, and ask if there are possible modifications that could save the models. The current spatial filtering models are incompatible with the idea that lightness perception depends on a two-stage process that first computes luminance ratios across edges, and then extrapolates the lightness values that are based on these edge ratios to the entire surface bounded by the edges. In spatial filtering models, surface lightness depends on the response of filters that are centered on the surface, and whose response may be influenced by the surround through large receptive fields, and potentially through normalization effects. There is no mechanism for extrapolating edge responses to the center of a surface, and thus frequency-specific effects of noise masking will always be coupled to the spatial scale of the target surface. This holds independently of whether the specific implementation of the filters is explicitly designed as in the models tested here, or learned in a neural network approach (Corney & Lotto, 2007).

Difficulties in replication experiment

We found in pilot experiments that a one-to-one replication of Salmela and Laurinen's (2009) study was difficult, because there was high variability in responses even between experienced observers. While some observers, including the first author, produced data very similar to observer VS in the original study, others showed very little effect of the noise masking. In conditions where the test patch was hard to detect, and thus looked similar in lightness to the grating bar on which it was placed, some observers looked at the stimulus for a very long time trying to detect the test patch, while others simply matched the lightness of the grating bar. In order to reduce the influence of such different strategies, we attempted to make the task more objective by using a two-alternative forced choice (2AFC) paradigm, in which observers simply indicated which of two simultaneously presented test patches appeared brighter. However, close inspection of the psychometric functions from that experiment revealed that for some noise frequencies, there is no point where test patches placed on dark and light grating bars appear equal in lightness. When observers did not see the patches, they judged the lightness of the bars, and then the test location on a light bar is always lighter than the test location on the dark bar. However, when they did see the patches, they almost always saw the patch on a light bar as darker, regardless of the precise luminance values of the two patches. Thus, if their response in the 2AFC task was around 50% for some test luminance value, this was probably because they saw the test patch in half of the trials, and did not see it in the other half. It was not because the test and the standard, if both were visible, look equal in lightness at these luminance values. This implies that analyzing the 2AFC data by fitting psychometric functions and estimating a point of subjective equivalence is misleading when measuring White's illusion masked by narrowband noise. It also hints at the important connection between edge visibility and perceived lightness of the patches. After further testing, we found that reducing the presentation time of the stimuli led to more similar behavior across subjects, so we opted for a lightness-matching paradigm with short presentation times. But even then, two out of our 11 observers showed behavior very different from the others, and the magnitude of both White's illusion and the effect of noise differed widely across observers (see individual observer data in the supplemental material).

One possible explanation for these difficulties is that the noise may make the matching task perceptually ill defined. The luminance of the test patches is not homogenous, and observers may have different strategies for arriving at a single lightness value that they use for their match. In the low-frequency noise conditions, the noise can appear as a layer of clouds or haze in front of a homogenous grating, and this layer separation makes the matching relatively straightforward. At very high noise frequencies, the noise is so fine-grained that it is not difficult to get an impression of the average lightness of the test patch, which in that case may appear as textured. At intermediate noise frequencies (i.e., those where we find a reduction or reversal of White's illusion) it can be difficult even to detect the test patch as a separate region, which makes the matching most difficult. In that case, most observers matched the average lightness in the region where the test patch would have been visible without the noise, which explains why in this condition, test patches (or rather test areas) on a light grating bar were often matched with higher lightness values than test areas on a dark grating bar. To us, these observations suggest that there is a close connection between image segmentation (i.e., the explicit perceptual separation of the test patches as a distinct region) and lightness perception. Still, despite these difficulties with interpreting precisely what the lightness matches mean, the data consistently demonstrate that for most observers, noise in the range from 1 to 5 cpd has the largest effect on perceived lightness, and that the most effective noise frequency does not scale proportionally to the grating frequency.

Conclusion

We started from the question to what extent low-level visual mechanisms alone can account for different lightness phenomena. Our analysis showed that the most popular class of low-level models, spatial filtering models, cannot provide an adequate explanatory account of White's illusion. While this could indicate that higher level factors are required for the explanation of lightness perception, the importance of edge information in the computation of surface lightness still leaves potential for a low-level mechanism.

There were other (low-level) models of lightness perception that advocated the importance of luminance edges (Grossberg & Todorovic, 1988; Kelly & Grossberg, 2000; Kingdom & Moulden, 1992; Morrone & Burr, 1988; Watt & Morgan, 1985). According to Kingdom (2011), the problem with some of these models (Kingdom & Moulden, 1992; Morrone & Burr, 1988; Watt & Morgan, 1985) is that the integration of edge information over two-dimesional images is intractable. A further argument against the models by Grossberg and Todorovic (1988) is that the proposed filling-in mechanism did not capture neurophysiological data (von der Heydt, 2003). However, “[t]he final reason why [these] models have failed in their bid to account for brightness phenomena is that they have been superseded by another class of spatial-filtering model” (Kingdom, 2011, p. 660). In light of the failure of this new class of models that was demonstrated here, we believe that it may be time to reconsider the edge based approach to lightness perception.

Acknowledgments

This work was supported by the German Research Foundation (GRK 1589/1 “Sensory Computation in Neural Systems,” and DFG MA5127/1-1 to Marianne Maertens). Felix Wichmann was funded, in part, by the German Federal Ministry of Education and Research (BMBF) through the Bernstein Computational Neuroscience Program Tübingen (FKZ: 01GQ1002). We would like to thank Viljami Salmela for providing the Matlab code used to create the narrowband noise stimuli as well as psychophysical data, Alan Robinson for the code of his Matlab implementations of the ODOG and FLODOG models, and Xavier Otazu for his implementation of the BIWAM.

1Lightness is the perceived reflectance of a surface, while brightness is its perceived luminance. In the stimuli considered here, the two are not separable, since there is no information about illumination, or surfaces for that matter. We opt to use the term lightness throughout because we believe that reflectance (i.e., achromatic color black, white, gray) is a more accessible perceptual category than luminance, and thus should be the ultimate target of lightness/brightness models. When citing the work of others, we use the term brightness if that term is used in their work.

Footnotes

2Sharp edges cause a response in a wide range of frequency-selective filters. The defining feature of an edge is not that it is localized in frequency, but that the phases of all frequency components are aligned.

An example of White's illusion. To most observers, the gray patch on the dark bar looks lighter than the gray patch on the light bar, even though the two are equiluminant. The illusion cannot simply be explained in terms of contrast, because both test patches share an equal amount of border with dark and light regions.

Figure 1

An example of White's illusion. To most observers, the gray patch on the dark bar looks lighter than the gray patch on the light bar, even though the two are equiluminant. The illusion cannot simply be explained in terms of contrast, because both test patches share an equal amount of border with dark and light regions.

Illustration of the effect of narrowband noise on White's illusion. Left: Stimulus is masked with a noise center frequency of 0.58 cpd, Middle: 3 cpd, Right: 9 cpd (assuming a viewing distance of 40 cm). White's effect should be reduced or absent in the middle panel.

Figure 2

Illustration of the effect of narrowband noise on White's illusion. Left: Stimulus is masked with a noise center frequency of 0.58 cpd, Middle: 3 cpd, Right: 9 cpd (assuming a viewing distance of 40 cm). White's effect should be reduced or absent in the middle panel.

Illustration of the ODOG model. The model consists of ODOG filters with seven spatial scales and six different orientations. To predict brightness, it processes images in four steps: First, an input image is convolved with all 42 (6 × 7) filters. Second, the filter outputs of different spatial frequency at the same orientation are summed, and outputs of higher frequencies receive slightly larger weights, indicated by the inset figure above the summation, where each dot corresponds to the weight of one spatial frequency. Third, the different orientation responses are normalized by pointwise division through their RMS energy computed over the entire image. Fourth, the normalized responses are summed to yield the model output. White's illusion in the model response is mainly caused by a higher weight given to the filter oriented orthogonal to the grating (fourth row), because its response has little energy. Since in this filter response, the test patch on the left has a higher value than the one on the right, it also receives a higher value in the final output.

Figure 3

Illustration of the ODOG model. The model consists of ODOG filters with seven spatial scales and six different orientations. To predict brightness, it processes images in four steps: First, an input image is convolved with all 42 (6 × 7) filters. Second, the filter outputs of different spatial frequency at the same orientation are summed, and outputs of higher frequencies receive slightly larger weights, indicated by the inset figure above the summation, where each dot corresponds to the weight of one spatial frequency. Third, the different orientation responses are normalized by pointwise division through their RMS energy computed over the entire image. Fourth, the normalized responses are summed to yield the model output. White's illusion in the model response is mainly caused by a higher weight given to the filter oriented orthogonal to the grating (fourth row), because its response has little energy. Since in this filter response, the test patch on the left has a higher value than the one on the right, it also receives a higher value in the final output.

Illustration of the screen during matching. The observer adjusted the comparison square on the left to match the lightness of the test patch in the grating. The gray background was actually larger and has been cropped for this illustration. The contrast of the grating has been increased for better visibility in this illustration.

Figure 4

Illustration of the screen during matching. The observer adjusted the comparison square on the left to match the lightness of the test patch in the grating. The gray background was actually larger and has been cropped for this illustration. The contrast of the grating has been increased for better visibility in this illustration.

Noise-masking results for one typical observer. x-axis indicates the center spatial frequency of the noise mask, in cycles per degree. 0.06 cpd would correspond to 1 cycle per image (cpi) in these stimuli. y-axis indicates illusion strength, measured as the difference in matched lightness between test patches placed on different background bars. The grating frequency is indicated by the star on the x-axis. Large circles are means across trials, small circles are individual trials. The results in the no-noise control are shown on the very left. Light gray circles are trials where the test patch was an increment with respect to the bar on which it was placed; dark gray circles are trials where the test patch was a decrement. These circles are plotted only to give an impression of the variance of the data in the no-noise condition. The dotted line indicates the overall mean illusion strength of the no-noise condition. The light gray line is the best fit to the data, and the location of the minimum of the fit is indicated by the light gray text inset.

Figure 5

Noise-masking results for one typical observer. x-axis indicates the center spatial frequency of the noise mask, in cycles per degree. 0.06 cpd would correspond to 1 cycle per image (cpi) in these stimuli. y-axis indicates illusion strength, measured as the difference in matched lightness between test patches placed on different background bars. The grating frequency is indicated by the star on the x-axis. Large circles are means across trials, small circles are individual trials. The results in the no-noise control are shown on the very left. Light gray circles are trials where the test patch was an increment with respect to the bar on which it was placed; dark gray circles are trials where the test patch was a decrement. These circles are plotted only to give an impression of the variance of the data in the no-noise condition. The dotted line indicates the overall mean illusion strength of the no-noise condition. The light gray line is the best fit to the data, and the location of the minimum of the fit is indicated by the light gray text inset.

Summary of the effect of narrowband noise on White's illusion at different grating frequencies. The x-axis indicates the spatial frequency of the grating, the y-axis the frequency at which the noise had the largest effect on illusion strength. Results from the psychophysical experiment are shown in blue (individual observers in light blue, mean across observers in dark). The results for the two models that also predict noise in a specific frequency range to be most effective are shown in red and green. Both models predict much lower noise frequencies to be most effective, and both predict the increase in most effective noise frequency with increasing grating frequency to be steeper than observed psychophysically.

Figure 6

Summary of the effect of narrowband noise on White's illusion at different grating frequencies. The x-axis indicates the spatial frequency of the grating, the y-axis the frequency at which the noise had the largest effect on illusion strength. Results from the psychophysical experiment are shown in blue (individual observers in light blue, mean across observers in dark). The results for the two models that also predict noise in a specific frequency range to be most effective are shown in red and green. Both models predict much lower noise frequencies to be most effective, and both predict the increase in most effective noise frequency with increasing grating frequency to be steeper than observed psychophysically.

Noise masking results for the ODOG model, analogous to Figure 5. x-axis indicates the center spatial frequency of the noise mask, y-axis indicates illusion strength, measured as the difference in model output between test patches placed on different background bars, but in the same noise environment. The grating frequency is indicated by the star on the x-axis. Large circles are means across trials; small circles are individual trials. For most models and noise frequencies, the effect of different noise masks is so small that individual trial data cluster together and form a line, or are hidden by the mean. The dotted horizontal line indicates the overall mean illusion strength of the no-noise condition. No single trial data are plotted for the no-noise condition, since the model response does not vary across trials without noise.

Figure 7

Noise masking results for the ODOG model, analogous to Figure 5. x-axis indicates the center spatial frequency of the noise mask, y-axis indicates illusion strength, measured as the difference in model output between test patches placed on different background bars, but in the same noise environment. The grating frequency is indicated by the star on the x-axis. Large circles are means across trials; small circles are individual trials. For most models and noise frequencies, the effect of different noise masks is so small that individual trial data cluster together and form a line, or are hidden by the mean. The dotted horizontal line indicates the overall mean illusion strength of the no-noise condition. No single trial data are plotted for the no-noise condition, since the model response does not vary across trials without noise.

Noise-masking results for the FLODOG model. Same conventions as in Figure 7. The light gray line is the best fit to the data, and the location of the minimum of the fit is indicated by the light gray text inset.

Figure 8

Noise-masking results for the FLODOG model. Same conventions as in Figure 7. The light gray line is the best fit to the data, and the location of the minimum of the fit is indicated by the light gray text inset.

Illustration of the effect of low-frequency noise on White's illusion. The leftmost stimulus is masked with a noise center frequency of 0.11 cpd, the central stimulus with 0.19 cpd, the rightmost stimulus with 0.33 cpd (assuming a viewing distance of 40 cm). None of these masks appear to cause a large reduction in illusion strength.

Figure 10

Illustration of the effect of low-frequency noise on White's illusion. The leftmost stimulus is masked with a noise center frequency of 0.11 cpd, the central stimulus with 0.19 cpd, the rightmost stimulus with 0.33 cpd (assuming a viewing distance of 40 cm). None of these masks appear to cause a large reduction in illusion strength.

Noise-masking results for the BIWAM. Same conventions as in Figure 7. The light gray line is the best fit to the data, and the location of the minimum of the fit is indicated by the light gray text inset.

Figure 11

Noise-masking results for the BIWAM. Same conventions as in Figure 7. The light gray line is the best fit to the data, and the location of the minimum of the fit is indicated by the light gray text inset.

COBC-type illusion (after Dakin & Bex, 2003) demonstrating the effect of shuffling the phase information in different frequency bands. (A) Unshuffled stimulus. The hair and cap look darker then the face, even though luminance differences are only present at the border between the regions. (B) Shuffling the phases below 30 cpi destroys the illusion. (C) Shuffling phases above 30 cpi preserves the illusion. (D) Shuffling phases below 6 cpi also preserves the illusion, in fact the shuffling is hardly noticeable, since there is by design very little energy in these low frequencies in the COBC-type image. This shows that the effect depends on some frequency band below 30 cpi and above 6 cpi, not simply on all low frequencies. Note that at a viewing distance of 40 cm, these images subtend around 6° visual angle, so the critical band is between 1 and 5 cpd. (A–C) recreated after Dakin and Bex (2003), but using a face image that is in the public domain, as done previously by Geier, J. (2009). A diffusion based computational model and computer simulation for the lightness illusions. Perception ECVP Abstract, 38, 95.

Figure 12

COBC-type illusion (after Dakin & Bex, 2003) demonstrating the effect of shuffling the phase information in different frequency bands. (A) Unshuffled stimulus. The hair and cap look darker then the face, even though luminance differences are only present at the border between the regions. (B) Shuffling the phases below 30 cpi destroys the illusion. (C) Shuffling phases above 30 cpi preserves the illusion. (D) Shuffling phases below 6 cpi also preserves the illusion, in fact the shuffling is hardly noticeable, since there is by design very little energy in these low frequencies in the COBC-type image. This shows that the effect depends on some frequency band below 30 cpi and above 6 cpi, not simply on all low frequencies. Note that at a viewing distance of 40 cm, these images subtend around 6° visual angle, so the critical band is between 1 and 5 cpd. (A–C) recreated after Dakin and Bex (2003), but using a face image that is in the public domain, as done previously by Geier, J. (2009). A diffusion based computational model and computer simulation for the lightness illusions. Perception ECVP Abstract, 38, 95.