Research has shown that the processing time for discriminating illusory contours is longer than for real contours. We know, however, little whether the visual processes, associated with detecting regions of illusory surfaces, are also slower as those responsible for detecting luminance-defined images. Using a speed–accuracy trade-off (SAT) procedure, we measured accuracy as a function of processing time for detecting illusory Kanizsa-type and luminance-defined squares embedded in 2D static luminance noise. The data revealed that the illusory images were detected at slower processing speed than the real images, while the points in time, when accuracy departed from chance, were not significantly different for both stimuli. The classification images for detecting illusory and real squares showed that observers employed similar detection strategies using surface regions of the real and illusory squares. The lack of significant differences between the x-intercepts of the SAT functions for illusory and luminance-modulated stimuli suggests that the detection of surface regions of both images could be based on activation of a single mechanism (the dorsal magnocellular visual pathway). The slower speed for detecting illusory images as compared to luminance-defined images could be attributed to slower processes of filling-in of regions of illusory images within the dorsal pathway.

Introduction

Segmentation of visual objects from background is usually determined by physical differences in luminance, color, texture, etc. However, visual information is often incomplete; the contrast between the object and background could be below the threshold at some locations or the object could be occluded in some regions of the image. Visual system has to bind the fragments of the object into a single percept and therefore should rely on the brain's reconstruction of features absent in retinal images.

Visual illusory objects, which are perceived in the absence of local stimulus borders (Kanizsa, 1979), have been extensively used to explore the mechanisms of perceptual grouping. Illusory objects are usually created by circular disks (inducers) presented on uniform background whose wedges are cut out and oriented in such a way that the extensions of the wedges form the illusory figure (Kanizsa, 1979). Despite that the regions of the illusory figure and the background have the same luminance, observers are able to perceive a region forming a surface of a distinct apparent brightness that is bounded by apparent contours.

What is the neural representation of such illusory images? Initially, completion of illusory contours was interpreted by cognitive theories as an attempt to find the most probable solution to a perceptual problem (Gregory, 1972). Computational models (Grossberg & Mingolla, 1985; Hess & Field, 1999; Spillmann & Dresp, 1995) and neuroimaging studies (for review, see Seghier & Vuilleumier, 2006) have proposed that distinct global attributes of an illusory object are processed by separate mechanisms. A fast local low-level mechanism including cells in striate and extrastriate visual areas (Bakin, Nakayama, & Gilbert, 2000; Lee & Nguyen, 2001; Nieder & Wagner, 1999; Peterhans & von der Heydt, 1989; Ramsden, Hung, & Roe, 2001; Redies, Crook, & Creutzfeldt, 1986; von der Heydt, Peterhans, & Baumgartner, 1984) is responsible for initial encoding of illusory contours. Intracranial recordings from cells in monkeys showed that Kanizsa-type illusory contours activated first cells in V2 (70–95 ms after stimulus onset) followed by a later response in V1 at 100–190 ms (Lee & Nguyen, 2001). The responses of V1 cells to contours of real (bright, gray, and outline) squares appeared earlier (45 ms) as compared to the latency of the responses to the illusory contours (100 ms) induced by the Kanizsa square. These findings suggested that the contour completion in V1 might be due to feedback modulations from V2 cells.

A global mechanism, which is located in the lateral occipital complex (LOC) including higher areas such as V3A, V4v, V7, and V8, has been also proposed to play a role in the processing of illusory images. Mendola, Dale, Fischl, Liu, and Tootell (1999) found that the population fMRI signals in the LOC were greater for illusory Kanizsa stimuli than for luminance-defined images, while luminance-defined images activated stronger visual V1 and V2 areas as compared to illusory images. Combined current density mapping, source analysis, and fMRI results showed that the illusory image effect (“IC effect”), estimated by the difference between visual evoked potentials (VEPs) elicited by stimuli producing perception of illusory figures and those that did not form illusory contours, occurred in the range of 88–102 ms after the stimulus onset (Murray, Wylie et al., 2002). It was proposed that object recognition processes operate from coarse to fine scale in three stages: (1) dorsal stream regions create a coarse global representation of object space; (2) followed by processing of illusory contours (“IC effect”) in LOC areas of the ventral stream; (3) lower visual areas (e.g., V2 and V1) establish representation of spatially precise and complete contours via feedback modulations from higher visual areas. Having in mind the latencies of the illusory contour responses in monkeys, reported by Lee and Nguyen (2001), and the timing of activation across cortical regions in awake monkeys (Schroeder, Mehta, & Givre, 1998), Murray, Wylie et al. (2002) suggested that Lee and Nguyen (2001)'s results, discussed before, may reflect feedback response modulations from higher visual areas, rather than feedforward activation of V1 and V2.

Is there a difference between the perceptual dynamics of real and illusory image processing? Using backward masking of Kanizsa illusory figures, Reynolds (1981) found that at short stimulus onset asynchronies (SOAs of 50 ms) subjects reported seeing only distinct inducers of the illusory figure, followed by seeing the illusory figure (SOAs of 75 ms) while the detailed figure shape (curved or straight edged) can be discriminated at longer SOAs (>100 ms). Ringach and Shapley (1996) studied the time course of illusory contour processing by using a shape discrimination task of Kanizsa-type figures produced by inducers that were rotated to form fat (bulged outward) or thin (tapered inward) illusory shapes. The results showed that the performance for discriminating fat and thin illusory contours was reduced by a mask containing local orientation information when flashed at SOAs of less than 117 ms. A second Kanizsa-type mask interfered with task performance at longer stimulus onset asynchronies (140–200 ms). The authors suggested that illusory contour processing involved two stages: detection of local boundary segments followed by integration of global illusory contours. Using the same paradigm, Imber, Shapley, and Rubin (2005) found that late masking effects of illusory contours could be observed even with illusory maskers that did not overlap spatially with the target illusory contours. In contrast, real luminance-defined contours were not effective as late-stage masking stimuli. These results led to the suggestion that late-stage masking may occur at visual cortical stages that are involved in shape categorization of illusory surfaces bounded by illusory contours.

Gold, Murray, Bennett, and Sekuler (2000) found direct evidence that observers use perceptually interpolated contours to discriminate the shape of fat/thin illusory figures embedded in static luminance noise. The derived classification images showed that the luminance noise around interpolated contours correlated with the observer responses. Using the same technique with dynamic noise, Gold and Shubel (2006) showed that observers were more gradually influenced at locations corresponding to illusory contours during the first 175 ms of stimulus presentation as compared to those corresponding to real contours (thin black lines connecting the inducers). Similar findings were reported by Keane, Lu, and Kellman (2007), who estimated classification images for moving fat and thin illusory noise-corrupted figures.

These findings were in line with the results of a study using VEPs and electric neuroimaging analysis (Murray, Imber, Javitt, & Foxe, 2006). Illusory contour sensitivity (“IC effect”), associated with automatic boundary completion, was observed within a period of 124–186 ms after the stimulus onset, while the electrophysiological correlates of shape discrimination of thin/fat Kanizsa-type illusory squares were recorded at 200 ms later (330–406 ms). The electrophysiological correlates of shape discrimination of real contours, however, occurred earlier (154–192 ms) as compared to those for illusory contours.

Shape discrimination task requires information mainly about the perceived contours of illusory figures. The illusory contours produced by Kanizsa-type patterns, however, bound a surface whose apparent brightness differs from the background brightness. Two different mechanisms have been suggested to underlie the surface-based and contour-based visual processes. One line of evidence comes from experiments investigating visual performance in visual search of Kanizsa-type illusory figures, embedded in an array of distracters whose inducers were rotated so as not to induce a complete figure. It was found that the parallel detection of the Kanizsa-type illusory figure, reported by Davis and Driver (1994), did not depend on the presence of illusory contours (Gurnsey, Poirier, & Gascon, 1996). Conci, Müller, and Elliott (2007) showed that the “pop-out” effects could be attributed to the presence of global surface but not to the perceived illusory contours.

Another line of evidence involves studies on processing of “salient regions” made by inducers whose shape is rounded to produce impression of enclosed regions without sharp contours. Stanley and Rubin (2003) found that stimuli, producing perception of salient regions, and Kanizsa-type illusory figures elevated significantly fMRI responses in LOC as compared to the responses to control stimuli, made by inducers that eliminated the perception of an occluding region. They proposed that the LOC areas may perform fast but crude region-based segmentation of visual images. This proposition was challenged by Shpaner, Murray, and Foxe (2009) who recorded VEPs having higher temporal resolution than fMRI recordings used by Stanley and Rubin (2003). The results showed that during the post-stimulus period of early sensory processing (154–203 ms), LOC areas were significantly more sensitive to illusory contours than to stimuli forming salient regions. They proposed that the processes of crude spatial segmentation may take place in the dorsal magnocellular pathway preceding the activation of the LOC areas.

The mechanisms processing surface and contours of illusory figures were also investigated by measuring performance in a simultaneous detection–discrimination experiment (Barlasov-Ioffe & Hochstein, 2008). Subjects were presented with four backward masked inducers producing Kanizsa-type illusory parallelogram, triangle, or an image that did not induce an illusory figure. They reported whether a figure was presented (detection) and which figure they perceived (discrimination). Percent correct responses for figure detection and shape discrimination increased as a function of stimulus-to-mask onset asynchrony. A group of naive subjects showed better performance for figure detection compared to that for shape discrimination. Other subjects with an intermediate level of experience were able to discriminate the figure shape without detecting the figure at longer SOAs. A third group of subjects (“experts”) showed nearly identical detection and discrimination performance at all SOAs. The authors attributed these differences to different perceptual abilities of the subjects as well as to differences in their decision strategies. The different individually determined detection–discrimination relationships were considered as suggesting that the detection of illusory figures and the discrimination of their shape can be regarded as two distinct processes.

We know little whether surface-based visual processes elicited by illusory images are also slower than those associated with the perception of real surfaces. Pegna, Khateb, Murray, Landis, and Michel (2002) measured reaction times (RTs) for identifying the type of triangle figures: a Kanizsa-type illusory shape, a triangle made by thin lines connecting the inducers, and an image with no contours produced by outwardly rotated inducers. This experimental paradigm did not require shape discrimination, rather it could be regarded as a surface-based task. RTs for the outline figure were significantly faster than those for figures with illusory and no contours. The data for figures with illusory and no contours were not significantly different. These data corresponded to differences in the global field power (GFP) of the VEPs elicited by the stimuli: the GFP for outline figures had smaller amplitude and peaked about 10 ms earlier than the GFP evoked by illusory triangles. The difference in the processing dynamics of real and illusory images, however, may be due to the stronger perceptual saliency of the figure with real contours compared to the illusory image (Pegna et al., 2002). It should be noted that the RT data are of a limited value in measuring processing dynamics because RTs can vary with either differences in detectability, differences in processing speed, or a mixture of the two effects (McElree & Carrasco, 1999).

The present study was aimed at investigating the processing dynamics of perceived surfaces of Kanizsa-type illusory and luminance-modulated images. To this end, we used a detection task that is likely to be based on information about the perceived image surface, in contrast to shape discrimination tasks which use information mainly for the perceived image contours. As we noted before, classification images for discriminating fat/thin shapes of illusory images revealed that observers used information about the perceived contours of the illusory figures (Gold et al., 2000; Gold & Shubel, 2006; Keane et al., 2007). Generally speaking, classification images may not reflect all elements of the visual representation of a pattern, rather they are assumed to reveal the strategy or the internal “template” used by the observers to match the visual representation of a figure in order to recognize the figure. It should be noted that the use of information for figure contours in a shape discrimination task does not necessarily require perception of a surface whose apparent brightness differs from the background brightness. Murray (2002) measured classification images for discriminating fat/thin illusory contours produced by Kanizsa squares with two white and two black inducers that did not induce global brightness differences between the illusory surface and the background (Matthews & Welch, 1997). The derived classification images revealed that observers judged the shape of the perceptually grouped figures by using contours of perceived squares. Additionally, Murray (2002) found similar classification images for L-squares, defined by four inward-facing Ls, which were perceptually grouped as a square without being perceived as a filled-in surface. In order to test the suggestion that a detection task is based on information for the image surface, we estimated classification images for real and illusory images embedded in luminance noise.

Processing dynamics was evaluated by a speed–accuracy trade-off (SAT) procedure (Dosher, 1976, 1979; McElree, 1993; McElree & Carrasco, 1999; Reed, 1973). The SAT procedure estimates the time-course functions that measure the growth of accuracy over processing time. The SAT functions contain a period of chance performance, followed by a period in which accuracy grows and finally a period of asymptotic performance where accuracy does not grow as processing time increases. Thus, this procedure allows conjoint measures of detectability and processing dynamics.

Previous studies of shape discrimination of illusory and real figures using backward masking (Imber et al., 2005) noted that the most efficient masking between luminance-defined figures occurred at SOAs less than 100 ms, while the masking effects between illusory figures arose at longer SOAs of 280–350 ms. Imber et al. (2005) suggested that the late masking might be due to interference at the level of the illusory surface representation. These findings lead to the prediction that if the perception of illusory surfaces is slower than the processing of real surfaces, then the SAT functions for illusory images should be shifted to longer processing times as compared to those for real images. Slower speed of accuracy growing to the asymptotic levels may also be present for illusory images compared to real images reflecting the speed of illusory image completion.

Methods

Stimuli

The stimuli were generated by a PC and presented on a RGB monitor with refresh frequency rate of 75 Hz and spatial resolution of 800 × 600 pixels. The stimulation field had a mean luminance of 30 cd/m2 and size of 30 × 23 deg. A custom video summation device (Pelli & Zhang, 1991) was used to produce 256 gray levels with a 12-bit precision. The luminance response of the display was measured by an OptiCal photometer (Cambridge Research System) interfaced to the PC. The monitor luminance was linearized using the inverse function of the non-linear luminance response when computing the stimulus images. Stimuli were viewed binocularly at a viewing distance of 60 cm. Participants were in a darkened room where the only source of light was the computer display. Custom software written in Pascal for MSDOS was used to generate the stimuli and control the experiment.

Two types of target stimuli were used in the present experiment: (1) a luminance decremental (real) square surrounded by an incremental frame of a diamond shape; the centers of both figures coincided and the square corners were positioned on the frame (Figure 1a), (2) an illusory square (Figure 1b) created by inward openings within the incremental frame, which were made by setting the real square contrast to zero (Kanizsa, 1979). The non-target stimulus (Figure 1c) contained outward openings within the white frame so observers did not perceive an illusory square. The inducing frame subtended approximately 3.5 deg of visual angle and had a contrast (C) of 4% defined as

Illusory and real stimuli used in the present experiments. (a) A real square defined by homogeneous luminance decremental square. (b) An illusory square created by occluded parts of an incremental frame. (c) A non-target stimulus, which does not produce perception of an illusory square due to small changes in the inducing frame. All stimuli were embedded in 2D static luminance Gaussian noise samples.

Figure 1

Illusory and real stimuli used in the present experiments. (a) A real square defined by homogeneous luminance decremental square. (b) An illusory square created by occluded parts of an incremental frame. (c) A non-target stimulus, which does not produce perception of an illusory square due to small changes in the inducing frame. All stimuli were embedded in 2D static luminance Gaussian noise samples.

Stimulus duration was 106.7 ms. The support ratio of the illusory square [the ratio between the length of the side of the illusory square that is surrounded by the frame and the total length of the illusory square side (Ringach & Shapley, 1996)] was 0.5. The stimuli were embedded in 2D static luminance Gaussian noise samples (noise pixel of 2.15 min of arc), which allow modifying the visibility of the illusory and real contours. The Gaussian noise distribution was truncated to ±2.5 standard deviations (SDs) from the background.

Procedure

SAT experiments

In Yes/No experiments, observers were presented randomly with a near-threshold target (an illusory square or a real decremental square) and a non-target, which appeared with equal probability in each trial. Both target and non-target stimuli were embedded in different Gaussian noise samples. The observers' task was to indicate the target presence or absence by pressing an appropriate button using the index or middle finger of their dominant hand. Observers were not informed whether the target was an illusory or a real square. They were instructed to respond within 300 ms after a response cue (a brief tone). If observers failed to respond within the time allowed, a feedback (a small circle) appeared at the location of the permanent fixation point at the screen center. The omitted trial was intermixed with the remaining trials and repeated later.

Participants performed 5 training sessions of 1-h duration. During the first training session, they were familiarized with an illusory square created by a frame of high contrast (20%), low noise SD (5%), and long stimulus duration (1000 ms). After that, the frame contrast was lowered to 4% and the stimulus duration was shortened to 106.7 ms. Both illusory and real squares were used during the training sessions. Observers were trained to respond within 300 ms after the response cue. Noise SD was varied to keep detectability index below 2 at the longest response cue lag. After the training sessions, each observer participated in 10 sessions, the results of which were used in the data analysis. At least 100 trials for each experimental condition were collected at each response lag for each participant.

In Experiment 1, stimuli containing an illusory square were embedded in visual noise of 3 different noise SDs. In Experiment 2, noise SD was fixed and real decremental squares of 3 contrast levels (−0.3, −0.5, and −0.7%) were presented. Trials with illusory and real squares were presented in separated blocks. The response cue was presented at 6 different lags between 120 and 907 ms after the stimulus onset. The response lag was randomly varied across trials. In both experiments, detectability index (d′) was measured as a function of processing time (lag plus mean response latency).

Classification image experiment

The experimental conditions were similar to those used in Experiments 1 and 2. The first condition contained trials with a randomly presented illusory target or a non-target; the second condition included trials with a randomly presented real target or a non-target. Observers indicated which stimulus was presented in the absence of time constraints (response cues were not used). In each experimental condition, 10,000 trials were presented in ten sessions. The noise samples had SD of 14% (KR) and 12% (MSM); the contrast of the real squares was −0.3%.

Reaction time experiment

Reaction times were measured in the same conditions as in Experiments 1 and 2. A response cue, however, was not presented and the subjects were instructed to press as soon as possible an appropriate button when they had detected the target presence or absence. Two of the observers (IH and JGI) took part in the experiment.

Data analysis

SAT experiments

The time course of accuracy (expressed in d′ units) for each experimental condition was computed using the z score for hit rates for target-present trials and the z score for false-alarm rates for target-absent trials at each response lag. The empirical SAT functions were fit with an exponential function (Dosher, 1976, 1979; McElree, 1993; McElree & Carrasco, 1999):

d′(t)=λ(1−e−(t−δ)β),f⁢o⁢rt>δ,e⁢l⁢s⁢e0,

(2)

where λ is the asymptotic parameter corresponding to detectability at maximum processing time, δ is the x-intercept parameter reflecting the discrete point in time when d′ = 0 (e.g., sensory encoding, transmission, and motor response delays), and β is the rate parameter indexing the speed with which detectability grows from chance to asymptote.

SAT data were fit with sets of models that ranged from a null model in which all functions were fit with a single asymptote (λ), x-intercept (δ), and rate (β) to a fully saturated model in which each function was fit with a unique set of parameters (3λ–3β–3δ).

The quality of fit was assessed by the corrected Akaike's Information Criterion (AICc), which was calculated using the following equation (Burnham & Anderson, 2002, pp. 60–85):

A⁢ICc=n⁢ln(∑i=1n(αi−αie⁢s⁢t)2n)+2⁢K+2⁢K(K+1)n−K−1,

(3)

where αi are the data values, αiest are the model calculations, n is the number of data points, and K is the number of free parameters plus one.

The AICc approach is based on information theory and does not use the traditional “hypothesis testing” statistical paradigm, rather it determines how well the data supports each model. The model with the smallest AICc value is most likely to be correct. If Aa and Ab are the AICc values for models a and b, respectively, and Aa < Ab (Δ = Aa − Ab > 0), Akaike's weight:

A⁢k⁢a⁢i⁢k⁢e⁢sw⁢e⁢i⁢g⁢h⁢t=e−0.5⁢Δ1+e−0.5⁢Δ,

(4)

shows the probability that model a is correct.

The evidence ratio defined as follows:

E⁢v⁢i⁢d⁢e⁢n⁢c⁢er⁢a⁢t⁢i⁢o=1e−0.5⁢Δ,

(5)

shows how many times more likely model a is compared to model b.

Additional criterion concerning the choice of the fit was the consistency of the parameter estimates across the observers (McElree & Carrasco, 1999).

Classification image experiment

Classification images were calculated by classifying, averaging, and combining the noise samples according to the stimuli (S1—target; S0—no target) and responses (R1—target; R0—no target) using the following equation (Ahumada, 2002; Ahumada & Lovell, 1971):

Classification images for an ideal observer were calculated using a hypothetical observer that uses a statistically optimal decision rule (Green & Swets, 1966). To this end, each noisy stimulus was cross-correlated with the luminance profile of the target and non-target. The ideal observer decided if the target or non-target has been presented by calculating whether the difference between the cross-correlations of the stimulus with the target and non-target was higher or lower than a fixed criterion level, respectively. Using a simulation of 10,000 trials for each experimental condition (illusory and real squares), classification images of the ideal observer were calculated by means of Equation 6.

where kij represents the kernel elements (i, j), m is the kernel size, n00, n01, n10, and n11 denote the number of trials in each stimulus-response category, and σN is the standard deviation of the external noise.

The probability of each pixel of the classification images to be different from zero was estimated by using the ratio between the pixel value and standard error (t-value) and the probability density function for the t-distribution. In order to correct for multiple comparisons, we employed an adaptive procedure for controlling the false discovery rate (Benjamini, Krieger, & Yekutieli, 2006) using a Matlab function written by David Groppe (http://www.mathworks.com/matlabcentral/fileexchange/27423-two-stage-benjamini-krieger-yekutieli-fdr-procedure). This procedure is a less conservative and more powerful method for correcting for multiple comparisons than the Bonferroni procedure.

Participants

Four right-handed volunteers (Oldfield, 1971) took part in the experiments. Observers (3 females and 1 male) were 27–40 years old and had normal visual acuity. All subjects, except MSM (who is also an author), were not familiar with the aim of the study.

Results

Experiment 1: SAT functions for detecting illusory squares

Figure 2 depicts the empirical SAT data for illusory squares embedded in three noise levels, which were individually selected for each subject in order to produce near-threshold target detection. Asymptotic performance increased as noise SD decreased. The average d′ values of the empirical asymptotic accuracy [the average accuracy at the two longest lag (McElree & Carrasco, 1999)] were 1.99, 1.29, and 0.98 for low, medium, and high noise levels, respectively. The Shapiro–Wilks test found that the data did not show significant departure from normality. This allowed using ANOVA, which showed a main effect of noise SD on asymptotic accuracy (ANOVA, F(2,9) = 26.9, p < 0.001). For each subject, we evaluated the significance of the differences in d′ measured in two experimental conditions using binomial statistics (Macmillan & Creelman, 2005). Pair-wise comparisons showed that the asymptotic accuracies for subjects IH, JGI, and MSM at medium and low noise levels were significantly (p < 0.05, Bonferroni correction) higher than those at high noise level, while the asymptotic accuracy differences for subject KR were not significant.

Accuracy as a function of processing time (lag of the response cue plus response latency) for illusory squares presented at different noise SDs as shown in insets (open markers). Curves show the best fits of Equation 2 with the 3λ–1β–1δ model. Filled markers show RTs for correct detecting of illusory stimuli in uncued conditions (abscissa = median RTs; ordinate = the accuracy level associated with median RTs). Data of all observers are shown.

Figure 2

Accuracy as a function of processing time (lag of the response cue plus response latency) for illusory squares presented at different noise SDs as shown in insets (open markers). Curves show the best fits of Equation 2 with the 3λ–1β–1δ model. Filled markers show RTs for correct detecting of illusory stimuli in uncued conditions (abscissa = median RTs; ordinate = the accuracy level associated with median RTs). Data of all observers are shown.

The filled markers in Figure 2 (subjects IH and JGI) show the median RTs for correct detection of the illusory square in an uncued condition and the corresponding values of d′. The Shapiro–Wilks test showed that the RTs in 5 out of 6 data sets did not have normal distributions. Therefore, we used a non-parametric method for testing equality of population medians among conditions. The Kruskal–Wallis ANOVA did not find a significant effect of the noise SD on the RTs for subject IH [median value and median absolute deviation: 374 ± 29.9 ms (low SD), 372 ± 32.1 ms (medium SD), 376 ± 22.0 ms (high SD)] and JGI [367.5 ± 29.5 ms (low SD), 374 ± 25.8 ms (medium SD), 387 ± 27.9 ms (high SD)].

SAT data were fit with sets of nested models as described in the Methods section. The quality of fit of the tested nested models was compared using Akaike's method (Table 1). It was found that for all observers the 3λ–1β–1δ model produced the smallest value of the AICc: −38.2 (IH), −37.0 (KR), −49.2 (JGI), and −51.1 (MSM). The differences between the AICc values yielded by the other models and those of the 3λ–1β–1δ model were in the range of 5.27–39.1 (evidence ratio > 14; Akaike's weight > 93.3%). According to Akaike's method, these findings indicate that the 3λ–1β–1δ model would be at least 14 times more likely of being correct than the other models. Additionally, the λ values estimated by means of the 3λ–1β–1δ model were identically ordered for all observers.

Best-fitting values of the parameters of sets of models (Equation 2), which were used to fit the empirical data of illusory squares presented in Figure 2. Model comparison is based on the corrected Akaike's Information Criterion (AICc). AW—Akaike's weight, ER—evidence ratio.

Table 1

Best-fitting values of the parameters of sets of models (Equation 2), which were used to fit the empirical data of illusory squares presented in Figure 2. Model comparison is based on the corrected Akaike's Information Criterion (AICc). AW—Akaike's weight, ER—evidence ratio.

Subject

Model (λ–β–δ)

1–1–1

3–1–1

3–1–3

3–3–1

3–3–3

IH

AICc

−17.6

−38.2

−28.5

−27.3

−10.1

ΔAICc

20.6

9.76

10.9

28.1

AW (%)

>99.9

99.2

99.6

>99.9

ER

3 × 104

132

234

108

JGI

AICc

−18.2

−49.2

−39.6

−39.8

−22.8

ΔAICc

31.03

9.65

9.49

26.5

AW (%)

>99.9

99.2

99.1

>99.9

ER

6 × 106

124

115

6 × 105

KR

AICc

−31.7

−37

−27.6

−28.5

−9.57

ΔAICc

5.27

9.42

8.52

27.5

AW (%)

93.3

99.1

98.6

>99.9

ER

14

111

71

9 × 105

MSM

AICc

−12.

−51.1

−39.7

−41.2

−23

ΔAICc

39.1

11.4

9.88

28.2

AW (%)

>99.9

99.7

99.3

>99.9

ER

3 × 108

300

140

1.3 × 106

Experiment 2: SAT functions for detecting luminance-defined squares

In this experiment, noise SD was selected to produce asymptotic accuracy of about 1 d′ units for detecting the illusory target for each observer: 10% (JGI), 12% (IH, MSM), and 14% (KR). Real decremental squares of three contrast levels (−0.3, −0.5, and −0.7%) were embedded in luminance noise and surrounded by an incremental frame (Figure 1a).

We found that the asymptotic accuracy increased as real target contrast increased (Figure 3). The average d′ values of the empirical asymptotic accuracy were 1.56, 2, and 2.17 for target contrasts of −0.3, −0.5, and −0.7%, respectively. The results of the Shapiro–Wilks test showed that the data did not significantly depart from normality. ANOVA found a significant effect of target contrast on the empirical asymptotic accuracy averaged across subjects (F(2,9) = 4.28, p < 0.05). Pair-wise comparisons, using binomial statistics (Macmillan & Creelman, 2005), showed that the asymptotic accuracies of subjects IH and KR at target contrasts of −0.5 and −0.7% were significantly (p < 0.05, Bonferroni correction) higher than those at −0.3% contrast. The asymptotic accuracy data of subjects JGI and MSM did not reach significant differences.

Accuracy as a function of processing time (lag of the response cue plus response latency) for real squares of 3 contrasts as shown in insets (open markers). Filled markers show RTs for correct detecting of real stimuli in uncued conditions (abscissa = median RTs; ordinate = the accuracy level associated with median RTs). Curves show the best fits of Equation 2 with the 3λ–1β–1δ model. Data of all observers are shown.

Figure 3

Accuracy as a function of processing time (lag of the response cue plus response latency) for real squares of 3 contrasts as shown in insets (open markers). Filled markers show RTs for correct detecting of real stimuli in uncued conditions (abscissa = median RTs; ordinate = the accuracy level associated with median RTs). Curves show the best fits of Equation 2 with the 3λ–1β–1δ model. Data of all observers are shown.

Results from Akaike's analysis (Table 2) showed that the 3λ–1β–1δ model produced the smallest values of the AICc for all observers: −35.2 (IH), −49.7 (JGI), −41.3 (KR), and −49 (MSM). The differences between the AICc values yielded by the other models and those of the 3λ–1β–1δ model were in the range of 6.14–37.4 (evidence ratio > 22; Akaike's weight > 95.6%). These findings indicate that the 3λ–1β–1δ model would be at least 22 times more likely of being correct than the other models. The λ values estimated by the 3λ–1β–1δ model were identically ordered for all observers.

Best-fitting values of the parameters of sets of models (Equation 2), which were used to fit the empirical data of real squares presented in Figure 3. The other designations are the same as in Table 1.

Table 2

Best-fitting values of the parameters of sets of models (Equation 2), which were used to fit the empirical data of real squares presented in Figure 3. The other designations are the same as in Table 1.

Subject

Model (λ–β–δ)

1–1–1

3–1–1

3–1–3

3–3–1

3–3–3

IH

AICc

−13.3

−35.2

−23.4

−25.2

−6.26

ΔAICc

21.9

11.8

9.96

28.9

AW (%)

>99.9

99.7

99.3

>99.9

ER

6 × 104

372

145

2 × 106

JGI

AICc

−43.5

−49.7

−37.4

−37.8

−21.9

ΔAICc

6.14

12.3

11.9

27.8

AW (%)

95.6

99.8

99.7

>99.9

ER

21.5

469

377

106

KR

AICc

−15.5

−41.3

−30.3

−31.3

−20.4

ΔAICc

25.8

11.01

10.06

20.9

AW (%)

>99.9

99.6

99.4

>99.9

ER

4 × 105

245

153

3 × 104

MSM

AICc

−11.7

−49

−38.7

−39

−19.6

ΔAICc

37.4

10.27

10

29.4

AW (%)

>99.9

99.4

99.3

>99.9

ER

108

170

148

3 × 106

Comparison of processing dynamics of illusory and real squares

The best-fitting values of the parameters [asymptote (λ), processing speed (β), and x-intercept (δ)] of the models used to fit the SAT functions for real and illusory images are shown in Table 3. The Shapiro–Wilks test found that the estimated speed parameters (β), intercept parameters (δ), and asymptotic parameters (λ) did not significantly depart from normality. The speed parameters for illusory squares were significantly (paired t-test, p < 0.005) slower than those for real squares. The mean value (±SD) of the difference between the speed parameters (in 1/β units) for illusory and real squares was 32 ± 5.5 ms [25.2 (IH), 32.7 (JGI), 38.5 (KR), and 30.1 (MSM) ms]. The intercept parameters for both stimuli were not significantly different: the mean value (±SD) was −2.2 ± 8.2 ms [−10.5 (IH), −1.2 (JGI), 8.7 (KR), and −5.6 (MSM) ms]. Combining the intercept (δ) and the time interval (1/β) within which accuracy grows from chance to a fixed level (63%) of the asymptotic accuracy into a composite measure (McElree & Carrasco, 1999) of processing dynamics for each subject showed that the processing dynamics for illusory squares was significantly (mean and SD: 29.7 ± 13.6, paired t-test, p < 0.05) slower by 14.6 (IH), 31.5 (JGI), 47.3 (KR), and 25.3 (MSM) ms than for real squares.

The asymptotic accuracy increased as the contrast of the real square increased from 0 (illusory square) to −0.7% at a fixed noise SD for each subject [10% (JGI), 12% (IH, MSM), and 14% (KR)]. ANOVA showed a main effect of contrast across subjects (F(3,12) = 9.2, p < 0.005). Post hoc Tukey HSD test found that the asymptotic accuracies, 2.06 and 2.24, averaged across subjects for real squares of −0.5 and −0.7% contrast levels were significantly (p < 0.05 and p < 0.005, respectively) higher than that (1.38) for illusory squares (zero contrast).

In order to compare the processing speed for detecting real squares of various contrast levels including zero contrast (an illusory square) at a fixed noise SD, we analyzed the best-fitted values of the 3λ–3β–3δ model parameters. The mean processing speed (in 1/β units; Figure 4, black circles) decreased as the contrast of the square increased from 0 to −0.7%. ANOVA found a main effect of target contrast level (F(3,12) = 5.2, p < 0.05). Post hoc Tukey HSD test showed that the mean processing speed (104 and 102 ms) for detecting real squares of −0.5 and −0.7% contrast levels were significantly (p < 0.05) faster than that (151 ms) for detecting illusory squares of zero contrast. The mean processing speed for detecting real squares of −0.3% contrast (119 ms) was faster but not significantly different from that for detecting illusory squares of zero contrast. The values of the intercept parameter (δ) did not show a significant main effect of target contrast.

The RTs for detecting illusory and real stimuli in an uncued condition were not significantly different (Kruskal–Wallis test). The error rates (misses) averaged across subjects, however, increased as the stimulus strength (determined by the noise SD for illusory images and stimulus contrast real images) decreased. To compare the error rates on stimulus strength, which were normally distributed, we used one-way ANOVA. The results showed an effect of stimulus strength (F(2,9) = 4.48, p < 0.05). Post hoc Tukey HSD test found that the mean error rate for detecting illusory squares embedded in low noise SD and real squares of −0.7% contrast (13 ± 5%) was significantly (p < 0.05) lower than those of higher difficulty (illusory squares embedded in medium noise SD and real squares of −0.5% contrast (17 ± 10%); illusory squares embedded in high noise SD and real squares of −0.3% contrast (27 ± 14%)).

Experiment 3: Classification images for real and illusory squares

The classification images for detecting a decremental square surrounded by an incremental frame are shown in Figure 5a (upper row). The brightness of each pixel indicates the correlation between noise pixel contrast and observer's judgments that a real square has been presented. The images in the lower row of Figure 5a illustrate the pixels that reached statistical significance (p < 0.05), corrected by false discovery rate (Benjamini et al., 2006) for controlling multiple comparisons, in the corresponding classification images shown in the upper row of Figure 5a. Red pixels are significantly larger than zero; blue pixels are significantly less than zero. The left gray image in Figure 5a represents the classification image for the ideal observer that uses all available information about the stimuli. This classification image shows that the ideal observer uses regions within the area of the real square (blue pixels in Figure 5a, left lower image). The ideal observer uses also a small number of pixels of incremental luminance at the edge of the real square due to differences between the luminance profiles of the frames in the target and non-target stimuli (red pixels in Figure 5a, left lower image). The classification images for observers KR and MSM show that these observers used mainly central and lower regions of the real squares [blue pixels in Figure 5a, middle (KR) and right (MSM) lower images].

The ideal observer's classification image for stimuli producing perception of an illusory square (Figure 5b, left column) had pixels that reached statistical significance (p < 0.001) only within the gaps within the luminance profiles of the frames of the target-present and target-absent stimuli. The blue (red) patches represent areas having lower (higher) luminance in the target-present than in target-absent stimuli. The classification images for illusory squares [Figure 5b, middle (KR) and right (MSM) columns], however, show that the observers' judgments were based on information from regions within the area of the illusory square; 94% (KR) and 98% (MSM) of the pixels that reached statistical significance [blue pixels in Figure 5b, middle (KR) and right (MSM) lower images] were located within the area of the illusory square.

The detectability index for detecting the real target of −0.3% contrast was 1.73 (KR) and 1.67 (MSM) and 1.59 (KR) and 1.45 (MSM) for the illusory target. Using binomial statistics (Macmillan & Creelman, 2005) for each subject, the d′ value for detecting the real image was significantly higher than that for the illusory image: KR, t = 3.54, p < 0.001; MSM, t = 5.39, p < 0.001.

Discussion

In the present study, we used the speed–accuracy trade-off approach to study the processing dynamics of luminance-modulated and illusory Kanizsa-type figures (Kanizsa, 1979) embedded in visual noise. The illusory figure was made by inward openings within an incremental frame of a diamond shape that produced perception of an illusory dark square (Figure 1b). The real target was produced by presenting a decremental square within the area of the illusory square (Figure 1a). The target-absent stimulus (Figure 1c) contained the same frame with gaps in the locations of the square corners, which were outward oriented and did not produce perception of an illusory square. This allowed investigating both illusory and real targets by modulating the luminance of the square: mean background luminance for perceiving an illusory square while the real square was defined by luminance modulation rather than by boundary contours. This experimental paradigm differs from the standard detection paradigm employing discrimination between a luminance-modulated stimulus and a uniform background. It should be noted that the presence of the frame might have some effects on the observers' strategy for detecting the real target. For example, observers could base their judgments for target-present and target-absent stimuli on the shape of the frame gaps in both stimuli. The classification images for detecting real squares (Figure 5a), however, show that observers used regions of the target surface rather than the frame gaps. It should be noted that both observers reported that they based their judgments on the presence or absence of an apparent square. The fixation of a central fixation mark and the short presentation time (100 ms) did not allow moving their eyes toward the frame gaps.

Using a task for detecting a uniform decremental disk of 1.36 deg embedded in annular ring noise, Shimozaki, Eckstein, and Abbey (2005) found that the amplitude of the classification images was maximal at the border (with a peak at the border and a trough at the surround) and decreased rapidly toward the stimulus center. Similar results were reported by Kurki, Peromaa, Hyvarinen, and Saarinen (2009) who used similar stimuli and a brightness matching task. In our classification images for real squares, the clusters of significant pixels were positioned around some of the square borders, but also such clusters were available within the region of the square. These differences could be due to the use of 2D pixel noise in our experiments, while the above studies used annular ring noise. As Shimozaki et al. (2005) noted, the smaller non-local effects relative to the local effects could be due to a strategy of the observers to use information more heavily at the edge of the disk than in the signal area because the number of ring pixels increases with increasing radius.

The ideal observer's classification image (Figure 5b, left images) showed that if the observer's strategy for detecting an illusory square was based on the luminance profiles of the frames in target-present and target-absent stimuli, then only contributions of the frame gaps would determine the observer's judgments. The classification images for detecting illusory squares revealed that both observers used regions of the illusory square whose luminance was equal to the background luminance. Thus, real observers used completely different detection strategy as compared to that of the ideal observer. These results correspond to the reports of the observers that they based their decision on the presence or absence of a darker illusory square. Using a classification image technique, Dakin and Bex (2003) reported similar filling-in effects of Craik–Cornsweet–O'Brien stimuli, which contained low spatial frequency components produced by filtering with a Laplacian-of-Gaussian filter. They suggested that the amplification of low spatial frequency structure of the image produced filling-in of the illusory image. It should be noted that using annular ring noise, Kurki et al. (2009) found that the classification images for brightness matching of a Craik–O'Brien–Cornsweet stimulus contained peak at the border of the surface, which attenuated rapidly toward the center of the stimulus.

The SAT data for real squares of different contrast levels were fit best by an exponential function (Equation 2) whose asymptotes increased as stimulus contrast increased, while the intercept and rate parameters did not change (3λ–1β–1δ model; Table 3). The SAT data for illusory squares were also satisfactorily explained by a 3λ–1β–1δ model. The points in time (x-intercepts) when accuracy departs from chance were not significantly different for illusory and real squares. However, the illusory image was processed at slower speed (32 ms in 1/β units) than the real image. The composite measure of processing dynamics (δ + 1/β) for illusory squares were also significantly slower by 30 ms than that for real squares.

It should be emphasized that the illusory square was produced by inward openings within an incremental frame of a diamond shape without luminance modulations (zero contrast), while the real target contained in addition a luminance decremental square of low contrast levels (−0.3, −0.5, and −0.7%). This put forward the question of whether there is a sharp division between the processing speeds for detecting illusory and real squares at a fixed level of luminance noise SD. Using the best-fitted values of the free parameters, estimated by the 3λ–3β–3δ model (Figure 4), the processing speeds for detecting real targets at −0.5 and −0.7% contrast levels were significantly faster than that for detecting the illusory square (zero contrast). The processing speed for detecting a real target of −0.3% was faster (by 32 ms in 1/β units) than for the illusory square, but this difference was not significant, which can be due to insufficient number of tested subjects and/or insufficient number of experimental trials. Another possibility is that the luminance-modulated square of −0.3% contrast could be a weak stimulus to produce significant effects on the processing speed as compared to the effects of the incremental Kanizsa-type frame, surrounding the real target.

We found that the RTs for detecting illusory and real stimuli in an uncued condition were not significantly different regardless of the different levels of performance accuracy. These findings could be explained by a speed–error trade-off due to the higher error rates for stimuli with lower detectability.

In the present study, we used a detection task that is based on information for the image surface as shown by the classification images for detecting illusory and real squares (Figure 5). This task differs from the shape discrimination task used for investigating perception of illusory figures (Ringach & Shapley, 1996), which is based on information about the boundary illusory contours (Gold et al., 2000; Gold & Shubel, 2006). The detection of surface regions of illusory figures and the discrimination of figure shape could be two separate processes (Barlasov-Ioffe & Hochstein, 2008). The processes extracting global surface regions of illusory images have been suggested to involve higher visual areas responsible for filling-in of the image surface: (1) the LOC areas (Mendola et al., 1999; Stanley & Rubin, 2003) and (2) the dorsal visual regions (Murray, Wylie et al., 2002; Shpaner et al., 2009). Both neural structures have large receptive fields, which could respond optimally to large-sized image surfaces. Large luminance-defined images are assumed to be processed by the fast magnocellular dorsal pathway (DeYoe & Van Essen, 1988). Therefore, the detection of both illusory and luminance-defined images could be based on activation of similar fast magnocellular neural structures within dorsal visual regions. The lack of significant difference between the points in time for both illusory and real stimuli (including sensory encoding, transmission, and motor response delays), at which information first becomes available for decision making, supports the suggestion that a single visual pathway underlies the detection of surface regions of illusory and luminance-modulated images. The slower speed for detecting illusory images as compared to luminance-defined images could be attributed to slower processes of filling-in the global regions of illusory images within the dorsal pathway.

As we discussed in the Introduction section, Imber et al. (2005) suggested that the late masking (SOAs of 280–350 ms) between illusory images might occur at the levels of illusory surface representation. They also noted that the masking effect between luminance-defined images occurred at short SOAs of less than 100 ms. These findings led to the prediction that SAT functions for illusory figures should be shifted to longer processing times than those for real images. We found that the x-intercepts of the SAT functions for both patterns were similar, which did not confirm the above prediction. This suggests that the late masking could reflect processes of contour completion rather than illusory surface representation. This suggestion could be tested by investigating the SAT functions in a shape discrimination task, which may find a shift to longer processing times of SAT functions for illusory contours as compared to those for real contours.

The extraction of global surface information has an important role in the rapid analysis of a visual scene. For example, the speed of Kanizsa figure detection in a visual search task is determined by surface regions of illusory images, rather than the boundary contours (Conci et al., 2007). The results of the present study have shown that the detection of fragmented visual objects that require binding image fragments into a single percept are processed with slower speed than luminance-defined objects. In complex visual scenes, this may have effects on the speed of detecting visual objects whose retinal images contain incomplete object information.

Acknowledgments

We are grateful to both anonymous reviewers for the valuable suggestions and comments of the previous versions of the manuscript.

Lee T. S.
Nguyen M.
(2001). Dynamics of subjective contour formation in the early visual cortex. Proceedings of the National Academy of Sciences of the United States of America, 98, 1907–1911. [PubMed] [Article][CrossRef][PubMed]