Object substitution is a type of visual masking that occurs when a briefly presented display is followed by several small dots that surround the location of a target image, but do not touch it. An example of a display sequence from a typical object substitution masking experiment is shown below. The visual display consists of a number of different shapes, with one shape (the circle) singled out by the four small dots that surround it. This entire display, including the four dots, is flashed briefly and the task of the observer (sometimes called the study participant) is to identify this shape. If the display terminates after a single brief flash of these shapes, identification accuracy of the shape surrounded by the four dots is nearly perfect. If, however, the shapes disappear but the dots remain visible for a little while longer, then identification accuracy is reduced, sometimes to the levels of pure guessing. In the subjective experience of the observer, it is as if the square shape that is merely implied by the outline of the four dots has replaced the original target shape. (Note that chance accuracy is 1/5 in this example because there are five different shapes and the observer task is to report the shape of one of them; the target).

Masking by four surrounding dots was first reported by Enns & Di Lollo (1997), who noted that it differed in several important ways from metacontrast masking, which is a reduction in visibility of a briefly presented target image that is followed by a second image that fits snugly around the contours of the target image, but does not touch it. A first point of difference was that the four dots did not act as a mask when the target location was known in advance. That is, unlike metacontrast masking, which is very effective when the target appears at the center of gaze, the four dots act as effective masks only when the observer is uncertain about the location of the target in advance of its appearance. More recent work has shown that object substitution masking can still occur if the location of the target is known in advance, provided that participants are engaged in a concurrent cognitively-demanding task (Dux, Visser, Goodhew & Lipp, 2010). This means that for the four dots to be an effective mask, attentional resources must be prevented from focussing on the target, either via spatial or other attentional manipulations.

A second difference from metacontrast masking is the lack of sensitivity to the proximity between target and masking contours. The strongest metacontrast masking occurs when the target and mask contours are adjacent to one another, and it falls off sharply in effectiveness as the distance between contours is increased. In contrast to this, masking of a target shape by four surrounding dots is insensitive to the distance between contours of the target and the four dots; what matters is simply that the target and mask are seen as occupying the same region of space (Di Lollo, Enns, & Rensink, 2000; Enns & Di Lollo, 1997). Moreover, when the target and four dots are presented in different spatial locations, the critical factor is whether or not the dots that remain on view are experienced as being the same dots as those that first appeared. If they are seen as the the same dots, albeit transformed over time because of motion or a change in viewing distance, then strong masking occurs even when they occupy different locations (Lleras & Moore, 2003). Conversely, if they are not seen as the same dots, then masking is weakened even when the target and dots occupy the same locations. Four dot masking is also stronger for dots that move away from the center of gaze than dots that move toward the center (Jiang & Chun, 2001).

A third difference from metacontrast masking was demonstrated by Chakravarthi and Cavanagh (2009). These authors used a ‘crowded’ display (Pelli & Tillman, 2008) in which flankers that are close in proximity to a peripherally-presented target obscure its visibility. Chakravarthi and Cavanagh (2009) reported that metacontrast masks applied to the distractor objects eliminated crowding (allowed the target to be perceived), whereas object substitution masks did not. This implies that object substitution masking has a later locus of suppression than either crowding or metacontrast masking.

Contents

Common onset masking

Common onset masking refers to the visual masking of one part of an image by another part of the same image that remains visible after the first part has been turned off (Bischof & Di Lollo, 1995; Cohene & Bechtoldt, 1974, 1975; Di Lollo, Bischof & Dixon, 1993). The specific combination of a four dot mask with common onset masking was introduced by Di Lollo et al. (2000), who used the results to argue that existing theories of visual masking were all incapable of accounting for the masking that occurred when four dots were presented at the same time as the target but remained on display even after the target display had been removed. The existing theories of masking were based on (1) the visual integration, and therefore perceptual confusion of, events occurring in close spatiotemporal proximity (Di Lollo, 1980; Kahneman,1968), (2) the interruption of processing of one stimulus by a stimulus that followed it closely in time and space (Kolers, 1968; Spencer & Shuntich, 1970; Turvey, 1973), and (3) the reduction of visibility in one stimulus by competitive neural interactions with another stimulus (Breitmeyer & Ganz, 1976; Keysers & Perrett, 2002; Weisstein, Ozog, & Szoc, 1975).

Di Lollo et al. (2000) proposed a general theoretical framework to account for both masking by four dots and for common onset masking, which they refer to as a reentrant theory of perception. They also offered a specific computational model of their experimental results, which they referred to as a computational model of object substitution, or CMOS. The central idea in both cases is that perception is based on the activity of modules, arranged in an anatomical hierarchy, and arrayed spatially over the visual field. Each module is conceptualized as a circuit comprising connections between a lower-order cortical area (e.g., primary visual area or V1) and a topographically related region in an extrastriate visual area (e.g., inferotemporal cortex or IT). The output of each module is a representation of the spatial pattern within its receptive field.

Perception emerges in this theory after one or more iterative exchanges between the low-level activity generated by the initial spatial pattern and the higher-level activity representing the objects that are familiar to the system. For example, given a brief display in which the target and the mask appear and disappear simultaneously, target processing can be based on the persistence in neural activity that follows a brief display. The fact that the spatial pattern of activity at the low-level decays uniformly means that there is no imbalance in activity between mask and target pattern representations, and observers are consequently able to identify the target accurately.

A reduction in target visibility (masking) occurs when there is a mismatch between the reentrant signal from the higher level and the ongoing activity at a lower level. This occurs when the target item is deleted shortly after it has appeared, leaving only the four-dot mask in the target location. The ongoing activity at the lower level then consists of an image of the mask, maintained by continued sensory input, and a decaying representation of the target at the higher level. Given this kind of conflict, what is perceived will depend on the number of iterations required to identify the target. If only a few iterations are required, conscious target identification may be completed before the target signal has faded completely. However, if more iterations are needed, a new perceptual hypothesis is formed that is consistent with the currently predominant low-level activity; and the ‘four dots alone’ percept replaces the ‘target shape plus dots’ percept.

Alternative interpretations

Critics of the reentrant theory of perception and the CMOS model have offered a number of alternative theoretical ideas to help explain object substitution masking. One possibility is that the masking produced by a portion of an initial display that remains visible is caused by the abrupt termination of the mask (Macknik, Martinez-Conde & Haglund, 2000). Offset transients in a visual display can cause a sharp burst of neural activity in cortical area V1 and these bursts have been shown to contribute to the degree of visibility reduction in metacontrast masking (Macknik & Livingstone,1998). The authors of the reentrant theory have countered that although offset transients may play a role, they are not the primary influence in object substitution masking for two main reasons (Enns & Di Lollo, 2000). First, masking is very much in evidence when the mask stays on view for 300 ms or more, which is too long after the target presentation for a burst of neural activity in response to an offset to be effective in reducing visibility of a target that occurred so long ago. Second, whereas the offset-transient hypothesis predicts progressively weaker masking with increasing mask duration (Macknik & Livingstone,1998), object substitution masking actually becomes progressively stronger with increasing mask duration (Di Lollo, von Mühlenen, Enns & Bridgeman, 2004).

Other critics have argued that rather than invoking reentrant processes to explain object substitution masking, it may be the relatively long duration of the mask that causes attention to be drawn to it rather than to the short-lived target at the same location (Francis & Hermens, 2002). The authors of the reentrant theory countered that this hypothesis was ruled out by experiments in which the mask was shown for an even longer period, beginning before the onset of the target display and outlasting it (Di Lollo, Enns, & Rensink, 2002). Although this modification had the effects of both increasing the overall duration of the mask and focusing attention directly on it, masking was still sharply reduced. The authors of the reentrant theory point out that it is the presence of the mask after the target that is critical, not its absolute duration relative to the target; and thus they interpret this aspect of the time course of masking as support for reentrant processes in perception.

Since its original presentation, object substitution masking has been studied extensively through psychophysical experiments, computational modeling, and neurophysiological experiments on animals and people. Brief summaries and representative publications of this research follow.

Psychophysical experiments

Jiang and Chun (2001) explored the location specificity of object substitution masking, finding that some masking occurred even when targets and masks did not appear in the same spatial locations. In particular, masking was considerably stronger when the mask was eccentric (away from the center of gaze) than when it was presented closer to the center of gaze than the target. A variety of different masks were also compared, with all showing a similar pattern and all showing similar magnitudes of masking. The authors interpreted this as supporting the idea that masking is occurring at the level of type (substitution by another class of object) rather than at the level of token (substitution of one member of an object class by another member). This idea continues to be explored in studies varying the similarity between target and mask at the level of features (different tokens) and of objects (different types) (Gellatly, Pilling, Cole & Skarratt, 2006).

Lleras & Moore (2003) and Moore & Lleras (2005) conducted an examination of the role of object-level representations (as opposed to image-level representations) in object substitution masking. Their main finding was that masking occurs when apparent motion and color are used to maintain the perception of object continuity between the target and the mask displays. Specifically, masking was most effective when participants perceived the mask as representing the same object as the target, albeit after undergoing some changes in its featural description. Conversely, when object continuity was broken (i.e., when observers perceived mask and target stimuli as representing different objects), masking was much reduced, even though the target and mask were in the same spatial locations.

A comparison of four dot masking with noise, pattern, and metacontrast masking was reported by Enns (2004). These experiments also varied the experimental factors of display size, spatial cuing, and the temporal interval between spatial cue and the target-mask sequence. The main finding was that when attention was distributed widely (because of a large set of possible target items and no advance spatial cuing) then all types of mask reduced target visibility in much the same way. However, when spatial attention was cued to the target location in advance, then the various types of mask differed quite dramatically in their effects, with four dots having no effect on target visibility and the other types of masking each revealing unique temporal characteristics. These findings were interpreted as evidence for two fundamentally different masking mechanisms: reduced visibility because of a disturbed object formation process in the first 100 milliseconds of processing, and reduced visibility because of an object substitution process that occurs when the display changes before spatial attention has become focused on a single object. However, four dot masking still occurs when attention is directed only voluntarily with a symbolic cue, rather than aided by a spatial cue with an abrupt onset at the target location, leaving it for future studies to establish exactly which aspects of spatial attention are critically involved (Luiga & Bachmann, 2007).

A question of great interest to perception researchers is whether a target image that has been masked, and therefore prevented from reaching consciousness, is nonetheless represented in the brain in some form. This is seen in some neuro-pathological conditions in the form of blindsight, which is the ability to respond appropriately to an object with actions (eye or limb movements) despite not being able to form a conscious experience of the object. A form of blindsight in healthy adults was demonstrated in the context of object substitution masking by Binsted, Brownell, Vorontsova, Heath & Saucier (2007). In this study, neurologically intact individuals were able to program and execute goal-directed reaching movements to a target object that they were unable to see because it had been masked with a four dot masking display. Despite not having conscious access to the target, the speed and accuracy of their hand-grasping response was still influenced by the size of the target. The authors interpreted this finding to suggest that the dorsal pathway controlling visually guided reaching can act based on feedforward signals from area V1, whereas the ventral pathway responsible for conscious awareness of the target depends on the reentrant signals from higher-level vision feeding back to area V1 in order to form a conscious percept.

Goodhew, Visser, Lipp, and Dux (2011a) examined the time-course of object substitution masking, and reported that with prolonged mask exposure (e.g., 640ms), there was an improvement in target identification relative to intermediate mask durations, creating a non-monotonic masking function across mask duration. This effect was not dependent on the offset of the mask prior to response (Goodhew, Dux, Lipp, and Visser, 2012), implicating higher-level mechanisms. The authors concluded that this is because the brain continues to make inferences that affect the conscious representation of objects over a more prolonged timescale than previously recognised.

Gellatly, Pilling, Carter and Guest (2010) found that brief exposure of the target is not necessary for object substitution masking to occur. That is, even when the target array was exposed for a period of up to 830ms (in the absence of the cue to indicate which item is the target), masking still occurred, although its magnitude was reduced. This reduction in masking was found even when generic placeholders were used instead of the target and distractor objects, suggesting that it is due to allowing increased time for the visual system to consolidate object representations at particular locations, rather than allowing particular features to be identified and stored (Guest, Gellatly, & Pilling, 2012).

Computational Modeling

There are several general computational frameworks for visual perception that emphasize reentrant processes as critical for understanding object perception and the dynamics of visual experience over time. These include Adaptive Resonance Theory (Carpenter & Grossberg, 2003; Grossberg,1995); the Reentrant Model of Visual Attention (Hamker, 2006); Pattern Theory (Mumford, 1991, 1992) and Predictive Coding Theory (Rao, 1999; Rao & Ballard, 1999). These models are all considerably broader in their scope than the specific focus here on object substitution masking, but they are similar in their spirit.

Following the initial presentation of the reentrant model known as CMOS, to account for the results of psychophysical experiments on object substitution masking (Di Lollo et al., 2000), Francis & Hermens (2002) examined a subset of these data with a non-reentrant model in which the critical role of attention was modeled directly as masking strength, with focused attention corresponding to a weak mask and distributed attention corresponding to a strong mask. This modeling effort was criticized by Di Lollo et al. (2002) as lacking theoretical plausibility, because it was not clear to them why distributed spatial attention should increase mask strength selectively over target strength. Di Lollo et al. (2002) also criticized Francis & Hermens (2002) for modeling only a subset of the data; specifically, for not modeling the experiments from Di Lollo et al. (2000) in which the mask was physically assigned more energy than the target, because it appeared in advance of the target and remained on for a longer duration, but yet resulted in reduced masking.

Francis (2003) made it possible for researchers other than modelers to compare the ability of various masking models to account for data by designing a website that allows users to enter data values, to set model parameters and then to compare the ability of five models to account for the data: the dual-channel model (Weisstein, 1972); recurrent inhibition (Bridgeman, 1978); decaying trace (Anbar & Anbar, 1982); efficient masking (Francis, 2000); and reentrant processing (Di Lollo et al., 2000). These models were subsequently compared in their ability to account for the factors of apparent brightness and mask duration in metacontrast masking (Di Lollo et al., 2004), with the two models explicitly incorporating feedback processes (recurrent inhibition and reentrant processing) providing the best overall fits to the data.

Goodhew, Visser, Lipp, and Dux (2011a) examined the time-course of object substitution masking, reporting that with prolonged mask exposure (e.g., more than 640ms), there was an improvement in target identification relative to intermediate mask durations, creating a non-monotonic masking function across mask duration. This effect was not dependent on the offset of the mask prior to response (Goodhew, Dux, Lipp, and Visser, 2012), implicating higher-level mechanisms. The authors concluded that this is because the brain continues to make inferences that affect the conscious representation of objects over a more prolonged timescale than previously recognised.

Gellatly, Pilling, Carter and Guest (2010) reported that brief exposure of the target is not necessary for object substitution masking to occur. That is, even when the target array was exposed for a period of up to 830ms (in the absence of the cue to indicate which item is the target), masking still occurred, although its magnitude was reduced. This reduction in masking was found even when generic placeholders were used instead of the target and distractor objects, suggesting that it is due to allowing increased time for the visual system to consolidate object representations at particular locations, rather than allowing particular features to be identified and stored (Guest, Gellatly, & Pilling, 2012).

Animal Neurophysiology

The hypothesis that visual masking may come about because reentrant cortical signals are disrupted, rather than a disruption in feedforward signals, has a history that precedes research on object substitution masking by at least two decades. Bridgeman (1975; 1980) reached this conclusion after a series of metacontrast masking experiments involving single-unit recordings in the primary visual cortical areas of cat and monkey. These studies showed that metacontrast masking was associated with a reduction in visual responses occurring beyond 80 ms and as late as 400 ms after stimulus onset. Earlier visually-evoked responses were affected only minimally, if at all. Bridgeman (1980) interpreted these late responses as representing the influence of reentrant signals from regions beyond the primary visual cortex.

Lamme, Zipser & Spekreijse (2002) came to much the same conclusion in studies of figure-ground perception and the associated responses of primary visual neurons in awake monkeys fitted with chronic microelectrode implants. In a first stage of responding (up to 80 ms from target onset), neurons responded only to local contours within their classical receptive fields. In a second stage (about 80–120 ms), the same neurons began to respond to figure boundaries that lay outside the classical receptive field. In a third stage (beyond about 120 ms), the same neurons responded to the surface properties of the target. Lamme et al. (2002) concluded that within primary visual cortex, different regimes of activity coexist at different points in time. Specifically, there is low-level processing of local features such as edge orientation in the early stages, which is largely unaffected by masking and remains active when animals are anesthetized and when animals indicate not seeing the target. But later there is also neural activity related to higher level processes such as figure–ground perception, which is affected by masking and by anesthesia and is absent when animals indicate not being able to see the target.

Human Neurophysiology

Ro, Breitmeyer, Burton, Singhal, & Lane (2003) conducted a metacontrast masking experiment in which they also tested the effects of transcranial magnetic stimulation (TMS) of the visual cortex in humans on the perception of a target shape. The results showed that appropriately timed TMS could induce perception of an otherwise imperceptible target shape that was followed by a masking shape. The interpretation given this result was that the disrupted neural activity caused by the TMS event suppressed perception of the masking shape, allowing the target shape to increase in visibility. Along with other researchers, they interpreted this finding as consistent with the theory that one or more cycles of reentrant processing between higher order brain areas and area V1 are necessary for conscious visual perception (Cowey & Walsh, 2000; Pascual-Leone & Walsh, 2001; Pollen 2007).

Hirose, Kihara, Tsubomi, Mima, Ueki, Fukuyama, Osaka (2005) used a repetitive form of TMS to study object substitution masking involving displays that began with one of several shapes surrounded by four dots and ending with only the four dots on view. The main finding was that following repetitive TMS over visual area V5/MT+, widely recognized as specialized in visual motion processing, object substitution masking was significantly reduced. The authors interpreted this result in support of the hypothesis that object substitution masking is mediated by visual processes that are aimed at providing continuity of objects over space and time (Lleras & Moore, 2003). In a follow-up study this research group added confirmatory data to this interpretation by showing that repetitive TMS over visual area V1 also reduced object substitution masking, consistent with the reciprocal pattern of signaling between V5/MT+ and V1 (Hirose et al., 2007). Moreover, repetitive TMS over a brain area 2 cm posterior to V5/MT+ had no effect on target visibility, pointing to the specificity of the circuit involved in masking.

Woodman & Luck (2003) used event-related potential (ERP) recordings in humans to demonstrate that a target masked by four dots, and therefore not reportable by the study participant, still contributed to an N2pc component that was sensitive to the identity of the target. They interpreted this result as indicating that the unseen target was sufficient to trigger an attentional shift to the correct location (i.e., the attentional shift was dependent on correct target identification), but that by the time this attentional shift was complete, only the mask remained visible, leading to impaired target report. In this interpretation, shifts of attention are independent and therefore dissociable from perceptual awareness. However, more recent work suggests that N2pc component is sensitive to target features, rather than shifts of attention (Kiss, Van Velzen, & Eimer, 2008). In light of this, Woodman and Luck’s (2003) result suggests that residual target processing can occur in the absence of awareness, rather than dissociating attention and awareness.

Reiss & Hoffman (2006) examined the N400 component of the ERP waveform in response to printed words that were either visible or reduced in visibility by four dot masking. Their interest was in determining whether four dot masking reduced the semantic content of the word in addition to its effect on target visibility. Their main finding was that the N400 amplitude was significantly diminished with object substitution masking. In a related study, these authors reported that the typical N170 amplitude difference between images of faces and houses was also eliminated with four dot masking (Reiss & Hoffman, 2007). They interpreted both of these findings as indicating that object substitution masking interferes with processing during the formation of an object representation, but prior to its semantic analysis. However, using a masked priming paradigm, Goodhew, Visser, Lipp and Dux (2011b) found that the semantic meaning of a word target influenced speeded responses to identify the colour of the four-dot mask, even when the target was successfully masked to the level of detection. This behavioural finding demonstrates implicit semantic perception in object substitution masking, dissociating high-level processing from awareness.

Fahrenfort, Scholte & Lamme (2007) also used ERP recordings in humans to study metacontrast masking of texture-defined shapes by other texture-defined shapes. One finding was that some components of ERP signals arising from higher-order visual brain areas were activated equally well by target shapes that were visible (i.e., that were not masked) and by target shapes that were invisible (i.e., that were effectively masked). The authors interpreted this as indicating that feedforward signals from the target are preserved, even when target accuracy is at chance as determined by objective measures. They suggest that this early activation (up to about 100 ms following target onset) is unaffected by the signals from the later-occurring mask, but that it is also insufficient to generate visual awareness of the target. However, a second finding of Fahrenfort et al. (2007) was that other ERP components, specifically those typically associated with reentrant processing, were absent when target shapes were not visible to participants (i.e., when they were masked). The authors conclude that masking derives its effectiveness from disrupting the reentrant processing that are necessary for the conscious perception of object shapes as distinct from background surfaces.

Kotsoni, Csibra, Mareschal & Johnston (2007) examined high-density ERP recordings in humans experiencing object substitution masking, with displays that began with one of several shapes surrounded by four dots and ended with only the four dots on view. Target accuracy was affected both by the number of shapes in the display and by the duration of the dots that remained on view after the shapes had been erased. The ERP data revealed a modulation of signals that was mask dependent at about 220 ms after the shape display had been presented. They interpreted this as corresponding to a reactivation of early lower-order visual areas by re-entrant feedback signals originating from higher visual areas.

Two studies to date have examined the object substitution masking with the BOLD response obtained from fMRIbrain imaging. Weidner, Shah & Fink (2006) reported a positive correlation between the degree of masking of a target and the strength of the BOLD signal in primary visual cortex and extrastriate cortical visual areas, including the inferotemporal-parietal sulcus. The authors interpreted this pattern as consistent with a brain network for object substitution that included both low-level and high-level anatomical cortical regions.

Carlson, Rauschenberger & Verstraten (2007) used fMRI brain imaging in conjunction with an adaptation procedure to test whether a target that was effectively masked by four dots was nonetheless represented in a higher-order brain region known as the lateral occipital cortex (specialized for object perception) and in the primary visual cortex (area V1). The results showed that there was no persisting neural representation of a masked target shape in lateral occipital cortex, though there was in area V1. The authors interpreted these results as supporting the idea that masking disrupts the formation of an object representation. They also pointed out that this ruled out the possibility that a masked target is merely rendered inaccessible to consciousness, as it is in some other psychophysical procedures such as binocular rivalry or the attentional blink.