Abstract

The FACADE model, and its laminar cortical realization and extension in the 3D LAMINART model, have explained, simulated, and predicted many perceptual and neurobiological data about how the visual cortex carries out 3D vision and figure-ground perception, and how these cortical mechanisms enable 2D pictures to generate 3D percepts of occluding and occluded objects. In particular, these models have proposed how border ownership occurs, but have not yet explicitly explained the correlation between multiple properties of border ownership neurons in cortical area V2 that were reported in a remarkable series of neurophysiological experiments by von der Heydt and his colleagues; namely, border ownership, contrast preference, binocular stereoscopic information, selectivity for side-of-figure, Gestalt rules, and strength of attentional modulation, as well as the time course during which such properties arise. This article shows how, by combining 3D LAMINART properties that were discovered in two parallel streams of research, a unified explanation of these properties emerges. This explanation proposes, moreover, how these properties contribute to the generation of consciously seen 3D surfaces. The first research stream models how processes like 3D boundary grouping and surface filling-in interact in multiple stages within and between the V1 interblob-V2 interstripe-V4 cortical stream and the V1 blob-V2 thin stripe-V4 cortical stream, respectively. Of particular importance for understanding figure-ground separation is how these cortical interactions convert computationally complementary boundary and surface mechanisms into a consistent conscious percept, including the critical use of surface contour feedback signals from surface representations in V2 thin stripes to boundary representations in V2 interstripes. Remarkably, key figure-ground properties emerge from these feedback interactions. The second research stream shows how cells that compute absolute disparity in cortical area V1 are transformed into cells that compute relative disparity in cortical area V2. Relative disparity is a more invariant measure of an object's depth and 3D shape, and is sensitive to figure-ground properties.

Boundary completion and surface filling-in obey computationally complementary laws. Boundaries complete inwardly in an oriented manner in response to pairs or greater numbers of inducers. Boundary completion also pools across opposite contrast polarities, and thus forms in a manner that is insensitive to contrast polarity. As a result, “all boundaries are invisible.” In contrast, surface filling-in spreads outwardly from each feature contour inducer in an unoriented manner and does not pool opposite contrast polarities, hence is sensitive to contrast polarity. As a result, all conscious percepts of visual qualia are surface percepts, including percepts of such seemingly simple stimuli as dots or lines, which also generate boundary groupings that contain filling-in of their surface brightnesses and/or colors; cf., simulations in Grossberg and Mingolla ().

Kanizsa square (left panel) and reverse-contrast Kanizsa square (right panel). The Kanizsa square appears brighter than its background due to the brightness induction by the four black pac man figures. In contrast, the reverse-contrast Kanizsa square may be recognized, but not seen, if the brightness induction by the black-to-gray pac man inducers balances the darkness induction due to the white-to-gray pac man inducers after filling-in.

T-junctions influence figure-ground percepts. The two figures in the top row illustrate Kanizsa stratification. In the left panel, the white cross appears in front of the square border most of the time. The white in positions where the cross occludes the square appears to belong to the cross, and is in front of the square, which is amodally completed behind it. On occasion, the percept flips with the square appearing in front of the cross. Then the white area that previously belonged to the cross appears to belong to the square, with the cross amodally completed behind it. In the right panel, even when the extra black vertical lines force the vertical square bar to always appear in front of the cross, the horizontal branches of the square are amodally recognized behind the vertical bars of the cross, leading to a percept of a square that is bent in depth. This latter result is incompatible with a Bayesian statistics account of what the percept should look like based upon the high probability of experiencing flat squares in the world. These percepts are explained in Grossberg () and simulated in Kelly and Grossberg (). In the bottom row (left panel), the two small rectangles are recognized as an amodally completed vertical rectangle behind the horizontal bar. This illustrates amodal completion of recognition without seeing, as do the two stratification figures. This percept, and its variants when the relative contrasts of the rectangles and background are varied, is explained in Grossberg (). The remaining figure in the lower right panel illustrates bistable transparency, whereby the percept of an upper left square appears as a transparent film in front of a lower right square alternates with the percept of a lower right square as a transparent film in front of an upper left square. This percept, as well as unimodal transparency and no transparency cases, is explained and simulated in Grossberg and Yazdanbakhsh ().

Multiple depth-selective boundary representations regulate filling-in of surface representations within multiple depth-selective Filling-In DOmains. Brightness or color feature contour inputs are topographically distributed across multiple depths (vertical arrows) before being captured by boundaries (horizontal and oblique arrows) that are positionally aligned with them. See Grossberg () for a more complete description of this surface capture process.

Filling-in of closed and open boundaries. The top row illustrates how, at a prescribed depth, a closed boundary contour abuts an illuminant-discounted feature contour. When this happens, the feature contours can fill-in within the closed boundary. The bottom row (left panel) depicts how filling-in of the feature contours is contained by this closed boundary contour, thereby generating large contrasts in filled-in activity at positions along the boundary contour. Contrast-sensitive surface contour output signals can then be generated in response to these large contrasts. The bottom row (right panel) depicts a boundary contour that has a big hole in it at a different depth. Feature contours can spread through such a hole until the filled-in activities on both sides of the boundary equalize, thereby preventing contrast-sensitive surface contour output signals from forming at such boundary positions.

How closed boundaries regulate seeing and recognition in depth. A closed boundary can form at the nearer depth Depth 1 by combining a binocular vertical boundary at the left side of the square with three monocular boundaries that are projected along the line of sight to all depths. Surface contour output signals can thus be generated by the FIDO at Depth 1, but not the FIDO at the farther depth Depth 2. The Depth 1 surface contours excite, and thereby strengthen, the boundaries at Depth 1 that controlled filling-in at Depth 1. These surface contours also inhibit the redundant boundaries at Depth 2 at the same positions. As a result, the pruned boundaries across all depths, after the surface contour feedback acts, can project to object recognition networks in inferotemporal cortex to facilitate amodal recognition, without being contaminated by spurious boundaries. See Fang and Grossberg () for simulations of how this process works in response to random dot stereograms.

A cross-section of the inhibitory off-surround across depth that is caused by surface contour outputs. The top row shows the inhibitory signals in response to a less bright Kanizsa square. The bottom row shows the inhibitory signals in response to a more bright Kanizsa square. The numerals 1 and 2 indicate one of the depths where the inhibitory signals are equal. This illustrates how the brighter Kanizsa square (at depth 1) can inhibit boundaries at more depths between that of the Kanizsa square and its inducers (such as depth 2), thereby making the brighter square stand out more in depth.

T-junctions and end gaps in figure-ground perception. (A) T-Junction Sensitivity: (left panel) T-junction in an image. (middle panel) Bipole cells activate long-range recurrent excitatory horizontal connections (cooperation, +) and also short-range inhibitory interneurons (competition, −); (right panel). An end gap in the vertical boundary arises because, for cells near where the horizontal top and vertical stem of the T come together, the top of the T activates bipole cells along the top of the T much more than bipole cells are activated along the T stem. As a result the stem boundary gets inhibited by the short-range inhibitory signals from the horizontal bipole cells, whereas the top boundary does not receive comparable inhibition from the vertical bipole cells (Reprinted with permission from Grossberg, ). (B) Necker cube. This 2D picture can be perceived as either of two 3D parallelopipeds whose shapes flip bistably through time. (C) When attention switches from one circle to another, that circle pops forward as a figure and its brightness changes. See text and Grossberg and Yazdanbakhsh () for an explanation. Reprinted with permission from Tse ().

Model circuit for transforming absolute disparity into relative disparity: the input consists of dots arranged on a disparity axis along a single position in the plane. The fixation plane is assigned a disparity of 0◦. This input is mapped to complex cells in V1 layer 2/3 that are tuned to absolute disparity and positioned along a disparity axis. The inputs from V1 layer 2/3 to V2 layers 6 and 4 define a shunting on-center off-surround network whose lateral inhibition causes a peak shift in V2 disparity tuning that matches relative disparity data, as illustrated in Figures , . Reprinted with permission from Grossberg et al. ().

Relative disparity data and simulations. (Left panel) Sample cell data from experiments and model: (A) Experimental data of two V2 cell responses for relative disparity (Reprinted with permission from Thomas et al., ). (B) Two model V2 layer 4 neurons with disparity tuning curves with changes in surround disparity. The model neurons simulate the position of data peaks and their shifts, but not all aspects of the amplitudes in the data. This is due to the simplicity of the model. Despite the simplicity, the model is capable of capturing the key shift properties. (Right panel) Shift ratio statistics. The shift ratio is defined as the shift in peaks of the tuning curve relative to the difference, or shift, of surround disparities. The shift ratio summarizes the statistics of the type of disparity observed: (C) Shift ratio summary reprinted with permission from Thomas et al. (). (D) Shift ratio summary from the model showing best results with D− = 0.2 and σinh = 1.0. An exhaustive number of combinations would have required permutations derived from choosing two surrounds without repetition from a set of 200 cells, leading to 19,900 permutations. However, the best available data from Thomas et al. () have a maximum of 91 shifts, so a random selection was compared with their summary statistics. This random selection chose, for each cell, four shift ratios to derive a total of 1600 shifts and 800 shift ratios. These shift ratios were, in turn, randomly sampled without replacement to select 75 and 91 shifts, respectively, to match the number of shifts computed in the experimental data [Reprinted with permission from Grossberg et al. ()].

Shift ratio statistics due to varying D− and σinh using the same shift sampling method as in Figure . Shifts toward absolute disparity or relative disparity depend on these parameters. (A)D− = 0.5; σinh = 1.0. Shift toward absolute disparity. This is the usual profile of V1 disparity cells. (B)D− = 1.0; σinh = 0.5. Absolute disparity is observed with a larger amplitude and narrower width of the off-surround kernel. (C)D− = 0.2; σinh = 1.0. A weak amplitude modulation (D− = 0.2) and a wide inhibitory surround together (σinh = 1.0) generate a gradient from absolute to relative disparity resembling the data. Thus, the nature of the surround inhibition in V1 and V2 accounts for the type of disparity sensitivity, a fact that is important in explaining some figure-ground data of Zhang and von der Heydt (); see the text. The parameters used in Figure and (C) are the same [Reprinted with permission from Grossberg et al. ()].

(A) The standard test for determining the effect of border ownership on edge responses. In the left two images, identical contrast edges are presented in the recorded receptive field. In the left-most figure, the light-dark edge is at the right side of a light square. In the figure to its right, it is at the left side of a dark square. The relation is analogous between the right two images, with reversed contrasts. Adapted from Zhou et al. (). (B) 2D pictures can appear flat or slanted in depth. The two figures in bold lines are made of same set of surfaces, but due to the different arrangement of their surfaces, they give rise to different slanted 3D percepts, even though the figural sides from which they are composed can individually be perceived as flat. The left bold figure has a positive tilt (near-to-far), while the right bold figure has a negative tilt (far-to-near). The tilts of near-to-far vs. far-to-near straight lines are determined by the combinations of angle cells between which they lie. The corresponding combinations of surrounding angle cells and straight bipole cells are associated with each other during normal 3D vision. Reprinted with permission from Grossberg and Swaminathan ().