First revision submitted April 2000
Second revision submitted September 2001
Third revision submitted June 2002
Accepted for publication September 2002
Peer commentaries received February 2003
Response to commentaries submitted March 2003
Response second revision submitted April 2003
Copy-edited article submitted July 2003
Copy-edited response to commentators submitted September 2003
Final paper published volume 26, number 4, pp 375-444, March 2004
Summary of whole review process
Open peer commentaries
Author's response to commentaries

Word Counts

Abstract1

The subjective experience of visual perception is of a world composed
of solid volumes, bounded by colored surfaces, embedded in a spatial
void. These properties are difficult to relate to our
neurophysiological understanding of the visual cortex. I propose
therefore a perceptual modeling approach, to model the information
manifest in the subjective experience of perception, as opposed to the
neurophysiological mechanism by which that experience is supposedly
subserved. A Gestalt Bubble model is presented to demonstrate how the
dimensions of conscious experience can be expressed in a quantitative
model of the perceptual experience that exhibits Gestalt properties.

Abstract2

A serious crisis is identified in theories of neurocomputation marked
by a persistent disparity between the phenomenological or experiential
account of visual perception and the neurophysiological level of
description of the visual system. In particular conventional concepts
of neural processing offer no explanation for the holistic global
aspects of perception identified by Gestalt theory. The problem is
paradigmatic, and can be traced to contemporary concepts of the
functional role of the neural cell, known as the Neuron Doctrine. In
the absence of an alternative neurophysiologically plausible model, I
propose a perceptual modeling approach, i.e. to model the percept as
experienced subjectively, rather than the objective neurophysiological
state of the visual system that supposedly subserves that
experience. A Gestalt Bubble model is presented to demonstrate how the
elusive Gestalt principles of emergence, reification, and invariance,
can be expressed in a quantitative model of the subjective experience
of visual consciousness. That model in turn reveals a unique
computational strategy underlying visual processing, which is unlike
any algorithm devised by man, and certainly unlike the atomistic
feed-forward model of neurocomputation offered by the Neuron Doctrine
paradigm. The perceptual modeling approach reveals the primary
function of perception as that of generating a fully spatial
virtual-reality replica of the external world in an internal
representation. The common objections to this "picture-in-the-head"
concept of perceptual representation are shown to be ill founded.

1 Introduction

Contemporary neuropscience finds itself in a state of serious
crisis. For the deeper we probe into the workings of the brain, the
farther we seem to get from the ultimate goal of providing a
neurophysiological account of the mechanism of conscious
experience. Nowhere is this impasse more evident than in the study of
visual perception, where the apparently clear and promising trail
discovered by Hubel and Wiesel leading up the hierarchy of feature
detection from primary to secondary and to higher cortical areas,
seems to have reached a theoretical dead-end. Besides the troublesome
issues of the noisy stochastic nature of the neural signal, and the
very broad tuning of the single cell as a feature detector, the notion
of visual processing as a hierarchy of feature detectors seems to
suggest some kind of "grandmother cell" model in which the activation
of a single cell or a group of cells represents the presence of a
particular type of object in the visual field. However it is not at
all clear how such a featural description of the visual scene could
even be usefully employed in practical interaction with the world.
Alternative paradigms of neural representation have been proposed,
including the suggestion that synchronous oscillations play a role in
perceptual representation, although these theories are not yet
specified sufficiently to know exactly how they address the issue of
perceptual representation. But the most serious indictment of
contemporary neurophysiological theories is that they offer no hint of
an explanation for the subjective experience of visual
consciousness. For visual experience is more than just an abstract
recognition of the features present in the visual field, but those
features are vividly experienced as solid three-dimensional objects,
bounded by colored surfaces, embedded in a spatial void. There are a
number of enigmatic properties of this world of experience identified
decades ago by Gestalt theory, suggestive of a holistic emergent
computational strategy whose operational principles remain a mystery.

The problem in modern neuroscience is a paradigmatic one, that can be
traced to its central concept of neural processing. According to the
Neuron Doctrine, neurons behave as quasi-independent
processors separated by relatively slow chemical synapses, with
strictly segregated input and output functions through the dendrites
and axon respectively. It is hard to imagine how such an assembly of
independent processors could account for the holistic emergent
properties of perception identified by Gestalt theory. In fact the
reason why these Gestalt aspects of perception have been largely
ignored in recent decades is exactly because they are so difficult to
express in terms of the Neuron Doctrine paradigm. More recent
proposals that implicate synchronous oscillations as the
neurophysiological basis of conscious experience (Crick & Koch 1990,
Crick 1994, Eckhorn et al. 1988, Llinas et al. 1994,
Singer 1999, Singer & Gray 1995) seem to suggest some kind of holistic
global process that appears to be more consistent with Gestalt
principles, although it is hard to see how this paradigm, at least as
currently conceived, can account for the solid three-dimensional
nature of subjective experience. The persistent disparity between the
neurophysiological and phenomenal levels of description suggests that
either the subjective experience of visual consciousness is somehow
illusory, or that the state of our understanding of neural
representation is far more embryonic than is generally recognized.

Pessoa et al. (1998) make the case for denying the primacy of
conscious experience. They argue that although the subjective
experience of filling-in phenomena is sometimes accompanied by some
neurophysiological correlate, that such an isomorphism between
experience and neurophysiology is not logically necessary, but is
merely an empirical issue, for, they claim, subjective experiences can
occur in the absence of a strictly isomorphic correlate. They argue
that although the subjective experience of visual consciousness
appears as a "picture" or three-dimensional model of a surrounding
world, this does not mean that the information manifest in that
experience is necessarily explicitly encoded in the brain. That
consciousness is an illusion based on a far more compressed or
abbreviated representation, in which percepts such as that of a
filled-in colored surface can be explained neurophysiologically by
"ignoring an absence" rather than by an explicit point-for-point
mapping of the perceived surface in the brain.

In fact, nothing could be farther from the truth. For to propose
that the subjective experience of perception can be more enriched and
explicit than the corresponding neurophysiological state flies in the
face of the materialistic basis of modern neuroscience. The modern
view is that mind and brain are different aspects of the same physical
mechanism. In other words, every perceptual experience, whether a
simple percept such as a filled-in surface, or a complex percept of a
whole scene, has two essential aspects; the subjective experience of
the percept, and the objective neurophysiological state of the brain
that is responsible for that subjective experience. Like the two faces
of a coin, these very different entities can be identified as merely
different manifestations of the same underlying structure, viewed from
the internal first-person, v.s. the external third-person
perspectives. The dual nature of a percept is analogous to the
representation of data in a digital computer, where a pattern of
voltages present in a particular memory register can represent some
meaningful information, either a numerical value, or a brightness
value in an image, or a character of text, etc. when viewed from
inside the appropriate software environment, while when viewed in
external physical terms that same data takes the form of voltages or
currents in particular parts of the machine. However whatever form is
selected for encoding data in the computer, the information content of
that data cannot possibly be of higher dimensionality than the
information explicitly expressed in the physical state of the
machine. The same principle must also hold in perceptual experience,
as proposed by Müller (1896), in the psychophysical
postulate. Müller argued that since the subjective experience
of perception is encoded in some neurophysiological state, the
information encoded in that conscious experience cannot possibly be
any greater than the information encoded in the corresponding
neurophysiological state.While we cannot observe phenomenologically
the physical medium by which perceptual information is encoded in the
brain, we can observe the information encoded in that medium,
expressed in terms of the variables of subjective experience. It
follows therefore that it should be possible by direct
phenomenological observation to determine the dimensions of conscious
experience, and thereby to infer the dimensions of the information
encoded neurophysiologically in the brain.

The "bottom-up" approach that works upwards from the properties of the
individual neuron, and the "top-down" approach that works downwards
from the subjective experience of perception are equally valid and
complementary approaches to the investigation of the visual
mechanism. Eventually these opposite approaches to the problem must
meet somewhere in the middle. However to date, the gap between them
remains as large as it ever was. Both approaches are essential to the
investigation of biological vision, because each approach offers a
view of the problem from its own unique perspective. The disparity
between these two views of the visual representation can help focus on
exactly those properties which are prominently absent from the
conventional neural network view of visual processing.

Arnheim (1969) presents an insightful analysis of this concept, which
can be reformulated as follows. Consider (for simplicity) just the
central "Y" vertex of figure 6 A depicted in figure 6 C. Arnheim
proposes that the extrinsic constraints of inverse optics can be
expressed for this stimulus using a rod-and-rail analogy as shown in
figure 6 D. The three rods, representing the three edges in the visual
input, are constrained in two dimensions to the configuration seen in
the input, but are free to slide in depth along four rails. The rods
must be elastic between their end-points, so that they can expand and
contract in length. By sliding along the rails, the rods can take on
any of the infinite three-dimensional configurations corresponding to
the two-dimensional input of figure 6 C. For example the final percept
could theoretically range from a percept of a convex vertex protruding
from the depth of the page, to a concave vertex intruding into the
depth of the page, with a continuum of intermediate perceptual states
between these limits. There are other possibilities beyond these, for
example percepts where each of the three rods is at a different depth
and therefore they do not meet in the middle of the stimulus. However
these alternative perceptual states are not all equally likely to be
experienced. Hochberg & Brooks (1960) showed that the final percept is
the one that exhibits the greatest simplicity, or prägnanz. In the
case of the vertex of figure 6 C the percept tends to appear as three
rods whose ends coincide in depth at the center, and meet at a mutual
right angle, defining either a concave or convex corner. This reduces
the infinite range of possible configurations to two discrete
perceptual states. This constraint can be expressed emergently in the
rod and rail model by joining the three rods flexibly at the central
vertex, and installing spring forces that tend to hold the three rods
at mutual right angles at the vertex. With this mechanism in place to
define the intrinsic or structural constraints, the rod-and-rail model
becomes a dynamic system that slides in depth along the rails, and
this system is bistable between a concave and convex right angled
percept, as observed phenomenally in figure 6 C. Although this model
reveals the dynamic interaction between intrinsic and extrinsic
constraints, this particular analogy is hard-wired to modeling the
percept of the triangular vertex of figure 6 C. I will now develop a
more general model that operates on this same dynamic principle, but
is designed to handle arbitrary input patterns.

8.1 A Gestalt Bubble Model

For the perceptual representation I propose (Lehar 2003) a volumetric block or
matrix of dynamic computational elements, as suggested in figure 7 A,
each of which can exist in one of two states, transparent or opaque,
with opaque state units being active at all points in the volume of
perceptual space where a colored surface is experienced. In other
words upon viewing a stimulus like figure 6 A, the perceptual
representation of this stimulus is modeled as a three-dimensional
pattern of opaque state units embedded in the volume of the perceptual
matrix in exactly the configuration observed in the subjective
perceptual experience when viewing figure 6 A, i.e. with opaque-state
elements at all points in the volumetric space that are within a
perceived surface in three dimensions, as suggested in figure 6 B. All
other elements in the block are in the transparent state to represent
the experience of the spatial void within which perceived objects are
perceived to be embedded. More generally opaque state elements should
also encode the subjective dimensions of color, i.e. hue, intensity,
and saturation, and intermediate states between transparent and opaque
would be required to account for the perception of semi-transparent
surfaces, although for now, the discussion will be limited to two
states and the monochromatic case. The transformation of perception
can now be defined as the turning on of the appropriate pattern of
elements in this volumetric representation in response to the visual
input, in order to replicate the three-dimensional configuration of
surfaces experienced in the subjective percept.

Figure 7

A: The Gestalt Bubble model consisting of a block of dynamic local
elements which can be in one of several states. B: The transparent
state, no neighborhood interactions. C: The opaque coplanarity state
which tends to complete smooth surfaces. D: The opaque orthogonality
stsate which tends to complete perceptual corners. E: The opaque occlusion
state which tends to complete surface edges.

8.2 Surface Percept Interpolation

The perceived surfaces due to a stimulus like 6 A appear to span
the structure of the percept defined by the edges in the stimulus,
somewhat like a milky bubble surface clinging to a cubical wire frame.
Although the featureless portions of the stimulus between the visual
edges offer no explicit visual information, a continuous surface is
perceived within those regions, as well as across the white background
behind the block figure, with a specific depth and surface orientation
value encoded explicitly at each point in the percept. This
three-dimensional surface interpolation function can be expressed in
the perceptual model by assigning every element in the opaque state a
surface orientation value in three dimensions, and by defining a
dynamic interaction between opaque state units to fill in the region
between them with a continuous surface percept. In order to express
this process as an emergent one, the dynamics of this surface
interpolation function must be defined in terms of local field-like
forces analogous to the local forces of surface tension active at any
point in a soap bubble. Figure 7 C depicts an opaque state unit
representing a local portion of a perceived surface at a specific
three-dimensional location and with a specific surface
orientation. The planar field of this element, depicted somewhat like
a planetary ring in figure 7 C, represents both the perceived surface
represented by this element, as well as a field-like influence
propagated by that element to adjacent units. This planar field fades
smoothly with distance from the center with a Gaussian function. The
effect of this field is to recruit adjacent elements within that field
of influence to take on a similar state, i.e. to induce transparent
state units to switch to the opaque state, and opaque state units to
rotate towards a similar surface orientation value. The final state
and orientation taken on by any element is computed as a spatial
average or weighted sum of the states of neighboring units as
communicated through their planar fields of influence, i.e. with the
greatest influence from nearby opaque elements in the matrix. The
influence is reciprocal between neighboring elements, thereby defining
a circular relation as suggested by the principle of emergence. In
order to prevent runaway positive feedback and uncontrolled
propagation of surface signal, an inhibitory dynamic is also
incorporated in order to suppress surface formation out of the plane
of the emergent surface, by endowing the local field of each unit with
an inhibitory field in order to suppress the opaque state in
neighboring elements in all directions outside of the plane of its
local field. The mathematical specification of the local field of
influence between opaque state units is outlined in greater detail in
the appendix. However the intent of the model is
expressed more naturally in the global properties as described here,
so the details of the local field influences are presented as only one
possible implementation of the concept, provided in order to ground
this somewhat nebulous idea in more concrete terms.

The global properties of the system should be such that if the
elements in the matrix were initially assigned randomly to either the
transparent or opaque state, with random surface orientations for
opaque-state units, the mutual field-like influences would tend to
amplify any group of opaque-state elements whose planar fields
happened to be aligned in an approximate plane, and as that plane of
active units feeds back on its own activation, the orientations of its
elements would conform ever closer to that of the plane, while
elements outside of the plane would be suppressed to the transparent
state. This would result in the emergence of a single plane of
opaque-state units as a dynamic global pattern of activation embedded
in the volume of the matrix, and that surface would be able to flex
and stretch much like a bubble surface, although unlike a real bubble,
this surface is defined not as a physical membrane, but as a dynamic
sheet of active elements embedded in the matrix. This volumetric
surface interpolation function will now serve as the backdrop for an
emergent reconstruction of the spatial percept around a
three-dimensional skeleton or framework constructed on the basis of
the visual edges in the scene.

8.3 Local Effects of a Visual Edge

A visual edge can be perceived as an object in its own right, like
a thin rod or wire surrounded by empty space. More often however an
edge is seen as a discontinuity in a surface, either as a corner or
fold, or perhaps as an occlusion edge like the outer perimeter of a
flat figure viewed against a more distant background. The interaction
between a visual edge and a perceived surface can therefore be modeled
as follows. The two-dimensional edge from the retinal stimulus
projects a different kind of field of influence into the depth
dimension of the volumetric matrix, as suggested by the gray shading
in figure 7 A, to represent the three-dimensional locus of all
possible edges that project to the two-dimensional edge in the
image. In other words, this field expresses the inverse optics
probability field or extrinsic constraint due to a single visual
edge. Wherever this field intersects opaque-state elements in the
volume of the matrix, it changes the shape of their local fields of
influence from a coplanar interaction to an orthogonal, or corner
interaction as suggested by the local force field in figure 7 D. The
corner of this field should align parallel to the visual edge, but
otherwise remain unconstrained in orientation except by interactions
with adjacent opaque units. Visual edges can also denote occlusion,
and so opaque-state elements can also exist in an occlusion state,
with a coplanarity interaction in one direction only, as suggested by
the occlusion field in figure 7 E. Therefore, in the presence of a
single visual edge, a local element in the opaque state should have an
equal probability of changing into the orthogonality or occlusion
state, with the orthogonal or occlusion edge aligned parallel to the
inducing visual edge. Elements in the orthogonal state tend to promote
orthogonality in adjacent elements along the perceived corner, while
elements in the occlusion state promote occlusion along that edge. In
other words, an edge will tend to be perceived as a corner or
occlusion percept along its entire length, although the whole edge may
change state back and forth as a unit in a multistable manner. The appendix presents a more detailed mathematical
description of how these orthogonality and occlusion fields might be
defined. The presence of the visual edge in figure 7 A therefore tends
to crease or break the perceived surface into one of the different
possible configurations shown in figure 8 A through D. The final
configuration selected by the system would depend not only on the
local image region depicted in figure 8, but also on forces from
adjacent regions of the image, in order to fuse the orthogonal or
occlusion state elements seamlessly into nearby coplanar surface
percepts.

Figure 8

A through D: Several possible stable states of the Gestalt Bubble
model in response to a single visual edge.

8.4 Global Effects of Configurations of Edges

Visual illusions like the Kanizsa figure shown in figure 4 A suggest
that edges in a stimulus that are in a collinear configuration tend to
link up in perceptual space to define a larger global edge connecting
the local edges. This kind of collinear boundary completion is
expressed in this model as a physical process analogous to the
propagation of a crack or fold in a physical medium. A visual edge
which fades gradually produces a crease in the perceptual medium that
tends to propagate outward beyond the edge as suggested in figure 9
A. If two such edges are found in a collinear configuration, the
perceptual surface will tend to crease or fold between them as
suggested in figure 9 B. This tendency is accentuated if additional
evidence from adjacent regions support this configuration. This can be
seen in figure 9 C where fading horizontal lines are seen to link up
across the figure to create a percept of a folded surface in depth,
which would otherwise appear as a regular hexagon, as seen in figure 9
D.

Figure 9

A: Boundary completion in the Gestalt bubble model: A single line
ending creates a crease in the perceptual surface. B: Two line
endings generate a crease joining them. C. A regular hexagon figure
transforms into D: a percept of a folded surface in depth, with the
addition of suggestive lines, with the assistance of a global gestalt
that is consistent with that perceptual interpretation.

Gestalt theory emphasized the significance of closure as a prominant
factor in perceptual segmentation, since an enclosed contour is seen
to promote a figure / ground segregation (Koffka 1935 p. 178). For
example an outline square tends to be seen as a square surface in
front of a background surface that is complete and continuous behind
the square, as suggested in the perceptual model depicted in figure 10
A. The problem is that closure is a "gestaltqualität", a quality
defined by a global configuration that is difficult to specify in
terms of any local featural requirements, especially in the case of
irregular or fragmented contours as seen in figure 10 B. In this model
an enclosed contour breaks away a piece of the perceptual surface,
completing the background amodally behind the occluding foreground
figure. In the presence of irregular or fragmented edges the influence
of the individual edge fragments act collectively to break the
perceptual surface along that contour as suggested in figure 10 C, like
the breaking of a physical surface that is weakened along an irregular
line of cracks or holes. The final scission of figure from ground is
therefore driven not so much by the exact path of the individual
irregular edges, as it is by the global configuration of the emergent
gestalt.

Figure 10

A: The perception of closure and figure / ground segregation are
explained in the Gestalt bubble model exactly as perceived, in this
case as a foreground square in front of a background surface that
completes behind the square. B: Even irregular and fragmented
surfaces produce a figure / ground segregation. C: The perceived
boundary of the fragmented figure follows the global emergent gestalt
rather than the exact path of individual edges.

8.5 Vertices and Intersections

In the case of vertices or intersections between visual edges, the
different edges interact with one another favoring the percept of a
single vertex at that point. For example the three edges defining the
three-way "Y" vertex shown in figure 6 C promote the percept of a
single three-dimensional corner, whose depth profile depends on
whether the corner is perceived as convex or concave. In the case of
figure 6 A, the cubical percept constrains the central "Y" vertex as a
convex rather than a concave trihedral percept. I propose that this
dynamic behavior can be implemented using the same kinds of local
field-forces described in the appendix to promote
mutually orthogonal completion in three dimensions, wherever visual
edges meet at an angle in two dimensions. Figure 11 A depicts the
three-dimensional influence of the two-dimensional Y-vertex when
projected on the front face of the volumetric matrix. Each plane of
this three-planed structure promotes the emergence of a corner or
occlusion percept at some depth within that plane. But the effects due
to these individual edges are not independent. Consider for example,
first the vertical edge projecting from the bottom of the vertex. By
itself, this edge might produce a folded percept as suggested in
figure 11 B, which could occur through a range of depths, and a
variety of orientations in depth, and in concave or convex form. But
the two angled planes of this percept each intersect the other two
fields of influence due to the other two edges of the stimulus, as
suggested in figure 11 B, thus favoring the emergence of those edges'
perceptual folds at that same depth, resulting in a single trihedral
percept at some depth in the volumetric matrix, as suggested in figure
11 C. Any dimension of this percept that is not explicitly specified
or constrained by the visual input, remains unconstrained. In other
words, the trihedral percept is embedded in the volumetric matrix in
such a way that its three component corner percepts are free to slide
inward or outward in depth, to rotate through a small range of angles,
and to flip in bistable manner between a convex and concave trihedral
configuration. The model now expresses the multistability of the
rod-and-rail analogy shown in figure 6 D, but in a more generalized
form that is no longer hard-wired to the Y-vertex input shown in
figure 6 C, but can accommodate any arbitrary configuration of lines
in the input image. A local visual feature like an isolated Y-vertex
generally exhibits a larger number of stable states, whereas in the
context of adjacent features the number of stable solutions is often
diminished. This explains why the cubical percept of figure 6 A is
stable, while its central Y-vertex alone as shown in figure 6 C is
bistable. The fundamental multistability of figure 6 A can be revealed
by the addition of a different spatial context, as depicted in figure
11 D.

Figure 11

A: The three-dimensional field of influence due to a two-dimensional
Y-vertex projected into the depth dimension of the volumetric
matrix. B: Each field of influence, for example the one due to the
vertical edge, stimulates a folded surface percept. The folded
surface intersects the other fields of influence due to the other two
edges, thereby tending to produce a percept of a single corner
percept. C: One of many possible emergent surface percepts in
response to that stimulus, in the form of a convex trihedral surface
percept. D: The percept can also be of a concave trihedral corner, as
seen sometimes at the center in this bistable figure.

8.6 Perspective Cues

Perspective cues offer another example of a computation that is
inordinately complicated in most models. However in a fully reified
spatial model perspective can be computed relatively easily with only
a small change in the geometry of the model. Figure 12 A shows a
trapezoid stimulus, which has a tendency to be perceived in depth,
i.e. the shorter top side tends to be perceived as being the same
length as the longer base, but apparently diminished by
perspective. Arnheim (1969) suggests a simple distortion to the
volumetric model to account for this phenomenon, which can be
reformulated as follows. The height and width of the volumetric matrix
are diminished as a function of depth, as suggested in figure 12 B,
transforming the block shape into a truncated pyramid that tapers in
depth. The vertical and horizontal dimensions represented by that
space however are not diminished, in other words, the larger front
face and the smaller rear face of the volumetric structure represent
equal areas in perceived space, by unequal areas in representational
space, as suggested by the converging grid lines in the figure. All of
the spatial interactions described above, for example the collinear
propagation of corner and occlusion percepts, would be similarly
distorted in this space. Even the angular measure of orthogonality is
distorted somewhat by this transformation. For example the perceived
cube depicted in the solid volume of figure 12 B is metrically
shrunken in height and width as a function of depth, but since this
shrinking is in the same proportion as the shrinking of the space
itself, the depicted irregular cube represents a percept of a regular
cube with equal sides and orthogonal faces. The propagation of the
field of influence in depth due to a two-dimensional visual input on
the other hand does not shrink with depth. A projection of the
trapezoid of figure 12 A would occur in this model as depicted in
figure 12 C, projecting the trapezoidal form backward in parallel,
independent of the convergence of the space around it. The shaded
surfaces in figure 12 C therefore represent the locus of all possible
spatial interpretations of the two-dimensional trapezoid stimulus of
figure 12 A, or the extrinsic constraints for the spatial percept due
to this stimulus. For example one possible perceptual interpretation
is of a trapezoid parallel to the plane of the page, which can be
perceived to be either nearer or farther in depth, but since the size
scale shrinks as a function of depth, the percept will be experienced
as larger in absolute size (as measured against the shrunken spatial
scale) when perceived as farther away, and as smaller in absolute size
(as measured against the expanded scale) when perceived to be closer
in depth. This corresponds to the phenomenon known as Emmert's Law
(Coren et al. 1994), whereby a retinal after-image appears larger when
viewed against a distant background than when viewed against a nearer
background. Now there are also an infinite number of alternative
perceptual interpretations of the trapezoidal stimulus, some of which
are depicted by the dark shaded lines of figure 12 D. Most of these
alternative percepts are geometrically irregular, representing figures
with unequal sides and odd angles. But of all these possibilities,
there is one special case, depicted in black lines in figure 12 D, in
which the convergence of the sides of the perceived form happens to
coincide exactly with the convergence of the space itself. In other
words, this particular percept represents a regular rectangle viewed
in perspective, with parallel sides and right angled corners, whose
nearer (bottom) and farther (top) horizontal edges are the same length
in the distorted perceptual space. While this rectangular percept
represents the most stable interpretation, other possible
interpretations might be suggested by different contexts. The most
significant feature of this concept of perceptual processing is that
the result of the computation is expressed not in the form of abstract
variables encoding the depth and slope of the perceived rectangle, but
in the form of an explicit three-dimensional replica of the surface
as it is perceived to exist in the world.

Figure 12

A: A trapezoidal stimulus that tends to be perceived as a rectangle
viewed in perspective. B: The perspective modified spatial
representation whose dimensions are shrunken in height and bredth as a
function of depth. C: The parallel projection of a field of influence
into depth of the two-dimensional trapezoidal stimulus. D: Several
possible perceptual interpretations of the trapezoidal stimulus, one
of which (depicted in black outline) represents a regular rectangle
viewed in perspective, because the convergence of its sides exactly
matches the convergence of the space itself.

This cosine function allows the coplanar influence to propagate to
near-coplanar orientations, thereby allowing surface completion to
occur around smoothly curving surfaces. The tolerance to such
curvature can also be varied parametrically by raising the cosine
function to a positive power Q, as shown in Equation 1. So the
in-plane stiffness of the coplanarity constraint is adjusted by
parameter P, while the angular stiffness is adjusted by
parameter Q. The absolute value on the cosine function in
Equation 1 allows interaction between elements when q is between p/2 and
p.

Figure 18

Orientation of the field of influence between one element and another.
For an element located at polar coordinates (r,q), the influence varies as a cosine function of
q, the angle between the normal vectors of the two interacting
elements.

The Occlusion Field

The orthogonality and occlusion fields have one less dimension of
symmetry than the coplanarity field, and therefore they are defined
with reference to two vectors through each element at right angles to
each other, as shown in Figure 19 A. For the orthogonality field,
these vectors represent the surface normals to the two orthogonal
planes of the corner, while for the occlusion field one vector is a
surface normal, and the other vector points within that plane in a
direction orthogonal to the occlusion edge. The occlusion field G
around the local element is defined in polar coordinates from these
two vector directions, using the angles a and
b respectively, as shown
in Figure 19 A. The plane of the first surface is defined as for the
coplanarity field, with the equation
Gabr =
e-r2
sin(a)P.
For the occlusion field this planar function should be split in two,
as shown in Figure 19 B to produce a positive and a negative half, so
that this field will promote surface completion in one direction only,
and will actually suppress surface completion in the negative half of
the field. This can be achieved by multiplying the above equation by
the sign (plus or minus, designated by the function sgn()) of a
cosine on the orthogonal vector, i.e.
Gabr =
e-r2
sin(a)P
sgn(cos(b)).
Because of the negative half-field in this function, there is no need
to normalize the equation. However the oriented component of the field
can be added as before, resulting in the equation

Gabrq =
e-r2
[sin(a)P
sgn(cos(b))]
| cos(q)Q |

(EQ 2)

Again, the maximal influence will be experienced when the two
elements are parallel in orientation, i.e. when q = 0. As before, the orientation cosine function
is raised to the positive power Q, to allow parametric
adjustment of the stiffness of the coplanarity constraint.

Figure 19

The Orthogonality Field

The orthogonality field H can be developed in a similar manner,
beginning with the planar function divided into positive and negative
half-fields, i.e. with the equation
Habr =
e-r2
sin(a)P
sgn(cos(b))
but then adding another similar plane from the orthogonal surface
normal, producing the equation
Habr =
e-r2
[sin(a)P
sgn(cos(b)) +
sin(b)P
sgn(cos(a))].
This produces two orthogonal planes, each with a negative half-field,
as shown schematically in Figure 19 C. Finally, this equation must be
modified to add the oriented component to the field, represented by
the vector q, such that the maximal influence
on an adjacent element will be experienced when that element is either
within one positive half-plane and at one orientation, or is within
the other positive half-plane and at the orthogonal orientation. The
final equation for the orthogonality field therefore is defined by

Edge Consistency and Inconsistency Constraints

There is another aspect of the field-like interaction between
elements that remains to be defined. Both the orthogonal and the
occlusion states are promoted by appropriately aligned neighboring
elements in the coplanar state. Orthogonal and occlusion elements
should also feel the influence of neighboring elements in the
orthogonal and occlusion states, because a single edge should have a
tendency to become either an orthogonal corner percept, or an
occlusion edge percept along its entire length. Therefore orthogonal
or occlusion elements should promote like-states, and inhibit
unlike-states in adjacent elements along the same corner or edge. The
interaction between like-state elements along the edge will be called
the edge-consistency constraint, and the corresponding field of
influence will be designated E, while the complementary interaction
between unlike-state elements along the edge is called the
edge-inconsistency constraint, whose corresponding edge-inconsistency
field will be designated I. These interactions are depicted
schematically in Figure 20

Figure 20

A and B: Edge consistency constraint as an excitatory influence
between like-state elements along a corner or edge percept. C and D:
Edge inconsistency constraint as an inhibitory influence between
unlike-state elements along a corner or edge percept. E: The
direction along the edge expressed as the intersection of the
orthogonal planes defined by the sine functions on the two orthogonal
vectors.

The spatial direction along the edge can be defined by the product
of the two sine functions sin(a) sin(b) defining the orthogonal planes, denoting the
zone of intersection of those two orthogonal planes, as suggested in
Figure 20 E. Again, this field can be sharpened by raising these sine
functions to a positive power P, and localized by applying the
exponential decay function. The edge consistency constraint E
therefore has the form
Eabr =
e-r2
[sin(a)P
sin(b)P].
As for the orientation of the edge-consistency field, this will
depend now on two angles,q and f, representing the orientations of the two
orthogonal vectors of the adjacent orthogonal or occlusion elements
relative to the two normal vectors respectively. Both the
edge-consistency and the edge-inconsistency fields, whether
excitatory between like-state elements, or inhibitory between
unlike-state elements, should peak when both pairs of reference
vectors are parallel to the normal vectors of the central element,
i.e. when q and f
are both equal to zero. The full equation for the edge-consistency
field E would therefore be

Eabrqf =
e-r2
[sin(a)P
sin(b)P]
cos(q)Q
cos(f)Q

(EQ 4)

where this equation is applied only to like-state edge or corner
elements, while the edge-inconsistency field I would be given by

Iabrqf =
e-r2
[sin(a)P
sin(b)P]
cos(q)Q
cos(f)Q

(EQ 5)

applied only to unlike-state elements. The total influence R on an
occlusion element therefore is calculated as the sum of the influence
of neighboring coplanar, orthogonal, and occlusion state elements as
defined by

Rabrqf =
Gabrqf +
Eabrqf -
Iabrqf

(EQ 6)

and the total influence S on an orthogonal state element is
defined by

Sabrqf =
Habrqf +
Eabrqf -
Iabrqf

(EQ 7)

Influence of the Visual Input

A two-dimensional visual edge has an influence on the
three-dimensional interpretation of a scene, since an edge is
suggestive of either a corner or an occlusion at some orientation in
three dimensions whose two-dimensional projection coincides with that
visual edge. This influence however is quite different from the local
field-like influences described above, because the influence of a
visual edge should penetrate the volumetric matrix with a planar field
of influence to all depths, and should activate all local elements
within the plane of influence that are consistent with that
edge. Subsequent local interactions between those activated elements
serves to select which subset of them should finally represent the
three-dimensional percept corresponding to the two-dimensional
image. For example, a vertical edge as shown in Figure 21 A would
project a vertical plane of influence, as suggested by the light
shading in Figure 21 A, into the depth dimension of the volumetric
matrix, where it stimulates the orthogonal and occlusion states which
are consistent with that visual edge. For example it would stimulate
corner and occlusion states at all angles about a vertical axis, as
shown in Figure 21 A, where the circular disks represent different
orientations of the positive half-fields of either corner or occlusion
fields. However a vertical edge would also be consistent with corners
or occlusions about axes tilted relative to the image plane but within
the plane of influence, for example about the axes depicted in Figure
21 B. The same kind of stimulation would occur at every point within
the plane of influence of the edge, although only one point is
depicted in the figure. When all elements consistent with this
vertical edge have been stimulated, the local field-like interactions
between adjacent stimulated elements will tend to select one edge or
corner at some depth and at some tilt, thereby suppressing alternative
edge percepts at that two-dimensional location at different depths and
at different tilts. At equilibrium, some arbitrary edge or corner
percept will emerge within the plane of influence as suggested in
Figure 21 C, which depicts only one such possible percept, while edge
consistency interactions will promote like-state elements along that
edge, producing a single emergent percept consistent with the visual
edge. In the absence of additional influences, for example in the
isolated local case depicted in Figure 21 C, the actual edge that
emerges will be unstable, i.e. it could appear anywhere within the
plane of influence of the visual edge through a range of tilt angles,
and could appear as either an occlusion or a corner edge. However when
it does appear, it propagates its own field-like influence into the
volumetric matrix, in this example the corner percept would propagate
a planar percept of two orthogonal surfaces that will expand into the
volume of the matrix, as suggested by the arrows in Figure 21 C. The
final percept therefore will be influenced by the global pattern of
activity, i.e. the final percept will construct a self-consistent
perceptual whole, whose individual parts reinforce each other by
mutual activation by way of the local interaction fields, although
that percept would remain unstable in all unconstrained
dimensions. For example the corner percept depicted in Figure 21 C
would snake back and forth unstably within the plane of influence,
rotate back and forth along its axis through a small angle, and flip
alternately between the corner and occlusion states, unless the
percept is stabilized by other features at more remote locations in
the matrix.

Figure 21

The influence of a visual edge, in this case a vertical edge, is to A:
stimulate local elements in the occlusion or corner percept states at
orientations about a vertical axis, or B: about a tilted axis within
the plane of influence of the edge. At equilibrium C: a single
unified percept emerges, in this case of a perceived corner at some
depth and tilt in the volume of the matrix.