As we have seen in the preceding section, the pairing of RFs that
participate in a multiple-view representation of a 3D object leads to
improved invariance of the representation in the face of changes in
the object's pose. This means that the representation of a given view
may be used over a wider range of viewpoints centered around that
view. Obviously, the paired-RF representation delays the onset of the
problem associated with viewpoint-dependent appearance of objects, and
does not solve it completely. Thus, a system based on view-specific
representation that starts by storing a single view of a novel object
eventually must add more views to its representation of that object.
The present section, following
[13], concentrates on the
relationship between successive views of an object that undergoes
rotation in 3D, and on the way to put this relationship to use in
forming a multiple-view representation of the object. In this, it
complements the work of
[30], who considered the utility of
a multiple-view representation, irrespective of the structure that may
be imposed on the different views stored in the system.

Figure 4: Canonical and non-canonical views of
a 3D object. The dependence on viewpoint of the performance
of human observers required to recognize such objects must be
accounted for by any model of recognition.

To understand the need for building structure into a multiple-view
model of recognition, one may recall two basic findings of the
relevant psychophysical studies: the existence of canonical views both
for everyday and for novel objects (Figure 4), and the
common patterns in the dependence of recognition performance on the
viewpoint (Figure 5). The notion of a canonical view was
introduced by Palmer, Rosch and Chase, who found that certain views of
familiar objects were recognized consistently easier and faster than
randomly chosen views of the same objects
[29].
Just as in mental rotation experiments gradual and monotonic change in
viewpoint precipitates an equally gradual and monotonic change in the
performance
[38,39], the error rate and the response
time for a test view of a 3D shape was found to grow monotonically
with misorientation relative to a canonical view
[44].
This dependence, however, was shown to weaken with repeated testing
[11]. The simple model described below replicates these
basic characteristics of mental rotation in recognition, by imposing a
certain quasi-sequential structure on the collection of units, each of
which represents a particular view of the object.

The model, which will be called here NMR (short for No Mental
Rotation), is self-organizing, in that it learns to represent a 3D
object from examples
[13]. The basic operating cycle of
NMR is as follows:

Accept a view of a 3D object (fixed throughout the learning
procedure);

If the new view is sufficiently different from any of the
views stored in the system:

Store the view;

Create a (lateral) link between the newly stored view and
the previously activated one;

Else:

Activate the view that best matches the input;

Strengthen the (lateral) link between the current and the
previous active views.

The simple version of the algorithm, described in
[13], operates under a limitation on the total number of
units that can be recruited, and has no provisions for ``unlearning''
a representation, or freeing inactive units. That simple version is
capable, nevertheless, of reproducing the two basic empirical findings
in the study of mental rotation: the dependence of performance (as
measured by response time) on the angular distance to a canonical
view, and the disappearance of that dependence with practice
(Figure 5). Lateral connections play a crucial role in
both these traits of the model's performance.

Figure 5: ``Mental rotation'' and its
disappearance with practice. The time required to recognize
an object presented at a certain view depends on the
misorientation of that view relative to a canonical one. This
phenomenon resembles a similar dependence observed in experiments
involving comparison between two simultaneously presented images
[38]. Following practice or repeated testing,
the response times become essentially uniform for all tested
views.

Figure 6: A network
implementing multiple-view representation. Both the initial
semblance of ``mental rotation'' and its disappearance with
practice can be replicated by a model based on lateral links
between view-specific representation units in a network trained to
recognize the object (see the Emergence of ... section).

Initially, the network of lateral connections between units
representing individual views constitutes the medium over which
activation spreads between the different units (this happens when one
of them becomes activated following the exposure of the model to a
particular test view of the target object; see Figure 6).
The monotonic dependence of response time on viewpoint stems from the
``serial'' structure of the lateral connections (instilled by the
natural order of presentation of the individual views, corresponding
to the order of their appearance during a rotation of the object).

Subsequently, additional links are added to the initial ``linear''
pattern, creating shortcuts that lead to a faster and more uniform
activation of the entire structure of specific-view units
(Figure 7). Because of these shortcuts, the response
time (modeled by the time it takes the entire network to reach a
certain level of activation) becomes generally shorter, and
progressively less dependent on the identity of the input view (that
is, of the locus of the initial activation of the network).

Figure 7: The same network, with
shortcuts introduced by repeated exposure. The shortcuts
obliterate the initially sequential structure of the lateral links
in the network, leading to uniform response times for the
different views.

According to the NMR model, mental rotation is merely a byproduct of a
mechanism geared to create associations between representations
of certain well-defined entities (in the present case, between the
representations of object views), provided that the appearance of
these entities follows one of the laws of association known since
Aristotle. Thus, in the case of the NMR model, the postulation of
lateral connections contributes to the parsimony of the modeling
process, by reducing a well-known but easily misunderstood phenomenon
in visual psychophysics to an equally well-known phenomenon in general
cognition, namely, associative learning, the substrate for which is
widely agreed to exist in the form of learned associations.