Abstract

The inferior temporal cortex (IT) of the monkey has long been known to play an essential role in visual object recognition. The present study examines the role of IT neurons in combined psychophysical and electrophysiological experiments, in which monkeys learned to classify and recognize computer generated three-dimensional objects.
The monkeys recognized as the target only those 2-D views that were close to the familiar, training view (Logothetis et al., Current Biology, 1994). Recognition became increasingly difficult for them as the stimulus was rotated away from the experienced viewpoint, and they failed to recognize views farther than about ± 40 deg from the training view.
When the animals were trained with as few as three views of an object, 120 deg apart, they could often recognize all views resulting from rotations around the same axis. Such
performance suggests that view-invariant recognition of familiar objects by both humans and nonhuman primates involves perceptual learning and may be accomplished by a
viewer-centered system that interpolates between only a small number of stored views.
A population of IT neurons was found that responded selectively to views of recently learned objects (Logothetis et al, Current Biology, 1995). The cells discharged maximally for one object-view, and their response fell off gradually as the object was rotated away
from the neuron's preferred view. A systematic analysis of the response of these neurons to various parts of the view revealed that most cells were responding to a complex configuration within the view. The response of some other cells to object parts was highly nonlinear indicating more configurational or “holistic” selectivity. No selective responses were ever encountered for views that the animal systematically failed to recognize. For a number of objects that were used extensively during the training of the animals, and for which behavioral performance was view-independent, multiple cells were found that were tuned around different views of the same object. A very small number of neurons were selective for all views of one particular object.
Our experiments show that recognition of three-dimensional novel objects is a function of the object's retinal projection. This finding supports the notion of viewer-centered object representations for the purpose of recognition. Our results are similar to those obtained under similar circumstances in humans (Rock et al., JEP: Gen., 1981; Bülthoff Edelman, PNAS, 1992). Our results suggest that IT neurons can develop a complex receptive field organization as a consequence of extensive training in the discrimination and recognition of objects. None of these objects had any prior meaning for the animal,
nor did they resemble anything familiar in the monkey's environment. These findings support the idea that a population of neurons -- each tuned to a different object aspect, showing a certain degree of invariance to image transformations, and selective for complex object features -- may, as an assembly, encode at least some types of complex 3-D objects for which structural decomposition is not possible or meaningful.