Introduction

In the visual system, attending to important objects in the visual field relies on
the transfer of top-down, object-based task information to the spatially organised
areas of cortex. How this occurs and the method by which this information can influence
the dorsal stream and redirect gaze are not well understood. Current models of the
ventral stream mostly focus on the feed-forward mechanisms involved and current feedback
models do not seem to address the issue of object-space binding in a comprehensive
and plausible manner.

Methods

We investigated these questions using the following modeling framework. A bidirectional,
ventral stream object recognition hierarchy up to anterior inferior temporal cortex
(AIT) from primary visual cortex (V1) and a model of dorsal stream to frontal eye
fields (FEF) with our previously developed oculomotor system [1]. Selection is performed in both the object-based mapping of AIT [2] and the spatial mapping of FEF [3] by basal ganglia loops [4]. Modeling of the ventral stream consists of a hierarchy of increasingly spatially
invariant cortical areas linked by both feed-forward excitatory and feedback connections.
Within each receptive field, there is a competition to represent the strongest and
thus most likely representation for that region, which can be biased by the feedback
from higher visual areas. Three models of feedback attention mechanism were tested:
additive feedback, shunting (multiplicative) feedback and a shunt "gating" of feedback
by feed-forward. The model was tested using a simple visual world (colored "flags")
that nevertheless challenged all the main competencies being investigated. Performance
was measured by (i) eliciting saccadic "behavior" in simulated visual search with
different numbers of distractors, and (ii) target segmentation in cluttered scenes
within a fixed time window.

Results

In the target segmentation task, the additive feedback model consistently fails to
bind the AIT representation of the object to the correct location on the visual field.
The shunting model was able to segment 58% of scenes while the gating model was most
successful (83%) (Figure 1). We then took the most successful (gating) model and challenged it with a conjunction
visual search task. Here, by simulating models trained and naïve to the target stimulus,
we showed that subsequent learning of a combined representation of an untrained target
stimulus can explain the experimentally observed decrease in the slope of reaction
time against number of distractors for that target (Figure 2) [5].