vision

ventral & dorsal vision

our computer modelling centre

OFTNAI is currently supporting a computer modelling centre within the Oxford UniversityDepartment of Experimental Psychology. Computer modellers within the university research centre are exploring various aspects of visual processing in the brain, including motion detection, face recognition in natural scenes, and the recognition of objects from novel views.

Over successive stages, the primate visual system develops neurons that respond with view, size and position invariance to objects or faces.

Our models explain how such neurons may develop their firing properties, and hence allow the visual system to recognise objects in natural environments.

the impact of our research on vision

This research has direct bearing on understanding disorders of visual perception such as amblyopia, in which one eye suffers reduced vision due to interference during early visual development. Amblyopia is the leading cause of vision loss in persons under 40 years of age. Other disabilities include prosopagnosia where subjects have difficulty recognising faces, or spatial neglect where patients ignore part of their field of vision.

Continuous Transformation Learning - Recently, our computer simulations have revealed a powerful new algorithm, Continuous Transformation Learning, that may account for how the brain learns to recognise objects and faces from different viewpoints.

This discovery represents a major breakthrough in understanding the operation of the visual system, and should help to guide the treatment of visual disorders arising from developmental problems.

In addition to potential medical benefits, possible engineering applications of this research range from visual control and quality inspection in manufacturing to automated CCTV monitoring. The new Continuous Transformation Learning algorithm may help robots to operate more flexibly in real-world environments by enabling them to recognise objects from different viewpoints.

ventral vision

The ability of the visual brain to analyse and recognize objects under natural viewing conditions is unmatched by today’s computer vision systems. In order to achieve this singular ability, the primate brain develops and utilises a rich tapestry of cells that encode diﬀerent kinds of visual information.

These later stages of processing carry out object recognition by integrating information from more elementary visual features represented in earlier layers.

The ventral visual pathway is thus thought to be responsible for transform-invariant visual object and face recognition in the brain. However, it remains a difficult challenge to understand exactly how these neurons develop their response properties during learning. The learning processes will depend on how the neurons interact with each other through successive layers of the ventral visual pathway as they are driven by rich visual input from natural scenes. We aim to investigate this through computer simulations that accurately model the behaviour of individual neurons, how these neurons are linked together in the brain, how the synaptic connections between cells are modified during learning, and the statistical properties of the visual input from the sensory environment.

Over the past twenty years, the Oxford laboratory has investigated a range of problems in this field using a computer model, VisNet of the primate ventral visual pathway. Similar to the actual visual system, this model is composed of a series of competitive networks to represent each stage of the system from the primary visual cortex (V1) to the anterior inferior temporal cortex (TE).

Architecture of the computer model of the primate ventral visual pathway VisNet. Similar to the actual visual system, this model is composed of a series of competitive networks.

We are currently investigating a number of further issues, such as a potential role of back-projections, Self-Organising Map, Spiking Neural Networks, Subliminal Learning, and etc., which are all related to visually guided learning and visual processing through successive stages of the ventral visual pathway.

Case study:conformation and spatial relations of facial features

Experimental studies have shown that neurons at an intermediate stage of the primate ventral visual pathway encode the conformation and spatial relations of facial features (Freiwald et al., 2009), while neurons in the later stages are selective to the full face (Tsao at el., 2006). In this study, we investigate how these cell firing properties may develop through visually-guided learning.

Figure 1A - A hierarchical neural network model of the primate’s ventral visual pathway is trained by presenting many randomly generated faces to the hierarchical competitive neural network while a local learning rule modifies the strengths of the synaptic connections between successive layers (Tromans et al., 2011).

Figure 1B - After training, the model is found to have developed the experimentally observed cell firing properties. In particular, we have demonstrated how the primate brain learns to represent facial expression independently of facial identity as reported in Hasselmo et al. (1989).

Figure 1C - We have also shown how the visual system forms separate representations of facial features such as the eyes, nose and mouth as well as representations of spatial relationships between these facial features, as have been reported in single unit recording studies (Freiwald, 2009).

Therefore, this research makes an important contribution to understanding visual processing in the primate brain.

dorsal vision

Part of our research in vision focuses on understanding the role of the dorsal stream in the sensorimotor transformations required for visually guided actions.

Visual targets are initially encoded in a retinal reference frame. However, this information is transformed in later stages of processing into different supra-retinal coordinate frames that are more suitable to guide our behavior. A number of neurophysiological studies in the posterior parietal cortex and premotor areas of the primate brain have reported a continuum of complex reference frames, ranging from eye-centred, head-centred, hand-centred, body-centred as well as intermediate and gain modulated representations.

Part of our research in vision focuses on understanding the role of the dorsal stream in the sensorimotor transformations required for visually guided actions.

Visual targets are initially encoded in a retinal reference frame. However, this information is transformed in later stages of processing into different supra-retinal coordinate frames that are more suitable to guide our behavior. A number of neurophysiological studies in the posterior parietal cortex and premotor areas of the primate brain have reported a continuum of complex reference frames, ranging from eye-centred, head-centred, hand-centred, body-centred as well as intermediate and gain modulated representations.

The modellers within our research centre have been producing self-organizing neural network models that provide a theoretical framework to explain the development of cells that encode the location of visual targets in different reference frames.

Case study:recognising objects from novel views

One of the major challenges in computer vision is how to recognise objects from different viewpoints which have not been encountered during training. Our neural network model of the ventral visual system is able to accomplish this by first learning how elemental features in the environment transform across different viewpoints during early visual development (Stringer & Rolls, 2002).

Left - Architecture of a 4-layer hierarchical neural network model of the ventral visual processing stream. Convergence through the network is designed to provide fourth-layer neurons with information across the entire input retina. Right - Convergence through successive layers of the visual system.

Six visual stimuli with three surface features that occur in three relative positions. Each row shows one of the stimuli rotated through the five different rotational views, in which the stimulus is presented to the network. From left to right, the rotational views shown are -60 degrees, -30 degrees, 0 degree (central position), 30 degrees, and 60 degrees.

To simulate early visual development, layers 1 and 2 are trained on pairs of surface features across all five views. Then layers 3 and 4 are trained on the complete stimuli at only four out of the five views.

Results from the neural network simulation after training. The figure shows the response profiles of a top layer neuron to the 6 stimuli across all 5 views. It can be seen that this cell has learned to respond invariantly to one of the stimuli across all views. The network has learned to discriminate between the 6 objects from all views, including the novel view not encountered during training.

The Oxford Foundation for Theoretical Neuroscience and Artificial Intelligence is incorporated through Companies House as a charitable company limited by guarantee (Company No. 5722895), and is registered as a charity with the Charity Commission for England and Wales (Charity No. 1116075). The board of trustees includes academics and research scientists from the Department of Experimental Psychology at the University of Oxford, who have many years of experience in developing computer models of the brain. The registered office is Midland House, West Way, Botley, Oxford OX2 0PH, United Kingdom. +44 1865 809444 trustees@oftnai.org