Perception: Constancy and Change

Konrad Paul Körding studied physics at the University of Heidelberg and at ETH Zurich. At the moment he is finishing his doctorate in the work group of Peter König on the perception of animals and artificial systems. His work can be found at www.koerding.com.

Humans and animals can easily recognize objects - even if they are set in cluttered scenes. The fact that objects appear in various sizes, from various directions or in different light does not make recognition harder for us.

For approximately half a century, scientists from various areas of artificial intelligence have attempted to imitate the human ability to recognize objects. Artificial systems today are very good at recognizing an object that doesn't change its appearance, even if part of the object is occluded or if the entire object disappears in fog. Objects in the cluttered scenes of daily life, however, appear in constantly changing sizes or orientation, which makes recognition significantly more difficult.

Two approaches are often applied to this problem. Either an algorithm is constructed that negates the change; if, for example, objects appear at various positions, they can be always be moved to the center of the screen in the computer's memory [1]. If they rotate, they can be turned back so that they appear vertical. This results in the problem that, in order to move an object into the center of the screen, one must first recognize it, i.e. in this case separate it from other objects in the scene. Another often-used alternative is to show the system the same object in different sizes with different backgrounds, so that the system learns to correctly classify the object [2]. A significant problem with this method, which at least functions well in simple situations, is that one must show the system very many instances, in contrast to us humans, who can often recognize a new object the first time we see it.

We follow a fundamentally different approach. An object such as a chair creates a constantly changing picture in our eye as we move about, while the identity of the object remains the same. We build an artificial system that learns to find out exactly those characteristics of the pictures [3] that change as little as possible. This allows ignoring irrelevant information. We play videos of the real world to an artificial system and let it learn slowly varying characteristics. In this way we hope to construct a system that is able to learn to classify objects by itself. First experiments with very simple objects were successful and allow us to hope that the principle can also work with more difficult problems. Objects in such a system define themselves from the statistics of our world and do not need to be defined a priori.