2009

For simple visual patterns under the experimenter's control we impose which information, or features, an observer can use to solve a given perceptual task. For natural vision tasks, however, there are typically a multitude of potential features in a given visual scene which the visual system may be exploiting when analyzing it: edges, corners, contours, etc. Here we describe a novel non-linear system identification technique based on modern machine learning methods that allows the critical features an observer uses to be inferred directly from the observer's data. The method neither requires stimuli to be embedded in noise nor is it limited to linear perceptive fields (classification images). We demonstrate our technique by deriving the critical image features observers fixate in natural scenes (bottom-up visual saliency). Unlike previous studies where the relevant structure is determined manuallyâ€”e.g. by selecting Gabors as visual filtersâ€”we do not make any assumptions in this regard, but numerically infer number and properties them from the eye-movement data. We show that center-surround patterns emerge as the optimal solution for predicting saccade targets from local image structure. The resulting model, a one-layer feed-forward network with contrast gain-control, is surprisingly simple compared to previously suggested saliency models. Nevertheless, our model is equally predictive. Furthermore, our findings are consistent with neurophysiological hardware in the superior colliculus. Bottom-up visual saliency may thus not be computed cortically as has been thought previously.

This paper proposes a method for computing fast approximations to support vector decision functions in the field of object detection. In the present approach we are building on an existing algorithm where the set of support vectors is replaced by a smaller, so-called reduced set of synthesized input space points. In contrast to the existing method that finds the reduced set via unconstrained optimization, we impose a structural constraint on the synthetic points such that the resulting approximations can be evaluated via separable filters. For applications that require scanning an entire image, this decreases the computational complexity of a scan by a significant amount. We present experimental results on a standard face detection database.

We compare two approaches to the problem of estimating the depth
of a point in space from observing its image position in two
different cameras: 1.~The classical photogrammetric approach
explicitly models the two cameras and estimates their intrinsic
and extrinsic parameters using a tedious calibration procedure;
2.~A generic machine learning approach where the mapping from
image to spatial coordinates is directly approximated by a Gaussian Process regression. Our results show that the generic
learning approach, in addition to simplifying the procedure of
calibration, can lead to higher depth accuracies than classical
calibration although no specific domain knowledge is used.

This paper presents a method for single-frame image super-resolution
using an unsupervised learning technique. The required prior
knowledge about the high-resolution images is obtained from
Kernel Principal Component Analysis (KPCA). The original form of
KPCA, however, can be only applied to strongly restricted image
classes due to the limited number of training examples that can be
processed. We therefore propose a new iterative method for performing
KPCA, the {em Kernel Hebbian Algorithm}. By kernelizing the
Generalized Hebbian Algorithm, one can iteratively estimate the Kernel
Principal Components with only linear order memory complexity. The
resulting super-resolution algorithm shows a comparable performance to
the existing supervised methods on images containing faces and natural
scenes.

We present a new approximation scheme for support vector decision functions in object detection. In the present approach we are building on an existing algorithm where the set of support vectors is replaced by a smaller so-called reduced set of synthetic points. Instead of finding the reduced set via unconstrained optimization, we impose a structural constraint on the synthetic vectors such that the resulting approximation can be evaluated via separable filters. Applications that require scanning an entire
image can benefit from this representation: when using separable filters, the average computational complexity for evaluating a reduced set vector on a test patch of size (h x w) drops from O(hw) to O(h+w). We show experimental results on handwritten digits and face detection.

We introduce a learning technique for regression
between high-dimensional spaces. Standard methods typically reduce
this task to many one-dimensional problems, with each output
dimension considered independently. By contrast, in our approach
the feature construction and the regression estimation are
performed jointly, directly minimizing a loss function that we
specify, subject to a rank constraint. A major advantage of this
approach is that the loss is no longer chosen according to the
algorithmic requirements, but can be tailored to the
characteristics of the task at hand; the features will then be
optimal with respect to this objective, and dependence between the
outputs can be exploited.

2003

A new method for performing a kernel principal component analysis is
proposed. By kernelizing the generalized Hebbian algorithm, one can
iteratively estimate the principal components in a reproducing
kernel Hilbert space with only linear order memory complexity. The
derivation of the method, a convergence proof, and preliminary
applications in image hyperresolution are presented. In addition,
we discuss the extension of the method to the online learning of
kernel principal components.

1998

In homing tasks, the goal is often not marked by visible objects but must be inferred from the spatial relation to the visual cues in the
surrounding scene. The exact computation of the goal direction would require knowledge about the distances to visible landmarks, information, which
is not directly available to passive vision systems. However, if prior assumptions about typical distance distributions are used, a snapshot taken at the
goal suffices to compute the goal direction from the current view. We show that most existing approaches to scene-based homing implicitly assume
an isotropic landmark distribution. As an alternative, we propose a homing scheme that uses parameterized displacement fields. These are obtained
from an approximation that incorporates prior knowledge about perspective distortions of the visual environment. A mathematical analysis proves
that both approximations do not prevent the schemes from approaching the goal with arbitrary accuracy, but lead to different errors in the computed
goal direction. Mobile robot experiments are used to test the theoretical predictions and to demonstrate the practical feasibility of the new approach.

We present a purely vision-based scheme for learning a topological representation of an open environment. The system represents selected places by local views of the surrounding scene, and finds traversable paths between them. The set of recorded views and their connections are combined into a graph model of the environment. To navigate between views connected in the graph, we employ a homing strategy inspired by findings of insect ethology. In robot experiments, we demonstrate that complex visual exploration and navigation tasks can thus be performed without using metric information.

1996

We present a purely vision-based scheme for learning a parsimonious representation of an open environment. Using simple exploration behaviours, our system constructs a graph of appropriately chosen views. To navigate between views connected in the graph, we employ a homing strategy inspired by findings of insect ethology. Simulations and robot experiments demonstrate the feasibility of the proposed
approach.

Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems