2009

The human visual system is foveated, that is, outside the central visual field resolution and acuity drop rapidly. Nonetheless much of a visual scene is perceived after only a few saccadic eye movements, suggesting an effective strategy for selecting saccade targets. It has been known for some time that local image structure at saccade targets influences the selection process. However, the question of what the most relevant visual features are is still under debate. Here we show that center-surround patterns emerge as the optimal solution for predicting saccade targets from their local image structure. The resulting model, a one-layer feed-forward network, is surprisingly simple compared to previously suggested models which assume much more complex computations such as multi-scale processing and multiple feature channels. Nevertheless, our model is equally predictive. Furthermore, our findings are consistent with neurophysiological hardware in the superior colliculus. Bottom-up visual saliency may thus not be computed cortically as has been thought previously.

Background: Treatment of neurodegenerative diseases is likely to be most beneficial in the very early, possibly preclinical stages of degeneration. We explored the usefulness of fully automatic structural MRI classification methods for detecting subtle degenerative change. The availability of a definitive genetic test for Huntington disease (HD) provides an excellent metric for judging the performance of such methods in gene mutation carriers who are free of symptoms.
Methods: Using the gray matter segment of MRI scans, this study explored the usefulness of a multivariate support vector machine to automatically identify presymptomatic HD gene mutation carriers (PSCs) in the absence of any a priori information. A multicenter data set of 96 PSCs and 95 age- and sex-matched controls was studied. The PSC group was subclassified into three groups based on time from predicted clinical onset, an estimate that is a function of DNA mutation size and age.
Results: Subjects with at least a 33% chance of developing unequivocal signs of HD in 5 years were correctly assigned to the PSC group 69% of the time. Accuracy improved to 83% when regions affected by the disease were selected a priori for analysis. Performance was at chance when the probability of developing symptoms in 5 years was less than 10%.
Conclusions: Presymptomatic Huntington disease gene mutation carriers close to estimated diagnostic onset were successfully separated from controls on the basis of single anatomic scans, without additional a priori information. Prior information is required to allow separation when degenerative changes are either subtle or variable.

For simple visual patterns under the experimenter's control we impose which information, or features, an observer can use to solve a given perceptual task. For natural vision tasks, however, there are typically a multitude of potential features in a given visual scene which the visual system may be exploiting when analyzing it: edges, corners, contours, etc. Here we describe a novel non-linear system identification technique based on modern machine learning methods that allows the critical features an observer uses to be inferred directly from the observer's data. The method neither requires stimuli to be embedded in noise nor is it limited to linear perceptive fields (classification images). We demonstrate our technique by deriving the critical image features observers fixate in natural scenes (bottom-up visual saliency). Unlike previous studies where the relevant structure is determined manuallyâ€”e.g. by selecting Gabors as visual filtersâ€”we do not make any assumptions in this regard, but numerically infer number and properties them from the eye-movement data. We show that center-surround patterns emerge as the optimal solution for predicting saccade targets from local image structure. The resulting model, a one-layer feed-forward network with contrast gain-control, is surprisingly simple compared to previously suggested saliency models. Nevertheless, our model is equally predictive. Furthermore, our findings are consistent with neurophysiological hardware in the superior colliculus. Bottom-up visual saliency may thus not be computed cortically as has been thought previously.

This paper presents a fully automated algorithm for reconstructing
a textured 3D model of a face from a single
photograph or a raw video stream. The algorithm is based
on a combination of Support Vector Machines (SVMs) and
a Morphable Model of 3D faces. After SVM face detection,
individual facial features are detected using a novel
regression- and classification-based approach, and probabilistically
plausible configurations of features are selected
to produce a list of candidates for several facial feature positions.
In the next step, the configurations of feature points
are evaluated using a novel criterion that is based on a
Morphable Model and a combination of linear projections.
To make the algorithm robust with respect to head orientation,
this process is iterated while the estimate of pose is
refined. Finally, the feature points initialize a model-fitting
procedure of the Morphable Model. The result is a highresolution
3D surface model.

Humans perceives the world by directing the center of gaze from one location to another via rapid eye movements, called saccades. In the period between saccades the direction of gaze is held fixed for a few hundred milliseconds (fixations). It is primarily during fixations that information enters the visual system. Remarkably, however, after only a few fixations we perceive a coherent, high-resolution scene despite the visual acuity of the eye quickly decreasing away from the center of gaze: This suggests an effective strategy for selecting saccade targets.
Top-down effects, such as the observer's task, thoughts, or intentions have an effect on saccadic selection. Equally well known is that bottom-up effects-local image structure-influence saccade targeting regardless of top-down effects. However, the question of what the most salient visual features are is still under debate. Here we model the relationship between spatial intensity patterns in natural images and the response of the saccadic system using tools from machine learning. This allows us to identify the most salient image patterns that guide the bottom-up component of the saccadic selection system, which we refer to as perceptive fields. We show that center-surround patterns emerge as the optimal solution to the problem of predicting saccade targets. Using a novel nonlinear system identification technique we reduce our learned classifier to a one-layer feed-forward network which is surprisingly simple compared to previously suggested models assuming more complex computations such as multi-scale processing, oriented filters and lateral inhibition. Nevertheless, our model is equally predictive and generalizes better to novel image sets. Furthermore, our findings are consistent with neurophysiological hardware in the superior colliculus. Bottom-up visual saliency may thus not be computed cortically as has been thought previously.

Interest point detection in still images is a well-studied topic in computer vision.
In the spatiotemporal domain, however, it is still unclear which features indicate useful interest points. In this paper we approach the problem by emph{learning} a detector from examples: we record eye movements of human subjects watching video sequences and train a neural network to predict which locations are likely to become eye movement targets. We show that our detector outperforms current spatiotemporal interest point architectures on a standard classification dataset.

This paper presents a fully automated algorithm for reconstructing a textured 3D model of a face from a single photograph or a raw video stream. The algorithm is based on a combination of Support Vector Machines (SVMs) and a Morphable Model of 3D faces. After SVM face detection, individual facial features are detected using a novel regression-and classification-based approach, and probabilistically plausible configurations of features are selected to produce a list of candidates for several facial feature positions. In the next step, the configurations of feature points are evaluated using a novel criterion that is based on a Morphable Model and a
combination of linear projections. Finally, the feature points initialize a model-fitting procedure of the Morphable Model. The result is a high-resolution 3D surface model.

This paper addresses the bottom-up influence of local image
information on human eye movements. Most existing computational models use a set of biologically plausible linear filters, e.g., Gabor or Difference-of-Gaussians filters as a front-end, the outputs of which are nonlinearly combined into a real number that indicates visual saliency. Unfortunately, this requires many design parameters such as the number, type, and size of the
front-end filters, as well as the choice of nonlinearities,
weighting and normalization schemes etc., for which biological plausibility cannot always be justified. As a result, these parameters have to be chosen in a more or less ad hoc way. Here, we propose to emph{learn} a visual saliency model directly from human eye movement data. The model is rather simplistic and essentially parameter-free, and therefore contrasts recent developments in the field that usually aim at higher prediction rates at the cost of additional parameters and increasing model complexity. Experimental results show that - despite the lack of
any biological prior knowledge - our model performs comparably to existing approaches, and in fact learns image features that resemble findings from several previous studies. In particular, its maximally excitatory stimuli have center-surround structure, similar to receptive fields in the early human visual system.

Computational models for bottom-up visual attention traditionally consist of a bank of Gabor-like or Difference-of-Gaussians filters and a nonlinear combination scheme which combines the filter responses into a real-valued saliency measure [1]. Recently it was shown that a standard machine learning algorithm can be used to derive a saliency model from human eye movement data with a very small number of additional assumptions. The learned model is much simpler than previous models, but nevertheless has state-of-the-art prediction performance [2]. A central result from this study is that DoG-like center-surround filters emerge as the unique solution to optimizing the predictivity of the model.
Here we extend the learning method to the temporal domain. While the previous model [2] predicts visual saliency based on local pixel intensities in a static image, our model also takes into account temporal intensity variations. We find that the learned model responds strongly to temporal intensity changes ocurring 200-250ms before a saccade is initiated. This delay coincides with the typical saccadic latencies, indicating that the learning algorithm has extracted a meaningful statistic from the training data. In addition, we show that the model correctly predicts a significant proportion of human eye movements on previously unseen test data.

We present an approach for designing interest operators that are
based on human eye movement statistics. In contrast to existing
methods which use hand-crafted saliency measures, we use machine
learning methods to infer an interest operator directly from eye
movement data. That way, the operator provides a measure of
biologically plausible interestingness. We describe the data
collection, training, and evaluation process, and show that our
learned saliency measure significantly accounts for human eye
movements. Furthermore, we illustrate connections to existing
interest operators, and present a multi-scale interest point
detector based on the learned function.

The human visual system samples images through saccadic eye movements which rapidly change the point of fixation. Although the selection of eye movement targets depends on numerous top-down mechanisms, a number of recent studies have shown that low-level image features such as local contrast or edges play an important role. These studies typically used predefined image features which were afterwards experimentally verified.
Here, we follow a complementary approach: instead of testing a set of candidate image features, we infer these hypotheses from the data, using methods from statistical learning. To this end, we train a non-linear classifier on fixated vs. randomly selected image patches without making any physiological assumptions. The resulting classifier can be essentially characterized by a nonlinear combination of two center-surround receptive fields. We find that the prediction performance of this simple model on our eye movement data is indistinguishable from the physiologically motivated model of Itti &amp; Koch (2000) which is far more complex. In particular, we obtain a comparable performance without using any multi-scale representations, long-range interactions or oriented image features.

We present a new approach to personalized handwriting recognition.
The problem, also known as writer adaptation, consists of converting
a generic (user-independent) recognizer into a personalized
(user-dependent) one, which has an improved recognition rate for a
particular user. The adaptation step usually involves user-specific
samples, which leads to the fundamental question of how to fuse this
new information with that captured by the generic recognizer. We
propose adapting the recognizer by minimizing a regularized risk
functional (a modified SVM) where the prior knowledge from the
generic recognizer enters through a modified regularization term.
The result is a simple personalization framework with very good
practical properties. Experiments on a 100 class real-world data set
show that the number of errors can be reduced by over 40% with as
few as five user samples per character.

This paper proposes a method for computing fast approximations to support vector decision functions in the field of object detection. In the present approach we are building on an existing algorithm where the set of support vectors is replaced by a smaller, so-called reduced set of synthesized input space points. In contrast to the existing method that finds the reduced set via unconstrained optimization, we impose a structural constraint on the synthetic points such that the resulting approximations can be evaluated via separable filters. For applications that require scanning an entire image, this decreases the computational complexity of a scan by a significant amount. We present experimental results on a standard face detection database.

In this paper we present a primal-dual decomposition algorithm for
support vector machine training. As with existing methods that use
very small working sets (such as Sequential Minimal
Optimization (SMO), Successive Over-Relaxation (SOR) or
the Kernel Adatron (KA)), our method scales well, is
straightforward to implement, and does not require an external QP
solver. Unlike SMO, SOR and KA, the method is applicable to a
large number of SVM formulations regardless of the number of
equality constraints involved. The effectiveness of our algorithm
is demonstrated on a more difficult SVM variant in this respect,
namely semi-parametric support vector regression.

We present a new approximation scheme for support vector decision functions in object detection. In the present approach we are building on an existing algorithm where the set of support vectors is replaced by a smaller so-called reduced set of synthetic points. Instead of finding the reduced set via unconstrained optimization, we impose a structural constraint on the synthetic vectors such that the resulting approximation can be evaluated via separable filters. Applications that require scanning an entire
image can benefit from this representation: when using separable filters, the average computational complexity for evaluating a reduced set vector on a test patch of size (h x w) drops from O(hw) to O(h+w). We show experimental results on handwritten digits and face detection.

In face detection, support vector machines (SVM) and neural networks (NN) have been shown
to outperform most other classication methods. While both approaches are learning-based,
there are distinct advantages and drawbacks to each method: NNs are difcult to design and
train but can lead to very small and efcient classiers. In comparison, SVM model selection
and training is rather straightforward, and, more importantly, guaranteed to converge to
a globally optimal (in the sense of training errors) solution. Unfortunately, SVM classiers
tend to have large representations which are inappropriate for time-critical image processing
applications.
In this work, we examine various existing and new methods for simplifying support vector
decision rules. Our goal is to obtain efcient classiers (as with NNs) while keeping the numerical
and statistical advantages of SVMs. For a given SVM solution, we compute a cascade
of approximations with increasing complexities. Each classier is tuned so that the detection
rate is near 100%. At run-time, the rst (simplest) detector is evaluated on the whole image.
Then, any subsequent classier is applied only to those positions that have been classied as
positive throughout all previous stages. The false positive rate at the end equals that of the
last (i.e. most complex) detector. In contrast, since many image positions are discarded by
lower-complexity classiers, the average computation time per patch decreases signicantly
compared to the time needed for evaluating the highest-complexity classier alone.

Web[BibTex]
cation methods. While both approaches are learning-based,
there are distinct advantages and drawbacks to each method: NNs are difcult to design and
train but can lead to very small and efcient classiers. In comparison, SVM model selection
and training is rather straightforward, and, more importantly, guaranteed to converge to
a globally optimal (in the sense of training errors) solution. Unfortunately, SVM classiers
tend to have large representations which are inappropriate for time-critical image processing
applications.
In this work, we examine various existing and new methods for simplifying support vector
decision rules. Our goal is to obtain efcient classiers (as with NNs) while keeping the numerical
and statistical advantages of SVMs. For a given SVM solution, we compute a cascade
of approximations with increasing complexities. Each classier is tuned so that the detection
rate is near 100%. At run-time, the rst (simplest) detector is evaluated on the whole image.
Then, any subsequent classier is applied only to those positions that have been classied as
positive throughout all previous stages. The false positive rate at the end equals that of the
last (i.e. most complex) detector. In contrast, since many image positions are discarded by
lower-complexity classiers, the average computation time per patch decreases signicantly
compared to the time needed for evaluating the highest-complexity classier alone.&p[url]=http://ei.is.tuebingen.mpg.de/person/kienzle/2463&p[summary]=Efficient Approximations for Support Vector Classifiers&p[images][0]=http://ps.staging.is.tuebingen.mpg.de/uploads/publication/image/495/thumb_lg_pami.jpg %>" onclick="popupCenter($(this).attr('href'), '', 580, 470); return false;" class="popup social_facebook"></a> -->
<!-- <a href="#" onclick="fb_share('In face detection, support vector machines (SVM) and neural networks (NN) have been shown
to outperform most other classication methods. While both approaches are learning-based,
there are distinct advantages and drawbacks to each method: NNs are difcult to design and
train but can lead to very small and efcient classiers. In comparison, SVM model selection
and training is rather straightforward, and, more importantly, guaranteed to converge to
a globally optimal (in the sense of training errors) solution. Unfortunately, SVM classiers
tend to have large representations which are inappropriate for time-critical image processing
applications.
In this work, we examine various existing and new methods for simplifying support vector
decision rules. Our goal is to obtain efcient classiers (as with NNs) while keeping the numerical
and statistical advantages of SVMs. For a given SVM solution, we compute a cascade
of approximations with increasing complexities. Each classier is tuned so that the detection
rate is near 100%. At run-time, the rst (simplest) detector is evaluated on the whole image.
Then, any subsequent classier is applied only to those positions that have been classied as
positive throughout all previous stages. The false positive rate at the end equals that of the
last (i.e. most complex) detector. In contrast, since many image positions are discarded by
lower-complexity classiers, the average computation time per patch decreases signicantly
compared to the time needed for evaluating the highest-complexity classier alone.', 'http://ei.is.tuebingen.mpg.de/person/kienzle/2463', 'http://staging.is.tuebingen.mpg.de/assets/home/am_home-5c82e9f63cc81d6ae8884feb8adb256e.jpg', 'Efficient Approximations for Support Vector Classifiers')" class="popup social_facebook"></a> -->
</li>
<li>
<a href="http://twitter.com/home?status=@MPI_IS_Tue - In face detection, support vector machines (SVM) and neural networks (NN) have been shown
to outperform most other classication methods. While both approaches are learning-based,
there are distinct advantages and drawbacks to each method: NNs are difcult to design and
train but can lead to very small and efcient classiers. In comparison, SVM model selection
and training is rather straightforward, and, more importantly, guaranteed to converge to
a globally optimal (in the sense of training errors) solution. Unfortunately, SVM classiers
tend to have large representations which are inappropriate for time-critical image processing
applications.
In this work, we examine various existing and new methods for simplifying support vector
decision rules. Our goal is to obtain efcient classiers (as with NNs) while keeping the numerical
and statistical advantages of SVMs. For a given SVM solution, we compute a cascade
of approximations with increasing complexities. Each classier is tuned so that the detection
rate is near 100%. At run-time, the rst (simplest) detector is evaluated on the whole image.
Then, any subsequent classier is applied only to those positions that have been classied as
positive throughout all previous stages. The false positive rate at the end equals that of the
last (i.e. most complex) detector. In contrast, since many image positions are discarded by
lower-complexity classiers, the average computation time per patch decreases signicantly
compared to the time needed for evaluating the highest-complexity classier alone.: http://ei.is.tuebingen.mpg.de/person/kienzle/2463" onclick="popupCenter($(this).attr('href'), '', 580, 470); return false;" class="popup social_twitter"></a>
</li>
<li>
<a href="http://www.linkedin.com/shareArticle?mini=true&amp;url=http://ei.is.tuebingen.mpg.de/person/kienzle/2463&amp;title=In face detection, support vector machines (SVM) and neural networks (NN) have been shown
to outperform most other classication methods. While both approaches are learning-based,
there are distinct advantages and drawbacks to each method: NNs are difcult to design and
train but can lead to very small and efcient classiers. In comparison, SVM model selection
and training is rather straightforward, and, more importantly, guaranteed to converge to
a globally optimal (in the sense of training errors) solution. Unfortunately, SVM classiers
tend to have large representations which are inappropriate for time-critical image processing
applications.
In this work, we examine various existing and new methods for simplifying support vector
decision rules. Our goal is to obtain efcient classiers (as with NNs) while keeping the numerical
and statistical advantages of SVMs. For a given SVM solution, we compute a cascade
of approximations with increasing complexities. Each classier is tuned so that the detection
rate is near 100%. At run-time, the rst (simplest) detector is evaluated on the whole image.
Then, any subsequent classier is applied only to those positions that have been classied as
positive throughout all previous stages. The false positive rate at the end equals that of the
last (i.e. most complex) detector. In contrast, since many image positions are discarded by
lower-complexity classiers, the average computation time per patch decreases signicantly
compared to the time needed for evaluating the highest-complexity classier alone. &amp;summary=Efficient Approximations for Support Vector Classifiers" onclick="popupCenter($(this).attr('href'), '', 580, 470); return false;" class="popup social_linkedin"></a>
</li>
<li>
<a href="https://plus.google.com/share?url=In face detection, support vector machines (SVM) and neural networks (NN) have been shown
to outperform most other classication methods. While both approaches are learning-based,
there are distinct advantages and drawbacks to each method: NNs are difcult to design and
train but can lead to very small and efcient classiers. In comparison, SVM model selection
and training is rather straightforward, and, more importantly, guaranteed to converge to
a globally optimal (in the sense of training errors) solution. Unfortunately, SVM classiers
tend to have large representations which are inappropriate for time-critical image processing
applications.
In this work, we examine various existing and new methods for simplifying support vector
decision rules. Our goal is to obtain efcient classiers (as with NNs) while keeping the numerical
and statistical advantages of SVMs. For a given SVM solution, we compute a cascade
of approximations with increasing complexities. Each classier is tuned so that the detection
rate is near 100%. At run-time, the rst (simplest) detector is evaluated on the whole image.
Then, any subsequent classier is applied only to those positions that have been classied as
positive throughout all previous stages. The false positive rate at the end equals that of the
last (i.e. most complex) detector. In contrast, since many image positions are discarded by
lower-complexity classiers, the average computation time per patch decreases signicantly
compared to the time needed for evaluating the highest-complexity classier alone. %20http://ei.is.tuebingen.mpg.de/person/kienzle/2463" onclick="popupCenter($(this).attr('href'), '', 580, 470); return false;" class="popup social_googleplus"></a>
</li>
<li>
<a href="mailto:?subject=In face detection, support vector machines (SVM) and neural networks (NN) have been shown
to outperform most other classication methods. While both approaches are learning-based,
there are distinct advantages and drawbacks to each method: NNs are difcult to design and
train but can lead to very small and efcient classiers. In comparison, SVM model selection
and training is rather straightforward, and, more importantly, guaranteed to converge to
a globally optimal (in the sense of training errors) solution. Unfortunately, SVM classiers
tend to have large representations which are inappropriate for time-critical image processing
applications.
In this work, we examine various existing and new methods for simplifying support vector
decision rules. Our goal is to obtain efcient classiers (as with NNs) while keeping the numerical
and statistical advantages of SVMs. For a given SVM solution, we compute a cascade
of approximations with increasing complexities. Each classier is tuned so that the detection
rate is near 100%. At run-time, the rst (simplest) detector is evaluated on the whole image.
Then, any subsequent classier is applied only to those positions that have been classied as
positive throughout all previous stages. The false positive rate at the end equals that of the
last (i.e. most complex) detector. In contrast, since many image positions are discarded by
lower-complexity classiers, the average computation time per patch decreases signicantly
compared to the time needed for evaluating the highest-complexity classier alone. &amp;body=http://ei.is.tuebingen.mpg.de/person/kienzle/2463" class="social_mail"></a>
</li>
</ul>
</div>

Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems