We study the stereo matching problem for reconstruction of the location of 3D-points on an unknown surface patch from two calibrated identical cameras without using any a priori information about the pointwise correspondences. We assume that camera parameters and the pose between the cameras are known. Our approach follows earlier work for coplanar cameras where a gradient flow algorithm was proposed to match associated Gramians. Here we extend this method by allowing arbitrary poses for the cameras. We introduce an intrinsic Riemannian Newton algorithm that achieves local quadratic convergence rates. A closed form solution is presented, too. The efficiency of both algorithms is demonstrated by numerical experiments.

We present a new technique for structured
prediction that works in a hybrid generative/
discriminative way, using a one-class
support vector machine to model the joint
probability of (input, output)-pairs in a joint
reproducing kernel Hilbert space.
Compared to discriminative techniques, like
conditional random elds or structured out-
put SVMs, the proposed method has the advantage
that its training time depends only
on the number of training examples, not on
the size of the label space. Due to its generative
aspect, it is also very tolerant against
ambiguous, incomplete or incorrect labels.
Experiments on realistic data show that our
method works eciently and robustly in situations
for which discriminative techniques
have computational or statistical problems.

Discovery of knowledge from geometric graph databases is of particular importance in chemistry and biology, because chemical compounds and proteins are represented as graphs with 3D geometric coordinates. In such applications, scientists are not interested in the statistics of the whole database. Instead they need information about a novel drug candidate or protein at hand, represented as a query graph. We propose a polynomial-delay algorithm for geometric frequent subgraph retrieval. It enumerates all subgraphs of a single given query graph which are frequent geometric $epsilon$-subgraphs under the entire class of rigid geometric transformations in a database. By using geometric$epsilon$-subgraphs, we achieve tolerance against variations in geometry. We compare the proposed algorithm to gSpan on chemical compound data, and we show that for a given minimum support the total number of frequent patterns is substantially limited by requiring geometric matching. Although the computation time per pattern is lar
ger than for non-geometric graph mining,the total time is within a reasonable level even for small minimum support.

In this paper we present new algorithms for non-negative matrix approximation (NMA), commonly known as the NMF problem. Our methods improve upon the well-known methods of Lee & Seung~cite{lee00} for both the Frobenius norm as well the Kullback-Leibler divergence versions of the problem. For the latter problem, our results are especially interesting because it seems to have witnessed much lesser algorithmic progress as compared to the Frobenius norm NMA problem. Our algorithms are based on a particular textbf {block-iterative} acceleration technique for EM, which preserves the multiplicative nature of the updates and also ensures monotonicity. Furthermore, our algorithms also naturally apply to the Bregman-divergence NMA algorithms of~cite{suv.nips}. Experimentally, we show that our algorithms outperform the traditional Lee/Seung approach most of the time.

Time-series segmentation in the fully unsupervised scenario in which the number of segment-types is a priori unknown is a fundamental problem in many applications. We propose a Bayesian approach to a segmentation model based on the switching linear Gaussian state-space model that enforces a sparse parametrization, such as to use only a small number of a priori available different dynamics to explain the data. This enables us to estimate the number of segment-types within the model, in contrast to previous non-Bayesian approaches where training and comparing several separate models was required. As the
resulting model is computationally intractable, we introduce a variational approximation where a reformulation of the problem enables the use of efficient inference algorithms.

In this paper we deal with graph classification. We propose a new algorithm for performing sparse logistic regression for graphs, which is comparable in accuracy with other methods of graph classification and produces probabilistic output in addition. Sparsity is required for the reason of interpretability, which is often necessary in domains such as bioinformatics or chemoinformatics.

Box-constrained convex optimization problems are central to several
applications in a variety of fields such as statistics, psychometrics,
signal processing, medical imaging, and machine learning. Two fundamental
examples are the non-negative least squares (NNLS) problem and the
non-negative Kullback-Leibler (NNKL) divergence minimization problem. The
non-negativity constraints are usually based on an underlying physical
restriction, for e.g., when dealing with applications in astronomy,
tomography, statistical estimation, or image restoration, the underlying
parameters represent physical quantities such as concentration, weight,
intensity, or frequency counts and are therefore only interpretable with
non-negative values. Several modern optimization methods can be
inefficient for simple problems
such as NNLS and NNKL as they are really designed to handle far more
general and complex problems.
In this work we develop two simple quasi-Newton methods for solving
box-constrained
(differentiable) convex optimization problems that utilize the well-known
BFGS and limited memory BFGS updates. We position our method between
projected gradient (Rosen, 1960) and projected Newton (Bertsekas, 1982)
methods, and prove its convergence under a simple Armijo step-size rule. We
illustrate our method by showing applications to: Image deblurring, Positron
Emission Tomography (PET) image reconstruction, and Non-negative Matrix
Approximation (NMA). On medium sized data we observe performance competitive
to established procedures, while for larger data the results are even
better.

Graph mining methods enumerate frequent subgraphs
efficiently, but they are not necessarily good features for
machine learning due to high correlation among features.
Thus it makes sense to perform principal component analysis
to reduce the dimensionality and create decorrelated
features. We present a novel iterative mining algorithm
that captures informative patterns corresponding to major
entries of top principal components. It repeatedly calls
weighted substructure mining where example weights are
updated in each iteration. The Lanczos algorithm, a standard
algorithm of eigendecomposition, is employed to update
the weights. In experiments, our patterns are shown to
approximate the principal components obtained by frequent
mining.

Computational models of spatial vision typically make use of a (rectified) linear filter, a nonlinearity and dominant late noise to account for human contrast discrimination data. Linear–nonlinear cascade models predict an improvement in observers' contrast detection performance when low, subthreshold levels of external noise are added (i.e., stochastic resonance). Here, we address the issue whether a single contrast gain-control model of early spatial vision can account for both the pedestal effect, i.e., the improved detectability of a grating in the presence of a low-contrast masking grating, and stochastic resonance. We measured contrast discrimination performance without noise and in both weak and moderate levels of noise. Making use of a full quantitative description of our data with few parameters combined with comprehensive model selection assessments, we show the pedestal effect to be more reduced in the presence of weak noise than in moderate noise. This reduction rules out independent, additive sources of performance improvement and, together with a simulation study, supports the parsimonious explanation that a single mechanism underlies the pedestal effect and stochastic resonance in contrast perception.

Discovery of knowledge from geometric graph databases is of particular importance in chemistry and
biology, because chemical compounds and proteins are represented as graphs with 3D geometric coordinates. In
such applications, scientists are not interested in the statistics of the whole database. Instead they need information
about a novel drug candidate or protein at hand, represented as a query graph. We propose a polynomial-delay
algorithm for geometric frequent subgraph retrieval. It enumerates all subgraphs of a single given query graph
which are frequent geometric epsilon-subgraphs under the entire class of rigid geometric transformations in a database.
By using geometric epsilon-subgraphs, we achieve tolerance against variations in geometry. We compare the proposed
algorithm to gSpan on chemical compound data, and we show that for a given minimum support the total number
of frequent patterns is substantially limited by requiring geometric matching. Although the computation time per
pattern is larger than for non-geometric graph mining, the total time is within a reasonable level even for small
minimum support.

Graph mining methods enumerate frequently appearing subgraph patterns, which can be used as features for subsequent classification or regression. However, frequent patterns are not necessarily informative for the given learning problem. We propose a mathematical programming boosting method (gBoost) that progressively collects informative patterns. Compared to AdaBoost, gBoost can build the prediction rule with fewer iterations. To apply the boosting method to graph data, a branch-and-bound pattern search algorithm is developed based on the DFS code tree. The constructed search space is reused in later iterations to minimize the computation time. Our method can learn more efficiently than the simpler method based on frequent substructure mining, because the output labels are used as an extra information source for pruning the search space. Furthermore, by engineering the mathematical program, a wide range of machine learning problems can be solved without modifying the pattern search algorithm.

Autonomous robots that can adapt to novel situations has been a long standing vision of robotics, artificial intelligence, and
the cognitive sciences. Early approaches to this goal during the heydays of artificial intelligence research in the late 1980s,
however, made it clear that an approach purely based on reasoning or human insights would not be able to model all the
perceptuomotor tasks of future robots. Instead, new hope was put in the growing wake of machine learning that promised fully
adaptive control algorithms which learn both by observation and trial-and-error. However, to date, learning techniques have yet
to fulfill this promise as only few methods manage to scale into the high-dimensional domains of manipulator and humanoid
robotics and usually scaling was only achieved in precisely pre-structured domains. We have investigated the ingredients for
a general approach to motor skill learning in order to get one step closer towards human-like performance. For doing so, we
study two major components for such an approach, i.e., firstly, a theoretically well-founded general approach to representing
the required control structures for task representation and execution and, secondly, appropriate learning algorithms which can
be applied in this setting.

Many common machine learning methods such as Support Vector Machines or Gaussian process
inference make use of positive definite kernels, reproducing kernel Hilbert spaces, Gaussian processes, and
regularization operators. In this work these objects are presented in a general, unifying framework, and
interrelations are highlighted.
With this in mind we then show how linear stochastic differential equation models can be incorporated
naturally into the kernel framework. And vice versa, many kernel machines can be interpreted in terms of
differential equations. We focus especially on ordinary differential equations, also known as dynamical
systems, and it is shown that standard kernel inference algorithms are equivalent to Kalman filter methods
based on such models.
In order not to cloud qualitative insights with heavy mathematical machinery, we restrict ourselves to finite
domains, implying that differential equations are treated via their corresponding finite difference equations.

We provide a novel framework for very fast model-based reinforcement learning in continuous state and action spaces. The framework requires probabilistic models that explicitly characterize their levels of confidence. Within this framework, we use flexible, non-parametric models to describe the world based on previously collected experience. We demonstrate learning on the cart-pole problem in a setting where we provide very limited prior knowledge about the task. Learning progresses rapidly, and a good policy is found after only a hand-full of iterations.

We investigate an implicit method to compute a piecewise linear representation of a surface from a
set of sample points. As implicit surface functions we use the weighted sum of piecewise linear kernel functions.
For such a function we can partition Rd in such a way that these functions are linear on the subsets of the partition.
For each subset in the partition we can then compute the zero level set of the function exactly as the intersection of
a hyperplane with the subset.

Policy Learning approaches are among the best suited methods for high-dimensional, continuous control systems such as anthropomorphic robot arms and humanoid robots. In this paper, we show two contributions: firstly, we show a unified perspective which allows us to derive several policy learning algorithms from a common point of view, i.e, policy gradient algorithms, natural-gradient algorithms and EM-like policy learning. Secondly, we present several applications to both robot motor primitive learning as well as to robot control in task space. Results both from simulation and several different real robots are shown.

We introduce a family of unsupervised algorithms, numerical taxonomy clustering, to simultaneously
cluster data, and to learn a taxonomy that encodes the relationship between the clusters. The algorithms
work by maximizing the dependence between the taxonomy and the original data. The resulting taxonomy is a
more informative visualization of complex data than simple clustering; in addition, taking into account the relations
between different clusters is shown to substantially improve the quality of the clustering, when compared
with state-of-the-art algorithms in the literature (both spectral clustering and a previous dependence maximization
approach). We demonstrate our algorithm on image and text data.

Sliding window classifiers are among the most successful and widely applied techniques for object localization. However, training is typically done in a way that is not specific to the localization task. First a binary classifier is trained using a sample of positive and negative examples, and this classifier is subsequently applied to multiple regions within test images. We propose instead to treat object localization in a principled way by posing it as a problem of predicting structured data: we model the problem not as binary classification, but as the prediction of the bounding box of objects located in images. The use of a joint-kernel framework allows us to formulate the training procedure as a generalization of an SVM, which can be solved efficiently. We further improve computational efficiency by using a branch-and-bound strategy for localization during both training and testing. Experimental evaluation on the PASCAL VOC and TU Darmstadt datasets show that the structured training procedure improves pe
rformance over binary training as well as the best previously published scores.

We aim to color automatically greyscale images, without any manual intervention. The color proposition could then be interactively corrected by user-provided color landmarks if necessary. Automatic colorization is nontrivial since there is usually no one-to-one correspondence between color and local texture. The contribution of our framework is that we deal directly with multimodality and estimate, for each pixel of the image to be colored, the probability distribution of all possible colors,
instead of choosing the most probable color at the local level. We also predict the expected variation of color at each pixel, thus defining a nonuniform
spatial coherency criterion. We then use graph cuts to maximize the probability of the whole colored image at the global level. We work in the L-a-b color space in order to approximate the human perception of distances between colors, and we use machine learning tools to extract as much information as possible from a dataset of colored examples. The resulting algorithm is fast, designed to be more robust to texture noise, and is above all able to deal with ambiguity, in contrary to previous approaches.

Proceedings of the National Academy of Sciences of the United States of America, 105(40):15370-15375, October 2008 (article)

Abstract

The voltage-dependent anion channel (VDAC), also known as mitochondrial porin, is the most abundant protein in the mitochondrial outer membrane (MOM). VDAC is the channel known to guide the metabolic flux across the MOM and plays a key role in mitochondrially induced apoptosis. Here, we present the 3D structure of human VDAC1, which was solved conjointly by NMR spectroscopy and x-ray crystallography. Human VDAC1 (hVDAC1) adopts a &amp;#946;-barrel architecture composed of 19 &amp;#946;-strands with an &amp;#945;-helix located horizontally midway within the pore. Bioinformatic analysis indicates that this channel architecture is common to all VDAC proteins and is adopted by the general import pore TOM40 of mammals, which is also located in the MOM.

For quantitative PET information, correction of tissue photon attenuation is mandatory. Generally in conventional PET, the attenuation map is obtained from a transmission scan, which uses a rotating radionuclide source, or from the CT scan in a combined PET/CT scanner. In the case of PET/MRI scanners currently under development, insufficient space for the rotating source exists; the attenuation map can be calculated from the MR image instead. This task is challenging because MR intensities correlate with proton densities and tissue-relaxation properties, rather than with attenuation-related mass density. METHODS: We used a combination of local pattern recognition and atlas registration, which captures global variation of anatomy, to predict pseudo-CT images from a given MR image. These pseudo-CT images were then used for attenuation correction, as the process would be performed in a PET/CT scanner. RESULTS: For human brain scans, we show on a database of 17 MR/CT image pairs that our method reliably enables e
stimation of a pseudo-CT image from the MR image alone. On additional datasets of MRI/PET/CT triplets of human brain scans, we compare MRI-based attenuation correction with CT-based correction. Our approach enables PET quantification with a mean error of 3.2% for predefined regions of interest, which we found to be clinically not significant. However, our method is not specific to brain imaging, and we show promising initial results on 1 whole-body animal dataset. CONCLUSION: This method allows reliable MRI-based attenuation correction for human brain scans. Further work is necessary to validate the method for whole-body imaging.

Three simple and explicit procedures for testing the independence
of two multi-dimensional random variables are described. Two
of the associated test statistics (L1, log-likelihood) are defined when the
empirical distribution of the variables is restricted to finite partitions.
A third test statistic is defined as a kernel-based independence measure.
All tests reject the null hypothesis of independence if the test statistics
become large. The large deviation and limit distribution properties of all
three test statistics are given. Following from these results, distributionfree
strong consistent tests of independence are derived, as are asymptotically
alpha-level tests. The performance of the tests is evaluated experimentally
on benchmark data.

We provide a comprehensive overview of many recent algorithms for approximate inference in
Gaussian process models for probabilistic binary classification. The relationships between several
approaches are elucidated theoretically, and the properties of the different algorithms are
corroborated by experimental results. We examine both 1) the quality of the predictive distributions and
2) the suitability of the different marginal likelihood approximations for model selection (selecting
hyperparameters) and compare to a gold standard based on MCMC. Interestingly, some methods
produce good predictive distributions although their marginal likelihood approximations are poor.
Strong conclusions are drawn about the methods: The Expectation Propagation algorithm is almost
always the method of choice unless the computational budget is very tight. We also extend
existing methods in various ways, and provide unifying code implementing all approaches.

The use of generous distance bounds has been the hallmark of NMR structure determination. However, bounds necessitate the estimation of data quality before the calculation, reduce the information content, introduce human bias, and allow for major errors in the structures. Here, we propose a new rapid structure calculation scheme based on Bayesian analysis. The minimization of an extended energy function, including a new type of distance restraint and a term depending on the data quality, results in an estimation of the data quality in addition to coordinates. This allows for the determination of the optimal weight on the experimental information. The resulting structures are of better quality and closer to the Xray crystal structure of the same molecule. With the new calculation approach, the analysis of discrepancies from the target distances becomes meaningful. The strategy may be useful in other applicationsfor example, in homology modeling.

This paper presents a fully automated algorithm for reconstructing
a textured 3D model of a face from a single
photograph or a raw video stream. The algorithm is based
on a combination of Support Vector Machines (SVMs) and
a Morphable Model of 3D faces. After SVM face detection,
individual facial features are detected using a novel
regression- and classification-based approach, and probabilistically
plausible configurations of features are selected
to produce a list of candidates for several facial feature positions.
In the next step, the configurations of feature points
are evaluated using a novel criterion that is based on a
Morphable Model and a combination of linear projections.
To make the algorithm robust with respect to head orientation,
this process is iterated while the estimate of pose is
refined. Finally, the feature points initialize a model-fitting
procedure of the Morphable Model. The result is a highresolution
3D surface model.

We propose a new measure of conditional dependence of random variables, based on normalized cross-covariance operators on reproducing kernel Hilbert spaces. Unlike previous kernel dependence measures, the proposed criterion does not depend on the choice of kernel in the limit of infinite data, for a wide class of kernels. At the same time, it has a straightforward empirical estimate with good convergence behaviour. We discuss the theoretical properties of the measure, and demonstrate its application in experiments.

We study a pattern classification algorithm which has recently been proposed by Vapnik and coworkers. It builds on a new inductive principle which assumes that in addition to positive and negative data, a third class of data is available, termed the Universum. We assay the behavior of the algorithm by establishing links with Fisher discriminant analysis and oriented PCA, as well as with an SVM in a
projected subspace (or, equivalently, with a data-dependent reduced kernel). We also provide experimental results.

This paper considers kernels invariant to translation, rotation and dilation. We show that no non-trivial positive definite (p.d.) kernels exist which are radial and
dilation invariant, only conditionally positive definite (c.p.d.) ones. Accordingly, we discuss the c.p.d. case and provide some novel analysis, including an elementary derivation of a c.p.d. representer theorem. On the practical side, we give a support vector machine (s.v.m.) algorithm for arbitrary c.p.d. kernels. For the thinplate
kernel this leads to a classifier with only one parameter (the amount of regularisation), which we demonstrate to be as effective as an s.v.m. with the Gaussian kernel, even though the Gaussian involves a second parameter (the length scale).

It has been a long-standing goal in the adaptive control community to reduce the generically difficult, general reinforcement learning (RL) problem to simpler problems solvable by supervised learning. While this approach is todays standard for value function-based methods, fewer approaches are known that apply similar reductions to policy search methods. Recently, it has been shown that immediate RL problems can be solved by reward-weighted regression, and that the resulting algorithm is an expectation maximization (EM) algorithm with strong guarantees. In this paper, we extend this algorithm to the episodic case and show that it can be used in the context of LSTM recurrent neural networks (RNNs). The resulting RNN training algorithm is equivalent to a weighted self-modeling supervised learning technique. We focus on partially observable Markov decision problems (POMDPs) where it is essential that the policy is nonstationary in order to be optimal. We show that this new reward-weighted logistic regression u
sed in conjunction with an RNN architecture can solve standard benchmark POMDPs with ease.

Similarity is used as an explanatory construct throughout psychology and multidimensional scaling (MDS) is the most popular way to assess similarity. In MDS, similarity is intimately connected to the idea of a geometric representation of stimuli in a perceptual space. Whilst connecting similarity and closeness of stimuli in a geometric representation may be intuitively plausible, Tversky and Gati [Tversky, A., Gati, I. (1982). Similarity, separability, and the triangle inequality. Psychological Review, 89(2), 123154] have reported data which are inconsistent with the usual geometric representations that are based on segmental additivity. We show that similarity measures based on Shepards universal law of generalization [Shepard, R. N. (1987). Toward a universal law of generalization for psychologica science. Science, 237(4820), 13171323] lead to an inner product representation in a reproducing kernel Hilbert space. In such a space stimuli are represented by their similarity to all other stimuli. This representation, based on Shepards law, has a natural metric that does not have additive segments whilst still retaining the intuitive notion of connecting similarity and distance between stimuli. Furthermore, this representation has the psychologically appealing property that the distance between stimuli is bounded.

Pattern recognition methods have shown that functional magnetic resonance imaging (fMRI) data can reveal significant information about brain activity. For example, in the debate of how object categories are represented in the brain, multivariate analysis has been used to provide evidence of a distributed encoding scheme [Science 293:5539 (2001) 24252430]. Many follow-up studies have employed different methods to analyze human fMRI data with varying degrees of success [Nature reviews 7:7 (2006) 523534]. In this study, we compare four popular pattern recognition methods: correlation analysis, support-vector machines (SVM), linear discriminant analysis (LDA) and Gaussian naïve Bayes (GNB), using data collected at high field (7 Tesla) with higher resolution than usual fMRI studies. We investigate prediction performance on single trials and for averages across varying numbers of stimulus presentations. The performance of the various algorithms depends on the nature of the brain activity being categorized: for
several tasks, many of the methods work well, whereas for others, no method performs above chance level. An important factor in overall classification performance is careful preprocessing of the data, including dimensionality reduction, voxel selection and outlier elimination.

High performance and compliant robot control require accurate dynamics models which cannot be obtained analytically for sufficiently complex robot systems. In such cases, machine learning offers a promising alternative for approximating the robot dynamics using measured data. This approach offers a natural framework to incorporate unknown nonlinearities as well as to continually adapt online for changes in the robot dynamics. However, the most accurate regression methods, e.g. Gaussian processes regression (GPR) and support vector regression (SVR), suffer from exceptional high computational complexity which prevents their usage for large numbers of samples or online learning to date. Inspired by locally linear regression techniques, we propose an approximation to the standard GPR using local Gaussian processes models. Due to reduced computational cost, local Gaussian processes (LGP) can be applied for larger sample-sizes and online learning. Comparisons with other nonparametric regressions, e.g. standard GPR,
nu-SVR and locally weighted projection regression (LWPR), show that LGP has higher accuracy than LWPR close to the performance of standard GPR and nu-SVR while being sufficiently fast for online learning.

Maximum variance unfolding (MVU) is an effective heuristic for dimensionality reduction. It produces a low-dimensional representation of the data by maximizing the variance of their embeddings while preserving the local distances of the
original data. We show that MVU also optimizes a statistical dependence measure which aims to retain the identity of individual observations under the distancepreserving constraints. This general view allows us to design "colored" variants of MVU, which produce low-dimensional representations for a given task, e.g. subject to class labels or other side information.

A straightforward nonlinear extension of Grangers concept of causality in the kernel framework is suggested. The kernel-based approach to assessing nonlinear Granger causality in multivariate time series enables us to determine, in a model-free way, whether the causal relation between two time series is present or not and whether it is direct or mediated by other processes. The trace norm of the so-called covariance operator in feature space is used to measure the prediction error. Relying on this measure, we test the improvement of predictability between time series by subsampling-based multiple testing. The distributional properties of the resulting p-values reveal the direction of Granger causality. Experiments with simulated and real-world data show that our method provides encouraging results.

Maximum entropy analysis of binary variables provides an elegant way for studying the role of pairwise correlations in neural populations. Unfortunately, these approaches suffer from their poor scalability to high dimensions. In sensory coding, however, high-dimensional data is ubiquitous. Here, we introduce a new approach using a near-maximum entropy model, that makes this type of analysis feasible for very high-dimensional data - the model parameters can be derived in closed form and sampling is easy. We demonstrate its usefulness by studying a simple neural representation model of natural images. For the first time, we are able to directly compare predictions from a pairwise maximum entropy model not only in small groups of neurons, but also in larger populations of more than thousand units. Our results indicate that in such larger networks interactions exist that are not predicted by pairwise correlations, despite the fact that pairwise correlations explain the lower-dimensional marginal statistics extrem ely well up to the limit of dimensionality where estimation of the full joint distribution is feasible.

Dynamic system-based motor primitives [1] have
enabled robots to learn complex tasks ranging from Tennisswings
to locomotion. However, to date there have been only
few extensions which have incorporated perceptual coupling to
variables of external focus, and, furthermore, these modifications
have relied upon handcrafted solutions. Humans learn how
to couple their movement primitives with external variables.
Clearly, such a solution is needed in robotics.
In this paper, we propose an augmented version of the dynamic
systems motor primitives which incorporates perceptual coupling
to an external variable. The resulting perceptually driven motor
primitives include the previous primitives as a special case and
can inherit some of their interesting properties. We show that
these motor primitives can perform complex tasks such as Ball-in-a-Cup or Kendama task even with large variances in the initial
conditions where a skilled human player would be challenged. For
doing so, we initialize the motor primitives in the traditional way
by imitation learning without perceptual coupling. Subsequently,
we improve the motor primitives using a novel reinforcement
learning method which is particularly well-suited for motor
primitives.

Stimulus selectivity of sensory neurons is often characterized by estimating their receptive field properties such as orientation selectivity. Receptive fields are usually derived from the mean (or covariance) of the spike-triggered stimulus ensemble. This approach treats each spike as an independent message but does not take into account that information might be conveyed through patterns of neural activity that are distributed across space or time. Can we find a concise description for the processing of a whole population of neurons analogous to the receptive field for single neurons? Here, we present a generalization of the linear receptive field which is not bound to be triggered on individual spikes but can be meaningfully
linked to distributed response patterns. More precisely, we seek to identify those stimulus features and the corresponding patterns of neural activity that are most
reliably coupled. We use an extension of reverse-correlation methods based on canonical correlation analysis. The resulting population receptive fields span the
subspace of stimuli that is most informative about the population response. We evaluate our approach using both neuronal models and multi-electrode recordings from rabbit retinal ganglion cells. We show how the model can be extended to capture nonlinear stimulus-response relationships using kernel canonical correlation analysis, which makes it possible to test different coding mechanisms. Our technique can also be used to calculate receptive fields from multi-dimensional neural measurements such as those obtained from dynamic imaging methods.

Protein subcellular localization is a crucial ingredient to many important
inferences about cellular processes, including prediction of protein function
and protein interactions. While many predictive computational tools have been
proposed, they tend to have complicated architectures and require many design
decisions from the developer.
Here we utilize the multiclass support vector machine (m-SVM) method to directly
solve protein subcellular localization without resorting to the common approach
of splitting the problem into several binary classification problems. We
further propose a general class of protein sequence kernels which considers all
motifs, including motifs with gaps. Instead of heuristically selecting one or a few
kernels from this family, we utilize a recent extension of SVMs that optimizes
over multiple kernels simultaneously. This way, we automatically search over
families of possible amino acid motifs.
We compare our automated approach to three other predictors on four different
datasets, and show that we perform better than the current state of the art. Further, our method provides some insights as to which sequence motifs are most useful for determining subcellular ocalization, which are in agreement with biological
reasoning.

We present a theoretical study on the discriminative clustering framework, recently proposed for simultaneous subspace selection via linear discriminant analysis (LDA) and clustering. Empirical results have shown its favorable performance in comparison with several other popular clustering algorithms. However, the inherent relationship between subspace selection and clustering in this framework is not well understood, due to the iterative nature of the algorithm. We show in this paper that this iterative subspace selection and clustering is equivalent to kernel K-means with a specific kernel Gram matrix. This provides significant and new insights into the nature of this subspace selection procedure. Based on this equivalence relationship, we propose the Discriminative K-means (DisKmeans) algorithm for simultaneous LDA subspace selection and clustering, as well as an automatic parameter estimation procedure. We also present the nonlinear extension of DisKmeans using kernels. We show that the learning of the ke
rnel matrix over a convex set of pre-specified kernel matrices can be incorporated into the clustering formulation. The connection between DisKmeans and several other clustering algorithms is also analyzed. The presented theories and algorithms are evaluated through experiments on a collection of benchmark data sets.

In this report we present new algorithms for non-negative matrix approximation (NMA),
commonly known as the NMF problem. Our methods improve upon the well-known methods of Lee &
Seung [19] for both the Frobenius norm as well the Kullback-Leibler divergence versions of the problem.
For the latter problem, our results are especially interesting because it seems to have witnessed much
lesser algorithmic progress as compared to the Frobenius norm NMA problem. Our algorithms are
based on a particular block-iterative acceleration technique for EM, which preserves the multiplicative
nature of the updates and also ensures monotonicity. Furthermore, our algorithms also naturally apply
to the Bregman-divergence NMA algorithms of Dhillon and Sra [8]. Experimentally, we show that our
algorithms outperform the traditional Lee/Seung approach most of the time.

Generalized linear models are the most commonly used tools to describe the stimulus selectivity of sensory neurons. Here we present a Bayesian treatment of such models. Using the expectation propagation algorithm, we are able to approximate the full posterior distribution over all weights. In addition, we use a Laplacian prior to favor sparse solutions. Therefore, stimulus features that do not critically influence neural activity will be assigned zero weights and thus be effectively excluded by the model. This feature selection mechanism facilitates both the interpretation of the neuron model as well as its predictive abilities. The posterior distribution can be used to obtain confidence intervals which makes it possible to assess the statistical significance of the solution. In neural data analysis, the available amount of experimental measurements is often limited whereas the parameter space is large. In such a situation, both regularization by a sparsity prior and uncertainty estimates for the model parameters are essential.
We apply our method to multi-electrode recordings of retinal ganglion cells and use our uncertainty estimate to test the statistical significance of functional couplings between neurons. Furthermore we used the sparsity of the Laplace prior to select those filters from a spike-triggered covariance analysis that are most informative about the neural response.

We present a model-free reinforcement learning method for partially observable Markov decision problems. Our method estimates a likelihood gradient by sampling directly in parameter space, which leads to lower variance gradient estimates than those obtained by policy gradient methods such as REINFORCE. For several complex control tasks, including robust standing with a humanoid robot, we show that our method outperforms well-known algorithms from the fields of policy gradients, finite difference methods and population based heuristics. We also provide a detailed analysis of the differences between our method and the other algorithms.

Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems