Secondary metabolic pathway in plant is important for finding druggable candidate enzymes. However, there are many enzymes whose functions are still undiscovered especially in organism-specific metabolic pathways. We propose reaction graph kernels for automatically assigning the EC numbers to unknown enzymatic reactions in a metabolic network. Experiments are carried out on KEGG/REACTION database and our method successfully predicted the first three digits of the EC number with 83% accuracy.We also exhaustively predicted missing enzymatic functions in the plant secondary metabolism pathways, and evaluated our results in biochemical validity.

At the heart of many important bioinformatics problems, such as gene finding and function prediction, is the classification of biological sequences, above all of DNA and proteins. In many cases, the most accurate classifiers are obtained by training SVMs with complex sequence kernels, for instance for transcription starts or splice sites. However, an often criticized downside of SVMs with complex kernels is that it is very hard for humans to understand the learned decision rules and to derive biological insights from them. To close this gap, we introduce the concept of positional oligomer importance matrices (POIMs) and develop an efficient algorithm for their computation. We demonstrate how they overcome the limitations of sequence logos, and how they can be used to find relevant motifs for different biological phenomena in a straight-forward way. Note that the concept of POIMs is not limited to interpreting SVMs, but is applicable to general k&#8722;mer based scoring systems.

In recent years, spectral clustering has become one of the most popular modern clustering algorithms. It is simple to implement, can be solved efficiently by standard linear algebra software, and very often outperforms traditional clustering algorithms such as the k-means algorithm. On the first glance spectral clustering appears slightly mysterious, and it is not obvious to see why it works at all and what it really does. The goal of this tutorial is to give some intuition on those questions. We describe different graph Laplacians and their basic properties, present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches. Advantages and disadvantages of the different spectral clustering algorithms are discussed.

Protein subcellular localization is a crucial ingredient to many important inferences about cellular processes, including prediction of protein function and protein interactions.We propose a new class of protein sequence kernels which considers all motifs including motifs with gaps. This class of kernels allows the inclusion of pairwise amino acid distances into their computation. We utilize an extension of the multiclass support vector machine (SVM)method which directly solves protein subcellular localization without resorting to the common approach of splitting the problem into several binary classification problems. To automatically search over families of possible amino acid motifs, we optimize over multiple kernels at the same time. We compare our automated approach to four other predictors on three different datasets, and show that we perform better than the current state of the art. Furthermore, our method provides some insights as to which features are most useful for determining subcellular localization, which are in agreement with biological reasoning.

The abilities to learn and to categorize are fundamental for cognitive systems, be it animals or machines, and therefore have attracted attention from engineers and psychologists alike. Modern machine learning methods and psychological models of categorization are remarkably similar, partly because these two fields share a common history in artificial neural networks and reinforcement learning. However, machine learning is now an independent and mature field that has moved beyond psychologically or neurally inspired algorithms towards providing foundations for a theory of learning that is rooted in statistics and functional analysis. Much of this research is potentially interesting for psychological theories of learning and categorization but also hardly accessible for psychologists. Here, we provide a tutorial introduction to a popular class of machine learning tools, called kernel methods. These methods are closely related to perceptrons, radial-basis-function neural networks and exemplar theories of catego
rization. Recent theoretical advances in machine learning are closely tied to the idea that the similarity of patterns can be encapsulated in a positive definite kernel. Such a positive definite kernel can define a reproducing kernel Hilbert space which allows one to use powerful tools from functional analysis for the analysis of learning algorithms. We give basic explanations of some key conceptsthe so-called kernel trick, the representer theorem and regularizationwhich may open up the possibility that insights from machine learning can feed back into psychology.

Background: For splice site recognition, one has to solve two classification problems:
discriminating true from decoy splice sites for both acceptor and donor sites. Gene finding systems
typically rely on Markov Chains to solve these tasks.
Results: In this work we consider Support Vector Machines for splice site recognition. We employ
the so-called weighted degree kernel which turns out well suited for this task, as we will illustrate in
several experiments where we compare its prediction accuracy with that of recently proposed
systems. We apply our method to the genome-wide recognition of splice sites in Caenorhabditis
elegans, Drosophila melanogaster, Arabidopsis thaliana, Danio rerio, and Homo sapiens. Our
performance estimates indicate that splice sites can be recognized very accurately in these genomes
and that our method outperforms many other methods including Markov Chains, GeneSplicer and
SpliceMachine. We provide genome-wide predictions of splice sites and a stand-alone prediction
tool ready to be used for incorporation in a gene finder.
Availability: Data, splits, additional information on the model selection, the whole genome
predictions, as well as the stand-alone prediction tool are available for download at http://
www.fml.mpg.de/raetsch/projects/splice.

Invited keynote talk at the launch of BrainGain, the Dutch BCI research consortium, November 2007 (talk)

Abstract

I‘ll present a perspective on Brain-Computer Interface development from T{\"u}bingen. Some of the benefits promised by BCI technology lie in the near foreseeable future, and some further away. Our motivation is to make BCI technology feasible for the people who could benefit from what it has to offer soon: namely, people in the "completely locked-in" state. I‘ll mention some of the challenges of working with this user group, and explain the specific directions they have motivated us to take in developing experimental methods, algorithms, and software.

Compliant control will be a prerequisite for humanoid robotics if these robots are supposed to work safely and robustly in human and/or dynamic environments. One view of compliant control is that a robot should control a minimal number of degrees-of-freedom (DOFs) directly, i.e., those relevant DOFs for the task, and keep the remaining DOFs maximally compliant, usually in the null space of the task. This view naturally leads to task space control. However, surprisingly few implementations of task space control can be found in actual humanoid robots. This paper makes a first step towards assessing the usefulness of task space controllers for humanoids by investigating which choices of controllers are available and what inherent control characteristics they havethis treatment will concern position and orientation control, where the latter is based on a quaternion formulation. Empirical evaluations on an anthropomorphic Sarcos master arm illustrate the robustness of the different controllers as well as the eas
e of implementing and tuning them. Our extensive empirical results demonstrate that simpler task space controllers, e.g., classical resolved motion rate control or resolved acceleration control can be quite advantageous in face of inevitable modeling errors in model-based control, and that well chosen formulations are easy to implement and quite robust, such that they are useful for humanoids.

PET/MR combines the high soft tissue contrast of Magnetic Resonance Imaging (MRI) and the functional information of Positron Emission Tomography (PET). For quantitative PET information, correction of tissue photon attenuation is mandatory. Usually in conventional PET, the attenuation map is obtained from a transmission scan, which uses a rotating source, or from the CT scan in case of combined PET/CT. In the case of a PET/MR scanner, there is insufficient space for the rotating source and ideally one would want to calculate the attenuation map from the MR image instead. Since MR images provide information about proton density of the different tissue types, it is not trivial to use this data for PET attenuation correction. We present a method for predicting the PET attenuation map from a given the MR image, using a combination of atlas-registration and recognition of local patterns.
Using "leave one out cross validation" we show on a database of 16 MR-CT image pairs that our method reliably allows estimating the CT image from the MR image. Subsequently, as in PET/CT, the PET attenuation map can be predicted from the CT image. On an additional dataset of MR/CT/PET triplets we quantitatively validate that our approach allows PET quantification with an error that is smaller than what would be clinically significant.
We demonstrate our approach on T1-weighted human brain scans. However, the presented methods are more general and current research focuses on applying the established methods to human whole body PET/MRI applications.

Deformable registration methods are essential for multimodality imaging. Many different methods exist but due to the complexity of the deformed images a direct comparison of the methods is difficult. One particular application that requires high accuracy registration of MR-CT images is atlas-based attenuation correction for PET/MR. We compare four deformable registration algorithms for 3D image data included in the Open Source "National Library of Medicine Insight Segmentation and Registration Toolkit" (ITK). An interactive landmark based registration using MiraView (Siemens) has been used as gold standard. The automatic algorithms provided by ITK are based on the metrics Mattes mutual information as well as on normalized mutual information. The transformations are calculated by interpolating over a uniform B-Spline grid laying over the image to be warped. The algorithms were tested on head images from 10 subjects. We implemented a measure which segments head interior bone and air based on the CT images and l
ow intensity classes of corresponding MRI images. The segmentation of bone is performed by individually calculating the lowest Hounsfield unit threshold for each CT image. The compromise is made by quantifying the number of overlapping voxels of the remaining structures. We show that the algorithms provided by ITK achieve similar or better accuracy than the time-consuming interactive landmark based registration. Thus, ITK provides an ideal platform to generate accurately fused datasets from different modalities, required for example for building training datasets for Atlas-based attenuation correction.

Recently, Udwadia (Proc. R. Soc. Lond. A 2003:17831800, 2003) suggested to derive tracking controllers for mechanical systems with redundant degrees-of-freedom (DOFs) using a generalization of Gauss principle of least constraint. This method allows reformulating control problems as a special class of optimal controllers. In this paper, we take this line of reasoning one step further and demonstrate that several well-known and also novel nonlinear robot control laws can be derived from this generic methodology. We show experimental verifications on a Sarcos Master Arm robot for some of the derived controllers. The suggested approach offers a promising unification and simplification of nonlinear control law design for robots obeying rigid body dynamics equations, both with or without external constraints, with over-actuation or underactuation, as well as open-chain and closed-chain kinematics.

Open source tools have recently reached a level of maturity which makes them suitable for building large-scale real-world systems. At the same time, the field of machine learning has developed a large body of powerful learning algorithms for diverse applications. However, the true potential of these methods is not realized, since existing implementations are not openly shared, resulting in software with low usability, and weak interoperability. We argue that this situation can be significantly improved by increasing incentives for researchers to publish their software under an open source model. Additionally, we outline the problems authors are faced with when trying to publish algorithmic implementations of machine learning methods. We believe that a resource of peer reviewed software accompanied by short articles would be highly valuable to both the machine learning and the general scientific community.

2nd Workshop on Machine Learning and Optimization at the ISM, October 2007 (talk)

Abstract

Many problems in unsupervised learning require the analysis of features of probability distributions. At the most fundamental level, we might wish to determine whether two distributions are the same, based on samples from each - this is known as the two-sample or homogeneity problem. We use kernel methods to address this problem, by mapping probability distributions to elements in a reproducing kernel Hilbert space (RKHS). Given a sufficiently rich RKHS, these representations are unique: thus comparing feature space representations allows us to compare distributions without ambiguity. Applications include testing whether cancer subtypes are distinguishable on the basis of DNA microarray data, and whether low frequency oscillations measured at an electrode in the cortex have a different distribution during a neural spike.
A more difficult problem is to discover whether two random variables drawn from a joint distribution are independent. It turns out that any dependence between pairs of random variables can be encoded in a cross-covariance operator between appropriate RKHS representations of the variables, and we may test independence by looking at a norm of the operator. We demonstrate this independence test by establishing dependence between an English text and its French translation, as opposed to French text on the same topic but otherwise unrelated. Finally, we show that this operator norm is itself a difference in feature means.

Recent approaches to action classification in videos have
used sparse spatio-temporal words encoding local appearance
around interesting movements. Most of these approaches
use a histogram representation, discarding the
temporal order among features. But this ordering information
can contain important information about the action
itself, e.g. consider the sport disciplines of hurdle race
and long jump, where the global temporal order of motions
(running, jumping) is important to discriminate between
the two. In this work we propose to use a sequential
representation which retains this temporal order. Further,
we introduce Discriminative Subsequence Mining to find
optimal discriminative subsequence patterns. In combination
with the LPBoost classifier, this amounts to simultaneously
learning a classification function and performing feature
selection in the space of all possible feature sequences.
The resulting classifier linearly combines a small number
of interpretable decision functions, each checking for the
presence of a single discriminative pattern. The classifier is
benchmarked on the KTH action classification data set and
outperforms the best known results in the literature.

While kernel methods are the basis of many popular techniques in supervised learning, they are less commonly used in testing, estimation, and analysis of probability distributions, where information theoretic approaches rule the roost. However it becomes difficult to estimate mutual information or entropy if the data are high dimensional.

Journal of the Optical Society of America A, 24(10):3233-3241, October 2007 (article)

Abstract

There are 8 cycle / deg ripples or oscillations in performance as a function of location near Mach bands in experiments measuring Mach bands masking effects on random polarity signal bars. The oscillations with increments are 180 degrees out of phase with those for decrements. The oscillations, much larger than the measurement error, appear to relate to the weighting function of the spatial-frequency-tuned channel detecting the broad-
band signals. The ripples disappear with step maskers and become much smaller at durations below 25 ms, implying either that the site of masking has changed or that the weighting function and hence spatial-frequency tuning is slow to develop.

We describe a technique for comparing distributions without
the need for density estimation as an intermediate step. Our approach relies on mapping the distributions into a reproducing kernel Hilbert space. Applications of this technique can be found in two-sample tests, which are used for determining whether two sets of observations arise from the
same distribution, covariate shift correction, local learning, measures of independence, and density estimation.

Assume we are given a sample of points from some underlying
distribution which contains several distinct clusters. Our goal is
to construct a neighborhood graph on the sample points such that
clusters are ``identified&amp;lsquo;&amp;lsquo;: that is, the subgraph induced by points
from the same cluster is connected, while subgraphs corresponding to
different clusters are not connected to each other. We derive bounds
on the probability that cluster identification is successful, and
use them to predict ``optimal&amp;lsquo;&amp;lsquo; values of k for the mutual and
symmetric k-nearest-neighbor graphs. We point out different
properties of the mutual and symmetric nearest-neighbor graphs
related to the cluster identification problem.

Attempting to model human categorization and similarity judgements is both a very interesting but also an exceedingly difficult challenge. Some of the difficulty
arises because of conflicting evidence whether human categorization and similarity judgements should or should not be modelled as to operate on a mental representation that is essentially metric. Intuitively, this has a strong appeal as it would allow (dis)similarity to be represented geometrically as distance in some internal space. Here we show how a single stimulus, carefully constructed in a
psychophysical experiment, introduces l2 violations in what used to be an internal similarity space that could be adequately modelled as Euclidean. We term this one
influential data point a conflictual judgement. We present an algorithm of how to analyse such data and how to identify the crucial point. Thus there may not be a
strict dichotomy between either a metric or a non-metric internal space but rather degrees to which potentially large subsets of stimuli are represented metrically
with a small subset causing a global violation of metricity.

We propose a highly efficient framework for kernel multi-class models with a large and structured set of classes. Kernel parameters are learned automatically by maximizing the cross-validation log likelihood, and
predictive probabilities are estimated. We demonstrate our
approach on large scale text classification tasks with hierarchical class structure, achieving state-of-the-art results in an order of magnitude less time than previous work.

We present a local learning approach for clustering. The basic idea is that a good clustering result should have the property that the cluster label of each data point can be well predicted based on its neighboring data and their cluster labels, using current supervised learning methods. An optimization problem is formulated such that its solution has the above property. Relaxation and eigen-decomposition are applied to solve this optimization problem. We also briefly investigate the parameter selection issue and provide a simple parameter selection method for the proposed algorithm. Experimental results are provided to validate the effectiveness of the proposed approach.

Recent approaches to independent component analysis have used kernel
independence measures to obtain very good performance in ICA, particularly
in areas where classical methods experience difficulty (for instance,
sources with near-zero kurtosis). In this chapter, we compare two efficient
extensions of these methods for large-scale problems: random subsampling
of entries in the Gram matrices used in defining the independence
measures, and incomplete Cholesky decomposition of these matrices.
We derive closed-form, efficiently computable approximations for the
gradients of these measures, and compare their performance on ICA using
both artificial and music data. We show that kernel ICA can scale up to much larger
problems than yet attempted, and that incomplete Cholesky decomposition
performs better than random sampling.

PET/MR combines the high soft tissue contrast of Magnetic Resonance Imaging (MRI) and the functional information of Positron Emission Tomography (PET). For quantitative PET information, correction of tissue photon attenuation is mandatory. Usually in conventional PET, the attenuation map is obtained from a transmission scan, which uses a rotating source, or from the CT scan in case of combined PET/CT. In the case of a PET/MR scanner, there is insufficient space for the rotating source and ideally one would want to calculate the attenuation map from the MR image instead. Since MR images provide information about proton density of the different tissue types, it is not trivial to use this data for PET attenuation correction. We present a method for predicting the PET attenuation map from a given the MR image, using a combination of atlas-registration and recognition of local patterns.
Using "leave one out cross validation" we show on a database of 16 MR-CT image pairs that our method reliably allows estimating the CT image from the MR image. Subsequently, as in PET/CT, the PET attenuation map can be predicted from the CT image. On an additional dataset of MR/CT/PET triplets we quantitatively validate that our approach allows PET quantification with an error that is smaller than what would be clinically significant.
We demonstrate our approach on T1-weighted human brain scans. However, the presented methods are more general and current research focuses on applying the established methods to human whole body PET/MRI applications.

Machine learning develops intelligent computer systems that are able to generalize from previously seen examples. A new domain of machine learning, in which the prediction must satisfy the additional constraints found in structured data, poses one of machine learnings greatest challenges: learning functional dependencies between arbitrary input and output domains. This volume presents and analyzes the state of the art in machine learning algorithms and theory in this novel field. The contributors discuss applications as diverse as machine translation, document markup, computational biology, and information extraction, among others, providing a timely overview of an exciting field.

Human immunodeficiency virus type 1 (HIV-1) evolves in human body,
and its exposure to a drug often causes mutations that enhance
the resistance against the drug.
To design an effective pharmacotherapy for an individual patient,
it is important to accurately predict the drug resistance
based on genotype data.
Notably, the resistance is not just
the simple sum of the effects of all mutations.
Structural biological studies suggest that
the association of mutations is crucial:
Even if mutations A or B alone do not affect the resistance,
a significant change might happen
when the two mutations occur together.
Linear regression methods cannot take the associations into account,
while decision tree methods can reveal only limited associations.
Kernel methods and neural networks implicitly use all possible
associations for prediction, but cannot select salient associations
explicitly.
Our method, itemset boosting, performs linear regression
in the complete space of power sets of mutations.
It implements a forward feature selection procedure where,
in each iteration, one mutation combination is
found by an efficient branch-and-bound search.
This method uses all possible combinations,
and salient associations are explicitly shown.
In experiments, our method worked particularly well for predicting the
resistance of nucleotide reverse transcriptase inhibitors
(NRTIs). Furthermore, it successfully recovered many mutation
associations known in biological literature.

Semi-supervised SVMs (S3VMs) attempt to learn low-density separators by maximizing the margin over labeled and unlabeled examples. The associated optimization problem is non-convex. To examine the full potential of S3VMs modulo local minima problems in current implementations, we apply branch and bound techniques for obtaining exact, globally optimal solutions. Empirical evidence suggests that the globally optimal solution can return excellent generalization performance in situations where other implementations fail completely. While our current implementation is only applicable to small datasets,
we discuss variants that can potentially lead to practically useful algorithms.

We propose two statistical tests to determine if two samples are from different distributions. Our test statistic is in both cases the distance between the means of the two samples mapped into a reproducing kernel Hilbert space (RKHS). The first test is based on a large deviation bound for the test statistic, while the second is
based on the asymptotic distribution of this statistic.
The test statistic can be computed in $O(m^2)$ time. We apply our approach to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where our test performs strongly.
We also demonstrate excellent performance when comparing distributions over graphs, for which no alternative tests currently exist.

We consider the task of tuning hyperparameters in SVM models based on minimizing a smooth performance validation function, e.g., smoothed k-fold cross-validation error,
using non-linear optimization techniques. The key computation in this approach is that of the gradient of the validation function with respect to hyperparameters. We show that for large-scale problems involving a wide choice of kernel-based models and validation functions, this computation can be very efficiently done; often within just a fraction of the training time. Empirical results show that a near-optimal set of hyperparameters can be identified by our approach with very few training rounds and gradient computations.

Establishing correspondence between distinct objects is an important and nontrivial task: correctness of the correspondence hinges on properties which are difficult to capture in an a priori criterion. While previous work has used a priori criteria which in some cases led to very good results, the present paper explores whether it is possible to learn a combination of features that, for a given training set of aligned human heads, characterizes the notion of correct correspondence. By optimizing this criterion, we are then able to compute correspondence and morphs for novel heads.

The extraction of a parametric global motion from a motion field is a task with several applications in video processing. We present two probabilistic formulations of the problem and carry out optimization using the RAST algorithm, a geometric matching method novel to motion estimation in video. RAST uses an exhaustive and adaptive search of transformation space and thus gives -- in contrast to local sampling optimization techniques used in the past -- a globally optimal solution. Among other applications, our framework can thus be used as a source of ground truth for benchmarking motion estimation algorithms. Our main contributions are: first, the novel combination of a state-of- the-art MAP criterion for dominant motion estimation with a search procedure that guarantees global optimality. Second, experimental re- sults that illustrate the superior performance of our approach on synthetic flow fields as well as real-world video streams. Third, a significant speedup of the search achieved by extending the mod
el with an additional smoothness prior.

Most literature on Support Vector Machines (SVMs) concentrate on
the dual optimization problem. In this paper, we would like to point out
that the primal problem can also be solved efficiently, both for linear
and non-linear SVMs, and that there is no reason to ignore this possibility.
On the contrary, from the primal point of view new families of algorithms for
large scale SVM training can be investigated.

A wealth of computationally efficient approximation methods for Gaussian process regression have been recently proposed. We give a unifying overview of sparse approximations, following Quiñonero-Candela and Rasmussen (2005), and a brief review of approximate matrix-vector multiplication methods.

Abstract. This paper considers kernels invariant to translation, rotation and dilation. We show that no non-trivial
positive definite (p.d.) kernels exist which are radial and dilation invariant, only conditionally positive definite
(c.p.d.) ones. Accordingly, we discuss the c.p.d. case and provide some novel analysis, including an elementary
derivation of a c.p.d. representer theorem. On the practical side, we give a support vector machine (s.v.m.) algorithm
for arbitrary c.p.d. kernels. For the thin-plate kernel this leads to a classifier with only one parameter (the
amount of regularisation), which we demonstrate to be as effective as an s.v.m. with the Gaussian kernel, even
though the Gaussian involves a second parameter (the length scale).

The annual Neural Information Processing Systems (NIPS) conference is the flagship meeting on neural computation and machine learning. It draws a diverse group of attendees--physicists, neuroscientists, mathematicians, statisticians, and computer scientists--interested in theoretical and applied aspects of modeling, simulating, and building neural-like or intelligent systems. The presentations are interdisciplinary, with contributions in algorithms, learning theory, cognitive science, neuroscience, brain imaging, vision, speech and signal processing, reinforcement learning, and applications. Only twenty-five percent of the papers submitted are accepted for presentation at NIPS, so the quality is exceptionally high. This volume contains the papers presented at the December 2006 meeting, held in Vancouver.

We consider a model to cluster the components of a vector
time-series. The task is to assign each component of the
vector time-series to a single cluster, basing this assignment
on the simultaneous dynamical similarity of the component
to other components in the cluster. This is in contrast to the
more familiar task of clustering a set of time-series based on
global measures of their similarity. The model is based on
a Dirichlet Mixture of Linear Gaussian State-Space models
(LGSSMs), in which each LGSSM is treated with a prior to
encourage the simplest explanation. The resulting model is
approximated using a collapsed variational Bayes implementation.

We consider the problem of denoising a noisily sampled submanifold $M$ in $R^d$, where the submanifold $M$
is a priori unknown and we are only given a noisy point sample. The presented denoising algorithm is based
on a graph-based diffusion process of the point sample. We analyze this diffusion process using recent results about
the convergence of graph Laplacians. In the experiments we show that our method is capable of dealing with
non-trivial high-dimensional noise. Moreover using the denoising algorithm as pre-processing method we
can improve the results of a semi-supervised learning algorithm.

Interest point detection in still images is a well-studied topic in computer vision.
In the spatiotemporal domain, however, it is still unclear which features indicate useful interest points. In this paper we approach the problem by emph{learning} a detector from examples: we record eye movements of human subjects watching video sequences and train a neural network to predict which locations are likely to become eye movement targets. We show that our detector outperforms current spatiotemporal interest point architectures on a standard classification dataset.

We present a framework for efficient, accurate approximate Bayesian inference in generalized linear models (GLMs), based on the expectation propagation (EP) technique. The parameters can be endowed with a factorizing prior distribution, encoding properties such as sparsity or non-negativity. The central role of posterior log-concavity in Bayesian GLMs is emphasized and related to stability issues in EP. In particular, we use our technique to infer the parameters of a point process model for neuronal spiking data from multiple electrodes, demonstrating significantly superior predictive performance when a sparsity assumption is enforced via a Laplace prior distribution.

Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems