The accuracy of optical flow estimation algorithms has been improving steadily as evidenced by results on the Middlebury optical flow benchmark. The typical formulation, however, has changed little since the work of Horn and Schunck. We attempt to uncover what has made recent advances possible through a thorough analysis of how the objective function, the optimization method, and modern implementation practices influence accuracy. We discover that "classical'' flow formulations perform surprisingly well when combined with modern optimization and implementation techniques. One key implementation detail is the median filtering of intermediate flow fields during optimization. While this improves the robustness of classical methods it actually leads to higher energy solutions, meaning that these methods are not optimizing the original objective function. To understand the principles behind this phenomenon, we derive a new objective function that formalizes the median filtering heuristic. This objective function includes a non-local smoothness term that robustly integrates flow estimates over large spatial neighborhoods. By modifying this new term to include information about flow and image boundaries we develop a method that can better preserve motion details. To take advantage of the trend towards video in wide-screen format, we further introduce an asymmetric pyramid downsampling scheme that enables the estimation of longer range horizontal motions. The methods are evaluated on Middlebury, MPI Sintel, and KITTI datasets using the same parameter settings.

Recently several hierarchical inverse dynamics controllers based on cascades of quadratic programs have been proposed for application on torque controlled robots. They have important theoretical benefits but have never been implemented on a torque controlled robot where model inaccuracies and real-time computation requirements can be problematic. In this contribution we present an experimental evaluation of these algorithms in the context of balance control for a humanoid robot. The presented experiments demonstrate the applicability of the approach under real robot conditions (i.e. model uncertainty, estimation errors, etc). We propose a simplification of the optimization problem that allows us to decrease computation time enough to implement it in a fast torque control loop. We implement a momentum-based balance controller which shows robust performance in face of unknown disturbances, even when the robot is standing on only one foot. In a second experiment, a tracking task is evaluated to demonstrate the performance of the controller with more complicated hierarchies. Our results show that hierarchical inverse dynamics controllers can be used for feedback control of humanoid robots and that momentum-based balance control can be efficiently implemented on a real robot.

Humanoid robots operating in human environments require whole-body controllers that can offer precise tracking and well-defined disturbance rejection behavior. In this contribution, we propose an experimental evaluation of a linear quadratic regulator (LQR) using a linearization of the full robot dynamics together with the contact constraints. The advantage of the controller is that it explicitly takes into account the coupling between the different joints to create optimal feedback controllers for whole-body control. We also propose a method to explicitly regulate other tasks of interest, such as the regulation of the center of mass of the robot or its angular momentum. In order to evaluate the performance of linear optimal control designs in a real-world scenario (model uncertainty, sensor noise, imperfect state estimation, etc), we test the controllers in a variety of tracking and balancing experiments on a torque controlled humanoid (e.g. balancing, split plane balancing, squatting, pushes while squatting, and balancing on a wheeled platform). The proposed control framework shows a reliable push recovery behavior competitive with more sophisticated balance controllers, rejecting impulses up to 11.7 Ns with peak forces of 650 N, with the added advantage of great computational simplicity. Furthermore, the controller is able to track squatting trajectories up to 1 Hz without relinearization, suggesting that the linearized dynamics is sufficient for significant ranges of motion.

Complexity is a hallmark of intelligent behavior consisting both of regular patterns and random variation. To quantitatively assess the complexity and randomness of human motion, we designed a motor task in which we translated subjects' motion trajectories into strings of symbol sequences. In the first part of the experiment participants were asked to perform self-paced movements to create repetitive patterns, copy pre-specified letter sequences, and generate random movements. To investigate whether the degree of randomness can be manipulated, in the second part of the experiment participants were asked to perform unpredictable movements in the context of a pursuit game, where they received feedback from an online Bayesian predictor guessing their next move. We analyzed symbol sequences representing subjects' motion trajectories with five common complexity measures: predictability, compressibility, approximate entropy, Lempel-Ziv complexity, as well as effective measure complexity. We found that subjects’ self-created patterns were the most complex, followed by drawing movements of letters and self-paced random motion. We also found that participants could change the randomness of their behavior depending on context and feedback. Our results suggest that humans can adjust both complexity and regularity in different movement types and contexts and that this can be assessed with information-theoretic measures of the symbolic sequences generated from movement trajectories.

In the first simulation, the intrinsic motivation of the agent was given by measuring learning progress through reduction in informational surprise (Figure 1 A-C). This way the agent should first learn the action that is easiest to learn (a1), and then switch to other actions that still allow for learning (a2) and ignore actions that cannot be learned at all (a3). This is exactly what we found in our simple environment. Compared to the original developmental learning algorithm based on learning progress proposed by Oudeyer [2], our Context Tree Weighting approach does not require local experts to do prediction, rather it learns the conditional probability distribution over observations given action in one structure. In the second simulation, the intrinsic motivation of the agent was given by measuring compression progress through improvement in compressibility (Figure 1 D-F). The agent behaves similarly: the agent first concentrates on the action with the most predictable consequence and then switches over to the regular action where the consequence is more difficult to predict, but still learnable. Unlike the previous simulation, random actions are also interesting to some extent because the compressed symbol strings use 8-bit representations, while only 2 bits are required for our observation space. Our preliminary results suggest that Context Tree Weighting might provide a useful representation to study problems of development.

This paper introduces a framework for state estimation on a humanoid robot platform using only common proprioceptive sensors and knowledge of leg kinematics. The presented approach extends that detailed in prior work on a point-foot quadruped platform by adding the rotational constraints imposed by the humanoid's flat feet. As in previous work, the proposed Extended Kalman Filter accommodates contact switching and makes no assumptions about gait or terrain, making it applicable on any humanoid platform for use in any task. A nonlinear observability analysis is performed on both the point-foot and flat-foot filters and it is concluded that the addition of rotational constraints significantly simplifies singular cases and improves the observability characteristics of the system. Results on a simulated walking dataset demonstrate the performance gain of the flat-foot filter as well as confirm the results of the presented observability analysis.

Previous work has shown that classical sequential decision making rules, including expectimax and minimax, are limit cases of a more general class of bounded rational planning problems that trade off the value and the complexity of the solution, as measured by its information divergence from a given reference. This allows modeling a range of novel planning problems having varying degrees of control due to resource constraints, risk-sensitivity, trust and model uncertainty. However, so far it has been unclear in what sense information constraints relate to the complexity of planning. In this paper, we introduce Monte Carlo methods to solve the generalized optimality equations in an efficient \& exact way when the inverse temperatures in a generalized decision tree are of the same sign. These methods highlight a fundamental relation between inverse temperatures and the number of Monte Carlo proposals. In particular, it is seen that the number of proposals is essentially independent of the size of the decision tree.

In the multiple instance learning setting, each observation is a bag of feature vectors of which one or more vectors indicates membership in a class. The primary task is to identify if any vectors in the bag indicate class membership while ignoring vectors that do not. We describe here a kernel-based technique that defines a parametric family of kernels via conformal transformations and jointly learns a discriminant function over bags together with the optimal parameter settings of the kernel. Learning a conformal transformation effectively amounts to weighting regions in the feature space according to their contribution to classification accuracy; regions that are discriminative will be weighted higher than regions that are not. This allows the classifier to focus on regions contributing to classification accuracy while ignoring regions that correspond to vectors found both in positive and in negative bags. We show how parameters of this transformation can be learned for support vector machines by posing the
problem as a multiple kernel learning problem. The resulting multiple instance classifier gives competitive accuracy for several multi-instance benchmark datasets from different domains.

The availability of new and fast tools in structure determination has led to a more than exponential growth of the number of structures solved per year. It is therefore increasingly essential to assess the accuracy of the new structures by reliable approaches able to assist validation. Here, we discuss a specific example in which the use of different complementary techniques, which include Bayesian methods and small angle scattering, resulted essential for validating the two currently available structures of the Josephin domain of ataxin-3, a protein involved in the ubiquitin/proteasome pathway and responsible for neurodegenerative spinocerebellar ataxia of type 3. Taken together, our results demonstrate that only one of the two structures is compatible with the experimental information. Based on the high precision of our refined structure, we show that Josephin contains an open cleft which could be directly implicated in the interaction with polyubiquitin chains and other partners.

Volterra and Wiener series are perhaps the best understood nonlinear system representations in signal processing. Although both approaches have enjoyed a certain popularity in the past, their application has been limited to rather low-dimensional and weakly nonlinear systems due to the exponential growth of the number of terms that have to be estimated. We show that Volterra and Wiener series can be represented implicitly as elements of a reproducing kernel Hilbert space by utilizing polynomial kernels. The estimation complexity of the implicit representation is linear in the input dimensionality and
independent of the degree of nonlinearity. Experiments show performance advantages in terms of convergence, interpretability, and system sizes that can be handled.

We propose a general framework for computing minimal set covers under class of certain logical constraints.
The underlying idea is to transform the problem into a mathematical programm under linear constraints.
In this sense it can be seen as a natural extension of the vector quantization algorithm proposed by Tipping and Schoelkopf.
We show which class of logical constraints can be cast and relaxed into linear constraints and give an algorithm for
the transformation.

In computational biology, it is common to represent domain knowledge using graphs. Frequently there exist multiple graphs for the same set of nodes, representing information from different sources, and no single graph is sufficient to predict class labels of unlabelled nodes reliably. One way to enhance reliability is to integrate multiple graphs, since individual graphs are partly independent and partly complementary to each other for prediction. In this chapter, we describe an algorithm to assign weights to multiple graphs within graph-based semi-supervised learning. Both predicting class labels and searching for weights for combining multiple graphs are formulated into one convex optimization problem. The graph-combining method is applied to functional class prediction of yeast proteins.When compared with individual graphs, the combined graph with optimized weights performs significantly better than any single graph.When compared with the semidefinite programming-based support vector machine (SDP/SVM), it shows comparable accuracy in a remarkably short time. Compared with a combined graph with equal-valued weights, our method could select important graphs without loss of accuracy, which implies the desirable property of integration with selectivity.

A major challenge in applying machine learning methods to Brain-Computer
Interfaces (BCIs) is to overcome the possible nonstationarity in the data from the datablock
the method is trained on and that the method is applied to. Assuming the joint
distributions of the whitened signal and the class label to be identical in two blocks, where
the whitening is done in each block independently, we propose a simple adaptation formula
that is applicable to a broad class of spatial filtering methods including ICA, CSP, and
logistic regression classifiers. We characterize the class of linear transformations for which
the above assumption holds. Experimental results on 60 BCI datasets show improved
classification accuracy compared to (a) fixed spatial filter approach (no adaptation) and
(b) fixed spatial pattern approach (proposed by Hill et al., 2006 [1]).

Many real-world machine learning problems are situated on finite discrete sets,
including dimensionality reduction, clustering, and transductive inference. A variety
of approaches for learning from finite sets has been proposed from different
motivations and for different problems. In most of those approaches, a finite set
is modeled as a graph, in which the edges encode pairwise relationships among the
objects in the set. Consequently many concepts and methods from graph theory are
adopted. In particular, the graph Laplacian is widely used.
In this chapter we present a systemic framework for learning from a finite set
represented as a graph. We develop discrete analogues of a number of differential
operators, and then construct a discrete analogue of classical regularization theory
based on those discrete differential operators. The graph Laplacian based approaches
are special cases of this general discrete regularization framework. An important
thing implied in this framework is that we have a wide choices of regularization on
graph in addition to the widely-used graph Laplacian based one.

Journal of the European Ceramic Society, 26(15):3061-3065, November 2006 (article)

Abstract

A common approach for the determination of Slow Crack Growth (SCG) parameters
are the static and dynamic loading method. Since materials with small Weibull
module show a large variability in strength, a correct statistical analysis of the
data is indispensable. In this work we propose the use of the Maximum Likelihood
method and a Baysian analysis, which, in contrast to the standard procedures, take
into account that failure strengths are Weibull distributed. The analysis provides
estimates for the SCG parameters, the Weibull module, and the corresponding confidence
intervals and overcomes the necessity of manual differentiation between inert
and fatigue strength data. We compare the methods to a Least Squares approach,
which can be considered the standard procedure. The results for dynamic loading
data from the glass sealing of MEMS devices show that the assumptions inherent
to the standard approach lead to significantly different estimates.

Motivation: In detection of non-coding RNAs, it is often necessary
to identify the secondary structure motifs from a set of putative RNA
sequences. Most of the existing algorithms aim to provide the best
motif or few good motifs, but biologists often need to inspect all the
possible motifs thoroughly.
Results: Our method RNAmine employs a graph theoretic representation
of RNA sequences, and detects all the possible motifs
exhaustively using a graph mining algorithm. The motif detection problem
boils down to finding frequently appearing patterns in a set of
directed and labeled graphs. In the tasks of common secondary structure
prediction and local motif detection from long sequences, our
method performed favorably both in accuracy and in efficiency with
the state-of-the-art methods such as CMFinder.

Objective. Despite many research efforts in recent
decades, the major pathogenetic mechanisms of osteo-
arthritis (OA), including gene alterations occurring
during OA cartilage degeneration, are poorly under-
stood, and there is no disease-modifying treatment
approach. The present study was therefore initiated in
order to identify differentially expressed disease-related
genes and potential therapeutic targets.
Methods. This investigation consisted of a large
gene expression profiling study performed based on 78
normal and disease samples, using a custom-made
complementar y DNA array covering &gt;4,000 genes.
Results. Many differentially expressed genes were
identified, including the expected up-regulation of ana-
bolic and catabolic matrix genes. In particular, the
down-regulation of important oxidative defense genes,
i.e., the genes for superoxide dismutases 2 and 3 and
glutathione peroxidase 3, was prominent. This indicates
that continuous oxidative stress to the cells and the
matrix is one major underlying pathogenetic mecha-
nism in OA. Also, genes that are involved in the
phenot ypic stabilit y of cells, a feature that is greatly
reduced in OA cartilage, appeared to be suppressed.
Conclusion. Our findings provide a reference data set on gene alterations in OA cartilage and, importantly,
indicate major mechanisms underlying central cell bio-
logic alterations that occur during the OA disease
process. These results identify molecular targets that
can be further investigated in the search for therapeutic
interventions.

Small molecules in chemistry can be represented as graphs.
In a quantitative structure-activity relationship (QSAR) analysis, the
central task is to find a regression function that predicts
the activity of the molecule in high accuracy.
Setting a QSAR as a primal target, we propose a new linear
programming approach to the graph-based regression problem.
Our method extends the graph classification algorithm by Kudo et al.
(NIPS 2004), which is a combination of boosting and graph mining.
Instead of sequential multiplicative updates, we employ the linear
programming boosting (LP) for regression. The LP approach allows to
include inequality constraints for the parameter vector, which turns out to
be particularly useful in QSAR tasks where activity values are
sometimes unavailable.
Furthermore, the efficiency is improved significantly by employing
multiple pricing.

In this paper we introduce a novel approach for incrementally building aspect models, and use it to dynamically discover underlying themes from document streams. Using the new approach we present an application which we call query-line tracking i.e., we automatically discover and summarize different themes or stories that appear over time, and that relate to a particular query. We present evaluation on news corpora to demonstrate the strength of our method for both query-line tracking, online indexing and clustering.

We consider the problem of constructing a globally smooth analytic function that represents a surface implicitly by way of its zero set, given sample points with surface normal vectors. The contributions of the paper include a novel means of regularising multi-scale compactly supported basis functions that leads to the desirable interpolation properties previously only associated with fully supported bases. We also provide a regularisation framework for simpler and more direct treatment of surface normals, along with a corresponding generalisation of the representer theorem lying at the core of kernel-based machine learning methods. We demonstrate the techniques on 3D problems of up to 14 million data points, as well as 4D time series data and four-dimensional interpolation between three-dimensional shapes.

Despite many years of research on how to properly align sequences in
the presence of sequencing errors, alternative splicing and
micro-exons, the correct alignment of mRNA sequences to genomic DNA is
still a challenging task. We present a novel approach based on large
margin learning that combines kernel based splice site predictions
with common sequence alignment techniques. By solving a convex
optimization problem, our algorithm -- called PALMA -- tunes the
parameters of the model such that the true alignment scores higher
than all other alignments. In an experimental study on the alignments
of mRNAs containing artificially generated micro-exons, we show that
our algorithm drastically outperforms all other methods: It perfectly
aligns all 4358 sequences on an hold-out set, while the best other
method misaligns at least 90 of them. Moreover, our algorithm is very
robust against noise in the query sequence: when deleting, inserting,
or mutating up to 50% of the query sequence, it still aligns 95% of
all sequences correctly, while other methods achieve less than 36%
accuracy. For datasets, additional results and a stand-alone
alignment tool see
http://www.fml.mpg.de/raetsch/projects/palma.

In many graph-based semi-supervised learning algorithms, edge weights are assumed to be fixed and determined by the data points&amp;amp;amp;amp;lsquo; (often symmetric)relationships in input space, without considering directionality.
However, relationships may be more informative in one direction (e.g. from labelled to unlabelled) than in the reverse direction, and some
relationships (e.g. strong weights between oppositely labelled points) are unhelpful in either direction. Undesirable edges may reduce the amount of influence an informative point can propagate to its neighbours -- the point and its outgoing edges have been ``blunted.&amp;amp;amp;amp;lsquo;&amp;amp;amp;amp;lsquo; We present an approach to ``sharpening&amp;amp;amp;amp;lsquo;&amp;amp;amp;amp;lsquo; in which weights are adjusted to meet an optimization criterion
wherever they are directed towards labelled points. This principle can be applied to a wide variety of algorithms. In the current paper, we present one ad hoc solution satisfying the principle, in order to show that it can improve performance on a number of publicly available benchmark data sets.

In this paper, an approach to the finite-horizon optimal state-feedback control problem of nonlinear, stochastic, discrete-time systems is presented. Starting from the dynamic programming equation, the value function will be approximated by means of Taylor series expansion up to second-order derivatives. Moreover, the problem will be reformulated, such that a minimum principle can be applied to the stochastic problem. Employing this minimum principle, the optimal control problem can be rewritten as a two-point boundary-value problem to be solved at each time step of a shrinking horizon. To avoid numerical problems, the two-point boundary-value problem will be solved by means of a continuation method. Thus, the curse of dimensionality of dynamic programming is avoided, and good candidates for the optimal state-feedback controls are obtained. The proposed approach will be evaluated by means of a scalar example system.

The regularization functional induced by the graph Laplacian of a random
neighborhood graph based on the data is adaptive in two ways. First it adapts to an underlying
manifold structure and second to the density of the data-generating probability measure.
We identify in this paper the limit of the regularizer and show
uniform convergence over the space of Hoelder functions. As an intermediate
step we derive upper bounds on the covering numbers of Hoelder functions on
compact Riemannian manifolds, which are of independent interest
for the theoretical analysis of manifold-based learning methods.

The Common Spatial Pattern (CSP) algorithm is a highly successful method for efficiently calculating spatial filters for brain signal classification. Spatial filtering can improve classification performance considerably, but demands that a large number of electrodes be mounted, which is inconvenient in day-to-day BCI usage. The CSP algorithm is also known for its tendency to overfit, i.e. to learn the noise in the training set rather than the signal. Both problems motivate an approach in which spatial filters are sparsified. We briefly sketch a reformulation of the problem which allows us to do this, using 1-norm regularisation. Focusing on the electrode selection issue, we present preliminary results on EEG data sets that suggest that effective spatial filters may be computed with as few as 10--20 electrodes, hence offering the potential to simplify the practical realisation of BCI systems significantly.

Given a spatial filtering algorithm that has allowed us to identify task-relevant EEG sources, we present a simple approach
for monitoring the activity of these sources while remaining relatively robust to changes in other (task-irrelevant) brain activity. The idea is to keep spatial *patterns* fixed rather than spatial filters, when transferring from
training to test sessions or from one time window to another. We show that a fixed spatial pattern (FSP)
approach, using a moving-window estimate of signal covariances, can be more robust to non-stationarity than a fixed spatial filter (FSF) approach.

Stability is a common tool to verify the validity of sample
based algorithms. In clustering it is widely used to tune the parameters of
the algorithm, such as the number k of clusters. In spite of the popularity
of stability in practical applications, there has been very little theoretical
analysis of this notion. In this paper we provide a formal definition
of stability and analyze some of its basic properties. Quite surprisingly,
the conclusion of our analysis is that for large sample size, stability is
fully determined by the behavior of the objective function which the
clustering algorithm is aiming to minimize. If the objective function has
a unique global minimizer, the algorithm is stable, otherwise it is unstable.
In particular we conclude that stability is not a well-suited tool
to determine the number of clusters - it is determined by the symmetries
of the data which may be unrelated to clustering parameters. We
prove our results for center-based clusterings and for spectral clustering,
and support our conclusions by many examples in which the behavior of
stability is counter-intuitive.

Real-world data often involves objects that exhibit multiple relationships; for example, papers and authors exhibit both paper-author interactions and paper-paper citation relationships. A typical learning problem requires one to make inferences about a subclass of objects (e.g. papers), while using the remaining objects and relations to provide relevant information. We present a simple, unified mechanism for incorporating information from multiple object types and relations when learning on a targeted subset. In this scheme, all sources of relevant information are marginalized onto the target subclass via random walks. We show that marginalized random walks can be used as a general technique for combining multiple sources of information in relational data. With this approach, we formulate new algorithms for transduction and ranking in relational data, and quantify the performance of new schemes on real world dataachieving good results in many problems.

Designs of micro electro-mechanical devices need to be robust against fluctuations in mass production. Computer experiments with tens of parameters are used to explore the behavior of the system, and to compute sensitivity measures as expectations over the input distribution. Monte Carlo methods are a simple approach to estimate these integrals, but they are infeasible when the models are computationally expensive. Using a Gaussian processes prior, expensive simulation runs can be saved. This Bayesian quadrature allows for an active selection of inputs where the simulation promises to be most valuable, and the number of simulation runs can be reduced further.
We present an active learning scheme for sensitivity analysis which is rigorously derived from the corresponding Bayesian expected loss. On three fully featured, high dimensional physical models of electro-mechanical sensors, we show that the learning rate in the active learning scheme is significantly better than for passive learning.

The ability to detect online abnormal events in signals is essential in many real-world Signal Processing applications. Previous algorithms require an explicit signal statistical model, and interpret abnormal events as statistical model abrupt changes. Corresponding implementation relies on maximum likelihood or on Bayes estimation theory with generally excellent performance. However, there are numerous cases where a robust and tractable model cannot be obtained, and model-free approaches need to be considered. In this paper, we investigate a machine learning, descriptor-based approach that does not require an explicit descriptors statistical model, based on Support Vector novelty detection. A sequential optimization algorithm is introduced. Theoretical considerations as well as simulations on real signals demonstrate its practical efficiency.

In recent years, spectral clustering has become one of the most
popular modern clustering algorithms. It is simple to implement, can
be solved efficiently by standard linear algebra software, and very
often outperforms traditional clustering algorithms such as the
k-means algorithm. Nevertheless, on the first glance spectral
clustering looks a bit mysterious, and it is not obvious to see why it
works at all and what it really does. This article is a tutorial
introduction to spectral clustering. We describe different graph
Laplacians and their basic properties, present the most common
spectral clustering algorithms, and derive those algorithms from
scratch by several different approaches. Advantages and disadvantages
of the different spectral clustering algorithms are discussed.

Motivation: Many problems in data integration in bioinformatics can be posed as one common question: Are two sets of observations generated by the same distribution? We propose a kernel-based statistical test for this problem, based on the fact that two distributions are different if and only if there exists at least one function having different expectation on the two distributions. Consequently we use the maximum discrepancy between function means as the basis of a test statistic.
The Maximum Mean Discrepancy (MMD) can take advantage of the kernel trick, which allows us to apply it not only to vectors, but strings, sequences, graphs, and other common structured data types arising in molecular biology.
Results: We study the practical feasibility of an MMD-based test on three central data integration tasks: Testing cross-platform comparability of microarray data, cancer diagnosis, and data-content based schema matching for two different protein function classification schemas. In all of these experiments, including high-dimensional ones, MMD is very accurate in finding samples that were generated from the same distribution, and outperforms its best competitors.
Conclusions: We have defined a novel statistical test of whether two samples are from the same distribution, compatible with both multivariate and structured data, that is fast, easy to implement, and works well, as confirmed by our experiments.

We propose novel methods for machine learning of structured output
spaces. Specifically, we consider outputs which are graphs with
vertices that have a natural order.
We consider the usual adjacency matrix representation of
graphs, as well as two other representations for such a graph: (a)
decomposing the graph into a set of paths, (b) converting the graph
into a single sequence of nodes with labeled edges.
For each of the three representations, we propose an encoding and
decoding scheme. We also propose an evaluation measure for comparing
two graphs.

Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems