In this paper we deal with graph classification. We propose a new algorithm for performing sparse logistic regression for graphs, which is comparable in accuracy with other methods of graph classification and produces probabilistic output in addition. Sparsity is required for the reason of interpretability, which is often necessary in domains such as bioinformatics or chemoinformatics.

Box-constrained convex optimization problems are central to several
applications in a variety of fields such as statistics, psychometrics,
signal processing, medical imaging, and machine learning. Two fundamental
examples are the non-negative least squares (NNLS) problem and the
non-negative Kullback-Leibler (NNKL) divergence minimization problem. The
non-negativity constraints are usually based on an underlying physical
restriction, for e.g., when dealing with applications in astronomy,
tomography, statistical estimation, or image restoration, the underlying
parameters represent physical quantities such as concentration, weight,
intensity, or frequency counts and are therefore only interpretable with
non-negative values. Several modern optimization methods can be
inefficient for simple problems
such as NNLS and NNKL as they are really designed to handle far more
general and complex problems.
In this work we develop two simple quasi-Newton methods for solving
box-constrained
(differentiable) convex optimization problems that utilize the well-known
BFGS and limited memory BFGS updates. We position our method between
projected gradient (Rosen, 1960) and projected Newton (Bertsekas, 1982)
methods, and prove its convergence under a simple Armijo step-size rule. We
illustrate our method by showing applications to: Image deblurring, Positron
Emission Tomography (PET) image reconstruction, and Non-negative Matrix
Approximation (NMA). On medium sized data we observe performance competitive
to established procedures, while for larger data the results are even
better.

Autonomous robots that can assist humans in situations of daily life have been a long standing vision of robotics, artificial intelligence, and cognitive sciences. A first step towards this goal is to create robots that can learn tasks triggered by environmental context or higher level instruction. However, learning techniques have yet to live up to this promise as only few methods manage to scale to high-dimensional manipulator or humanoid robots. In this tutorial, we give a general overview on motor skill learning for cognitive robotics using research at ATR, USC, CMU and Max-Planck in order to illustrate the problems in motor skill learning. For doing so, we discuss task-appropriate representations and algorithms for learning robot motor skills. Among the topics are the learning basic movements or motor primitives by imitation and reinforcement learning, learning rhytmic and discrete movements, fast regression methods for learning inverse dynamics and setups for learning task-space policies. Examples on various robots, e.g., SARCOS DB, the SARCOS Master Arm, BDI Little Dog and a Barrett WAM, are shown and include Ball-in-a-Cup, T-Ball, Juggling, Devil-Sticking, Operational Space Control and many others.

25th International Conference on Machine Learning (ICML), July 2008 (talk)

Abstract

This tutorial will give an introduction to the recent understanding and methodology of the kernel method: dealing with higher order statistics by embedding painlessly random variables/probability distributions.
In the early days of kernel machines research, the "kernel trick" was considered a useful way of constructing nonlinear algorithms from linear ones. More recently, however, it has become clear that a potentially more far reaching use of kernels is as a linear way of dealing with higher order statistics by embedding distributions in a suitable reproducing kernel Hilbert space (RKHS). Notably, unlike the straightforward expansion of higher order moments or conventional characteristic function approach, the use of kernels or RKHS provides a painless, tractable way of embedding distributions.
This line of reasoning leads naturally to the questions: what does it mean to embed a distribution in an RKHS? when is this embedding injective (and thus, when do different distributions have unique mappings)? what implications are there for learning algorithms that make use of these embeddings? This tutorial aims at answering these questions.
There are a great variety of applications in machine learning and computer science, which require distribution estimation and/or comparison.

25th International Conference on Machine Learning (ICML), June 2008 (talk)

Abstract

We derive a generalization bound for multi-classification schemes based on grid clustering in categorical parameter product spaces. Grid clustering partitions the parameter space in the form of a Cartesian product of partitions for each of the parameters. The derived bound provides a means to evaluate clustering solutions in terms of the generalization power of a built-on classifier. For classification based on a single feature the bound serves to find a globally optimal classification rule. Comparison of the generalization power of individual features can then be used for feature ranking. Our experiments show that in this role the bound is much more precise than mutual information or normalized correlation indices.

With the help of differential geometry we describe a framework to define a thin-plate spline like energy for maps between arbitrary Riemannian manifolds. The so-called Eells energy only depends on the intrinsic geometry of the input and output manifold, but not on their respective representation. The energy can then be used for regression between manifolds, we present results for cases where the outputs are rotations, sets of angles, or points on 3D surfaces. In the future we plan to also target regression where the output is an element of "shape space", understood as a Riemannian manifold. One could also further explore the meaning of the Eells energy when applied to diffeomorphisms between shapes, especially with regard to its potential use as a distance measure between shapes that does not depend on the embedding or the parametrisation of the shapes.

2002

2002

Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems