In this talk I will discuss the problem of quasar target selection. The objects attributes in astronomy such as fluxes are often subjected to substantial and heterogeneous measurement uncertainties, especially for the medium-redshift between 2.2 and 3.5 quasars which is relatively rare and must be targeted down to g ~ 22 mag. Most of the previous works for quasar target selection includes UV-excess, kernel density estimation, a likelihood approach, and artificial neural network cannot directly deal with the heterogeneous input uncertainties. Recently, extreme deconvolution (XD) has been used to tackle this problem in a well-posed manner. In this work, we present a discriminative approach for quasar target selection that can deal with input uncertainties directly. To do so, we represent each object as a Gaussian distribution whose mean is the object's attribute vector and covariance is the given flux measurement uncertainty. Given a training set of Gaussian distributions, the support measure machines (SMMs) algorithm are trained and used to build the quasar targeting catalog. Preliminary results will also be presented.
Joint work with Jo Bovy and Bernhard Sch{\"o}lkopf

2009

Mini-Symposia on Assistive Machine Learning for People with Disabilities at NIPS (AMD), December 2009 (talk)

Abstract

Brain-computer interfaces (BCI) aim to be the ultimate in assistive technology: decoding a user&lsquo;s intentions directly from brain signals without involving any muscles or peripheral nerves. Thus, some classes of BCI potentially offer hope for users with even the most extreme cases of paralysis, such as in late-stage Amyotrophic Lateral Sclerosis, where nothing else currently allows communication of any kind. Other lines in BCI research aim to restore lost motor function in as natural a way as possible, reconnecting and in some cases re-training motor-cortical areas to control prosthetic, or previously paretic, limbs. Research and development are progressing on both invasive and non-invasive fronts, although BCI has yet to make a breakthrough to widespread clinical application.
The high-noise high-dimensional nature of brain-signals, particularly in non-invasive approaches and in patient populations, make robust decoding techniques a necessity. Generally, the approach has been to use relatively simple feature extraction techniques, such as template matching and band-power estimation, coupled to simple linear classifiers. This has led to a prevailing view among applied BCI researchers that (sophisticated) machine-learning is irrelevant since "it doesn&lsquo;t matter what classifier you use once you&lsquo;ve done your preprocessing right and extracted the right features." I shall show a few examples of how this runs counter to both the empirical reality and the spirit of what needs to be done to bring BCI into clinical application. Along the way I&lsquo;ll highlight some of the interesting problems that remain open for machine-learners.

Clustering is a widely used tool for exploratory data analysis. However, the theoretical understanding of clustering is very limited. We still do not have a well-founded answer to the seemingly simple question of
"how many clusters are present in the data?", and furthermore a formal comparison of clusterings based on different optimization objectives is far beyond our abilities. The lack of good theoretical support gives rise to multiple heuristics that confuse the practitioners and stall development of the field. We suggest that the ill-posed nature of clustering problems is caused by the fact that clustering is often taken out of its subsequent application context. We argue that one does not cluster the data just for the sake of clustering it, but rather to facilitate the solution of some higher level task. By evaluation of the clustering‘s contribution to the solution of the higher level task it is possible to compare different clusterings, even those obtained by different optimization objectives. In the preceding work it was shown that such an approach can be applied to evaluation and design of co-clustering solutions. Here we suggest that this approach can be extended to other settings, where clustering is applied.

Kernel Canonical Correlation Analysis (KCCA) is a general technique for subspace learning that incorporates
principal components analysis (PCA) and Fisher linear discriminant analysis (LDA) as special
cases. By finding directions that maximize correlation, KCCA learns representations tied more closely
to underlying process generating the the data and can ignore high-variance noise directions. However,
for data where acquisition in a given modality is expensive or otherwise limited, KCCA may suffer from
small sample effects. We propose to use semi-supervised Laplacian regularization to utilize data that are
present in only one modality. This manifold learning approach is able to find highly correlated directions
that also lie along the data manifold, resulting in a more robust estimate of correlated subspaces. Functional
magnetic resonance imaging (fMRI) acquired data are naturally amenable to subspace techniques
as data are well aligned and such data of the human brain are a particularly interesting candidate. In this
study we implemented various supervised and semi-supervised versions of KCCA on human fMRI data,
with regression to single and multivariate labels (corresponding to video content subjects viewed during
the image acquisition). In each variate condition, Laplacian regularization improved performance whereas
the semi-supervised variants of KCCA yielded the best performance. We additionally analyze the weights
learned by the regression in order to infer brain regions that are important during different types of visual processing.

The acquisition and self-improvement of novel motor skills is among the most important problems in robotics. Motor primitives offer one of the most promising frameworks for the application of machine learning techniques in this context. Employing the Dynamic Systems Motor primitives originally introduced by Ijspeert et al. (2003), appropriate learning algorithms for a concerted approach of both imitation and reinforcement learning are presented. Using these algorithms new motor skills, i.e., Ball-in-a-Cup, Ball-Paddling and Dart-Throwing, are learned.

In this paper we deal with graph classification. We propose a new algorithm for performing sparse logistic regression for graphs, which is comparable in accuracy with other methods of graph classification and produces probabilistic output in addition. Sparsity is required for the reason of interpretability, which is often necessary in domains such as bioinformatics or chemoinformatics.

Box-constrained convex optimization problems are central to several
applications in a variety of fields such as statistics, psychometrics,
signal processing, medical imaging, and machine learning. Two fundamental
examples are the non-negative least squares (NNLS) problem and the
non-negative Kullback-Leibler (NNKL) divergence minimization problem. The
non-negativity constraints are usually based on an underlying physical
restriction, for e.g., when dealing with applications in astronomy,
tomography, statistical estimation, or image restoration, the underlying
parameters represent physical quantities such as concentration, weight,
intensity, or frequency counts and are therefore only interpretable with
non-negative values. Several modern optimization methods can be
inefficient for simple problems
such as NNLS and NNKL as they are really designed to handle far more
general and complex problems.
In this work we develop two simple quasi-Newton methods for solving
box-constrained
(differentiable) convex optimization problems that utilize the well-known
BFGS and limited memory BFGS updates. We position our method between
projected gradient (Rosen, 1960) and projected Newton (Bertsekas, 1982)
methods, and prove its convergence under a simple Armijo step-size rule. We
illustrate our method by showing applications to: Image deblurring, Positron
Emission Tomography (PET) image reconstruction, and Non-negative Matrix
Approximation (NMA). On medium sized data we observe performance competitive
to established procedures, while for larger data the results are even
better.

Autonomous robots that can assist humans in situations of daily life have been a long standing vision of robotics, artificial intelligence, and cognitive sciences. A first step towards this goal is to create robots that can learn tasks triggered by environmental context or higher level instruction. However, learning techniques have yet to live up to this promise as only few methods manage to scale to high-dimensional manipulator or humanoid robots. In this tutorial, we give a general overview on motor skill learning for cognitive robotics using research at ATR, USC, CMU and Max-Planck in order to illustrate the problems in motor skill learning. For doing so, we discuss task-appropriate representations and algorithms for learning robot motor skills. Among the topics are the learning basic movements or motor primitives by imitation and reinforcement learning, learning rhytmic and discrete movements, fast regression methods for learning inverse dynamics and setups for learning task-space policies. Examples on various robots, e.g., SARCOS DB, the SARCOS Master Arm, BDI Little Dog and a Barrett WAM, are shown and include Ball-in-a-Cup, T-Ball, Juggling, Devil-Sticking, Operational Space Control and many others.

25th International Conference on Machine Learning (ICML), July 2008 (talk)

Abstract

This tutorial will give an introduction to the recent understanding and methodology of the kernel method: dealing with higher order statistics by embedding painlessly random variables/probability distributions.
In the early days of kernel machines research, the "kernel trick" was considered a useful way of constructing nonlinear algorithms from linear ones. More recently, however, it has become clear that a potentially more far reaching use of kernels is as a linear way of dealing with higher order statistics by embedding distributions in a suitable reproducing kernel Hilbert space (RKHS). Notably, unlike the straightforward expansion of higher order moments or conventional characteristic function approach, the use of kernels or RKHS provides a painless, tractable way of embedding distributions.
This line of reasoning leads naturally to the questions: what does it mean to embed a distribution in an RKHS? when is this embedding injective (and thus, when do different distributions have unique mappings)? what implications are there for learning algorithms that make use of these embeddings? This tutorial aims at answering these questions.
There are a great variety of applications in machine learning and computer science, which require distribution estimation and/or comparison.

With the help of differential geometry we describe a framework to define a thin-plate spline like energy for maps between arbitrary Riemannian manifolds. The so-called Eells energy only depends on the intrinsic geometry of the input and output manifold, but not on their respective representation. The energy can then be used for regression between manifolds, we present results for cases where the outputs are rotations, sets of angles, or points on 3D surfaces. In the future we plan to also target regression where the output is an element of "shape space", understood as a Riemannian manifold. One could also further explore the meaning of the Eells energy when applied to diffeomorphisms between shapes, especially with regard to its potential use as a distance measure between shapes that does not depend on the embedding or the parametrisation of the shapes.

Using robots as models of cognitive behaviour has a long tradition in robotics. Parallel to the historical development in cognitive science, one observes two major, subsequent waves in cognitive robotics. The first is based on ideas of classical, cognitivist Artificial Intelligence (AI). According to the AI view of cognition as rule-based symbol manipulation, these robots typically try to extract symbolic descriptions of the environment from their sensors that are used to update a common, global world representation from which, in turn, the next action of the robot is derived. The AI approach has been successful in strongly restricted and controlled environments requiring well-defined tasks, e.g. in industrial assembly lines.
AI-based robots mostly failed, however, in the unpredictable and unstructured environments that have to be faced by mobile robots. This has provoked the second wave in cognitive robotics which tries to achieve cognitive behaviour as an emergent property from the interaction of simple, low-level modules. Robots of the second wave are called animats as their architecture is designed to closely model aspects of real animals. Using only simple reactive mechanisms and Hebbian-type or evolutionary learning, the resulting animats often outperformed the highly complex AI-based robots in tasks such as obstacle avoidance, corridor following etc.
While successful in generating robust, insect-like behaviour, typical animats are limited to stereotyped, fixed stimulus-response associations. If one adopts the view that cognition requires a flexible, goal-dependent choice of behaviours and planning capabilities (H.A. Mallot, Kognitionswissenschaft, 1999, 40-48) then it appears that cognitive behaviour cannot emerge from a collection of purely reactive modules. It rather requires environmentally decoupled structures that work without directly engaging the actions that it is concerned with. This poses the current challenge to cognitive robotics: How can we build cognitive robots that show the robustness and the learning capabilities of animats without falling back into the representational paradigm of AI?
The speakers of the symposium present their approaches to this question in the context of robot navigation and sensorimotor learning. In the first talk, Prof. Helge Ritter introduces a robot system for imitation learning capable of exploring various alternatives in simulation before actually performing a task. The second speaker, Angelo Arleo, develops a model of spatial memory in rat navigation based on his electrophysiological experiments. He validates the model on a mobile robot which, in some navigation tasks, shows a performance comparable to that of the real rat. A similar model of spatial memory is used to investigate the mechanisms of territory formation in a series of robot experiments presented by Prof. Hanspeter Mallot. In the last talk, we return to the domain of sensorimotor learning where Ralf M{\"o}ller introduces his approach to generate anticipatory behaviour by learning forward models of sensorimotor relationships.

Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems