Applying reinforcement learning to humanoid robots is challenging because humanoids have a large number of degrees of freedom and state and action spaces are continuous. Thus, most reinforcement learning algorithms would become computationally infeasible and require a prohibitive amount of trials to explore such high-dimensional spaces. In this paper, we present a probabilistic reinforcement learning approach, which is derived from the framework of stochastic optimal control and path integrals. The algorithm, called Policy Improvement with Path Integrals (PI2), has a surprisingly simple form, has no open tuning parameters besides the exploration noise, is model-free, and performs numerically robustly in high dimensional learning problems. We demonstrate how PI2 is able to learn full-body motor skills on a 34-DOF humanoid robot. To demonstrate the generality of our approach, we also apply PI2 in the context of variable impedance control, where both planned trajectories and gain schedules for each joint are optimized simultaneously.

Table tennis is a sufficiently complex motor task
for studying complete skill learning systems. It consists of several
elementary motions and requires fast movements, accurate
control, and online adaptation. To represent the elementary
movements needed for robot table tennis, we rely on dynamic
systems motor primitives (DMP). While such DMPs have been
successfully used for learning a variety of simple motor tasks,
they only represent single elementary actions. In order to select
and generalize among different striking movements, we present
a new approach, called Mixture of Motor Primitives that uses
a gating network to activate appropriate motor primitives. The
resulting policy enables us to select among the appropriate
motor primitives as well as to generalize between them. In
order to obtain a fully learned robot table tennis setup, we
also address the problem of predicting the necessary context
information, i.e., the hitting point in time and space where
we want to hit the ball. We show that the resulting setup
was capable of playing rudimentary table tennis using an
anthropomorphic robot arm.

Many successful applications of computer vision to image or video manipulation are interactive by nature. However, parameters of such systems are often trained neglecting the user. Traditionally, interactive systems have been treated in the same manner as their fully automatic counterparts. Their performance is evaluated by computing the accuracy of their solutions under some fixed set of user interactions. This paper proposes a new evaluation and learning method which brings the user in the loop. It is based on the use of an active robot user -- a simulated model of a human user. We show how this approach can be used to evaluate and learn parameters of state-of-the-art interactive segmentation systems. We also show how simulated user models can be integrated into the popular max-margin method for parameter learning and propose an algorithm to solve the resulting optimisation problem.

We present a method for fully automated selection of treatment beam ensembles for external radiation therapy. We reformulate the beam angle selection problem as a clustering problem of locally ideal beam orientations distributed on the unit sphere. For this purpose we construct an infinite mixture of von Mises-Fisher distributions, which is suited in general for density estimation from data on the D-dimensional sphere. Using a nonparametric Dirichlet process prior, our model infers probability distributions over both the number of clusters and their parameter values. We describe an efficient Markov chain Monte Carlo inference algorithm for posterior inference from experimental data in this model. The performance of the suggested beam angle selection framework is illustrated for one intra-cranial, pancreas, and prostate case each. The infinite von Mises-Fisher mixture model (iMFMM) creates between 18 and 32 clusters, depending on the patient anatomy. This suggests to use the iMFMM directly for beam ensemble selection in robotic radio surgery, or to generate low-dimensional input for both subsequent optimization of trajectories for arc therapy and beam ensemble selection for conventional radiation therapy.

Robot learning methods which allow autonomous robots to adapt to novel situations have been a long standing vision of robotics, artificial intelligence, and cognitive sciences. However, to date, learning techniques have yet to fulfill this promise as only few methods manage to scale into the high-dimensional domains of manipulator robotics, or even the new upcoming trend of humanoid robotics. If possible, scaling was usually only achieved in precisely pre-structured domains. In this paper, we investigate the ingredients for a general approach policy learning with the goal of an application to motor skill refinement in order to get one step closer towards human-like performance. For doing so, we study two major components for such an approach, i. e., firstly, we study policy learning algorithms which can be applied in the general setting of motor skill learning, and, secondly, we study a theoretically well-founded general approach to representing the required control structures for task representation and execution.

Building on recent results for submodular minimization with combinatorial constraints, and on online submodular minimization, we address online approximation
algorithms for submodular minimization with combinatorial constraints. We discuss two types of algorithms and outline approximation algorithms that integrate into those.

We consider the problem of local graph clustering
where the aim is to discover the local cluster corresponding
to a point of interest. The most popular algorithms to solve
this problem start a random walk at the point of interest and
let it run until some stopping criterion is met. The vertices
visited are then considered the local cluster. We suggest a more
powerful alternative, the multi-agent random walk. It consists
of several agents connected by a fixed rope of length l. All
agents move independently like a standard random walk on
the graph, but they are constrained to have distance at most l
from each other. The main insight is that for several agents it is
harder to simultaneously travel over the bottleneck of a graph
than for just one agent. Hence, the multi-agent random walk
has less tendency to mistakenly merge two different clusters
than the original random walk. In our paper we analyze
the multi-agent random walk theoretically and compare it
experimentally to the major local graph clustering algorithms
from the literature. We find that our multi-agent random walk
consistently outperforms these algorithms.

The GPML toolbox provides a wide range of functionality for Gaussian process (GP) inference and prediction. GPs are specified by mean and covariance functions; we offer a library of simple mean and covariance functions and mechanisms to compose more complex ones. Several likelihood functions are supported including Gaussian and heavy-tailed for regression as well as others suitable for classification. Finally, a range of inference methods is provided, including exact and variational inference, Expectation Propagation, and Laplace's method dealing with non-Gaussian likelihoods and FITC for dealing with large regression tasks.

Brain-Computer Interfaces based on electrocorticography (ECoG) or electroencephalography (EEG), in combination with robot-assisted active physical therapy, may support traditional rehabilitation procedures for patients with
severe motor impairment due to cerebrovascular brain damage caused by stroke. In this short report, we briefly review the state-of-the art in this exciting new field,
give an overview of the work carried out at the Max Planck Institute for Biological Cybernetics and the University of T{\"u}bingen, and discuss challenges that need to be addressed in order to move from basic research to clinical studies.

Proceedings of the National Academy of Sciences of the United States of America, 107(46):19748-19753, November 2010 (article)

Abstract

Protein biosynthesis, the translation of the genetic code into polypeptides, occurs on ribonucleoprotein particles called ribosomes. Although X-ray structures of bacterial ribosomes are available, high-resolution structures of eukaryotic 80S ribosomes are lacking. Using cryoelectron microscopy and single-particle reconstruction, we have determined the structure of a translating plant (Triticum aestivum) 80S ribosome at 5.5-Å resolution. This map, together with a 6.1-Å map of a Saccharomyces cerevisiae 80S ribosome, has enabled us to model ∼98% of the rRNA. Accurate assignment of the rRNA expansion segments (ES) and variable regions has revealed unique ES–ES and r-protein–ES interactions, providing insight into the structure and evolution of the eukaryotic ribosome.

Policy gradient methods are a type of reinforcement learning techniques that rely upon optimizing parametrized policies with respect to the expected return (long-term cumulative reward) by gradient descent. They do not suffer from many of the problems that have been marring traditional reinforcement learning approaches such as the lack of guarantees of a value function, the intractability problem resulting from uncertain state information and the complexity arising from continuous states & actions.

Proceedings of the National Academy of Sciences of the United States of America, 107(46):19754-19759, November 2010 (article)

Abstract

Protein synthesis in all living organisms occurs on ribonucleoprotein particles, called ribosomes. Despite the universality of this process, eukaryotic ribosomes are significantly larger in size than their bacterial counterparts due in part to the presence of 80 r proteins rather than 54 in bacteria. Using cryoelectron microscopy reconstructions of a translating plant (Triticum aestivum) 80S ribosome at 5.5-Å resolution, together with a 6.1-Å map of a translating Saccharomyces cerevisiae 80S ribosome, we have localized and modeled 74/80 (92.5%) of the ribosomal proteins, encompassing 12 archaeal/eukaryote-specific small subunit proteins as well as the complete complement of the ribosomal proteins of the eukaryotic large subunit. Near-complete atomic models of the 80S ribosome provide insights into the structure, function, and evolution of the eukaryotic translational apparatus.

This letter presents a graph kernel for spatio-spectral remote sensing image classification with support vector machines (SVMs). The method considers higher order relations in the neighborhood (beyond pairwise spatial relations) to iteratively compute a kernel matrix for SVM learning. The proposed kernel is easy to compute and constitutes a powerful alternative to existing approaches. The capabilities of the method are illustrated in several multi- and hyperspectral remote sensing images acquired over both urban and agricultural areas.

Inferring the causal structure that links $n$ observables is usually based upon detecting statistical dependences and choosing simple graphs that make the joint measure Markovian. Here we argue why causal inference is also possible when the sample size is one. We develop a theory how to generate causal graphs explaining similarities between single objects. To this end, we replace the notion of conditional stochastic independence in the causal Markov condition with the vanishing of conditional algorithmic mutual information and describe the corresponding causal inference rules. We explain why a consistent reformulation of causal inference in terms of algorithmic complexity implies a new inference principle that takes into account also the complexity of conditional probability densities, making it possible to select among Markov equivalent causal graphs. This insight provides a theoretical foundation of a heuristic principle proposed in earlier work. We also sketch some ideas on how to replace Kolmogorov complexity with decidable complexity criteria. This can be seen as an algorithmic analog of replacing the empirically undecidable question of statistical independence with practical independence tests that are based on implicit or explicit assumptions on the underlying distribution.

Reinforcement learning for partially observable Markov decision problems (POMDPs) is a challenge as it requires policies with an internal state. Traditional approaches suffer significantly from this shortcoming and usually make strong assumptions on the problem domain such as perfect system models, state-estimators and a Markovian hidden system. Recurrent neural networks (RNNs) offer a natural framework for dealing with policy learning using hidden state and require only few limiting assumptions. As they can be trained well using gradient descent, they are suited for policy gradient approaches.
In this paper, we present a policy gradient method, the Recurrent Policy Gradient which constitutes a model-free reinforcement learning method. It is aimed at training limited-memory stochastic policies on problems which require long-term memories of past observations. The approach involves approximating a policy gradient for a recurrent neural network by backpropagating return-weighted characteristic eligibilities through time. Using a Long Short-Term Memory RNN architecture, we are able to outperform previous RL methods on three important benchmark tasks. Furthermore, we show that using history-dependent baselines helps reducing estimation variance significantly, thus enabling our approach to tackle more challenging, highly stochastic environments.

In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics (SMC 2010), pages: 121-126, IEEE, Piscataway, NJ, USA, IEEE International Conference on Systems, Man and Cybernetics (SMC), October 2010 (inproceedings)

Abstract

Brain-Computer Interfaces (BCIs) in combination with robot-assisted physical therapy may become a valuable tool for neurorehabilitation of patients with
severe hemiparetic syndromes due to cerebrovascular brain damage (stroke) and other neurological conditions. A key aspect of this approach is reestablishing
the disrupted sensorimotor feedback loop, i.e., determining the intended movement using a BCI and helping a human with impaired motor function to move
the arm using a robot. It has not been studied yet, however, how artificially closing the sensorimotor feedback loop affects the BCI decoding performance.
In this article, we investigate this issue in six healthy subjects, and present evidence that haptic feedback facilitates the decoding of arm movement
intention. The results provide evidence of the feasibility of future rehabilitative efforts combining robot-assisted physical therapy with BCIs.

This paper addresses the problem of learning and efficiently representing discriminative probabilistic models of object-specific grasp affordances particularly when the number of labeled grasps is extremely limited. The proposed method does not require an explicit 3D model but rather learns an implicit manifold on which it defines a probability distribution over grasp affordances. We obtain hypothetical grasp configurations from visual descriptors that are associated with the contours of an object. While these hypothetical configurations are abundant, labeled configurations are very scarce as these are acquired via time-costly experiments carried out by the robot. Kernel logistic regression (KLR) via joint kernel maps is trained to map the hypothesis space of grasps into continuous class-conditional probability values indicating their achievability. We propose a soft-supervised extension of KLR and a framework to combine the merits of semi-supervised and active learning approaches to tackle the scarcity of labeled grasps. Experimental evaluation shows that combining active and semi-supervised learning is favorable in the existence of an oracle. Furthermore, semi-supervised learning outperforms supervised learning, particularly when the labeled data is very limited.

The goal of frequent subgraph mining is to detect subgraphs that frequently occur in a dataset of graphs. In classification settings, one is often interested in discovering discriminative frequent subgraphs, whose presence or absence is indicative of the class membership of a graph. In this article, we propose an approach to feature selection on frequent subgraphs, called CORK, that combines two central advantages. First, it optimizes a submodular quality criterion, which means that we can yield a near-optimal solution using greedy feature selection. Second, our submodular quality function criterion can be integrated into gSpan, the state-of-the-art tool for frequent subgraph mining, and help to prune the search space for discriminative frequent subgraphs even during frequent subgraph mining.

Although human beings see and move slower than table tennis or baseball robots, they manage to outperform such robot systems. One important aspect of this better performance is the human movement generation. In this paper, we study trajectory generation for table tennis from a biomimetic point of view. Our focus lies on generating efficient stroke movements capable of mastering variations in the environmental conditions, such as changing ball speed, spin and position. We study table tennis from a human motor control point of view. To make headway towards this goal, we construct a trajectory generator for a single stroke using the discrete movement stages hypothesis and the virtual hitting point hypothesis to create a model that produces a human-like stroke movement. We verify the functionality of the trajectory generator for a single forehand stroke both in a simulation and using a real Barrett WAM.

Grasping an object is a task that inherently needs to be treated in a hybrid fashion. The system must decide both where and how to grasp the object. While selecting where to grasp requires learning about the object as a whole, the execution only needs to reactively adapt to the context close to the grasps location. We propose a hierarchical controller that reflects the structure of these two sub-problems, and attempts to learn solutions that work for both. A hybrid architecture is employed by the controller to make use of various machine learning methods that can cope with the large amount of uncertainty inherent to the task. The controllers upper level selects where to grasp the object using a reinforcement learner, while the lower level comprises an imitation learner and a vision-based reactive controller to determine appropriate grasping motions. The resulting system is able to quickly learn good grasps of a novel object in an unstructured environment, by executing smooth reaching motions and preshapin
g the hand depending on the objects geometry. The system was evaluated both in simulation and on a real robot.

We study the problem of multimodal dimensionality reduction assuming that data samples can be missing at training time,
and not all data modalities may be present at application time. Maximum covariance analysis, as a generalization of PCA, has
many desirable properties, but its application to practical problems is limited by its need for perfectly paired data. We
overcome this limitation by a latent variable approach that allows working with weakly paired data and is still able to
efficiently process large datasets using standard numerical routines. The resulting weakly paired maximum covariance analysis
often finds better representations than alternative methods, as we show in two exemplary tasks: texture discrimination and transfer learning.

Most current algorithms for blind steganalysis of images are based on a two-stages approach: First, features are extracted in order to reduce dimensionality and to highlight potential manipulations; second, a classifier trained on pairs of clean and stego images finds a decision rule for these features to detect stego images. Thereby, vector components might vary significantly in their values, hence normalization of the feature vectors is crucial. Furthermore, most classifiers contain free parameters, and an automatic model selection step has to be carried out for adapting these parameters. However, the commonly used cross-validation destroys some information needed by the classifier because of the arbitrary splitting of image pairs (stego and clean version) in the training set. In this paper, we propose simple modifications of normalization and for standard cross-validation. In our experiments, we show that these methods lead to a significant improvement of the standard blind steganalyzer of Lyu and Farid.

We study nonparametric regression between Riemannian manifolds based on regularized empirical risk minimization. Regularization functionals for mappings between manifolds should respect the geometry of input and output manifold and be independent of the chosen parametrization of the manifolds. We define and analyze the three most simple regularization functionals with these properties and present a rather general scheme for solving the resulting optimization problem. As application examples we discuss interpolation on the sphere, fingerprint processing, and correspondence computations between three-dimensional surfaces. We conclude with characterizing interesting and sometimes counterintuitive implications and new open problems that are specific to learning between Riemannian manifolds and are not encountered in multivariate regression in Euclidean space.

Remote sensing image segmentation requires multi-category classification typically with limited number of labeled training samples. While semi-supervised learning (SSL) has emerged as a sub-field of machine learning to tackle the scarcity of labeled samples, most SSL algorithms to date have had trade-offs in terms of scalability and/or applicability to multi-categorical data. In this paper, we evaluate semi-supervised logistic regression (SLR), a recent information theoretic semi-supervised algorithm, for remote sensing image classification problems. SLR is a probabilistic discriminative classifier and a specific instance of the generalized maximum entropy framework with a convex loss function. Moreover, the method is inherently multi-class and easy to implement. These characteristics make SLR a strong alternative to the widely used semi-supervised variants of SVM for the segmentation of remote sensing images. We demonstrate the competitiveness of SLR in multispectral, hyperspectral and radar image classifica
tion.

Our winning approach to the 2010 MLSP Competition is based on a generative method for P300-based BCI decoding, successfully applied to visual spellers. Here, generative has a double meaning. On the one hand, we work with a probability density model of the data given the target/non target labeling, as opposed to discriminative (e.g. SVM-based) methods. On the other hand, the natural consequence of this approach is a decoding based on comparing the observation to templates generated from the data.

We formulate the multiframe blind deconvolution problem in an incremental
expectation maximization (EM) framework. Beyond deconvolution,
we show how to use the same framework to address: (i)
super-resolution despite noise and unknown blurring; (ii) saturationcorrection
of overexposed pixels that confound image restoration.
The abundance of data allows us to address both of these without
using explicit image or blur priors. The end result is a simple but effective
algorithm with no hyperparameters. We apply this algorithm
to real-world images from astronomy and to super resolution tasks:
for both, our algorithm yields increased resolution and deconvolved
images simultaneously.

Density modeling is notoriously difficult for high dimensional data. One approach to the problem is to search for a lower dimensional manifold which captures the main characteristics of the data. Recently, the Gaussian Process Latent Variable Model (GPLVM) has successfully been used to find low dimensional manifolds in a variety of complex data. The GPLVM consists of a set of points in a low dimensional latent space, and a stochastic map to the observed space. We show how it can be interpreted as a density model in the observed space. However, the GPLVM is not trained as a density model and therefore yields bad density estimates. We propose a new training strategy and obtain improved generalisation performance and better density estimates in comparative evaluations on several benchmark data sets.

In Proceedings of the First International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures (ADMS 2010), pages: 1-6, First International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures (ADMS), September 2010 (inproceedings)

Abstract

Nearest neighbor search is a core computational task in
database systems and throughout data analysis. It is also
a major computational bottleneck, and hence an enormous
body of research has been devoted to data structures and
algorithms for accelerating the task. Recent advances in
graphics hardware provide tantalizing speedups on a variety
of tasks and suggest an alternate approach to the problem:
simply run brute force search on a massively parallel sys-
tem. In this paper we marry the approaches with a novel
data structure that can effectively make use of parallel systems such as graphics cards. The architectural complexities of graphics hardware - the high degree of parallelism, the small amount of memory relative to instruction throughput, and the single instruction, multiple data design- present significant
challenges for data structure design. Furthermore,
the brute force approach applies perfectly to graphics hardware, leading one to question whether an intelligent algorithm or data structure can even hope to outperform this
basic approach. Despite these challenges and misgivings,
we demonstrate that our data structure - termed a Random
Ball Cover - provides significant speedups over the GPU-
based brute force approach.

eiCayton, L.A Nearest Neighbor Data Structure for Graphics Hardware
In Proceedings of the First International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures (ADMS 2010), pages: 1-6, First International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures (ADMS), September 2010 (inproceedings)

Brain-Computer Interfaces (BCI) that rely upon epidural electrocorticographic signals may become a promising tool for neurorehabilitation of patients with severe hemiparatic syndromes due to cerebrovascular, traumatic or tumor-related brain damage. Here, we show in a patient-based feasibility study that online classification of arm movement intention is possible. The intention to move or to rest can be identified with high accuracy (~90 %), which is sufficient for BCI-guided neurorehabilitation. The observed spatial distribution of relevant features on the motor cortex indicates that cortical reorganization has been induced by the brain lesion. Low- and high-frequency components of the electrocorticographic power spectrum provide complementary information towards classification of arm movement intention.

This paper describes the software package libDAI, a free & open source C++ library that provides implementations of various exact and approximate inference methods for graphical models with discrete-valued variables. libDAI supports directed graphical models (Bayesian networks) as well as undirected ones (Markov random fields and factor graphs). It offers various approximations of the partition sum, marginal probability distributions and maximum probability states. Parameter learning is also supported. A feature comparison with other open source software packages for approximate inference is given. libDAI is licensed under the GPL v2+ license and is available at http://www.libdai.org.

Convolutive blind source separation (BSS) usually encounters two difficulties—the filter indeterminacy in the recovered sources and the relatively high computational load. In this paper we propose an efficient method to convolutive BSS, by dealing with these two issues. It consists of two stages, namely, multichannel blind deconvolution (MBD) and learning the post-filters with the minimum filter distortion (MFD) principle. We present a computationally efficient approach to MBD in the first stage: a vector autoregression (VAR) model is first fitted to the data, admitting a closed-form solution and giving temporally independent errors; traditional independent component analysis (ICA) is then applied to these errors to produce the MBD results. In the second stage, the least linear reconstruction error (LLRE) constraint of the separation system, which was previously used to regularize the solutions to nonlinear ICA, enforces a MFD principle of the estimated mixing system for convolutive BSS. One can then easily learn the post-filters to preserve the temporal structure of the sources. We show that with this principle, each recovered source is approximately the principal component of the contributions of this source to all observations. Experimental results on both synthetic data and real room recordings show the good performance of this method.

Playing table tennis is a difficult motor task which requires
fast movements, accurate control and adaptation to task parameters.
Although human beings see and move slower than most robot systems
they outperform all table tennis robots significantly. In this paper we
study human table tennis and present a robot system that mimics human
striking behavior. Therefore we model the human movements involved
in hitting a table tennis ball using discrete movement stages and the
virtual hitting point hypothesis. The resulting model is implemented on
an anthropomorphic robot arm with 7 degrees of freedom using robotics
methods. We verify the functionality of the model both in a physical realistic
simulation of an anthropomorphic robot arm and on a real Barrett
WAM.

Grasping is one of the most important abilities needed for future service robots. In the task of picking up an object from between clutter, traditional robotics approaches would determine a suitable grasping point and then use a movement planner to reach the goal. The planner would require precise and accurate information about the environment and long computation times, both of which are often not available. Therefore, methods are needed that execute grasps robustly even with imprecise information gathered only from standard stereo vision. We propose techniques that reactively modify the robot&amp;lsquo;s learned motor primitives based on non-parametric potential fields centered on the Early Cognitive Vision descriptors. These allow both obstacle avoidance, and the adapting of finger motions to the object&amp;lsquo;s local geometry. The methods were tested on a real robot, where they led to improved adaptability and quality of grasping actions.

Journal of NeuroEngineering and Rehabilitation, 7(34):1-4, July 2010 (article)

Abstract

Even though feedback is considered to play an important role in learning how to operate a brain-computer interface (BCI), to date no significant influence of feedback design on BCI-performance has been reported in literature. In this work, we adapt a standard motor-imagery BCI-paradigm to study how BCI-performance is affected by biasing the belief subjects have on their level of control over the BCI system. Our findings indicate that subjects already capable of operating a BCI are impeded by inaccurate feedback, while subjects normally performing on or close to chance level may actually benefit from an incorrect belief on their performance level. Our results imply that optimal feedback design in BCIs should take into account a subject&lsquo;s current skill level.

Information diffusion and virus propagation are fundamental processes talking place in networks. While it is often possible to directly observe when nodes become infected, observing individual transmissions (i.e., who infects whom or who influences whom) is typically very difficult. Furthermore, in many applications, the underlying network over which the diffusions and propagations spread is actually unobserved. We tackle these challenges by developing a method for tracing paths of diffusion and influence through networks and inferring the networks over which contagions propagate. Given the times when nodes adopt pieces of information or become infected, we identify the optimal network that best explains the observed infection times. Since the optimization problem is NP-hard to solve exactly, we develop an efficient approximation algorithm that scales to large datasets and in practice gives provably near-optimal performance. We demonstrate the effectiveness of our approach by tracing information cascades in a set of 170 million blogs and news articles over a one year period to infer how information flows through the online media space. We find that the diffusion network of news tends to have a core-periphery structure with a small set of core media sites that diffuse information to the rest of the Web. These sites tend to have stable circles of influence with more general news media sites acting as connectors between them.

Policy search is a successful approach to reinforcement
learning. However, policy improvements often result
in the loss of information. Hence, it has been marred
by premature convergence and implausible solutions.
As first suggested in the context of covariant policy
gradients (Bagnell and Schneider 2003), many of these
problems may be addressed by constraining the information
loss. In this paper, we continue this path of reasoning
and suggest the Relative Entropy Policy Search
(REPS) method. The resulting method differs significantly
from previous policy gradient approaches and
yields an exact update step. It works well on typical
reinforcement learning benchmark problems.

Forms of justification for inductive machine learning techniques are discussed and classified into four types. This is done with a view to introduce some of these techniques and their justificatory guarantees to the attention of philosophers, and to initiate a discussion as to whether they must be treated separately or rather can be viewed consistently from within a single framework.

Journal of Computer Science and Technology, 25(4):653-664, July 2010 (article)

Abstract

In the Bayesian mixture modeling framework it is possible to infer the necessary number of components to model the data and therefore it is unnecessary to explicitly restrict the number of components. Nonparametric mixture models sidestep the problem of finding the “correct” number of mixture components by assuming infinitely many components. In this paper Dirichlet process mixture (DPM) models are cast as infinite mixture models and inference using Markov chain Monte Carlo is described. The specification of the priors on the model parameters is often guided by mathematical and practical convenience. The primary goal of this paper is to compare the choice of conjugate and non-conjugate base distributions on a particular class of DPM models which is widely used in applications, the Dirichlet process Gaussian mixture model (DPGMM). We compare computational efficiency and modeling performance of DPGMM defined using a conjugate and a conditionally conjugate base distribution. We show that better density models can result from using a wider class of priors with no or only a modest increase in computational effort.

We consider two variables that are related to each other by an invertible function. While it has previously been shown that the dependence structure of the noise can provide hints
to determine which of the two variables is the cause, we presently show that even in the deterministic (noise-free) case, there are asymmetries that can be exploited for causal inference. Our method is based on the idea that if the function and the probability density of the cause are chosen independently, then the distribution of the effect will, in a certain sense, depend on the function. We
provide a theoretical analysis of this method, showing that it also works in the low noise regime, and link it to information geometry. We report strong empirical results on various real-world data sets from different domains.

This paper addresses the recent trends in machine learning methods for the automatic classification of remote sensing (RS) images. In particular, we focus on two new paradigms: semisupervised and active learning. These two paradigms allow one to address classification problems in the critical conditions where the available labeled training samples are limited. These operational conditions are very usual in RS problems, due to the high cost and time associated with the collection of labeled samples. Semisupervised and active learning techniques allow one to enrich the initial training set information and to improve classification accuracy by exploiting unlabeled samples or requiring additional labeling phases from the user, respectively. The two aforementioned strategies are theoretically and experimentally analyzed considering SVM-based techniques in order to highlight advantages and disadvantages of both strategies.

Monthly Notices of the Royal Astronomical Society, 405(3):2044-2061, July 2010 (article)

Abstract

We present the results of the GREAT08 Challenge, a blind analysis challenge to infer weak gravitational lensing shear distortions from images. The primary goal was to
stimulate new ideas by presenting the problem to researchers outside the shear measurement community. Six GREAT08 Team methods were presented at the launch of
the Challenge and five additional groups submitted results during the 6 month competition. Participants analyzed 30 million simulated galaxies with a range in signal to
noise ratio, point-spread function ellipticity, galaxy size, and galaxy type. The large quantity of simulations allowed shear measurement methods to be assessed at a level
of accuracy suitable for currently planned future cosmic shear observations for the first time. Different methods perform well in different parts of simulation parameter space and come close to the target level of accuracy in several of these. A number of fresh ideas have emerged as a result of the Challenge including a re-examination of the process of combining information from different galaxies, which reduces the dependence on realistic galaxy modelling. The image simulations will become increasingly sophis-
ticated in future GREAT challenges, meanwhile the GREAT08 simulations remain as a benchmark for additional developments in shear measurement algorithms.

Separation of the sources and analysis of their connectivity have been an important topic in EEG/MEG analysis. To solve this problem in an automatic manner, we propose a twolayer model, in which the sources are conditionally uncorrelated from each other, but not independent; the dependence is caused by the causality in their time-varying variances (envelopes). The model is identified in two steps. We first propose a new source
separation technique which takes into account the autocorrelations (which may be time-varying) and time-varying variances of the sources. The causality in the envelopes is then discovered by exploiting a special
kind of multivariate GARCH (generalized autoregressive
conditional heteroscedasticity) model. The resulting causal diagram gives the effective connectivity between the separated sources; in our experimental results on MEG data, sources with similar functions are grouped together, with negative influences between groups, and the groups are
connected via some interesting sources.

In nonlinear latent variable models or dynamic models, if we consider the latent variables as confounders (common causes), the noise dependencies imply further relations
between the observed variables. Such models are then closely related to causal discovery in the presence of nonlinear confounders, which is a challenging problem. However, generally in such models the observation noise is assumed to be independent across data dimensions, and consequently the noise dependencies are ignored. In this paper we focus on the Gaussian process latent variable model
(GPLVM), from which we develop an extended model called invariant GPLVM (IGPLVM), which can adapt to arbitrary noise
covariances. With the Gaussian process prior put on a particular transformation of the latent nonlinear functions, instead of the original ones, the algorithm for IGPLVM involves almost the same computational loads as that
for the original GPLVM. Besides its potential application in causal discovery, IGPLVM has the advantage that its estimated latent nonlinear manifold is invariant to any nonsingular linear transformation of the data. Experimental
results on both synthetic and realworld data show its encouraging performance in nonlinear manifold learning and causal discovery.

Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems