Policy Learning approaches are among the best suited methods for high-dimensional, continuous control systems such
as anthropomorphic robot arms and humanoid robots. In this paper, we show two contributions: firstly, we show a
unified perspective which allows us to derive several policy learning al- gorithms from a common point of view, i.e, policy
gradient algorithms, natural- gradient algorithms and EM-like policy learning. Secondly, we present several applications
to both robot motor primitive learning as well as to robot control in task space. Results both from simulation and several
different real robots are shown.

Reinforcement learning is a natural choice for the learning of complex motor tasks by reward-related self-improvement. As the space of movements is high-dimensional and
continuous, a policy parametrization is needed which can be used in this context. Traditional motor primitive approaches deal largely with open-loop policies which can
only deal with small perturbations. In this paper, we present a new type of motor primitive policies which serve as closed-loop policies together with an appropriate learning
algorithm. Our new motor primitives are an augmented version version of the dynamic systems motor primitives that incorporates perceptual coupling to external
variables. We show that these motor primitives can perform complex tasks such a Ball-in-a-Cup or Kendama task even with large variances in the initial conditions where a
human would hardly be able to learn this task. We initialize the open-loop policies by imitation learning and the perceptual coupling with a handcrafted solution. We first
improve the open-loop policies and subsequently the perceptual coupling using a novel reinforcement learning method which is particularly well-suited for motor primitives.

Humans perceives the world by directing the center of gaze from one location to another via rapid eye movements, called saccades. In the period between saccades the direction of gaze is held fixed for a few hundred milliseconds (fixations). It is primarily during fixations that information enters the visual system. Remarkably, however, after only a few fixations we perceive a coherent, high-resolution scene despite the visual acuity of the eye quickly decreasing away from the center of gaze: This suggests an effective strategy for selecting saccade targets.
Top-down effects, such as the observer's task, thoughts, or intentions have an effect on saccadic selection. Equally well known is that bottom-up effects-local image structure-influence saccade targeting regardless of top-down effects. However, the question of what the most salient visual features are is still under debate. Here we model the relationship between spatial intensity patterns in natural images and the response of the saccadic system using tools from machine learning. This allows us to identify the most salient image patterns that guide the bottom-up component of the saccadic selection system, which we refer to as perceptive fields. We show that center-surround patterns emerge as the optimal solution to the problem of predicting saccade targets. Using a novel nonlinear system identification technique we reduce our learned classifier to a one-layer feed-forward network which is surprisingly simple compared to previously suggested models assuming more complex computations such as multi-scale processing, oriented filters and lateral inhibition. Nevertheless, our model is equally predictive and generalizes better to novel image sets. Furthermore, our findings are consistent with neurophysiological hardware in the superior colliculus. Bottom-up visual saliency may thus not be computed cortically as has been thought previously.

AREADNE 2008: Research in Encoding and Decoding of Neural Ensembles, 2, pages: 67, June 2008 (poster)

Abstract

Pattern recognition methods have shown that fMRI data can reveal significant information
about brain activity. For example, in the debate of how object-categories are represented in
the brain, multivariate analysis has been used to provide evidence of distributed encoding
schemes. Many follow-up studies have employed different methods to analyze human fMRI
data with varying degrees of success. In this study we compare four popular pattern recognition
methods: correlation analysis, support-vector machines (SVM), linear discriminant analysis
and Gaussian naïve Bayes (GNB), using data collected at high field (7T) with higher resolution
than usual fMRI studies. We investigate prediction performance on single trials and for averages
across varying numbers of stimulus presentations. The performance of the various algorithms
depends on the nature of the brain activity being categorized: for several tasks,
many of the methods work well, whereas for others, no methods perform above chance level.
An important factor in overall classification performance is careful preprocessing of the data,
including dimensionality reduction, voxel selection, and outlier elimination.

Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems