As most action generation problems of autonomous robots can be phrased in terms of sequential decision problems, robotics offers a tremendously important and interesting application platform for reinforcement learning. Similarly, the real-world challenges of this domain pose a major real-world check for reinforcement learning. Hence, the interplay between both disciplines can be seen as promising as the one between physics and mathematics. Nevertheless, only a fraction of the scientists working on reinforcement learning are sufficiently tied to robotics to oversee most problems encountered in this context. Thus, we will bring the most important challenges faced by robot reinforcement learning to their attention. To achieve this goal, we will attempt to survey most work that has successfully applied reinforcement learning to behavior generation for real robots. We discuss how the presented successful approaches have been made tractable despite the complexity of the domain and will study how representations or the inclusion of prior knowledge can make a significant difference. As a result, a particular focus of our chapter lies on the choice between model-based and model-free as well as between value function-based and policy search methods. As a result, we obtain a fairly complete survey of robot reinforcement learning which should allow a general reinforcement learning researcher to understand this domain.

Humans manage to adapt learned movements very quickly to new situations by generalizing learned behaviors from similar situations. In contrast, robots currently often need to re-learn the complete movement. In this paper, we propose a method that learns to generalize parametrized motor plans by adapting a small set of global parameters, called meta-parameters. We employ reinforcement learning to learn the required meta-parameters to deal with the current situation, described by states. We introduce an appropriate reinforcement learning algorithm based on a kernelized version of the reward-weighted regression. To show its feasibility, we evaluate this algorithm on a toy example and compare it to several previous approaches. Subsequently, we apply the approach to three robot tasks, i.e., the generalization of throwing movements in darts, of hitting movements in table tennis, and of throwing balls where the tasks are learned on several different real physical robots, i.e., a Barrett WAM, a BioRob, the JST-ICORP/SARCOS CBi and a Kuka KR 6.

2011

Many motor skills in humanoid robotics can be learned using parametrized motor primitives. While successful applications to date have been achieved with imitation learning, most of the interesting motor learning problems are high-dimensional reinforcement learning problems. These problems are often beyond the reach of current reinforcement learning methods. In this paper, we study parametrized policy search methods and apply these to benchmark problems of motor primitive learning in robotics. We show that many well-known parametrized policy search methods can be derived from a general, common framework. This framework yields both policy gradient methods and expectation-maximization (EM) inspired algorithms. We introduce a novel EM-inspired algorithm for policy learning that is particularly well-suited for dynamical system motor primitives. We compare this algorithm, both in simulation and on a real robot, to several well-known parametrized policy search methods such as episodic REINFORCE, ‘Vanilla’ Policy Gradients with optimal baselines, episodic Natural Actor Critic, and episodic Reward-Weighted Regression. We show that the proposed method out-performs them on an empirical benchmark of learning dynamical system motor primitives both in simulation and on a real robot. We apply it in the context of motor learning and show that it can learn a complex Ball-in-a-Cup task on a real Barrett WAM™ robot arm.

Many complex robot motor skills can be represented using elementary movements, and there exist efficient techniques for learning parametrized motor plans using demonstrations and self-improvement. However with current techniques, in many cases, the robot currently needs to learn a new elementary movement even if a parametrized motor plan exists that covers a related situation. A method is needed that modulates the elementary movement through the meta-parameters of its representation. In this paper, we describe how to learn such mappings from circumstances to meta-parameters using reinforcement learning. In particular we use a kernelized version of the reward-weighted regression. We show two robot applications of the presented setup in robotic domains; the generalization of throwing movements in darts, and of hitting movements in table tennis. We demonstrate that both tasks can be learned successfully using simulated and real robots.

Learning robots that can acquire new motor skills and refine existing one has been a long standing vision of robotics, artificial intelligence, and the cognitive sciences. Early steps towards this goal in the 1980s made clear that reasoning and human insights will not suffice. Instead, new hope has been offered by the rise of modern machine learning approaches. However, to date, it becomes increasingly clear that off-the-shelf machine learning approaches will not suffice for motor skill learning as these methods often do not scale into the high-dimensional domains of manipulator and humanoid robotics nor do they fulfill the real-time requirement of our domain. As an alternative, we propose to break the generic skill learning problem into parts that we can understand well from a robotics point of view. After designing appropriate learning approaches for these basic components, these will serve as the ingredients of a general approach to motor skill learning. In this paper, we discuss our recent and current progress in this direction. For doing so, we present our work on learning to control, on learning elementary movements as well as our steps towards learning of complex tasks. We show several evaluations both using real robots as well as physically realistic simulations.

Many complex robot motor skills can be represented using elementary movements, and there exist efficient
techniques for learning parametrized motor plans using demonstrations and self-improvement. However, in
many cases, the robot currently needs to learn a new elementary movement even if a parametrized motor
plan exists that covers a similar, related situation. Clearly, a method is needed that modulates the elementary
movement through the meta-parameters of its representation. In this paper, we show how to learn such
mappings from circumstances to meta-parameters using reinforcement learning.We introduce an appropriate
reinforcement learning algorithm based on a kernelized version of the reward-weighted regression. We
compare this algorithm to several previous methods on a toy example and show that it performs well in
comparison to standard algorithms. Subsequently, we show two robot applications of the presented setup;
i.e., the generalization of throwing movements in darts, and of hitting movements in table tennis. We show
that both tasks can be learned successfully using simulated and real robots.

We consider the problem of imitation learning where the examples, demonstrated by an expert, cover only a small part of a large state space. Inverse Reinforcement Learning (IRL) provides an efficient tool for generalizing the demonstration, based on the assumption that the expert is optimally acting in a Markov Decision Process (MDP). Most of the past work on IRL requires that a (near)-optimal policy can be computed for different reward functions. However, this requirement can hardly be satisfied in systems with a large, or continuous, state space. In this paper, we propose a model-free IRL algorithm, where the relative entropy between the empirical distribution of the state-action trajectories under a uniform policy and their distribution under the learned policy is minimized by stochastic gradient descent. We compare this new approach to well-known IRL algorithms using approximate MDP models. Empirical results on simulated car racing, gridworld and ball-in-a-cup problems show that our approach is able to learn good policies from a small number of demonstrations.

Playing table tennis is a difficult motor task that requires fast movements, accurate control and adaptation
to task parameters. Although human beings see and move slower than most robot systems, they significantly
outperform all table tennis robots. One important reason for this higher performance is the human movement
generation. In this paper, we study human movements during table tennis and present a robot system that mimics
human striking behavior. Our focus lies on generating hitting motions capable of adapting to variations in environmental conditions, such as changes in ball speed and position. Therefore, we model the human movements
involved in hitting a table tennis ball using discrete movement stages and the virtual hitting point hypothesis.
The resulting model was evaluated both in a physically realistic simulation and on a real anthropomorphic seven
degrees of freedom Barrett WAM™ robot arm.

Many motor skills consist of many lower level elementary movements that need to be sequenced in order to achieve a task. In order to learn such a task, both the primitive movements as well as the higher-level strategy need to be acquired at the same time. In contrast, most learning approaches focus either on learning to combine a fixed set of options or to learn just single options. In this paper, we discuss a new approach that allows improving the performance of lower level actions while pursuing a higher level task. The presented approach is applicable to learning a wider range motor skills, but in this paper, we employ it for learning games where the player wants to improve his performance at the individual actions of the game while still performing well at the strategy level game. We propose to learn the lower level actions using Cost-regularized Kernel Regression and the higher level actions using a form of Policy Iteration. The two approaches are coupled by their transition probabilities. We evaluate the approach on a side-stall-style throwing game both in simulation and with a real BioRob.

Table tennis is a sufficiently complex motor task
for studying complete skill learning systems. It consists of several
elementary motions and requires fast movements, accurate
control, and online adaptation. To represent the elementary
movements needed for robot table tennis, we rely on dynamic
systems motor primitives (DMP). While such DMPs have been
successfully used for learning a variety of simple motor tasks,
they only represent single elementary actions. In order to select
and generalize among different striking movements, we present
a new approach, called Mixture of Motor Primitives that uses
a gating network to activate appropriate motor primitives. The
resulting policy enables us to select among the appropriate
motor primitives as well as to generalize between them. In
order to obtain a fully learned robot table tennis setup, we
also address the problem of predicting the necessary context
information, i.e., the hitting point in time and space where
we want to hit the ball. We show that the resulting setup
was capable of playing rudimentary table tennis using an
anthropomorphic robot arm.

Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems