2015

Abstract Anticipation can enhance the capability of a robot in its interaction with humans, where the robot predicts the humans' intention for selecting its own action. We present a novel framework of anticipatory action selection for human-robot interaction, which is capable to handle nonlinear and stochastic human behaviors such as table tennis strokes and allows the robot to choose the optimal action based on prediction of the human partner's intention with uncertainty. The presented framework is generic and can be used in many human-robot interaction scenarios, for example, in navigation and human-robot co-manipulation. In this article, we conduct a case study on human-robot table tennis. Due to the limited amount of time for executing hitting movements, a robot usually needs to initiate its hitting movement before the opponent hits the ball, which requires the robot to be anticipatory based on visual observation of the opponent's movement. Previous work on Intention-Driven Dynamics Models (IDDM) allowed the robot to predict the intended target of the opponent. In this article, we address the problem of action selection and optimal timing for initiating a chosen action by formulating the anticipatory action selection as a Partially Observable Markov Decision Process (POMDP), where the transition and observation are modeled by the \{IDDM\} framework. We present two approaches to anticipatory action selection based on the \{POMDP\} formulation, i.e., a model-free policy learning method based on Least-Squares Policy Iteration (LSPI) that employs the \{IDDM\} for belief updates, and a model-based Monte-Carlo Planning (MCP) method, which benefits from the transition and observation model by the IDDM. Experimental results using real data in a simulated environment show the importance of anticipatory action selection, and that \{POMDPs\} are suitable to formulate the anticipatory action selection problem by taking into account the uncertainties in prediction. We also show that existing algorithms for POMDPs, such as \{LSPI\} and MCP, can be applied to substantially improve the robot's performance in its interaction with humans.

Playing table tennis is a difficult task for robots, especially due to their limitations of acceleration. A key bottleneck is the amount of time needed to reach the desired hitting position and velocity of the racket for returning the incoming ball. Here, it often does not suffice to simply extrapolate the ball's trajectory after the opponent returns it but more information is needed. Humans are able to predict the ball's trajectory based on the opponent's moves and, thus, have a considerable advantage. Hence, we propose to incorporate an anticipation system into robot table tennis players, which enables the robot to react earlier while the opponent is performing the striking movement. Based on visual observation of the opponent's racket movement, the robot can predict the aim of the opponent and adjust its movement generation accordingly. The policies for deciding how and when to react are obtained by reinforcement learning. We conduct experiments with an existing robot player to show that the learned reaction policy can significantly improve the performance of the overall system.

Policy search is a successful approach to reinforcement learning. However, policy improvements often result in the loss of information. Hence, it has been marred by premature convergence and implausible solutions. As first suggested in the context of covariant or natural policy gradients, many of these problems may be addressed by constraining the information loss. In this paper, we continue this path of reasoning and suggest two reinforcement learning methods, i.e., a model‐based and a model free algorithm that bound the loss in relative entropy while maximizing their return. The resulting methods differ significantly from previous policy gradient approaches and yields an exact update step. It works well on typical reinforcement learning benchmark problems as well as novel evaluations in robotics. We also show a Bayesian bound motivation of this new approach [8].

Learning robots that can acquire new motor skills and refine existing one has been a long standing vision of robotics, artificial intelligence, and the cognitive sciences. Early steps towards this goal in the 1980s made clear that reasoning and human insights will not suffice. Instead, new hope has been offered by the rise of modern machine learning approaches. However, to date, it becomes increasingly clear that off-the-shelf machine learning approaches will not suffice for motor skill learning as these methods often do not scale into the high-dimensional domains of manipulator and humanoid robotics nor do they fulfill the real-time requirement of our domain. As an alternative, we propose to break the generic skill learning problem into parts that we can understand well from a robotics point of view. After designing appropriate learning approaches for these basic components, these will serve as the ingredients of a general approach to motor skill learning. In this paper, we discuss our recent and current progress in this direction. For doing so, we present our work on learning to control, on learning elementary movements as well as our steps towards learning of complex tasks. We show several evaluations both using real robots as well as physically realistic simulations.

Opponent modeling is a critical mechanism in repeated games. It allows a player to adapt its strategy in order to better respond to the presumed preferences of his opponents. We introduce a new modeling technique that adaptively balances exploitability and risk reduction. An opponent’s strategy is modeled with a set of possible strategies that contain the actual strategy with a high probability. The algorithm is safe as the expected payoff is above the minimax payoff with a high probability, and can exploit the opponents’ preferences when sufficient observations have been obtained. We apply them to normal-form games and stochastic games with a finite number of stages. The performance of the proposed approach is first demonstrated on repeated rock-paper-scissors games. Subsequently, the approach is evaluated in a human-robot table-tennis setting where the robot player learns to prepare to return a served ball. By modeling the human players, the robot chooses a forehand, backhand or middle preparation pose before they serve. The learned strategies can exploit the opponent’s preferences, leading to a higher rate of successful returns.

Playing table tennis is a difficult motor task that requires fast movements, accurate control and adaptation
to task parameters. Although human beings see and move slower than most robot systems, they significantly
outperform all table tennis robots. One important reason for this higher performance is the human movement
generation. In this paper, we study human movements during table tennis and present a robot system that mimics
human striking behavior. Our focus lies on generating hitting motions capable of adapting to variations in environmental conditions, such as changes in ball speed and position. Therefore, we model the human movements
involved in hitting a table tennis ball using discrete movement stages and the virtual hitting point hypothesis.
The resulting model was evaluated both in a physically realistic simulation and on a real anthropomorphic seven
degrees of freedom Barrett WAM™ robot arm.

Table tennis is a sufficiently complex motor task
for studying complete skill learning systems. It consists of several
elementary motions and requires fast movements, accurate
control, and online adaptation. To represent the elementary
movements needed for robot table tennis, we rely on dynamic
systems motor primitives (DMP). While such DMPs have been
successfully used for learning a variety of simple motor tasks,
they only represent single elementary actions. In order to select
and generalize among different striking movements, we present
a new approach, called Mixture of Motor Primitives that uses
a gating network to activate appropriate motor primitives. The
resulting policy enables us to select among the appropriate
motor primitives as well as to generalize between them. In
order to obtain a fully learned robot table tennis setup, we
also address the problem of predicting the necessary context
information, i.e., the hitting point in time and space where
we want to hit the ball. We show that the resulting setup
was capable of playing rudimentary table tennis using an
anthropomorphic robot arm.

Policy search is a successful approach to reinforcement
learning. However, policy improvements often result
in the loss of information. Hence, it has been marred
by premature convergence and implausible solutions.
As first suggested in the context of covariant policy
gradients (Bagnell and Schneider 2003), many of these
problems may be addressed by constraining the information
loss. In this paper, we continue this path of reasoning
and suggest the Relative Entropy Policy Search
(REPS) method. The resulting method differs significantly
from previous policy gradient approaches and
yields an exact update step. It works well on typical
reinforcement learning benchmark problems.

Policy search is a successful approach to reinforcement learning. However, policy
improvements often result in the loss of information. Hence, it has been marred by
premature convergence and implausible solutions. As first suggested in the context of
covariant policy gradients, many of these problems may be addressed by constraining
the information loss. In this book chapter, we continue this path of reasoning and suggest
the Relative Entropy Policy Search (REPS) method. The resulting method differs
significantly from previous policy gradient approaches and yields an exact update step.
It works well on typical reinforcement learning benchmark problems. We will also
present a real-world applications where a robot employs REPS to learn how to return balls in a game of table tennis.

Playing table tennis is a difficult motor task which requires
fast movements, accurate control and adaptation to task parameters.
Although human beings see and move slower than most robot systems
they outperform all table tennis robots significantly. In this paper we
study human table tennis and present a robot system that mimics human
striking behavior. Therefore we model the human movements involved
in hitting a table tennis ball using discrete movement stages and the
virtual hitting point hypothesis. The resulting model is implemented on
an anthropomorphic robot arm with 7 degrees of freedom using robotics
methods. We verify the functionality of the model both in a physical realistic
simulation of an anthropomorphic robot arm and on a real Barrett
WAM.

Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems