Current robotics research is largely driven by the vision of creating an intelligent being that can perform dangerous, difficult or unpopular tasks. These can for example be exploring the surface of planet mars or the bottom of the ocean, maintaining a furnace or assembling a car. They can also be more mundane such as cleaning an apartment or fetching groceries. This vision has been pursued since the 1960s when the first robots were built. Some of the tasks mentioned above, especially those in industrial manufacturing, are already frequently performed by robots. Others are still completely out of reach. Especially, household robots are far away from being deployable as general purpose devices. Although advancements have been made in this research area, robots are not yet able to perform household chores robustly in unstructured and open-ended environments given unexpected events and uncertainty in perception and execution.In this thesis, we are analyzing which perceptual and motor capabilities are necessary for the robot to perform common tasks in a household scenario. In that context, an essential capability is to understand the scene that the robot has to interact with. This involves separating objects from the background but also from each other.Once this is achieved, many other tasks become much easier. Configuration of object scan be determined; they can be identified or categorized; their pose can be estimated; free and occupied space in the environment can be outlined.This kind of scene model can then inform grasp planning algorithms to finally pick up objects.However, scene understanding is not a trivial problem and even state-of-the-art methods may fail. Given an incomplete, noisy and potentially erroneously segmented scene model, the questions remain how suitable grasps can be planned and how they can be executed robustly.In this thesis, we propose to equip the robot with a set of prediction mechanisms that allow it to hypothesize about parts of the scene it has not yet observed. Additionally, the robot can also quantify how uncertain it is about this prediction allowing it to plan actions for exploring the scene at specifically uncertain places. We consider multiple modalities including monocular and stereo vision, haptic sensing and information obtained through a human-robot dialog system. We also study several scene representations of different complexity and their applicability to a grasping scenario. Given an improved scene model from this multi-modal exploration, grasps can be inferred for each object hypothesis. Dependent on whether the objects are known, familiar or unknown, different methodologies for grasp inference apply. In this thesis, we propose novel methods for each of these cases. Furthermore,we demonstrate the execution of these grasp both in a closed and open-loop manner showing the effectiveness of the proposed methods in real-world scenarios.

We consider the problem of grasp and manipulation planning when the state of the world is only partially observable. Specifically, we address the task of picking up unknown objects from a table top. The proposed approach to object shape prediction aims at closing the knowledge gaps in the robot's understanding of the world. A completed state estimate of the environment can then be provided to a simulator in which stable grasps and collision-free movements are planned. The proposed approach is based on the observation that many objects commonly in use in a service robotic scenario possess symmetries. We search for the optimal parameters of these symmetries given visibility constraints. Once found, the point cloud is completed and a surface mesh reconstructed. Quantitative experiments show that the predictions are valid approximations of the real object shape. By demonstrating the approach on two very different robotic platforms its generality is emphasized.

We propose a novel human-robot-interaction framework for robust visual scene understanding. Without any a-priori knowledge about the objects, the task of the robot is to correctly enumerate how many of them are in the scene and segment them from the background. Our approach builds on top of state-of-the-art computer vision methods, generating object hypotheses through segmentation. This process is combined with a natural dialog system, thus including a `human in the loop' where, by exploiting the natural conversation of an advanced dialog system, the robot gains knowledge about ambiguous situations. We present an entropy-based system allowing the robot to detect the poorest object hypotheses and query the user for arbitration. Based on the information obtained from the human-robot dialog, the scene segmentation can be re-seeded and thereby improved. We present experimental results on real data that show an improved segmentation performance compared to segmentation without interaction.

In IEEE International Conference on Robotics and Automation (ICRA), Shanghai, China, May 9-13, 2011, clmc (inproceedings)

Abstract

We present a new approach to motion planning
using a stochastic trajectory optimization framework. The
approach relies on generating noisy trajectories to explore
the space around an initial (possibly infeasible) trajectory,
which are then combined to produced an updated trajectory
with lower cost. A cost function based on a combination of
obstacle and smoothness cost is optimized in each iteration. No
gradient information is required for the particular optimization
algorithm that we use and so general costs for which derivatives
may not be available (e.g. costs corresponding to constraints
and motor torques) can be included in the cost function. We
demonstrate the approach both in simulation and on a dual-arm
mobile manipulation system for unconstrained and constrained
tasks. We experimentally show that the stochastic nature of
STOMP allows it to overcome local minima that gradient-based
optimizers like CHOMP can get stuck in.

Developing robots capable of fine manipulation skills is of major importance in order to build truly assistive robots. These robots need to be compliant in their actuation and control in order to operate safely in human environments. Manip-ulation tasks imply complex contact interactions with the external world, and in-volve reasoning about the forces and torques to be applied. Planning under con-tact conditions is usually impractical due to computational complexity, and a lack of precise dynamics models of the environment. We present an approach to acquiring manipulation skills on compliant robots through reinforcement learn-ing. The initial position control policy for manipulation is initialized through kinesthetic demonstration. We augment this policy with a force/torque profile to be controlled in combination with the position trajectories. We use the Policy Improvement with Path Integrals (PI2) algorithm to learn these force/torque pro-files by optimizing a cost function that measures task success. We demonstrate our approach on the Barrett WAM robot arm equipped with a 6-DOF force/torque sensor on two different manipulation tasks: opening a door with a lever door handle, and picking up a pen off the table. We show that the learnt force control policies allow successful, robust execution of the tasks.

Path integral methods [7], [15],[1] have recently been shown to be applicable to a very general class of optimal control problems. Here we examine the path integral formalism from a decision-theoretic point of view, since an optimal controller can always be regarded as an instance of a perfectly rational decision-maker that chooses its actions so as to maximize its expected utility [8]. The problem with perfect rationality is, however, that finding optimal actions is often very difficult due to prohibitive computational resource costs that are not taken into account. In contrast, a bounded rational decision-maker has only limited resources and therefore needs to strike some compromise between the desired utility and the required resource costs [14]. In particular, we suggest an information-theoretic measure of resource costs that can be derived axiomatically [11]. As a consequence we obtain a variational principle for choice probabilities that trades off maximizing a given utility criterion and avoiding resource costs that arise due to deviating from initially given default choice probabilities. The resulting bounded rational policies are in general probabilistic. We show that the solutions found by the path integral formalism are such bounded rational policies. Furthermore, we show that the same formalism generalizes to discrete control problems, leading to linearly solvable bounded rational control policies in the case of Markov systems. Importantly, Bellman?s optimality principle is not presupposed by this variational principle, but it can be derived as a limit case. This suggests that the information- theoretic formalization of bounded rationality might serve as a general principle in control design that unifies a number of recently reported approximate optimal control methods both in the continuous and discrete domain.

In IEEE International Conference on Robotics and Automation (ICRA), Shanghai, China, May 9-13, 2011, clmc (inproceedings)

Abstract

Learning complex motor skills for real world tasks
is a hard problem in robotic manipulation that often requires
painstaking manual tuning and design by a human expert.
In this work, we present a Reinforcement Learning based
approach to acquiring new motor skills from demonstration.
Our approach allows the robot to learn fine manipulation skills
and significantly improve its success rate and skill level starting
from a possibly coarse demonstration. Our approach aims to
incorporate task domain knowledge, where appropriate, by
working in a space consistent with the constraints of a specific
task. In addition, we also present an approach to using sensor
feedback to learn a predictive model of the task outcome. This
allows our system to learn the proprioceptive sensor feedback
needed to monitor subsequent executions of the task online and
abort execution in the event of predicted failure. We illustrate
our approach using two example tasks executed with the PR2
dual-arm robot: a straight and accurate pool stroke and a box
flipping task using two chopsticks as tools.

In Proceedings of the 18th World Congress of the International Federation of Automatic Control, 2011, clmc (inproceedings)

Abstract

Recent work on path integral stochastic optimal control theory Theodorou et al.
(2010a); Theodorou (2011) has shown promising results in planning and control of nonlinear
systems in high dimensional state spaces. The path integral control framework relies on the
transformation of the nonlinear Hamilton Jacobi Bellman (HJB) partial diﬀerential equation
(PDE) into a linear PDE and the approximation of its solution via the use of the Feynman
Kac lemma. In this work, we are reviewing the generalized version of path integral stochastic
optimal control formalism Theodorou et al. (2010a), used for optimal control and planing of
stochastic dynamical systems with state dependent control and diﬀusion matrices. Moreover
we present the iterative path integral control approach, the so called Policy Improvement with
Path Integrals or (PI2 ) which is capable of scaling in high dimensional robotic control problems.
Furthermore we present a convergence analysis of the proposed algorithm and we apply the
proposed framework to a variety of robotic tasks. Finally with the goal to perform locomotion
the iterative path integral control is applied for learning nonlinear limit cycle attractors with
adjustable land scape.

Segmenting complex movements into a sequence of primitives remains a difficult problem with many applications in the robotics and vision communities. In this work, we show how the movement segmentation problem can be reduced to a sequential movement recognition problem. To this end, we reformulate the orig-inal Dynamic Movement Primitive (DMP) formulation as a linear dynamical sys-tem with control inputs. Based on this new formulation, we develop an Expecta-tion-Maximization algorithm to estimate the duration and goal position of a par-tially observed trajectory. With the help of this algorithm and the assumption that a library of movement primitives is present, we present a movement seg-mentation framework. We illustrate the usefulness of the new DMP formulation on the two applications of online movement recognition and movement segmen-tation.

The development of agile and safe humanoid robots require controllers that guarantee both high tracking performance and compliance with the environment. More specifically, the control of contact interaction is of crucial importance for robots that will actively interact with their environment. Model-based controllers such as inverse dynamics or operational space control are very appealing as they offer both high tracking performance and compliance. However, while widely used for fully actuated systems such as manipulators, they are not yet standard controllers for legged robots such as humanoids. Indeed such robots are fundamentally different from manipulators as they are underactuated due to their floating-base and subject to switching contact constraints. In this paper we present an inverse dynamics controller for legged robots that use torque redundancy to create an optimal distribution of contact constraints. The resulting controller is able to minimize, given a desired motion, any quadratic cost of the contact constraints at each instant of time. In particular we show how this can be used to minimize tangential forces during locomotion, therefore significantly improving the locomotion of legged robots on difficult terrains. In addition to the theoretical result, we present simulations of a humanoid and a quadruped robot, as well as experiments on a real quadruped robot that demonstrate the advantages of the controller.

In Proceedings of American Control Conference (ACC), 2011, clmc (inproceedings)

Abstract

With the goal to build robotic hands which can reach the levels of dexterity and robustness of the hand, the question of what are the candidate control principles that can
handle the nonlinearities, the high dimensionality and the internal noise of biomechanical structures of the complexity of the hand, is still open. In this work we present the first stochastic optimal feedback controller applied to a full tendon driven simulated robotic index finger. In our model we do take into account the full tendon structure of the index finger which consist of 11 tendons based on the underlying physiology and we
consider muscle with the typical force - length and force velocity properties. Our feedback controller show robustness against noise and perturbation
of the dynamics while it can also successfully handle the nonlinearities and high dimensionality of the robotic index finger. Furthermore as it is shown in the evaluations, it provides the complete time history of the
tendon excursions and the tendon velocities of the index finger for the tasks of tapping with zero and nonzero terminal velocities.

For complex robots such as humanoids, model-based control is highly beneficial for accurate tracking while keeping negative feedback gains low for compliance. However, in such multi degree-of-freedom lightweight systems, conventional identification of rigid body dynamics models using CAD data and actuator models is inaccurate due to unknown nonlinear robot dynamic effects. An alternative method is data-driven parameter estimation, but significant noise in measured and inferred variables affects it adversely. Moreover, standard estimation procedures may give physically inconsistent results due to unmodeled nonlinearities or insufficiently rich data. This paper addresses these problems, proposing a Bayesian system identification technique for linear or piecewise linear systems. Inspired by Factor Analysis regression, we develop a computationally efficient variational Bayesian regression algorithm that is robust to ill-conditioned data, automatically detects relevant features, and identifies input and output noise. We evaluate our approach on rigid body parameter estimation for various robotic systems, achieving an error of up to three times lower than other state-of-the-art machine learning methods

Applying model-free reinforcement learning to manipulation remains challeng-ing for several reasons. First, manipulation involves physical contact, which causes discontinuous cost functions. Second, in manipulation, the end-point of the movement must be chosen carefully, as it represents a grasp which must be adapted to the pose and shape of the object. Finally, there is uncertainty in the object pose, and even the most carefully planned movement may fail if the object is not at the expected position.
To address these challenges we 1) present a simplified, computationally more ef-ficient version of our model-free reinforcement learning algorithm PI2; 2) extend PI2 so that it simultaneously learns shape parameters and goal parameters of mo-tion primitives; 3) use shape and goal learning to acquire motion primitives that are robust to object pose uncertainty. We evaluate these contributions on a ma-nipulation platform consisting of a 7-DOF arm with a 4-DOF hand.

We present an approach that enables robots to
learn motion primitives that are robust towards state estimation
uncertainties. During reaching and preshaping, the robot learns
to use fine manipulation strategies to maneuver the object into
a pose at which closing the hand to perform the grasp is more
likely to succeed. In contrast, common assumptions in grasp
planning and motion planning for reaching are that these tasks
can be performed independently, and that the robot has perfect
knowledge of the pose of the objects in the environment.
We implement our approach using Dynamic Movement
Primitives and the probabilistic model-free reinforcement learning
algorithm Policy Improvement with Path Integrals (PI2 ).
The cost function that PI2 optimizes is a simple boolean that
penalizes failed grasps. The key to acquiring robust motion
primitives is to sample the actual pose of the object from a
distribution that represents the state estimation uncertainty.
During learning, the robot will thus optimize the chance of
grasping an object from this distribution, rather than at one
specific pose.
In our empirical evaluation, we demonstrate how the motion
primitives become more robust when grasping simple cylindrical
objects, as well as more complex, non-convex objects. We
also investigate how well the learned motion primitives generalize
towards new object positions and other state estimation
uncertainty distributions.

In IEEE International Conference on Robotics and Automation (ICRA), Shanghai, China, May 9-13, 2011, clmc (inproceedings)

Abstract

Inverse dynamics controllers and operational
space controllers have proved to be very efficient for compliant
control of fully actuated robots such as fixed base manipulators.
However legged robots such as humanoids are inherently
different as they are underactuated and subject to switching external
contact constraints. Recently several methods have been
proposed to create inverse dynamics controllers and operational
space controllers for these robots. In an attempt to compare
these different approaches, we develop a general framework
for inverse dynamics control and show that these methods
lead to very similar controllers. We are then able to greatly
simplify recent whole-body controllers based on operational
space approaches using kinematic projections, bringing them
closer to efficient practical implementations. We also generalize
these controllers such that they can be optimal under an
arbitrary quadratic cost in the commands.

One of the hallmarks of the performance, versatility, and robustness
of biological motor control is the ability to adapt the impedance of
the overall biomechanical system to different task requirements and
stochastic disturbances. A transfer of this principle to robotics is
desirable, for instance to enable robots to work robustly and safely
in everyday human environments. It is, however, not trivial to derive
variable impedance controllers for practical high degree-of-freedom
(DOF) robotic tasks.
In this contribution, we accomplish such variable impedance control
with the reinforcement learning (RL) algorithm PISq ({f P}olicy
{f I}mprovement with {f P}ath {f I}ntegrals). PISq is a
model-free, sampling based learning method derived from first
principles of stochastic optimal control. The PISq algorithm requires no tuning
of algorithmic parameters besides the exploration noise. The designer
can thus fully focus on cost function design to specify the task. From
the viewpoint of robotics, a particular useful property of PISq is
that it can scale to problems of many DOFs, so that reinforcement learning on real robotic
systems becomes feasible.
We sketch the PISq algorithm and its theoretical properties, and how
it is applied to gain scheduling for variable impedance control.
We evaluate our approach by presenting results on several simulated and real robots.
We consider tasks involving accurate tracking through via-points, and manipulation tasks requiring physical contact with the environment.
In these tasks, the optimal strategy requires both tuning of a reference trajectory emph{and} the impedance of the end-effector.
The results show that we can use path integral based reinforcement learning not only for
planning but also to derive variable gain feedback controllers in
realistic scenarios. Thus, the power of variable impedance control
is made available to a wide variety of robotic systems and practical
applications.

link (url)[BibTex]
f P}olicy
{f I}mprovement with {f P}ath {f I}ntegrals). PISq is a
model-free, sampling based learning method derived from first
principles of stochastic optimal control. The PISq algorithm requires no tuning
of algorithmic parameters besides the exploration noise. The designer
can thus fully focus on cost function design to specify the task. From
the viewpoint of robotics, a particular useful property of PISq is
that it can scale to problems of many DOFs, so that reinforcement learning on real robotic
systems becomes feasible.
We sketch the PISq algorithm and its theoretical properties, and how
it is applied to gain scheduling for variable impedance control.
We evaluate our approach by presenting results on several simulated and real robots.
We consider tasks involving accurate tracking through via-points, and manipulation tasks requiring physical contact with the environment.
In these tasks, the optimal strategy requires both tuning of a reference trajectory emph{and} the impedance of the end-effector.
The results show that we can use path integral based reinforcement learning not only for
planning but also to derive variable gain feedback controllers in
realistic scenarios. Thus, the power of variable impedance control
is made available to a wide variety of robotic systems and practical
applications.
&p[url]=https://am.is.tuebingen.mpg.de/publications?year%5B%5D=2011/buchli_ijrr_2011&p[summary]=Learning variable impedance control&p[images][0]=http://ps.staging.is.tuebingen.mpg.de/uploads/publication/image/495/thumb_lg_pami.jpg %>" onclick="popupCenter($(this).attr('href'), '', 580, 470); return false;" class="popup social_facebook"></a> -->
<!-- <a href="#" onclick="fb_share('One of the hallmarks of the performance, versatility, and robustness
of biological motor control is the ability to adapt the impedance of
the overall biomechanical system to different task requirements and
stochastic disturbances. A transfer of this principle to robotics is
desirable, for instance to enable robots to work robustly and safely
in everyday human environments. It is, however, not trivial to derive
variable impedance controllers for practical high degree-of-freedom
(DOF) robotic tasks.
In this contribution, we accomplish such variable impedance control
with the reinforcement learning (RL) algorithm PISq ({f P}olicy
{f I}mprovement with {f P}ath {f I}ntegrals). PISq is a
model-free, sampling based learning method derived from first
principles of stochastic optimal control. The PISq algorithm requires no tuning
of algorithmic parameters besides the exploration noise. The designer
can thus fully focus on cost function design to specify the task. From
the viewpoint of robotics, a particular useful property of PISq is
that it can scale to problems of many DOFs, so that reinforcement learning on real robotic
systems becomes feasible.
We sketch the PISq algorithm and its theoretical properties, and how
it is applied to gain scheduling for variable impedance control.
We evaluate our approach by presenting results on several simulated and real robots.
We consider tasks involving accurate tracking through via-points, and manipulation tasks requiring physical contact with the environment.
In these tasks, the optimal strategy requires both tuning of a reference trajectory emph{and} the impedance of the end-effector.
The results show that we can use path integral based reinforcement learning not only for
planning but also to derive variable gain feedback controllers in
realistic scenarios. Thus, the power of variable impedance control
is made available to a wide variety of robotic systems and practical
applications.
', 'https://am.is.tuebingen.mpg.de/publications?year%5B%5D=2011/buchli_ijrr_2011', 'http://staging.is.tuebingen.mpg.de/assets/home/am_home-5c82e9f63cc81d6ae8884feb8adb256e.jpg', 'Learning variable impedance control')" class="popup social_facebook"></a> -->
</li>
<li>
<a href="http://twitter.com/home?status=@MPI_IS_Tue - One of the hallmarks of the performance, versatility, and robustness
of biological motor control is the ability to adapt the impedance of
the overall biomechanical system to different task requirements and
stochastic disturbances. A transfer of this principle to robotics is
desirable, for instance to enable robots to work robustly and safely
in everyday human environments. It is, however, not trivial to derive
variable impedance controllers for practical high degree-of-freedom
(DOF) robotic tasks.
In this contribution, we accomplish such variable impedance control
with the reinforcement learning (RL) algorithm PISq ({f P}olicy
{f I}mprovement with {f P}ath {f I}ntegrals). PISq is a
model-free, sampling based learning method derived from first
principles of stochastic optimal control. The PISq algorithm requires no tuning
of algorithmic parameters besides the exploration noise. The designer
can thus fully focus on cost function design to specify the task. From
the viewpoint of robotics, a particular useful property of PISq is
that it can scale to problems of many DOFs, so that reinforcement learning on real robotic
systems becomes feasible.
We sketch the PISq algorithm and its theoretical properties, and how
it is applied to gain scheduling for variable impedance control.
We evaluate our approach by presenting results on several simulated and real robots.
We consider tasks involving accurate tracking through via-points, and manipulation tasks requiring physical contact with the environment.
In these tasks, the optimal strategy requires both tuning of a reference trajectory emph{and} the impedance of the end-effector.
The results show that we can use path integral based reinforcement learning not only for
planning but also to derive variable gain feedback controllers in
realistic scenarios. Thus, the power of variable impedance control
is made available to a wide variety of robotic systems and practical
applications.
: https://am.is.tuebingen.mpg.de/publications?year%5B%5D=2011/buchli_ijrr_2011" onclick="popupCenter($(this).attr('href'), '', 580, 470); return false;" class="popup social_twitter"></a>
</li>
<li>
<a href="http://www.linkedin.com/shareArticle?mini=true&amp;url=https://am.is.tuebingen.mpg.de/publications?year%5B%5D=2011/buchli_ijrr_2011&amp;title=One of the hallmarks of the performance, versatility, and robustness
of biological motor control is the ability to adapt the impedance of
the overall biomechanical system to different task requirements and
stochastic disturbances. A transfer of this principle to robotics is
desirable, for instance to enable robots to work robustly and safely
in everyday human environments. It is, however, not trivial to derive
variable impedance controllers for practical high degree-of-freedom
(DOF) robotic tasks.
In this contribution, we accomplish such variable impedance control
with the reinforcement learning (RL) algorithm PISq ({f P}olicy
{f I}mprovement with {f P}ath {f I}ntegrals). PISq is a
model-free, sampling based learning method derived from first
principles of stochastic optimal control. The PISq algorithm requires no tuning
of algorithmic parameters besides the exploration noise. The designer
can thus fully focus on cost function design to specify the task. From
the viewpoint of robotics, a particular useful property of PISq is
that it can scale to problems of many DOFs, so that reinforcement learning on real robotic
systems becomes feasible.
We sketch the PISq algorithm and its theoretical properties, and how
it is applied to gain scheduling for variable impedance control.
We evaluate our approach by presenting results on several simulated and real robots.
We consider tasks involving accurate tracking through via-points, and manipulation tasks requiring physical contact with the environment.
In these tasks, the optimal strategy requires both tuning of a reference trajectory emph{and} the impedance of the end-effector.
The results show that we can use path integral based reinforcement learning not only for
planning but also to derive variable gain feedback controllers in
realistic scenarios. Thus, the power of variable impedance control
is made available to a wide variety of robotic systems and practical
applications.
&amp;summary=Learning variable impedance control" onclick="popupCenter($(this).attr('href'), '', 580, 470); return false;" class="popup social_linkedin"></a>
</li>
<li>
<a href="https://plus.google.com/share?url=One of the hallmarks of the performance, versatility, and robustness
of biological motor control is the ability to adapt the impedance of
the overall biomechanical system to different task requirements and
stochastic disturbances. A transfer of this principle to robotics is
desirable, for instance to enable robots to work robustly and safely
in everyday human environments. It is, however, not trivial to derive
variable impedance controllers for practical high degree-of-freedom
(DOF) robotic tasks.
In this contribution, we accomplish such variable impedance control
with the reinforcement learning (RL) algorithm PISq ({f P}olicy
{f I}mprovement with {f P}ath {f I}ntegrals). PISq is a
model-free, sampling based learning method derived from first
principles of stochastic optimal control. The PISq algorithm requires no tuning
of algorithmic parameters besides the exploration noise. The designer
can thus fully focus on cost function design to specify the task. From
the viewpoint of robotics, a particular useful property of PISq is
that it can scale to problems of many DOFs, so that reinforcement learning on real robotic
systems becomes feasible.
We sketch the PISq algorithm and its theoretical properties, and how
it is applied to gain scheduling for variable impedance control.
We evaluate our approach by presenting results on several simulated and real robots.
We consider tasks involving accurate tracking through via-points, and manipulation tasks requiring physical contact with the environment.
In these tasks, the optimal strategy requires both tuning of a reference trajectory emph{and} the impedance of the end-effector.
The results show that we can use path integral based reinforcement learning not only for
planning but also to derive variable gain feedback controllers in
realistic scenarios. Thus, the power of variable impedance control
is made available to a wide variety of robotic systems and practical
applications.
%20https://am.is.tuebingen.mpg.de/publications?year%5B%5D=2011/buchli_ijrr_2011" onclick="popupCenter($(this).attr('href'), '', 580, 470); return false;" class="popup social_googleplus"></a>
</li>
<li>
<a href="mailto:?subject=One of the hallmarks of the performance, versatility, and robustness
of biological motor control is the ability to adapt the impedance of
the overall biomechanical system to different task requirements and
stochastic disturbances. A transfer of this principle to robotics is
desirable, for instance to enable robots to work robustly and safely
in everyday human environments. It is, however, not trivial to derive
variable impedance controllers for practical high degree-of-freedom
(DOF) robotic tasks.
In this contribution, we accomplish such variable impedance control
with the reinforcement learning (RL) algorithm PISq ({f P}olicy
{f I}mprovement with {f P}ath {f I}ntegrals). PISq is a
model-free, sampling based learning method derived from first
principles of stochastic optimal control. The PISq algorithm requires no tuning
of algorithmic parameters besides the exploration noise. The designer
can thus fully focus on cost function design to specify the task. From
the viewpoint of robotics, a particular useful property of PISq is
that it can scale to problems of many DOFs, so that reinforcement learning on real robotic
systems becomes feasible.
We sketch the PISq algorithm and its theoretical properties, and how
it is applied to gain scheduling for variable impedance control.
We evaluate our approach by presenting results on several simulated and real robots.
We consider tasks involving accurate tracking through via-points, and manipulation tasks requiring physical contact with the environment.
In these tasks, the optimal strategy requires both tuning of a reference trajectory emph{and} the impedance of the end-effector.
The results show that we can use path integral based reinforcement learning not only for
planning but also to derive variable gain feedback controllers in
realistic scenarios. Thus, the power of variable impedance control
is made available to a wide variety of robotic systems and practical
applications.
&amp;body=https://am.is.tuebingen.mpg.de/publications?year%5B%5D=2011/buchli_ijrr_2011" class="social_mail"></a>
</li>
</ul>
</div>

Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems