Social robots or collaborative robots that have to interact with people in a reactive way are difficult to program. This difficulty stems from the different skills required by the programmer: to provide an engaging user experience the behavior must include a sense of aesthetics while robustly operating in a continuously changing environment. The Playful framework allows composing such dynamic behaviors using a basic set of action and perception primitives. Within this framework, a behavior is encoded as a list of declarative statements corresponding to high-level sensory-motor couplings. To facilitate non-expert users to program such behaviors, we propose a Learning from Demonstration (LfD) technique that maps motion capture of humans directly to a Playful script. The approach proceeds by identifying the sensory-motor couplings that are active at each step using the Viterbi path in a Hidden Markov Model (HMM). Given these activation patterns, binary classifiers called evaluations are trained to associate activations to sensory data. Modularity is increased by clustering the sensory-motor couplings, leading to a hierarchical tree structure. The novelty of the proposed approach is that the learned behavior is encoded not in terms of trajectories in a task space, but as couplings between sensory information and high-level motor actions. This provides advantages in terms of behavioral generalization and reactivity displayed by the robot.

Proceedings of the 56th IEEE Annual Conference on Decision and Control (CDC), pages: 5193-5200, IEEE, IEEE Conference on Decision and Control, December 2017 (conference)

Abstract

Finding optimal feedback controllers for nonlinear dynamic systems from data
is hard. Recently, Bayesian optimization (BO) has been proposed as a powerful
framework for direct controller tuning from experimental trials. For selecting
the next query point and finding the global optimum, BO relies on a
probabilistic description of the latent objective function, typically a
Gaussian process (GP). As is shown herein, GPs with a common kernel choice can,
however, lead to poor learning outcomes on standard quadratic control problems.
For a first-order system, we construct two kernels that specifically leverage
the structure of the well-known Linear Quadratic Regulator (LQR), yet retain
the flexibility of Bayesian nonparametric learning. Simulations of uncertain
linear and nonlinear systems demonstrate that the LQR kernels yield superior
learning performance.

Recent approaches in robotics follow the insight that perception is facilitated by interactivity with the environment. These approaches are subsumed under the term of Interactive Perception (IP). We argue that IP provides the following benefits: (i) any type of forceful interaction with the environment creates a new type of informative sensory signal that would otherwise not be present and (ii) any prior knowledge about the nature of the interaction supports the interpretation of the signal. This is facilitated by knowledge of the regularity in the combined space of sensory information and action parameters. The goal of this survey is to postulate this as a principle and collect evidence in support by analyzing and categorizing existing work in this area. We also provide an overview of the most important applications of Interactive Perception. We close this survey by discussing the remaining open questions. Thereby, we hope to define a field and inspire future work.

We propose a novel long-term optimization criterion to improve the robustness of model-based reinforcement learning in real-world scenarios. Learning a dynamics model to derive a solution promises much greater data-efficiency and reusability compared to model-free alternatives. In practice, however, modelbased RL suffers from various imperfections such as noisy input and output data, delays and unmeasured (latent) states. To achieve higher resilience against such effects, we propose to optimize a generative long-term prediction model directly with respect to the likelihood of observed trajectories as opposed to the common approach of optimizing a dynamics model for one-step-ahead predictions. We evaluate the proposed method on several artificial and real-world benchmark problems and compare it to PILCO, a model-based RL framework, in experiments on a manipulation robot. The results show that the proposed method is competitive compared to state-of-the-art model learning methods. In contrast to these more involved models, our model can directly be employed for policy search and outperforms a baseline method in the robot experiment.

Understanding physical phenomena is a key component of human intelligence and enables physical interaction with previously unseen environments. In this paper, we study how an artificial agent can autonomously acquire this intuition through interaction with the environment. We created a synthetic block stacking environment with physics simulation in which the agent can learn a policy end-to-end through trial and error. Thereby, we bypass to explicitly model physical knowledge within the policy. We are specifically interested in tasks that require the agent to reach a given goal state that may be different for every new trial. To this end, we propose a deep reinforcement learning framework that learns policies which are parametrized by a goal. We validated the model on a toy example navigating in a grid world with different target positions and in a block stacking task with different target structures of the final tower. In contrast to prior work, our policies show better generalization across different goals.

The successful execution of complex modern robotic tasks often relies on the correct tuning of a large number of parameters. In this paper we present a methodology for improving the performance of a trotting gait by learning the gait parameters, impedance profile and the gains of the control architecture. We show results on a set of terrains, for various speeds using a realistic simulation of a hydraulically actuated system. Our method achieves a reduction in the gait's mechanical energy consumption during locomotion of up to 26%. The simulation results are validated in experimental trials on the hardware system.

In Proceedings of the IEEE/RSJ International Conference of Intelligent Robots and Systems, September 2017 (inproceedings)Accepted

Abstract

We aim to reliably predict whether a grasp on a known object is successful before it is executed in the real world. There is an entire suite of grasp metrics that has already been developed which rely on precisely known contact points between object and hand. However, it remains unclear whether and how they may be combined into a general purpose grasp stability predictor. In this paper, we analyze these questions by leveraging a large scale database of simulated grasps on a wide variety of objects. For each grasp, we compute the value of seven metrics. Each grasp is annotated by human subjects with ground truth stability labels. Given this data set, we train several classification methods to find out whether there is some underlying, non-trivial structure in the data that is difficult to model manually but can be learned. Quantitative and qualitative results show the complexity of the prediction problem. We found that a good prediction performance critically depends on using a combination of metrics as input features. Furthermore, non-parametric and non-linear classifiers best capture the structure in the data.

An event-based state estimation approach for reducing communication in a networked control system is proposed. Multiple distributed sensor agents observe a dynamic process and sporadically transmit their measurements to estimator agents over a shared bus network. Local event-triggering protocols ensure that data is transmitted only when necessary to meet a desired estimation accuracy. The event-based design is shown to emulate the performance of a centralised state observer design up to guaranteed bounds, but with reduced communication. The stability results for state estimation are extended to the distributed control system that results when the local estimates are used for feedback control. Results from numerical simulations and hardware experiments illustrate the effectiveness of the proposed approach in reducing network communication.

In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages: 5295-5301, IEEE, Piscataway, NJ, USA, IEEE International Conference on Robotics and Automation (ICRA), May 2017 (inproceedings)

In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages: 1557-1563, IEEE, Piscataway, NJ, USA, IEEE International Conference on Robotics and Automation (ICRA), May 2017 (inproceedings)

We propose a probabilistic filtering method which fuses joint measurements with depth images to yield a precise, real-time estimate of the end-effector pose in the camera frame. This avoids the need for frame transformations when using it in combination with visual object tracking methods.
Precision is achieved by modeling and correcting biases in the joint measurements as well as inaccuracies in the robot model, such as poor extrinsic camera calibration. We make our method computationally efficient through a principled combination of Kalman filtering of the joint measurements and asynchronous depth-image updates based on the Coordinate Particle Filter.
We quantitatively evaluate our approach on a dataset recorded from a real robotic platform, annotated with ground truth from a motion capture system. We show that our approach is robust and accurate even under challenging conditions such as fast motion, significant and long-term occlusions, and time-varying biases. We release the dataset along with open-source code of our approach to allow for quantitative comparison with alternative approaches.

Abstract Anticipation can enhance the capability of a robot in its interaction with humans, where the robot predicts the humans' intention for selecting its own action. We present a novel framework of anticipatory action selection for human-robot interaction, which is capable to handle nonlinear and stochastic human behaviors such as table tennis strokes and allows the robot to choose the optimal action based on prediction of the human partner's intention with uncertainty. The presented framework is generic and can be used in many human-robot interaction scenarios, for example, in navigation and human-robot co-manipulation. In this article, we conduct a case study on human-robot table tennis. Due to the limited amount of time for executing hitting movements, a robot usually needs to initiate its hitting movement before the opponent hits the ball, which requires the robot to be anticipatory based on visual observation of the opponent's movement. Previous work on Intention-Driven Dynamics Models (IDDM) allowed the robot to predict the intended target of the opponent. In this article, we address the problem of action selection and optimal timing for initiating a chosen action by formulating the anticipatory action selection as a Partially Observable Markov Decision Process (POMDP), where the transition and observation are modeled by the \{IDDM\} framework. We present two approaches to anticipatory action selection based on the \{POMDP\} formulation, i.e., a model-free policy learning method based on Least-Squares Policy Iteration (LSPI) that employs the \{IDDM\} for belief updates, and a model-based Monte-Carlo Planning (MCP) method, which benefits from the transition and observation model by the IDDM. Experimental results using real data in a simulated environment show the importance of anticipatory action selection, and that \{POMDPs\} are suitable to formulate the anticipatory action selection problem by taking into account the uncertainties in prediction. We also show that existing algorithms for POMDPs, such as \{LSPI\} and MCP, can be applied to substantially improve the robot's performance in its interaction with humans.

Applying reinforcement learning to humanoid robots is challenging because humanoids have a large number of degrees of freedom and state and action spaces are continuous. Thus, most reinforcement learning algorithms would become computationally infeasible and require a prohibitive amount of trials to explore such high-dimensional spaces. In this paper, we present a probabilistic reinforcement learning approach, which is derived from the framework of stochastic optimal control and path integrals. The algorithm, called Policy Improvement with Path Integrals (PI2), has a surprisingly simple form, has no open tuning parameters besides the exploration noise, is model-free, and performs numerically robustly in high dimensional learning problems. We demonstrate how PI2 is able to learn full-body motor skills on a 34-DOF humanoid robot. To demonstrate the generality of our approach, we also apply PI2 in the context of variable impedance control, where both planned trajectories and gain schedules for each joint are optimized simultaneously.

In IROS’10 Workshop on Defining and Solving Realistic Perception Problems in Personal Robotics, October 2010 (inproceedings)

Abstract

Object grasping and manipulation pose major challenges for perception and control and require rich interaction between these two fields. In this paper, we concentrate on the plethora of perceptual problems that have to be solved before a robot can be moved in a controlled way to pick up an object. A vision system is presented that integrates a number of different computational processes, e.g. attention, segmentation, recognition or reconstruction to incrementally build up a representation of the scene suitable for grasping and manipulation of objects. Our vision system is equipped with an active robotic head and a robot arm. This embodiment enables the robot to perform a number of different actions like saccading, fixating, and grasping. By applying these actions, the robot can incrementally build a scene representation and use it for interaction. We demonstrate our system in a scenario for picking up known objects from a table top. We also show the system’s extendibility towards grasping of unknown and familiar objects.

We propose a method for multi-modal scene exploration where initial object hypothesis formed by active visual segmentation are confirmed and augmented through haptic exploration with a robotic arm. We update the current belief about the state of the map with the detection results and predict yet unknown parts of the map with a Gaussian Process. We show that through the integration of different sensor modalities, we achieve a more complete scene model. We also show that the prediction of the scene structure leads to a valid scene representation even if the map is not fully traversed. Furthermore, we propose different exploration strategies and evaluate them both in simulation and on our robotic platform.

In this paper we present a framework for the segmentation of multiple objects from a 3D point cloud. We extend traditional image segmentation techniques into a full 3D representation. The proposed technique relies on a state-of-the-art min-cut framework to perform a fully 3D global multi-class labeling in a principled manner. Thereby, we extend our previous work in which a single object was actively segmented from the background. We also examine several seeding methods to bootstrap the graphical model-based energy minimization and these methods are compared over challenging scenes. All results are generated on real-world data gathered with an active vision robotic head. We present quantitive results over aggregate sets as well as visual results on specific examples.

Policy search is a successful approach to reinforcement
learning. However, policy improvements often result
in the loss of information. Hence, it has been marred
by premature convergence and implausible solutions.
As first suggested in the context of covariant policy
gradients (Bagnell and Schneider 2003), many of these
problems may be addressed by constraining the information
loss. In this paper, we continue this path of reasoning
and suggest the Relative Entropy Policy Search
(REPS) method. The resulting method differs significantly
from previous policy gradient approaches and
yields an exact update step. It works well on typical
reinforcement learning benchmark problems.

Reinforcement learning (RL) is one of the most general approaches to learning control. Its applicability to complex motor systems, however, has been largely impossible so far due to the computational difficulties that reinforcement learning encounters in high dimensional continuous state-action spaces. In this paper, we derive a novel approach to RL for parameterized control policies based on the framework of stochastic optimal control with path integrals. While solidly grounded in optimal control theory and estimation theory, the update equations for learning are surprisingly simple and have no danger of numerical instabilities as neither matrix inversions nor gradient learning rates are required. Empirical evaluations demonstrate significant performance improvements over gradient-based policy learning and scalability to high-dimensional control problems. Finally, a learning experiment on a robot dog illustrates the functionality of our algorithm in a real-world scenario. We believe that our new algorithm, Policy Improvement with Path Integrals (PI2), offers currently one of the most efficient, numerically robust, and easy to implement algorithms for RL in robotics.

Model-based control methods can be used to enable fast, dexterous, and compliant motion of robots without sacrificing control accuracy. However, implementing such techniques on floating base robots, e.g., humanoids and legged systems, is non-trivial due to under-actuation, dynamically changing constraints from the environment, and potentially closed loop kinematics. In this paper, we show how to compute the analytically correct inverse dynamics torques for model-based control of sufficiently constrained floating base rigid-body systems, such as humanoid robots with one or two feet in contact with the environment. While our previous inverse dynamics approach relied on an estimation of contact forces to compute an approximate inverse dynamics solution, here we present an analytically correct solution by using an orthogonal decomposition to project the robot dynamics onto a reduced dimensional space, independent of contact forces. We demonstrate the feasibility and robustness of our approach on a simulated floating base bipedal humanoid robot and an actual robot dog locomoting over rough terrain.

We present a control architecture for fast quadruped locomotion over rough terrain. We approach the problem by decomposing it into many sub-systems, in which we apply state-of-the-art learning, planning, optimization and control techniques to achieve robust, fast locomotion. Unique features of our control strategy include: (1) a system that learns optimal foothold choices from expert demonstration using terrain templates, (2) a body trajectory optimizer based on the Zero-Moment Point (ZMP) stability criterion, and (3) a floating-base inverse dynamics controller that, in conjunction with force control, allows for robust, compliant locomotion over unperceived obstacles. We evaluate the performance of our controller by testing it on the LittleDog quadruped robot, over a wide variety of rough terrain of varying difficulty levels. We demonstrate the generalization ability of this controller by presenting test results from an independent external test team on terrains that have never been shown to us.

This paper presents work on vision based robotic grasping. The proposed method adopts a learning framework where prototypical grasping points are learnt from several examples and then used on novel objects. For representation purposes, we apply the concept of shape context and for learning we use a supervised learning approach in which the classifier is trained with labelled synthetic images. We evaluate and compare the performance of linear and non-linear classifiers. Our results show that a combination of a descriptor based on shape context with a non-linear classification algorithm leads to a stable detection of grasping points for a variety of objects.

Robot learning methods which allow au- tonomous robots to adapt to novel situations have been a long standing vision of robotics, artificial intelligence, and cognitive sciences. However, to date, learning techniques have yet to ful- fill this promise as only few methods manage to scale into the high-dimensional domains of manipulator robotics, or even the new upcoming trend of humanoid robotics. If possible, scaling was usually only achieved in precisely pre-structured domains. In this paper, we investigate the ingredients for a general ap- proach policy learning with the goal of an application to motor skill refinement in order to get one step closer towards human- like performance. For doing so, we study two major components for such an approach, i. e., firstly, we study policy learning algo- rithms which can be applied in the general setting of motor skill learning, and, secondly, we study a theoretically well-founded general approach to representing the required control structu- res for task representation and execution.

For complex robots such as humanoids, model-based control is highly beneficial for accurate tracking
while keeping negative feedback gains low for compliance. However, in such multi degree-of-freedom
lightweight systems, conventional identification of rigid body dynamics models using CAD data and
actuator models is inaccurate due to unknown nonlinear robot dynamic effects. An alternative method
is data-driven parameter estimation, but significant noise in measured and inferred variables affects it
adversely. Moreover, standard estimation procedures may give physically inconsistent results due to
unmodeled nonlinearities or insufficiently rich data. This paper addresses these problems, proposing
a Bayesian system identification technique for linear or piecewise linear systems. Inspired by Factor
Analysis regression, we develop a computationally efficient variational Bayesian regression algorithm
that is robust to ill-conditioned data, automatically detects relevant features, and identifies input and
output noise. We evaluate our approach on rigid body parameter estimation for various robotic systems,
achieving an error of up to three times lower than other state-of-the-art machine learning methods.

In this work we present the ﬁrst constrained stochastic op-
timal feedback controller applied to a fully nonlinear, tendon
driven index ﬁnger model. Our model also takes into account an
extensor mechanism, and muscle force-length and force-velocity
properties. We show this feedback controller is robust to noise
and perturbations to the dynamics, while successfully handling
the nonlinearities and high dimensionality of the system. By ex-
tending prior methods, we are able to approximate physiological
realism by ensuring positivity of neural commands and tendon
tensions at all timesthus can, for the ﬁrst time, use the optimal control framework
to predict biologically plausible tendon tensions for a nonlinear
neuromuscular ﬁnger model.
METHODS
1 Muscle Model
The rigid-body triple pendulum ﬁnger model with slightly
viscous joints is actuated by Hill-type muscle models. Joint
torques are generated by the seven muscles of the index ﬁn-

Whether human reaching movements are planned and optimized in kinematic
(task space) or dynamic (joint or muscle space) coordinates is still an
issue of debate. The first hypothesis implies that a planner produces a
desired end-effector position at each point in time during the reaching
movement, whereas the latter hypothesis includes the dynamics of the
muscular-skeletal control system to produce a continuous end-effector
trajectory.
Previous work by Wolpert et al (1995) showed that when subjects were
led to believe that their straight reaching paths corresponded to curved
paths as shown on a computer screen, participants adapted the true path
of their hand such that they would visually perceive a straight line
in visual space, despite that they actually produced a curved path.
These results were interpreted as supporting the stance that reaching
trajectories are planned in kinematic coordinates. However, this
experiment could only demonstrate that adaptation to altered paths, i.e.
the position of the end-effector, did occur, but not that the precise
timing of end-effector position was equally planned, i.e., the trajectory.
Our current experiment aims at filling this gap by explicitly testing whether
position over time, i.e. velocity, is a property of reaching movements
that is planned in kinematic coordinates.
In the current experiment, the velocity profiles of cursor movements
corresponding to the participant's hand motions were skewed either to
the left or to the right; the path itself was left unaltered.
We developed an adaptation paradigm, where the skew of the velocity profile was
introduced gradually and participants reported no awareness of any
manipulation.
Preliminary results indicate that the true hand motion of participants
did not alter, i.e. there was no adaptation so as to counterbalance the
introduced skew. However, for some participants, peak hand velocities
were lowered for higher skews, which suggests that participants
interpreted the manipulation as mere noise due to variance in their own
movement.
In summary, for a visuomotor transformation task, the hypothesis of a
planned continuous end-effector trajectory predicts adaptation to a
modified velocity profile. The current experiment found no systematic adaptation
under such transformation, but did demonstrate an effect that is more in
accordance that subjects could not perceive the manipulation and rather
interpreted as an increase of noise.

In 32nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2010, clmc (inproceedings)

Abstract

Abstract? We provide an overview of optimal control meth- ods to nonlinear neuromuscular systems and discuss their lim- itations. Moreover we extend current optimal control methods to their application to neuromuscular models with realistically numerous musculotendons; as most prior work is limited to torque-driven systems. Recent work on computational motor control has explored the used of control theory and esti- mation as a conceptual tool to understand the underlying computational principles of neuromuscular systems. After all, successful biological systems regularly meet conditions for stability, robustness and performance for multiple classes of complex tasks. Among a variety of proposed control theory frameworks to explain this, stochastic optimal control has become a dominant framework to the point of being a standard computational technique to reproduce kinematic trajectories of reaching movements (see [12])
In particular, we demonstrate the application of optimal control to a neuromuscular model of the index finger with all seven musculotendons producing a tapping task. Our simu- lations include 1) a muscle model that includes force- length and force-velocity characteristics; 2) an anatomically plausible biomechanical model of the index finger that includes a tendi- nous network for the extensor mechanism and 3) a contact model that is based on a nonlinear spring-damper attached at the end effector of the index finger. We demonstrate that it is feasible to apply optimal control to systems with realistically large state vectors and conclude that, while optimal control is an adequate formalism to create computational models of neuro- musculoskeletal systems, there remain important challenges and limitations that need to be considered and overcome such as contact transitions, curse of dimensionality, and constraints on states and controls.

We present a novel algorithm for efficient learning and feature selection in high-
dimensional regression problems. We arrive at this model through a modification of
the standard regression model, enabling us to derive a probabilistic version of the
well-known statistical regression technique of backfitting. Using the Expectation-
Maximization algorithm, along with variational approximation methods to overcome
intractability, we extend our algorithm to include automatic relevance detection
of the input features. This Variational Bayesian Least Squares (VBLS) approach
retains its simplicity as a linear model, but offers a novel statistically robust â??black-
boxâ? approach to generalized linear regression with high-dimensional inputs. It can
be easily extended to nonlinear regression and classification problems. In particular,
we derive the framework of sparse Bayesian learning, e.g., the Relevance Vector
Machine, with VBLS at its core, offering significant computational and robustness
advantages for this class of methods. We evaluate our algorithm on synthetic and
neurophysiological data sets, as well as on standard regression and classification
benchmark data sets, comparing it with other competitive statistical approaches
and demonstrating its suitability as a drop-in replacement for other generalized
linear regression techniques.

In the proceedings of American Control Conference (ACC 2010) , 2010, clmc (article)

Abstract

We present a generalization of the classic Differential Dynamic Programming algorithm. We assume the existence of state- and control-dependent process noise, and proceed to derive the second-order expansion of the cost-to-go. Despite having quartic and cubic terms in the initial expression, we show that these vanish, leaving us with the same quadratic structure as standard DDP.

In International Conference on Artificial Intelligence and Statistics (AISTATS 2010), 2010, clmc (inproceedings)

Abstract

With the goal to generate more scalable algo- rithms with higher efficiency and fewer open parameters, reinforcement learning (RL) has recently moved towards combining classi- cal techniques from optimal control and dy- namic programming with modern learning techniques from statistical estimation the- ory. In this vein, this paper suggests the framework of stochastic optimal control with path integrals to derive a novel approach to RL with parametrized policies. While solidly grounded in value function estimation and optimal control based on the stochastic Hamilton-Jacobi-Bellman (HJB) equations, policy improvements can be transformed into an approximation problem of a path inte- gral which has no open parameters other than the exploration noise. The resulting algorithm can be conceived of as model- based, semi-model-based, or even model free, depending on how the learning problem is structured. Our new algorithm demon- strates interesting similarities with previous RL research in the framework of proba- bility matching and provides intuition why the slightly heuristically motivated proba- bility matching approach can actually per- form well. Empirical evaluations demon- strate significant performance improvements over gradient-based policy learning and scal- ability to high-dimensional control problems. We believe that Policy Improvement with Path Integrals (PI2) offers currently one of the most efficient, numerically robust, and easy to implement algorithms for RL based on trajectory roll-outs.

Investigating principles of human motor control in the framework of optimal control has had a long tradition in neural control of movement, and has recently experienced a new surge of investigations. Ideally, optimal control problems are addresses as a reinforcement learning (RL) problem, which would allow to investigate both the process of acquiring an optimal control solution as well as the solution itself. Unfortunately, the applicability of RL to complex neural and biomechanics systems has been largely impossible so far due to the computational difficulties that arise in high dimensional continuous state-action spaces. As a way out, research has focussed on computing optimal control solutions based on iterative optimal control methods that are based on linear and quadratic approximations of dynamical models and cost functions. These methods require perfect knowledge of the dynamics and cost functions while they are based on gradient and Newton optimization schemes. Their applicability is also restricted to low dimensional problems due to problematic convergence in high dimensions. Moreover, the process of computing the optimal solution is removed from the learning process that might be plausible in biology. In this work, we present a new reinforcement learning method for learning optimal control solutions or motor control. This method, based on the framework of stochastic optimal control with path integrals, has a very solid theoretical foundation, while resulting in surprisingly simple learning algorithms. It is also possible to apply this approach without knowledge of the system model, and to use a wide variety of complex nonlinear cost functions for optimization. We illustrate the theoretical properties of this approach and its applicability to learning motor control tasks for reaching movements and locomotion studies. We discuss its applicability to learning desired trajectories, variable stiffness control (co-contraction), and parameterized control policies. We also investigate the applicability to signal dependent noise control systems. We believe that the suggested method offers one of the easiest to use approaches to learning optimal control suggested in the literature so far, which makes it ideally suited for computational investigations of biological motor control.

In a not too distant future, robots will be a natural part of
daily life in human society, providing assistance in many
areas ranging from clinical applications, education and care
giving, to normal household environments [1]. It is hard to
imagine that all possible tasks can be preprogrammed in such
robots. Robots need to be able to learn, either by themselves
or with the help of human supervision. Additionally, wear and
tear on robots in daily use needs to be automatically compensated
for, which requires a form of continuous self-calibration,
another form of learning. Finally, robots need to react to stochastic
and dynamic environments, i.e., they need to learn
how to optimally adapt to uncertainty and unforeseen
changes. Robot learning is going to be a key ingredient for the
future of autonomous robots.
While robot learning covers a rather large field, from learning
to perceive, to plan, to make decisions, etc., we will focus
this review on topics of learning control, in particular, as it is
concerned with learning control in simulated or actual physical
robots. In general, learning control refers to the process of
acquiring a control strategy for a particular control system and
a particular task by trial and error. Learning control is usually
distinguished from adaptive control [2] in that the learning system
can have rather general optimization objectivesâ??not just,
e.g., minimal tracking errorâ??and is permitted to fail during
the process of learning, while adaptive control emphasizes fast
convergence without failure. Thus, learning control resembles
the way that humans and animals acquire new movement
strategies, while adaptive control is a special case of learning
control that fulfills stringent performance constraints, e.g., as
needed in life-critical systems like airplanes.
Learning control has been an active topic of research for at
least three decades. However, given the lack of working robots
that actually use learning components, more work needs to be
done before robot learning will make it beyond the laboratory
environment. This article will survey some ongoing and past
activities in robot learning to assess where the field stands and
where it is going. We will largely focus on nonwheeled robots
and less on topics of state estimation, as typically explored in
wheeled robots [3]â??6], and we emphasize learning in continuous
state-action spaces rather than discrete state-action spaces [7], [8].
We will illustrate the different topics of robot learning with
examples from our own research with anthropomorphic and
humanoid robots.

We present a control architecture for fast quadruped locomotion over rough terrain. We approach the problem by decomposing
it into many sub-systems, in which we apply state-of-the-art learning, planning, optimization, and control techniques
to achieve robust, fast locomotion. Unique features of our control strategy include: (1) a system that learns optimal
foothold choices from expert demonstration using terrain templates, (2) a body trajectory optimizer based on the Zero-
Moment Point (ZMP) stability criterion, and (3) a floating-base inverse dynamics controller that, in conjunction with force
control, allows for robust, compliant locomotion over unperceived obstacles. We evaluate the performance of our controller
by testing it on the LittleDog quadruped robot, over a wide variety of rough terrains of varying difficulty levels. The
terrain that the robot was tested on includes rocks, logs, steps, barriers, and gaps, with obstacle sizes up to the leg length
of the robot. We demonstrate the generalization ability of this controller by presenting results from testing performed by
an independent external test team on terrain that has never been shown to us.

Energy-shaping control methods have produced strong theoretical results for asymptotically stable 3D bipedal dynamic walking in the literature. In particular, geometric controlled reduction exploits robot symmetries to control momentum conservation laws that decouple the sagittal-plane dynamics, which are easier to stabilize. However, the associated control laws require high-dimensional matrix inverses multiplied with complicated energy-shaping terms, often making these control theories difficult to apply to highly-redundant humanoid robots. This paper presents a first step towards the application of energy-shaping methods on real robots by casting controlled reduction into a framework of constrained accelerations for inverse dynamics control. By representing momentum conservation laws as constraints in acceleration space, we construct a general expression for desired joint accelerations that render the constraint surface invariant. By appropriately choosing an orthogonal projection, we show that the unconstrained (reduced) dynamics are decoupled from the constrained dynamics. Any acceleration-based controller can then be used to stabilize this planar subsystem, including passivity-based methods. The resulting control law is surprisingly simple and represents a practical way to employ control theoretic stability results in robotic platforms. Simulated walking of a 3D compass-gait biped show correspondence between the new and original controllers, and simulated motions of a 16-DOF humanoid demonstrate the applicability of this method.

One of the hallmarks of the performance, versatility,
and robustness of biological motor control is the ability to adapt
the impedance of the overall biomechanical system to different
task requirements and stochastic disturbances. A transfer of this
principle to robotics is desirable, for instance to enable robots
to work robustly and safely in everyday human environments. It
is, however, not trivial to derive variable impedance controllers
for practical high DOF robotic tasks. In this contribution, we accomplish
such gain scheduling with a reinforcement learning approach
algorithm, PI2 (Policy Improvement with Path Integrals).
PI2 is a model-free, sampling based learning method derived from
first principles of optimal control. The PI2 algorithm requires no
tuning of algorithmic parameters besides the exploration noise.
The designer can thus fully focus on cost function design to
specify the task. From the viewpoint of robotics, a particular
useful property of PI2 is that it can scale to problems of many
DOFs, so that RL on real robotic systems becomes feasible. We
sketch the PI2 algorithm and its theoretical properties, and how
it is applied to gain scheduling. We evaluate our approach by
presenting results on two different simulated robotic systems, a
3-DOF Phantom Premium Robot and a 6-DOF Kuka Lightweight
Robot. We investigate tasks where the optimal strategy requires
both tuning of the impedance of the end-effector, and tuning
of a reference trajectory. The results show that we can use
path integral based RL not only for planning but also to derive
variable gain feedback controllers in realistic scenarios. Thus,
the power of variable impedance control is made available to a
wide variety of robotic systems and practical applications.

In Proceedings of the 13th International Conference on Climbing and Walking Robots (CLAWAR), pages: 580-587, Nagoya, Japan, sep 2010 (inproceedings)

Abstract

Contact interaction with the environment is crucial in the design of locomotion controllers for legged robots, to prevent slipping for example. Therefore, it is of great importance to be able to control the effects of the robots movements on the contact reaction forces. In this contribution, we extend a recent inverse dynamics algorithm for floating base robots to optimize the distribution of contact forces while achieving precise trajectory tracking. The resulting controller is algorithmically simple as compared to other approaches. Numerical simulations show that this result significantly increases the range of possible movements of a humanoid robot as compared to the previous inverse dynamics algorithm. We also present a simplification of the result where no inversion of the inertia matrix is needed which is particularly relevant for practical use on a real robot. Such an algorithm becomes interesting for agile locomotion of robots on difficult terrains where the contacts with the environment are critical, such as walking over rough or slippery terrain.

2003

In The International Symposium on Adaptive Motion of Animals and Machines, Kyoto, Japan, March 4-8, 2003, March 2003, clmc (inproceedings)

Abstract

Sensory-motor integration is one of the key issues in robotics. In this paper, we propose an approach to rhythmic arm movement control that is synchronized with an external signal based on exploiting a simple neural oscillator network. Trajectory generation by the neural oscillator is a biologically inspired method that can allow us to generate a smooth and continuous trajectory. The parameter tuning of the oscillators is used to generate a synchronized movement with wide intervals. We adopted the method for the drumming task as an example task. By using this method, the robot can realize synchronized drumming with wide drumming intervals in real time. The paper also shows the experimental results of drumming by a humanoid robot.

We present an algorithm aimed at addressing both computational and analytical intractability of Bayesian regression models which operate in very high-dimensional, usually underconstrained spaces. Several domains of research frequently provide such datasets, including chemometrics [2], and human movement analysis [1]. The literature in nonparametric statistics provides interesting solutions such as Backfitting [3] and Partial Least Squares [4], which are extremely robust and efficient, yet lack a probabilistic interpretation that could place them in the context of current research in statistical learning algorithms that emphasize the estimation of confidence, posterior distributions, and model complexity. In order to achieve numerical robustness and low computational cost, we first derive a novel Bayesian interpretation of Backfitting (BB) as a computationally efficient regression algorithm. BBÕs learning complexity scales linearly with the input dimensionality by decoupling inference among individual input dimensions. We embed BB in an efficient, locally variational model selection mechanism that automatically grows the number of backfitting experts in a mixture-of-experts regression model. We demonstrate the effectiveness of the algorithm in performing principled regularization of model complexity when fitting nonlinear manifolds while avoiding the numerical hazards associated with highly underconstrained problems. We also note that this algorithm appears applicable in various areas of neural computation, e.g., in abstract models of computational neuroscience, or implementations of statistical learning on artificial systems.

Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems