Proceedings of the 56th IEEE Annual Conference on Decision and Control (CDC), pages: 5193-5200, IEEE, IEEE Conference on Decision and Control, December 2017 (conference)

Abstract

Finding optimal feedback controllers for nonlinear dynamic systems from data
is hard. Recently, Bayesian optimization (BO) has been proposed as a powerful
framework for direct controller tuning from experimental trials. For selecting
the next query point and finding the global optimum, BO relies on a
probabilistic description of the latent objective function, typically a
Gaussian process (GP). As is shown herein, GPs with a common kernel choice can,
however, lead to poor learning outcomes on standard quadratic control problems.
For a first-order system, we construct two kernels that specifically leverage
the structure of the well-known Linear Quadratic Regulator (LQR), yet retain
the flexibility of Bayesian nonparametric learning. Simulations of uncertain
linear and nonlinear systems demonstrate that the LQR kernels yield superior
learning performance.

Recent approaches in robotics follow the insight that perception is facilitated by interactivity with the environment. These approaches are subsumed under the term of Interactive Perception (IP). We argue that IP provides the following benefits: (i) any type of forceful interaction with the environment creates a new type of informative sensory signal that would otherwise not be present and (ii) any prior knowledge about the nature of the interaction supports the interpretation of the signal. This is facilitated by knowledge of the regularity in the combined space of sensory information and action parameters. The goal of this survey is to postulate this as a principle and collect evidence in support by analyzing and categorizing existing work in this area. We also provide an overview of the most important applications of Interactive Perception. We close this survey by discussing the remaining open questions. Thereby, we hope to define a field and inspire future work.

We propose a novel long-term optimization criterion to improve the robustness of model-based reinforcement learning in real-world scenarios. Learning a dynamics model to derive a solution promises much greater data-efficiency and reusability compared to model-free alternatives. In practice, however, modelbased RL suffers from various imperfections such as noisy input and output data, delays and unmeasured (latent) states. To achieve higher resilience against such effects, we propose to optimize a generative long-term prediction model directly with respect to the likelihood of observed trajectories as opposed to the common approach of optimizing a dynamics model for one-step-ahead predictions. We evaluate the proposed method on several artificial and real-world benchmark problems and compare it to PILCO, a model-based RL framework, in experiments on a manipulation robot. The results show that the proposed method is competitive compared to state-of-the-art model learning methods. In contrast to these more involved models, our model can directly be employed for policy search and outperforms a baseline method in the robot experiment.

Understanding physical phenomena is a key component of human intelligence and enables physical interaction with previously unseen environments. In this paper, we study how an artificial agent can autonomously acquire this intuition through interaction with the environment. We created a synthetic block stacking environment with physics simulation in which the agent can learn a policy end-to-end through trial and error. Thereby, we bypass to explicitly model physical knowledge within the policy. We are specifically interested in tasks that require the agent to reach a given goal state that may be different for every new trial. To this end, we propose a deep reinforcement learning framework that learns policies which are parametrized by a goal. We validated the model on a toy example navigating in a grid world with different target positions and in a block stacking task with different target structures of the final tower. In contrast to prior work, our policies show better generalization across different goals.

The successful execution of complex modern robotic tasks often relies on the correct tuning of a large number of parameters. In this paper we present a methodology for improving the performance of a trotting gait by learning the gait parameters, impedance profile and the gains of the control architecture. We show results on a set of terrains, for various speeds using a realistic simulation of a hydraulically actuated system. Our method achieves a reduction in the gait's mechanical energy consumption during locomotion of up to 26%. The simulation results are validated in experimental trials on the hardware system.

In Proceedings of the IEEE/RSJ International Conference of Intelligent Robots and Systems, September 2017 (inproceedings)Accepted

Abstract

We aim to reliably predict whether a grasp on a known object is successful before it is executed in the real world. There is an entire suite of grasp metrics that has already been developed which rely on precisely known contact points between object and hand. However, it remains unclear whether and how they may be combined into a general purpose grasp stability predictor. In this paper, we analyze these questions by leveraging a large scale database of simulated grasps on a wide variety of objects. For each grasp, we compute the value of seven metrics. Each grasp is annotated by human subjects with ground truth stability labels. Given this data set, we train several classification methods to find out whether there is some underlying, non-trivial structure in the data that is difficult to model manually but can be learned. Quantitative and qualitative results show the complexity of the prediction problem. We found that a good prediction performance critically depends on using a combination of metrics as input features. Furthermore, non-parametric and non-linear classifiers best capture the structure in the data.

An event-based state estimation approach for reducing communication in a networked control system is proposed. Multiple distributed sensor agents observe a dynamic process and sporadically transmit their measurements to estimator agents over a shared bus network. Local event-triggering protocols ensure that data is transmitted only when necessary to meet a desired estimation accuracy. The event-based design is shown to emulate the performance of a centralised state observer design up to guaranteed bounds, but with reduced communication. The stability results for state estimation are extended to the distributed control system that results when the local estimates are used for feedback control. Results from numerical simulations and hardware experiments illustrate the effectiveness of the proposed approach in reducing network communication.

In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages: 5295-5301, IEEE, Piscataway, NJ, USA, IEEE International Conference on Robotics and Automation (ICRA), May 2017 (inproceedings)

In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages: 1557-1563, IEEE, Piscataway, NJ, USA, IEEE International Conference on Robotics and Automation (ICRA), May 2017 (inproceedings)

We propose a probabilistic filtering method which fuses joint measurements with depth images to yield a precise, real-time estimate of the end-effector pose in the camera frame. This avoids the need for frame transformations when using it in combination with visual object tracking methods.
Precision is achieved by modeling and correcting biases in the joint measurements as well as inaccuracies in the robot model, such as poor extrinsic camera calibration. We make our method computationally efficient through a principled combination of Kalman filtering of the joint measurements and asynchronous depth-image updates based on the Coordinate Particle Filter.
We quantitatively evaluate our approach on a dataset recorded from a real robotic platform, annotated with ground truth from a motion capture system. We show that our approach is robust and accurate even under challenging conditions such as fast motion, significant and long-term occlusions, and time-varying biases. We release the dataset along with open-source code of our approach to allow for quantitative comparison with alternative approaches.

Abstract Anticipation can enhance the capability of a robot in its interaction with humans, where the robot predicts the humans' intention for selecting its own action. We present a novel framework of anticipatory action selection for human-robot interaction, which is capable to handle nonlinear and stochastic human behaviors such as table tennis strokes and allows the robot to choose the optimal action based on prediction of the human partner's intention with uncertainty. The presented framework is generic and can be used in many human-robot interaction scenarios, for example, in navigation and human-robot co-manipulation. In this article, we conduct a case study on human-robot table tennis. Due to the limited amount of time for executing hitting movements, a robot usually needs to initiate its hitting movement before the opponent hits the ball, which requires the robot to be anticipatory based on visual observation of the opponent's movement. Previous work on Intention-Driven Dynamics Models (IDDM) allowed the robot to predict the intended target of the opponent. In this article, we address the problem of action selection and optimal timing for initiating a chosen action by formulating the anticipatory action selection as a Partially Observable Markov Decision Process (POMDP), where the transition and observation are modeled by the \{IDDM\} framework. We present two approaches to anticipatory action selection based on the \{POMDP\} formulation, i.e., a model-free policy learning method based on Least-Squares Policy Iteration (LSPI) that employs the \{IDDM\} for belief updates, and a model-based Monte-Carlo Planning (MCP) method, which benefits from the transition and observation model by the IDDM. Experimental results using real data in a simulated environment show the importance of anticipatory action selection, and that \{POMDPs\} are suitable to formulate the anticipatory action selection problem by taking into account the uncertainties in prediction. We also show that existing algorithms for POMDPs, such as \{LSPI\} and MCP, can be applied to substantially improve the robot's performance in its interaction with humans.

2013

In IEEE/RSJ International Conference on Intelligent Robots and Systems, pages: 3195-3202, IEEE, November 2013 (inproceedings)

Abstract

We address the problem of tracking the 6-DoF pose of an object while it is being manipulated by a human or a robot. We use a dynamic Bayesian network to perform inference and compute a posterior distribution over the current object pose. Depending on whether a robot or a human manipulates the object, we employ a process model with or without knowledge of control inputs. Observations are obtained from a range camera. As opposed to previous object tracking methods, we explicitly model self-occlusions and occlusions from the environment, e.g, the human or robotic hand. This leads to a strongly non-linear observation model and additional dependencies in the Bayesian network. We employ a Rao-Blackwellised particle filter to compute an estimate of the object pose at every time step. In a set of experiments, we demonstrate the ability of our method to accurately and robustly track the object pose in real-time while it is being manipulated by a human or a robot.

The International Journal of Robotics Research, 33(2):321-341, Sage, October 2013 (article)

Abstract

In this work, we propose to reconstruct a complete 3-D model of an unknown object
by fusion of visual and tactile information while the object is grasped. Assuming the
object is symmetric, a first hypothesis of its complete 3-D shape is generated. A grasp
is executed on the object with a robotic manipulator equipped with tactile sensors.
Given the detected contacts between the fingers and the object, the initial full object
model including the symmetry parameters can be refined. This refined model will then
allow the planning of more complex manipulation tasks.
The main contribution of this work is an optimal estimation approach for the fusion of
visual and tactile data applying the constraint of object symmetry. The fusion is
formulated as a state estimation problem and solved with an iterative extended
Kalman filter. The approach is validated experimentally using both artificial and real
data from two different robotic platforms.

In many naturally occurring optimization problems one needs to ensure that the definition of the optimization problem lends itself to solutions that are tractable to compute. In cases where exact solutions cannot be computed tractably, it is beneficial to have strong guarantees on the tractable approximate solutions. In order operate under these criterion most optimization problems are cast under the umbrella of convexity or submodularity. In this report we will study design and optimization over a common class of functions called submodular functions. Set functions, and specifically submodular set functions, characterize a wide variety of naturally occurring optimization problems, and the property of submodularity of set functions has deep theoretical consequences with wide ranging applications. Informally, the property of submodularity of set functions concerns the intuitive principle of diminishing returns. This property states that adding an element to a smaller set has more value than adding it to a larger set. Common examples of submodular monotone functions are entropies, concave functions of cardinality, and matroid rank functions; non-monotone examples include graph cuts, network flows, and mutual information. In this paper we will review the formal definition of submodularity; the optimization of submodular functions, both maximization and minimization; and finally discuss some applications in relation to learning and reasoning using submodular functions.

In IEEE International Conference on Robotics and Automation (ICRA), May 2013, clmc (inproceedings)

Abstract

One of the central problems in computer vision is the detection of semantically important objects and the estimation of their pose. Most of the work in object detection has been based on single image processing and its performance is limited by occlusions and ambiguity in appearance and geometry. This paper proposes an active approach to object detection by controlling the point of view of a mobile depth camera. When an initial static detection phase identifies an object of interest, several hypotheses are made about its class and orientation. The sensor then plans a sequence of view-points, which balances the amount of energy used to move with the chance of identifying the correct hypothesis. We formulate an active M-ary hypothesis testing problem, which includes sensor mobility, and solve it using a point-based approximate POMDP algorithm. The validity of our approach is verified through simulation and experiments with real scenes captured by a kinect sensor. The results suggest a significant improvement over static object detection.

We investigate adaptation under a reaching task with an acceleration-based force field perturbation designed to alter the nominal straight hand trajectory in a potentially benign manner:pushing the hand of course in one direction before subsequently restoring towards the target. In this particular task, an explicit strategy to reduce motor effort requires a distinct deviation from the nominal rectilinear hand trajectory. Rather, our results display a clear directional preference during learning, as subjects adapted perturbed curved trajectories towards their initial baselines. We model this behavior using the framework of stochastic optimal control theory and an objective function that trades-of the discordant requirements of 1) target accuracy, 2) motor effort, and 3) desired trajectory. Our work addresses the underlying objective of a reaching movement, and we suggest that robustness, particularly against internal model uncertainly, is as essential to the reaching task as terminal accuracy and energy effciency.

In IEEE International Conference on Robotics and Automation (ICRA), pages: 3547-3554, 2013 (inproceedings)

Abstract

In this work, we propose to reconstruct a complete 3-D model of an unknown object by fusion of visual and tactile information while the object is grasped. Assuming the object is symmetric, a first hypothesis of its complete 3-D shape is generated from a single view. This initial model is used to plan a grasp on the object which is then executed with a robotic manipulator equipped with tactile sensors. Given the detected contacts between the fingers and the object, the full object model including the symmetry parameters can be refined. This refined model will then allow the planning of more complex manipulation tasks. The main contribution of this work is an optimal estimation approach for the fusion of visual and tactile data applying the constraint of object symmetry. The fusion is formulated as a state estimation problem and solved with an iterative extended Kalman filter. The approach is validated experimentally using both artificial and real data from two different robotic platforms.

Nonlinear dynamical systems have been used in many disciplines to model complex behaviors, including biological motor control, robotics,
perception, economics, traffic prediction, and neuroscience. While often the unexpected emergent behavior of nonlinear systems is the focus of
investigations, it is of equal importance to create goal-directed behavior (e.g., stable locomotion from a system of coupled oscillators under perceptual guidance). Modeling goal-directed behavior with nonlinear systems is, however, rather difficult due to the parameter sensitivity of these systems, their complex phase transitions in response to subtle parameter changes, and the difficulty of analyzing and predicting their long-term behavior; intuition and time-consuming parameter tuning play a major role. This letter presents and reviews dynamical movement primitives, a line of research for modeling attractor behaviors of autonomous nonlinear dynamical systems with the help of statistical learning techniques. The essence of our approach is to start with a simple dynamical system, such as a set of linear differential equations, and transform those into a weakly nonlinear system with prescribed attractor dynamics by meansof a learnable autonomous forcing term. Both point attractors and limit cycle attractors of almost arbitrary complexity can be generated. We explain the design principle of our approach and evaluate its properties in several example applications in motor control and robotics.

We present an approach to learning objective functions for robotic manipulation based on inverse reinforcement learning. Our path integral inverse reinforcement learning algorithm can deal with high-dimensional continuous state-action spaces, and only requires local optimality of demonstrated trajectories. We use L 1 regularization in order to achieve feature selection, and propose an efficient algorithm to minimize the resulting convex objective function. We demonstrate our approach by applying it to two core problems in robotic manipulation. First, we learn a cost function for redundancy resolution in inverse kinematics. Second, we use our method to learn a cost function over trajectories, which is then used in optimization-based motion planning for grasping and manipulation tasks. Experimental results show that our method outperforms previous algorithms in high-dimensional settings.

The development of legged robots for complex environments requires controllers that guarantee both high tracking performance and compliance with the environment. More specifically the control of contact interaction with the environment is of crucial importance to ensure stable, robust and safe motions. In the following, we present an inverse dynamics controller that exploits torque redundancy to directly and explicitly minimize any combination of linear and quadratic costs in the contact constraints and in the commands. Such a result is particularly relevant for legged robots as it allows to use torque redundancy to directly optimize contact interactions. For example, given a desired locomotion behavior, it can guarantee the minimization of contact forces to reduce slipping on difficult terrains while ensuring high tracking performance of the desired motion. The proposed controller is very simple and computationally efficient, and most importantly it can greatly improve the performance of legged locomotion on difficult terrains as can be seen in the experimental results.

The International Journal of Robotics Research, 32(3):280-298, March 2013 (article)

Abstract

The development of legged robots for complex environments requires controllers that guarantee both high tracking performance and compliance with the environment. More specifically the control of the contact interaction with the environment is of crucial importance to ensure stable, robust and safe motions. In this contribution we develop an inverse-dynamics controller for floating-base robots under contact constraints that can minimize any combination of linear and quadratic costs in the contact constraints and the commands. Our main result is the exact analytical derivation of the controller. Such a result is particularly relevant for legged robots as it allows us to use torque redundancy to directly optimize contact interactions. For example, given a desired locomotion behavior, we can guarantee the minimization of contact forces to reduce slipping on difficult terrains while ensuring high tracking performance of the desired motion. The main advantages of the controller are its simplicity, computational efficiency and robustness to model inaccuracies. We present detailed experimental results on simulated humanoid and quadruped robots as well as a real quadruped robot. The experiments demonstrate that the controller can greatly improve the robustness of locomotion of the robots.1

Precise kinematic forward models are important for robots to successfully perform dexterous grasping and manipulation tasks, especially when visual servoing is rendered infeasible due to occlusions. A lot of research has been conducted to estimate geometric and non-geometric parameters of kinematic chains to minimize reconstruction errors. However, kinematic chains can include non-linearities, e.g. due to cable stretch and motor-side encoders, that result in significantly different errors for different parts of the state space. Previous work either does not consider such non-linearities or proposes to estimate non-geometric parameters of carefully engineered models that are robot specific. We propose a data-driven approach that learns task error models that account for such unmodeled non-linearities. We argue that in the context of grasping and manipulation, it is sufficient to achieve high accuracy in the task relevant state space. We identify this relevant state space using previously executed joint configurations and learn error corrections for those. Therefore, our system is developed to generate subsequent executions that are similar to previous ones. The experiments show that our method successfully captures the non-linearities in the head kinematic chain (due to a counterbalancing spring) and the arm kinematic chains (due to cable stretch) of the considered experimental platform, see Fig. 1. The feasibility of the presented error learning approach has also been evaluated in independent DARPA ARM-S testing contributing to successfully complete 67 out of 72 grasping and manipulation tasks.

2008

One of the most general frameworks for phrasing control problems for
complex, redundant robots is operational space control. However, while
this framework is of essential importance for robotics and well-understood
from an analytical point of view, it can be prohibitively hard to achieve
accurate control in face of modeling errors, which are inevitable in com-
plex robots, e.g., humanoid robots. In this paper, we suggest a learning
approach for opertional space control as a direct inverse model learning
problem. A ï¬rst important insight for this paper is that a physically cor-
rect solution to the inverse problem with redundant degrees-of-freedom
does exist when learning of the inverse map is performed in a suitable
piecewise linear way. The second crucial component for our work is based
on the insight that many operational space controllers can be understood
in terms of a constrained optimal control problem. The cost function as-
sociated with this optimal control problem allows us to formulate a learn-
ing algorithm that automatically synthesizes a globally consistent desired
resolution of redundancy while learning the operational space controller.
From the machine learning point of view, this learning problem corre-
sponds to a reinforcement learning problem that maximizes an immediate
reward. We employ an expectation-maximization policy search algorithm
in order to solve this problem. Evaluations on a three degrees of freedom
robot arm are used to illustrate the suggested approach. The applica-
tion to a physically realistic simulator of the anthropomorphic SARCOS
Master arm demonstrates feasibility for complex high degree-of-freedom
robots. We also show that the proposed method works in the setting of
learning resolved motion rate control on real, physical Mitsubishi PA-10
medical robotics arm.

Dexterous manipulation with a highly redundant movement system is one of the hallmarks of hu-
man motor skills. From numerous behavioral studies, there is strong evidence that humans employ
compliant task space control, i.e., they focus control only on task variables while keeping redundant
degrees-of-freedom as compliant as possible. This strategy is robust towards unknown disturbances
and simultaneously safe for the operator and the environment. The theory of operational space con-
trol in robotics aims to achieve similar performance properties. However, despite various compelling
theoretical lines of research, advanced operational space control is hardly found in actual robotics imple-
mentations, in particular new kinds of robots like humanoids and service robots, which would strongly
profit from compliant dexterous manipulation. To analyze the pros and cons of different approaches
to operational space control, this paper focuses on a theoretical and empirical evaluation of different
methods that have been suggested in the literature, but also some new variants of operational space
controllers. We address formulations at the velocity, acceleration and force levels. First, we formulate
all controllers in a common notational framework, including quaternion-based orientation control, and
discuss some of their theoretical properties. Second, we present experimental comparisons of these
approaches on a seven-degree-of-freedom anthropomorphic robot arm with several benchmark tasks.
As an aside, we also introduce a novel parameter estimation algorithm for rigid body dynamics, which
ensures physical consistency, as this issue was crucial for our successful robot implementations. Our
extensive empirical results demonstrate that one of the simplified acceleration-based approaches can
be advantageous in terms of task performance, ease of parameter tuning, and general robustness and
compliance in face of inevitable modeling errors.

In Abstracts of the Eighteenth Annual Meeting of Neural Control of Movement (NCM), Naples, Florida, April 29-May 4, 2008, clmc (inproceedings)

Abstract

Force field experiments have been a successful paradigm for studying the principles of planning, execution, and learning in human arm movements. Subjects have been shown to cope with the disturbances generated by force fields by learning internal models of the underlying dynamics to predict disturbance effects or by increasing arm impedance (via co-contraction) if a predictive approach becomes infeasible.
Several studies have addressed the issue uncertainty in force field learning. Scheidt et al. demonstrated that subjects exposed to a viscous force field of fixed structure but varying strength (randomly changing from trial to trial), learn to adapt to the mean disturbance, regardless of the statistical distribution. Takahashi et al. additionally show a decrease in strength of after-effects after learning in the randomly varying environment. Thus they suggest that the nervous system adopts a dual strategy: learning an internal model of the mean of the random environment, while simultaneously increasing arm impedance to minimize the consequence of errors.
In this study, we examine what role variance plays in the learning of uncertain force fields. We use a 7 degree-of-freedom exoskeleton robot as a manipulandum (Sarcos Master Arm, Sarcos, Inc.), and apply a 3D viscous force field of fixed structure and strength randomly selected from trial to trial. Additionally, in separate blocks of trials, we alter the variance of the randomly selected strength multiplier (while keeping a constant mean). In each block, after sufficient learning has occurred, we apply catch trials with no force field and measure the strength of after-effects.
As expected in higher variance cases, results show increasingly smaller levels of after-effects as the variance is increased, thus implying subjects choose the robust strategy of increasing arm impedance to cope with higher levels of uncertainty. Interestingly, however, subjects show an increase in after-effect strength with a small amount of variance as compared to the deterministic (zero variance) case. This result implies that a small amount of variability aides in internal model formation, presumably a consequence of the additional amount of exploration conducted in the workspace of the task.

Real-time control of the endeffector of a humanoid robot in external coordinates requires
computationally efficient solutions of the inverse kinematics problem. In this context, this
paper investigates methods of resolved motion rate control (RMRC) that employ optimization
criteria to resolve kinematic redundancies. In particular we focus on two established techniques,
the pseudo inverse with explicit optimization and the extended Jacobian method. We prove that
the extended Jacobian method includes pseudo-inverse methods as a special solution. In terms of
computational complexity, however, pseudo-inverse and extended Jacobian differ significantly in
favor of pseudo-inverse methods. Employing numerical estimation techniques, we introduce a
computationally efficient version of the extended Jacobian with performance comparable to the
original version. Our results are illustrated in simulation studies with a multiple degree-offreedom
robot, and were evaluated on an actual 30 degree-of-freedom full-body humanoid robot.

In Abstracts of the Eighteenth Annual Meeting of Neural Control of Movement (NCM), Naples, Florida, April 29-May 4, 2008, clmc (inproceedings)

Abstract

Reinforcement learning (RL) - learning solely based on reward or cost
feedback - is widespread in robotics control and has been also suggested
as computational model for human motor control. In human motor control,
however, hardly any experiment studied reinforcement learning. Here, we
study learning based on visual cost feedback in a reaching task and did
three experiments: (1) to establish a simple enough experiment for RL,
(2) to study spatial localization of RL, and (3) to study the dependence
of RL on the cost function.
In experiment (1), subjects sit in front of a drawing tablet and look at
a screen onto which the drawing pen's position is projected. Beginning
from a start point, their task is to move with the pen through a target
point presented on screen. Visual feedback about the pen's position is
given only before movement onset. At the end of a movement, subjects get
visual feedback only about the cost of this trial. We choose as cost the
squared distance between target and virtual pen position at the target
line. Above a threshold value, the cost was fixed at this value. In the
mapping of the pen's position onto the screen, we added a bias (unknown
to subject) and Gaussian noise. As result, subjects could learn the
bias, and thus, showed reinforcement learning.
In experiment (2), we randomly altered the target position between three
different locations (three different directions from start point: -45,
0, 45). For each direction, we chose a different bias. As result,
subjects learned all three bias values simultaneously. Thus, RL can be
spatially localized.
In experiment (3), we varied the sensitivity of the cost function by
multiplying the squared distance with a constant value C, while keeping
the same cut-off threshold. As in experiment (2), we had three target
locations. We assigned to each location a different C value (this
assignment was randomized between subjects). Since subjects learned the
three locations simultaneously, we could directly compare the effect of
the different cost functions. As result, we found an optimal C value; if
C was too small (insensitive cost), learning was slow; if C was too
large (narrow cost valley), the exploration time was longer and learning
delayed. Thus, reinforcement learning in human motor control appears to
be sen

In this paper we introduce an improved implementation of locally weighted projection regression
(LWPR), a supervised learning algorithm that is capable of handling high-dimensional input data.
As the key features, our code supports multi-threading, is available for multiple platforms, and
provides wrappers for several programming languages.

Stochastic optimal control is a framework for computing control commands that lead to an optimal behavior under a given cost. Despite the long history of optimal control in engineering, it has been only recently applied to describe human motion. So far, stochastic optimal control has been mainly used in tasks that are already learned, such as reaching to a target. For learning, however, there are only few cases where optimal control has been applied. The main assumptions of stochastic optimal control that restrict its application to tasks after learning are the a priori knowledge of (1) a quadratic cost function (2) a state space model that captures the kinematics and/or dynamics of musculoskeletal system and (3) a measurement equation that models the proprioceptive and/or exteroceptive feedback. Under these assumptions, a sequence of control gains is computed that is optimal with respect to the prespecified cost function.
In our work, we relax the assumption of the a priori known cost function and provide a computational framework for modeling tasks that involve learning. Typically, a cost function consists of two parts: one part that models the task constraints, like squared distance to goal at movement endpoint, and one part that integrates over the squared control commands. In learning a task, the first part of this cost function will be adapted. We use an expectation-maximization scheme for learning: the expectation step optimizes the task constraints through gradient descent of a reward function and the maximizing step optimizes the control commands.
Our computational model is tested and compared with data given from a behavioral experiment. In this experiment, subjects sit in front of a drawing tablet and look at a screen onto which the drawing-pen's position is projected. Beginning from a start point, their task is to move with the pen through a target point presented on screen. Visual feedback about the pen's position is given only before movement onset. At the end of a movement, subjects get visual feedback only about the cost of this trial. In the mapping of the pen's position onto the screen, we added a bias (unknown to subject) and Gaussian noise. Therefore the cost is a function of this bias. The subjects were asked to reach to the target and minimize this cost over trials.
In this behavioral experiment, subjects could learn the bias and thus showed reinforcement learning. With our computational model, we could model the learning process over trials. Particularly, the dependence on parameters of the reward function (Gaussian width) and the modulation of movement variance over time were similar in experiment and model.

Local linearizations are ubiquitous in the control of robotic systems. Analytical methods, if available, can be used to obtain the linearization, but in complex robotics systems where the the dynamics and kinematics are often not faithfully obtainable, empirical linearization may be preferable. In this case, it is important to only use data for the local linearization that lies within a ``reasonable'' linear regime of the system, which can be defined from the Hessian at the point of the linearization -- a quantity that is not available without an analytical model. We introduce a Bayesian approach to solve statistically what constitutes a ``reasonable'' local regime. We approach this problem in the context local linear regression. In contrast to previous locally linear methods, we avoid cross-validation or complex statistical hypothesis testing techniques to find the appropriate local regime. Instead, we treat the parameters of the local regime probabilistically and use approximate Bayesian inference for their estimation. This approach results in an analytical set of iterative update equations that are easily implemented on real robotics systems for real-time applications. As in other locally weighted regressions, our algorithm also lends itself to complete nonlinear function approximation for learning empirical internal models. We sketch the derivation of our Bayesian method and provide evaluations on synthetic data and actual robot data where the analytical linearization was known.

Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems