Teaching machines to autonomously explore their environment and to learn new skills on their own is one of the big promises of machine learning, artificial intelligence and in particular reinforcement learning. In reality, autonomous behavior of current robotics systems is still painfully limited to very specific tasks in mostly controlled environments or at least environments which can be easily simulated to obtain vast amounts of data.

In his PhD work, Andreas Doerr is addressing the problems encountered when applying reinforcement learning techniques to real world systems, where data-efficient learning in the presences of noisy and incomplete measurements (POMDP) of the system’s true underlying state is required. One particular line of research is concerned with improving probabilistic, model-based reinforcement learning methods [ ] and tailoring the model learning techniques to the subsequent task of policy search [ ].

Tuning and designing robotic behavior by combining elementary objective terms is a tedious task which generally consists of finding proper representations for each new skill. Inverse Optimal Control (IOC) allows, by specifying a set of basis functions (or features), to learn the right association of objective terms defining a policy...

Model-based Reinforcement Learning (RL) algorithms make efficient use of the observed system interaction data by constructing a model of the underlying dynamics. Data-efficiency has been shown to greatly improve over model-free (e.g. policy gradient) or value function based methods. At the s...

Proportional, Integral and Derivative (PID) control architectures cover a significant portion of today’s industrial control applications. The PID control law for a Single-Input Single-Output (SISO) system is given by

2018

State-space models (SSMs) are a highly expressive model class for learning patterns in time series data and for system identification. Deterministic versions of SSMs (e.g., LSTMs) proved extremely successful in modeling complex time-series data. Fully probabilistic SSMs, however, unfortunately often prove hard to train, even for smaller problems. To overcome this limitation, we propose a scalable initialization and training algorithm based on doubly stochastic variational inference and Gaussian processes. In the variational approximation we propose in contrast to related approaches to fully capture the latent state temporal correlations to allow for robust training.

We propose a novel long-term optimization criterion to improve the robustness of model-based reinforcement learning in real-world scenarios. Learning a dynamics model to derive a solution promises much greater data-efficiency and reusability compared to model-free alternatives. In practice, however, modelbased RL suffers from various imperfections such as noisy input and output data, delays and unmeasured (latent) states. To achieve higher resilience against such effects, we propose to optimize a generative long-term prediction model directly with respect to the likelihood of observed trajectories as opposed to the common approach of optimizing a dynamics model for one-step-ahead predictions. We evaluate the proposed method on several artificial and real-world benchmark problems and compare it to PILCO, a model-based RL framework, in experiments on a manipulation robot. The results show that the proposed method is competitive compared to state-of-the-art model learning methods. In contrast to these more involved models, our model can directly be employed for policy search and outperforms a baseline method in the robot experiment.

Inverse Optimal Control (IOC) has strongly impacted the systems engineering process, enabling automated planner tuning through straightforward and intuitive demonstration. The most successful and established applications, though, have been in lower dimensional problems such as navigation planning where exact optimal planning or control is feasible. In higher dimensional systems, such as humanoid robots, research has made substantial progress toward generalizing the ideas to model free or locally optimal settings, but these systems are complicated to the point where demonstration itself can be difficult. Typically, real-world applications are restricted to at best noisy or even partial or incomplete demonstrations that prove cumbersome in existing frameworks. This work derives a very flexible method of IOC based on a form of Structured Prediction known as Direct Loss Minimization. The resulting algorithm is essentially Policy Search on a reward function that rewards similarity to demonstrated behavior (using Covariance Matrix Adaptation (CMA) in our experiments). Our framework blurs the distinction between IOC, other forms of Imitation Learning, and Reinforcement Learning, enabling us to derive simple, versatile, and practical algorithms that blend imitation and reinforcement signals into a unified framework. Our experiments analyze various aspects of its performance and demonstrate its efficacy on conveying preferences for motion shaping and combined reach and grasp quality optimization.

Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems