Stochastic optimal control with learned dynamics models

Download

Date

Author

Metadata

Abstract

The motor control of anthropomorphic robotic systems is a challenging computational
task mainly because of the high levels of redundancies such systems exhibit. Optimality
principles provide a general strategy to resolve such redundancies in a task driven
fashion. In particular closed loop optimisation, i.e., optimal feedback control (OFC),
has served as a successful motor control model as it unifies important concepts such
as costs, noise, sensory feedback and internal models into a coherent mathematical
framework.
Realising OFC on realistic anthropomorphic systems however is non-trivial: Firstly,
such systems have typically large dimensionality and nonlinear dynamics, in which
case the optimisation problem becomes computationally intractable. Approximative
methods, like the iterative linear quadratic gaussian (ILQG), have been proposed to
avoid this, however the transfer of solutions from idealised simulations to real hardware
systems has proved to be challenging. Secondly, OFC relies on an accurate description
of the system dynamics, which for many realistic control systems may be unknown,
difficult to estimate, or subject to frequent systematic changes. Thirdly, many (especially
biologically inspired) systems suffer from significant state or control dependent
sources of noise, which are difficult to model in a generally valid fashion. This thesis
addresses these issues with the aim to realise efficient OFC for anthropomorphic
manipulators.
First we investigate the implementation of OFC laws on anthropomorphic hardware.
Using ILQG we optimally control a high-dimensional anthropomorphic manipulator
without having to specify an explicit inverse kinematics, inverse dynamics
or feedback control law. We achieve this by introducing a novel cost function that
accounts for the physical constraints of the robot and a dynamics formulation that resolves
discontinuities in the dynamics. The experimental hardware results reveal the
benefits of OFC over traditional (open loop) optimal controllers in terms of energy
efficiency and compliance, properties that are crucial for the control of modern anthropomorphic
manipulators.
We then propose a new framework of OFC with learned dynamics (OFC-LD) that,
unlike classic approaches, does not rely on analytic dynamics functions but rather updates
the internal dynamics model continuously from sensorimotor plant feedback. We
demonstrate how this approach can compensate for unknown dynamics and for complex
dynamic perturbations in an online fashion.
A specific advantage of a learned dynamics model is that it contains the stochastic
information (i.e., noise) from the plant data, which corresponds to the uncertainty in
the system. Consequently one can exploit this information within OFC-LD in order
to produce control laws that minimise the uncertainty in the system. In the domain of
antagonistically actuated systems this approach leads to improved motor performance,
which is achieved by co-contracting antagonistic actuators in order to reduce the negative
effects of the noise. Most importantly the shape and source of the noise is unknown
a priory and is solely learned from plant data. The model is successfully tested on an
antagonistic series elastic actuator (SEA) that we have built for this purpose.
The proposed OFC-LD model is not only applicable to robotic systems but also
proves to be very useful in the modelling of biological motor control phenomena and
we show how our model can be used to predict a wide range of human impedance
control patterns during both, stationary and adaptation tasks.