2003

NIPS Workshop "Planning for the Real World: The promises and challenges of dealing with uncertainty", December 2003 (talk)

Abstract

Learning control and planning in high dimensional continuous state-action systems, e.g., as needed in a humanoid robot, has so far been a domain beyond the applicability of generic planning techniques like reinforcement learning and dynamic programming. This talk describes an approach we have taken in order to enable complex robotics systems to learn to accomplish control tasks. Adaptive learning controllers equipped with statistical learning techniques can be used to learn tracking controllers -- missing state information and uncertainty in the state estimates are usually addressed by observers or direct adaptive control methods. Imitation learning is used as an ingredient to seed initial control policies whose output is a desired trajectory suitable to accomplish the task at hand. Reinforcement learning with stochastic policy gradients using a natural gradient forms the third component that allows refining the initial control policy until the task is accomplished. In comparison to general learning control, this approach is highly prestructured and thus more domain specific. However, it seems to be a theoretically clean and feasible strategy for control systems of the complexity that we need to address.

Many forms of recurrent neural networks can be understood in terms of
dynamic systems theory of difference equations or differential
equations. Learning in such systems corresponds to adjusting some
internal parameters to obtain a desired time evolution of the network,
which can usually be characterized in term of point attractor dynamics,
limit cycle dynamics, or, in some more rare cases, as strange attractor
or chaotic dynamics. Finding a stable learning process to adjust the
open parameters of the network towards shaping the desired attractor
type and basin of attraction has remain a complex task, as the
parameter trajectories during learning can lead the system through a
variety of undesirable unstable behaviors, such that learning may never
succeed.
In this presentation, we review a recently developed learning framework
for a class of recurrent neural networks that employs a more structured
network approach. We assume that the canonical system behavior is known
a priori, e.g., it is a point attractor or a limit cycle. With either
supervised learning or reinforcement learning, it is possible to
acquire the transformation from a simple representative of this
canonical behavior (e.g., a 2nd order linear point attractor, or a
simple limit cycle oscillator) to the desired highly complex attractor
form. For supervised learning, one shot learning based on locally
weighted regression techniques is possible. For reinforcement learning,
stochastic policy gradient techniques can be employed. In any case, the
recurrent network learned by these methods inherits the stability
properties of the simple dynamic system that underlies the nonlinear
transformation, such that stability of the learning approach is not a
problem. We demonstrate the success of this approach for learning
various skills on a humanoid robot, including tasks that require to
incorporate additional sensory signals as coupling terms to modify the
recurrent network evolution on-line.

Using robots as models of cognitive behaviour has a long tradition in robotics. Parallel to the historical development in cognitive science, one observes two major, subsequent waves in cognitive robotics. The first is based on ideas of classical, cognitivist Artificial Intelligence (AI). According to the AI view of cognition as rule-based symbol manipulation, these robots typically try to extract symbolic descriptions of the environment from their sensors that are used to update a common, global world representation from which, in turn, the next action of the robot is derived. The AI approach has been successful in strongly restricted and controlled environments requiring well-defined tasks, e.g. in industrial assembly lines.
AI-based robots mostly failed, however, in the unpredictable and unstructured environments that have to be faced by mobile robots. This has provoked the second wave in cognitive robotics which tries to achieve cognitive behaviour as an emergent property from the interaction of simple, low-level modules. Robots of the second wave are called animats as their architecture is designed to closely model aspects of real animals. Using only simple reactive mechanisms and Hebbian-type or evolutionary learning, the resulting animats often outperformed the highly complex AI-based robots in tasks such as obstacle avoidance, corridor following etc.
While successful in generating robust, insect-like behaviour, typical animats are limited to stereotyped, fixed stimulus-response associations. If one adopts the view that cognition requires a flexible, goal-dependent choice of behaviours and planning capabilities (H.A. Mallot, Kognitionswissenschaft, 1999, 40-48) then it appears that cognitive behaviour cannot emerge from a collection of purely reactive modules. It rather requires environmentally decoupled structures that work without directly engaging the actions that it is concerned with. This poses the current challenge to cognitive robotics: How can we build cognitive robots that show the robustness and the learning capabilities of animats without falling back into the representational paradigm of AI?
The speakers of the symposium present their approaches to this question in the context of robot navigation and sensorimotor learning. In the first talk, Prof. Helge Ritter introduces a robot system for imitation learning capable of exploring various alternatives in simulation before actually performing a task. The second speaker, Angelo Arleo, develops a model of spatial memory in rat navigation based on his electrophysiological experiments. He validates the model on a mobile robot which, in some navigation tasks, shows a performance comparable to that of the real rat. A similar model of spatial memory is used to investigate the mechanisms of territory formation in a series of robot experiments presented by Prof. Hanspeter Mallot. In the last talk, we return to the domain of sensorimotor learning where Ralf M{\"o}ller introduces his approach to generate anticipatory behaviour by learning forward models of sensorimotor relationships.

2002

2002

Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems