Reinforcement Learning for Complex Tasks and Robot Navigation

Zero-Shot Sim-to-Real Transfer

Zero-shot Sim-to-Real Transfer with Modular Priors
Robert Lee,
Serena Mou,
Vibhavari Dasagi,
Jake Bruce,
Jürgen Leitner,
Niko Sünderhauf.
arXiv preprint,
2018.
Current end-to-end Reinforcement Learning (RL) approaches are severely limited by restrictively large search spaces and are prone to overfitting to their training environment. This is because in end-to-end RL perception, decision-making and low-level control are all being learned jointly from very sparse reward signals, with little capability of incorporating prior knowledge or existing algorithms. In this work, we propose a novel framework that effectively decouples RL for high-level decision making from low-level perception and control. This allows us to transfer a learned policy from a highly abstract simulation to a real robot without requiring any transfer learning. We therefore coin our approach zero-shot sim-to-real transfer. We successfully demonstrate our approach on the robot manipulation task of object sorting. A key component of our approach is a deep sets encoder that enables us to reinforcement learn the high-level policy based on the variable-length output of a pre-trained object detector, instead of learning from raw pixels. We show that this method can learn effective policies within mere minutes of highly simplified simulation. The learned policies can be directly deployed on a robot without further training, and generalize to variations of the task unseen during training.
[arXiv]

Learning to Navigate

Learning Deployable Navigation Policies at Kilometer Scale from a Single Traversal
Jake Bruce,
Niko Sünderhauf,
Piotr Mirowski,
Raia Hadsell,
Michael Milford.
In Proc. of Conference on Robot Learning (CoRL),
2018.
We present an approach for efficiently learning goal-directed navigation policies on a mobile robot, from only a single coverage traversal of recorded data. The navigation agent learns an effective policy over a diverse action space in a large heterogeneous environment consisting of more than 2km of travel, through buildings and outdoor regions that collectively exhibit large variations in visual appearance, self-similarity, and connectivity. We compare pretrained visual encoders that enable precomputation of visual embeddings to achieve a throughput of tens of thousands of transitions per second at training time on a commodity desktop computer, allowing agents to learn from millions of trajectories of experience in a matter of hours. We propose multiple forms of computationally efficient stochastic augmentation to enable the learned policy to generalise beyond these precomputed embeddings, and demonstrate successful deployment of the learned policy on the real robot without fine tuning, despite environmental appearance differences at test time.
[arXiv]
[website]

One-Shot Reinforcement Learning for Robot Navigation with Interactive Replay
Jacob Bruce,
Niko Sünderhauf,
Piotr Mirowski,
Raia Hadsell,
Michael Milford.
In Proc. of NIPS Workshop on Acting and Interacting in the Real World: Challenges in Robot Learning,
2017.
Recently, model-free reinforcement learning algorithms have been shown to solve challenging problems by learning from extensive interaction with the environment. A significant issue with transferring this success to the robotics domain is that interaction with the real world is costly, but training on limited experience is prone to overfitting. We present a method for learning to navigate, to a fixed goal and in a known environment, on a mobile robot. The robot leverages an interactive world model built from a single traversal of the environment, a pre-trained visual feature encoder, and stochastic environmental augmentation, to demonstrate successful zero-shot transfer under real-world environmental variations without fine-tuning.

Multimodal Deep Autoencoders for Control of a Mobile Robot
James Sergeant,
Niko Sünderhauf,
Michael Milford,
Ben Upcroft.
In Proceedings of the Australasian Conference on Robotics and Automation (ACRA),
2015.
Robot navigation systems are typically engineered
to suit certain platforms, sensing suites
and environment types. In order to deploy a
robot in an environment where its existing navigation
system is insufficient, the system must
be modified manually, often at significant cost.
In this paper we address this problem, proposing
a system based on multimodal deep autoencoders
that enables a robot to learn how
to navigate by observing a dataset of sensor input
and motor commands collected while being
teleoperated by a human. Low-level features
and cross modal correlations are learned and
used in initialising two different architectures
with three operating modes. During operation,
these systems exploit the learned correlations
in generating suitable control signals based only
on the sensor information.