Title:Learning Actionable Representations from Visual Observations

Abstract: In this work we explore a new approach for robots to teach themselves about
the world simply by observing it. In particular we investigate the
effectiveness of learning task-agnostic representations for continuous control
tasks. We extend Time-Contrastive Networks (TCN) that learn from visual
observations by embedding multiple frames jointly in the embedding space as
opposed to a single frame. We show that by doing so, we are now able to encode
both position and velocity attributes significantly more accurately. We test
the usefulness of this self-supervised approach in a reinforcement learning
setting. We show that the representations learned by agents observing
themselves take random actions, or other agents perform tasks successfully, can
enable the learning of continuous control policies using algorithms like
Proximal Policy Optimization (PPO) using only the learned embeddings as input.
We also demonstrate significant improvements on the real-world Pouring dataset
with a relative error reduction of 39.4% for motion attributes and 11.1% for
static attributes compared to the single-frame baseline. Video results are
available at this https URL .