Arxiv on Feb. 21th

Unlike humans, who are capable of continual learning over their lifetimes,
artificial neural networks have long been known to suffer from a phenomenon
known as catastrophic forgetting, whereby new learning can lead to abrupt
erasure of previously acquired knowledge. Whereas in a neural network the
parameters are typically modelled as scalar values, an individual synapse in
the brain comprises a complex network of interacting biochemical components
that evolve at different timescales. In this paper, we show that by equipping
tabular and deep reinforcement learning agents with a synaptic model that
incorporates this biological complexity (Benna & Fusi, 2016), catastrophic
forgetting can be mitigated at multiple timescales. In particular, we find that
as well as enabling continual learning across sequential training of two simple
tasks, it can also be used to overcome within-task forgetting by reducing the
need for an experience replay database.
( https://arxiv.org/abs/1802.07239 , 1794kb)

Exploration is a fundamental challenge in reinforcement learning (RL). Many
of the current exploration methods for deep RL use task-agnostic objectives,
such as information gain or bonuses based on state visitation. However, many
practical applications of RL involve learning more than a single task, and
prior tasks can be used to inform how exploration should be performed in new
tasks. In this work, we explore how prior tasks can inform an agent about how
to explore effectively in new situations. We introduce a novel gradient-based
fast adaptation algorithm — model agnostic exploration with structured noise
(MAESN) — to learn exploration strategies from prior experience. The prior
experience is used both to initialize a policy and to acquire a latent
exploration space that can inject structured stochasticity into a policy,
producing exploration strategies that are informed by prior knowledge and are
more effective than random action-space noise. We show that MAESN is more
effective at learning exploration strategies when compared to prior meta-RL
methods, RL without learned exploration strategies, and task-agnostic
exploration methods. We evaluate our method on a variety of simulated tasks:
locomotion with a wheeled robot, locomotion with a quadrupedal walker, and
object manipulation.
( https://arxiv.org/abs/1802.07245 , 6738kb)