Bellemare et al. (2016) introduced the notion of a pseudo-count to generalize
count-based exploration to non-tabular reinforcement learning. This
pseudo-count is derived from a density model which effectively replaces the
count table used in the tabular setting. Using an exploration bonus based on
this pseudo-count and a mixed Monte Carlo update applied to a DQN agent was
sufficient to achieve state-of-the-art on the Atari 2600 game Montezuma's
Revenge.
In this paper we consider two questions left open by their work: First, how
important is the quality of the density model for exploration? Second, what
role does the Monte Carlo update play in exploration? We answer the first
question by demonstrating the use of PixelCNN, an advanced neural density model
for images, to supply a pseudo-count. In particular, we examine the intrinsic
difficulties in adapting Bellemare et al's approach when assumptions about the
model are violated. The result is a more practical and general algorithm
requiring no special apparatus. We combine PixelCNN pseudo-counts with
different agent architectures to dramatically improve the state of the art on
several hard Atari games. One surprising finding is that the mixed Monte Carlo
update is a powerful facilitator of exploration in the sparsest of settings,
including Montezuma's Revenge.

Captured tweets and retweets: 1

Made with a human heart + one part enriched uranium + four parts unicorn blood