Efficient Exploration through Bayesian Deep Q-Networks

Kamyar Azizzadenesheli, Emma Brunskill, Animashree Anandkumar

Abstract:We propose Bayesian Deep Q-Network (BDQN), a practical Thompson sampling based Reinforcement Learning (RL) Algorithm. Thompson sampling allows for targeted exploration in high dimensions through posterior sampling but is usually computationally expensive. We address this limitation by introducing uncertainty only at the output layer of the network through a Bayesian Linear Regression (BLR) model, which can be trained with fast closed-form updates and its samples can be drawn efficiently through the Gaussian distribution. We apply our method to a wide range of Atari Arcade Learning Environments. Since BDQN carries out more efficient exploration, it is able to reach higher rewards substantially faster than a key baseline, DDQN.

TL;DR:Using Bayesian regression to estimate the posterior over Q-functions and deploy Thompson Sampling as a targeted exploration strategy with efficient trade-off the exploration and exploitation

OpenReview is created by the Information Extraction and Synthesis Laboratory, College of Information and Computer Science, University of Massachusetts Amherst. We gratefully acknowledge the support of the OpenReview sponsors: Google, Facebook, NSF, the University of Massachusetts Amherst Center for Data Science, and Center for Intelligent Information Retrieval, as well as the Google Cloud Platform for donating the computing and networking services on which OpenReview.net runs.

Send Feedback

Enter your feedback below and we'll get back to you as soon as possible.