Reinforcement learning (RL) was originally proposed as a framework to
allow agents to learn in an online fashion as they interact with their
environment. Existing RL algorithms come short of achieving this goal
because the amount of exploration required is often too costly and/or too
time consuming for online learning. As a result, RL is mostly used for
offline learning in simulated environments. We propose a new algorithm,
called BEETLE, for effective online learning that is computationally
efficient while minimizing the amount of exploration. We take a Bayesian
model-based approach, framing RL as a partially observable Markov decision
process. Our two main contributions are the analytical derivation that
the optimal value function is the upper envelope of a set of multivariate
polynomials, and an efficient point-based value iteration algorithm that
exploits this simple parameterization.

This is joint work with Nikos Vlassis (U of Amsterdam), Jesse Hoey (U of
Toronto) and Kevin Regan (U of Waterloo).