Ryan Adams - Harvard

Starts:

4:00 pm on Thursday, April 17, 2014

Ends:

5:00 pm on Thursday, April 17, 2014

Location:

MCS 148

Title: Accelerating Exact MCMC with Subsets of Data. Abstract: One of the challenges of building statistical models for large data
sets is balancing the correctness of inference procedures against
computational realities. In the context of Bayesian procedures, the
pain of such computations has been particularly acute as it has
appeared that algorithms such as Markov chain Monte Carlo necessarily
need to touch all of the data at each iteration in order to arrive at
a correct answer. Several recent proposals have been made to use
subsets (or "minibatches") of data to perform MCMC in ways analogous
to stochastic gradient descent. Unfortunately, these proposals have
only provided approximations, although in some cases it has been
possible to bound the error of the resulting stationary distribution.
In this talk I will discuss two new, complementary algorithms for
using subsets of data to perform faster MCMC. In both cases, these
procedures yield stationary distributions that are exactly the desired
target posterior distribution. The first of these, "Firefly Monte
Carlo", is an auxiliary variable method that uses randomized subsets
of data to achieve valid transition operators, with connections to
recent developments in pseudo-marginal MCMC. The second approach I
will discuss, parallel predictive prefetching, uses subsets of data to
parallelize Markov chain Monte Carlo across multiple cores, while
still leaving the target distribution intact. These methods have both
yielded significant gains in wallclock performance in sampling from
posterior distributions with millions of data.