Invited Talk abstracts

We report here on our practical experience when combining Particle MCMC with Adaptive Monte-Carlo in nonlinear mixed-effect models for longitudinal studies. Our motivating application is a population pharmacokinetic/pharmacodynamic study for glucose/insulin regulation, but the very same specific conditional independence structure is used way beyond pharmacology: forestry, dairy sciences, waste-water treatment… We show how to introduce Adaptive Monte-Carlo to improve the mixing in these intricate models (strongly non-linear dynamics). We exploit the conditional independence structure to reduce considerably the dimension of the parameter space on which the proposal covariance matrix is learned, hence further augmenting the mixing. We also show how this allows for exact proposal on some of the hyper-parameters, alternating Particle Metropolis Hastings and Gibbs sampler. We also warn against a very simple-to-overlook mistake easy to make when using Adaptive MC on a constrained domain (e.g. positivity). Finally, we will study which parts of the inference can be parallelized, especially on a GPU. We outline some practical caveats in terms of random number generation and of ease of implementation.

Joint work with Arnaud Doucet and Gareth W. Peters.

Natively probabilistic computation: principles and applications

Vikash Mansinghka, Navia Systems

Complex probabilistic models and Bayesian inference are becoming increasingly critical across science and industry, especially in large-scale data analysis. They are also central to our best computational accounts of human cognition, perception and action. However, all these efforts struggle with the infamous curse of dimensionality. Rich probabilistic models can seem hard to write and even harder to solve, as specifying and calculating probabilities often appears to require the manipulation of exponentially (and sometimes infinitely) large tables of numbers.

We argue that these difficulties reflect a basic mismatch between the needs of probabilistic reasoning and the deterministic, functional orientation of our current hardware, programming languages and CS theory. To mitigate these issues, we have been developing a stack of abstractions for natively probabilistic computation, based around stochastic simulators (or samplers) for distributions (especially posterior simulation), rather than evaluators for deterministic functions. Ultimately, our aim is to build and deploy machines for which Bayesian inference by stochastic simulation is as natural as logic and arithmetic are for current microprocessors.

In this talk, I will briefly describe two systems we have built using these new tools:

1. Veritable, a probabilistic table, based on a novel nonparametric Bayesian
probabilistic program and fast stochastic inference.

I will also discuss the challenges involved in analyzing the computational complexity of probabilistic programs and in reliably synthesizing probably tractable programs. I will touch on both new theoretical results and some reinterpretations of recent theory that is, to the best of our knowledge, new to the Bayesian statistics community.

Gaussian processes and compactly supported correlation functions

Derek Bingham, Simon Fraser University

Computer simulators are often used to study real-world processes that are too difficult to observe directly. Experimenter's are often interested in building a statistical surrogate for the computer model to avoid constantly running the simulator at different input settings. Building an emulator for a computer simulator using standard Gaussian process models can be computationally infeasible when the number of evaluated input values is large. As an alternative, we propose using compactly supported correlation functions, which produce sparse correlation matrices that can be more easily manipulated. Following the usual approach of taking the correlation to be a product of correlations in each input dimension, we show how to impose restrictions on the correlation range for each input, giving sparsity, while also allowing the ranges to trade-off against one another, thereby giving good predictive performance when the data are anisotropic. Issues related to Bayesian inference and implementation of MCMC for large data-sets will be discussed. As an illustration, the method is used to construct an emulator of photometric red-shifts of cosmological objects.

This is joint work with Salman Habib, Katrin Heitman (Los Alamos National Lab)
and Cari Kaufman (UC Berkeley).

MCMC for Bayesian Nonparametric Models: A Bag of Tricks

Markov chain Monte Carlo (MCMC) sampling forms the most popular class of inference methods for Bayesian nonparametric models based on Dirichlet processes, Indian buffet processes etc. And there are good reasons why: there is a huge variety of very powerful techniques, they can be used together, they often do not require solving complex equations and integrals, and they are guaranteed to converge to the distribution of interest. In this talk, I will describe the bag of tricks that have helped me design MCMC samplers, and discuss some open questions related to inference in Bayesian nonparametric models.

Distributed Gibbs sampling for Topic Models and Bayesian Networks

The exponential growth of data has prompted the need for scalable machine learning. We have studied the possibility of performing efficient Gibbs sampling in the face of very large datasets. To achieve that goal we have explored two orthogonal ideas: 1) distributed Gibbs sampling – both synchronous and asynchronous, 2) adaptive Gibbs updates that avoid the need to look at every possible assignment. These techniques are tested on topic models and more general Bayesian networks, both parametric and non-parametric. We show perplexity and speedup results on a number of datasets: NIPS, NEWSGROUPS, NYT, WIKIPEDIA and MEDLINE, the latter containing over 700M tokens. The main conclusions are that 1) distributed inference, while no longer exact, has no noticeable negative effect on perplexity relative to exact Gibbs sampling but may lead to considerable speedups – up to 700 times if we distribute over 1000 processors, 2) both speedup techniques can be easily combined to lead to further speedups, 3) the proposed techniques seem applicable to a wide range of Bayesian network models.

Joint work with Arthur Acuncion, David Newman and Padhriac Smyth.

MCMC in Probabilistic Databases and Cluster Computing

Andrew McCallum, UMass Amherst

Incorporating uncertainty and probabilistic inference into large-scale databases has posed many challenges, often leading to systems that sacrifice modeling power, scalability, or restrict the class of supported queries. We aim to achieve a good balance among these considerations with an approach in which the underlying relational database represents a single possible world, an imperatively-defined factor graph [1] encodes a distribution over the set of possible worlds, and Markov chain Monte Carlo inference is used for inference [2]. I will describe experiments that operate on hundreds of thousands of entities and relations from both FreeBase and five years of NYTimes articles. I will also discuss methods of distributed MCMC inference on the problem of entity disambiguation for 25k NYTimes person mentions using a cluster of 50 machines.