Advances in Markov chain Monte Carlo methods

Abstract

Probability distributions over many variables occur frequently in Bayesian inference, statistical physics and simulation studies. Samples from distributions give insight into their typical behavior and can allow approximation of any quantity of interest, such as expectations or normalizing constants. Markov chain Monte Carlo (MCMC), introduced by Metropolis et al. (1953), allows sampling from distributions with intractable normalization, and remains one of most important tools for approximate computation with probability distributions. While not needed by MCMC, normalizers are key quantities: in Bayesian statistics marginal likelihoods are needed for model comparison; in statistical physics many physical quantities relate to the partition function. In this thesis we propose and investigate several new Monte Carlo algorithms, both for evaluating normalizing constants and for improved sampling of distributions. Many MCMC correctness proofs rely on using reversible transition operators; often these operators lead to slow diffusive motion resembling a random walk. After reviewing existing MCMC algorithms, we develop a new framework for constructing non-reversible transition operators that allow more persistent motion. Next we explore and extend MCMC-based algorithms for computing normalizing constants. We compare annealing, multicanonical and nested sampling, giving recommendations for their use. We also develop a new MCMC operator and Nested Sampling approach for the Potts model. This demonstrates that nested sampling is sometimes better than annealing methods at computing normalizing constants and drawing posterior samples. Finally we consider “doubly-intractable” distributions with extra unknown normalizer terms that do not cancel in standard MCMC algorithms. We propose using several deterministic approximations for the unknown terms, and investigate their interaction with sampling algorithms. We then develop novel exact-sampling-based MCMC methods, the Exchange Algorithm and Latent Histories. For the first time these algorithms do not require separate approximation before sampling begins. Moreover, the Exchange Algorithm outperforms the only alternative sampling algorithm for doubly intractable distributions.