Slice sampling

More by Radford M. Neal

Abstract

Markov chain sampling methods that adapt to characteristics of the distribution being sampled can be constructed using the principle that one can ample from a distribution by sampling uniformly from the region under
the plot of its density function. A Markov chain that converges to this uniform distribution can be constructed by alternating uniform sampling in the vertical direction with uniform sampling from the horizontal "slice" defined by the current vertical position, or more generally, with some update that leaves the uniform distribution over this slice invariant. Such "slice sampling" methods are easily implemented for univariate distributions, and can be used to sample from a multivariate distribution by updating each variable in turn. This approach is often easier to implement than Gibbs sampling and more efficient than simple Metropolis updates, due to the ability of slice sampling to adaptively choose the magnitude of changes made. It is therefore attractive for routine and automated use. Slice sampling methods that update all variables simultaneously are also possible. These methods can adaptively choose the magnitudes of changes made to each variable, based on the local properties of the density function. More ambitiously, such methods could potentially adapt to the dependencies between variables by constructing local quadratic approximations. Another approach is to improve sampling efficiency by suppressing random walks. This can be done for univariate slice sampling by "overrelaxation," and for multivariate slice sampling by "reflection" from the edges of the slice.

and Schmeiser (1998). For single-variable slice sampling, the variation of slice sampling proposed by Neal operates analogously to Gibbs sampling in the sense that to obtain the next point x1, y is generated from the conditional distribution [y|x0] given the current point x0 and then x1 is drawn from [x|y]. Both [y|x0] and [x|y] are uniform distributions. Since the closed form of the support of [x|y] is not available, sampling directly from [x|y] is not possible. A clever development is Neal's sophisticated (but relatively expensive) sampling procedure to generate x1 from the "slice" S = x : y < f (x). In Chen and Schmeiser (1998), we proposed random-direction interior point (RDIP), a general sampler designed to be "black box" in the sense that the user need not tune the sampler to the problem. RDIP samples from the uniform distribution defined over the region U below the curve of the surface defined by f (x). Both slice sampling and RDIP require evaluations of f (x). Slice sampling, however, can be more expensive than RDIP because slice sampling requires evaluating f (x) more than once per iteration. The intention of RDIP's design is to use as much free information as possible. For the high-dimensional case, the hy perrectangle idea in slice sampling could be inefficient. For example, suppose f (x) is the bivariate normal density with a high correlation. Then, the hy perrectangle idea essentially mimics the Gibbs sampler, which suffers slow convergence; see Chen and Schmeiser (1993) for a detailed discussion. Aligning the hy perrectangle (or ellipses) to the shape of f (x), along the lines of Kaufman and Smith (1998), seems like a good idea. As Neal mentions, the computational efficiency of our "black-box" sampler RDIP depends on the normalization constant. Our goal was to be automatic and reasonably efficient, rather than to tune the sampler to the problem. If, however,

Rosenthal (1999). Generalizing just a little from the setting described in Section 4 of the paper, suppose that our target density can be written as

the methodology of Roberts and Tweedie (2000). As an illustration of these results, it can be shown that, for the case where f0 is constant (i.e., in the single-variable slice sampler), and f1 is a real-valued log-concave function, 525 iterations suffice for convergence from all starting points x with f1(x) 0.01 supy f1(y). Similar results can be deduced for multidimensional log-concave distributions but the bounds worsen as dimension increases reflecting a genuine curse of dimensionality in this problem (despite the fact that this is inherently a twodimensional Gibbs sampler). To counteract this issue, Roberts and Rosenthal (2002) introduces the polar slice sampler in d dimensions, where f0(x) is chosen

can be constructed in the spirit of Kendall and Møller (2000). Reversibility of the slice sampler allows easy simulation of these processes backwards in time to identify the starting point of the maximal and minimal chains. The beauty of the perfect slice sampling construction relies on the possibility of coupling the maximal and minimal chains even on a continuous state space. This is achieved because, thanks to monotonicity, the minimal horizontal slice (i.e., the one defined by the minimal chain) is alway s a superset of the maximal horizontal slice. If, when sampling over the minimal horizontal slice, a point is selected that belongs to the intersection of the minimal and the maximal horizontal slices, instantaneous coupling happens. Examples of applications given in Mira, Møller and Roberts (2001) include the Ising model on a two-dimensional grid at the critical temperature and various other automodels. In Casella, Mengersen, Robert and Titterington (2002) a further application of the perfect slice sampler construction to mixture of distributions is studied.

to Fill (1998). A further improvement of the algorithm in Mira, Møller and Roberts (2001) allows the use of read once random numbers as introduced in

which appear in Damien, Wakefield and Walker (1999). I took Neal's example in Section 8 and used a many-variable slice sampler on it. In fact I took a latent variable for each data point, the idea criticized by Neal. The overall joint density is given by