Other sites

Rejection Sampling

An interesting sampling method that was covered briefly in my Bayesian statistics course was rejection sampling. Since I have nothing better to do, I thought it would be fun to make an acceptance-rejection algorithm using R. FUN!

The Rejection Sampling method is usually used to simulate data from an unknown distribution. To do this one samples from a distribution that covers the suport of the unknown distribution and use certain criteria for accepting/rejecting the sampled values. One way to do this is as follows (Rice, p 92).

Where M(x) is a function such that M(x) ≥ ƒ(x) on [a,b].To keep things simple for myself I will be simulating a Beta distribution with parameters 6 and 3 (ƒ). To do this I will sample T’s from a scaled uniform distribution (M), and reject sampled values where M(T)*U ≥ ƒ(T). In a plot of the beta distribution with parameters 6 and 3 we can see that the ƒ(x) never goes above 3. For this reason I chose to scale the uniform distribution M by multiplying it by 3.Here is the R code to implement rejection sampling for 100,000 observations in this example.sample.x = runif(100000,0,1)accept = c()

We can plot the results along with the true distribution with the following code.hist(T[,1][T$accept=='Yes'], breaks = seq(0,1,0.01), freq = FALSE, main = 'Histogram of X', xlab = 'X')lines(x, dbeta(x,6,3))

With 100,000 observations sampled, the data fit very well.We can look at the densities of both the accepted and rejected values to get an idea of what’s going on.library(ggplot2)print(qplot(sample.x, data = T, geom = 'density', color = accept))

Looking at a stacked histogram of all the sampled values together we can really see how much wasted data there are in this example.print(qplot(sample.x, data = T, geom = 'histogram', fill = accept, binwidth=0.01))

In fact, when I ran this example I got 33,114 accepted values and 66,886 rejected values. I probably could have chosen a better value than 3 to scale the uniform distribution, but ideally rejection sampling uses a known distribution that is only slightly different from the unknown distribution we’re trying to estimate.