Learn more about using open source R for big data analysis, predictive modeling, data science and more from the staff of Revolution Analytics.

November 21, 2013

R and Bayesian Statistics

by Joseph Rickert

Drew Linzer, the Bayesian statistician who attracted considerable attention last year with his spot-on, R-based forecast of the 2012 presidential election, recently gave a tutorial on Bayesian statistics to the Bay Area useR Group (BARUG). Drew covered quite a bit of ground running R code that showed how to make use of WinBugs, JAGS and Stan, the major engines for specifying and solving Bayesian models. With this very helpful introduction to Bayesian thinking fresh in my mind I thought that it would be worth a look at how well R and computational Bayesian statistics are getting along. After all, they practically grew up together. The modern era of Bayesian computation began with the “rediscovery” of MCMC algorithms in the early 1990s, and R became an open source project in 1995. The logical place to start, the CRAN Task View for Bayesian Inference, states that:

Applied researchers interested in Bayesian statistics are increasingly attracted to R because of the ease of which one can code algorithms to sample from posterior distributions as well as the significant number of packages contributed to the Comprehensive R Archive Network (CRAN) that provide tools for Bayesian inference.

My bet is that the reason that Bayesian task view lists 7 packages for general model fitting and over 80 packages for for specific Bayesian models and methods is because there are quite a few Bayesian statisticians are working in R with only a relatively small number are submitting packages to CRAN.

However, even with all of this custom Bayesian coding it is probably safe to say that in the much of the computational heavy lifting for Bayesian analysis is accomplished through specialized Bayesian engines for which Drew provided examples.

WinBUGS is a version of the original BUGS (Bayesian Inference Using Gibbs Sampling) that began as a project MRC Biostatistics Unit, Cambridge in 1989. WinBUGS is free, stable software that may be accessed via the R2WinBUGS R package. Although WinBUGS still sees quite a bit of use, all of the BUGS development these days is centered around the open source (GPL2) OpenBUGS software that runs on Unix and Linux in addition to Windows. The R2OpenBUGS package provides access to OpenBugs in a similar way that R2WinBUGS interfaces to WinBUGS. The BRugs package provides tighter integration to OpenBUGS. Its interactive interface allows for completing some housekeeping tasks such as checking convergence without having to exit the BUGS software.

JAGS (Just Another Gibbs Sampler) is a Bayesian engine that was designed to work with R. It is open source (GPL 2) runs on multiple operating systems, is extensible and allows users to write their own functions. The language JAGS uses to specify Bayesian models is a variation of the basic BUGS language. R users access JAGS through the rjags package and may used the coda package to analyze the MCMC results.

The newest Bayesian engine, Stan (named after Stanislaw Ulam of Monte Carlo fame), is a open source software (GPL 3) that was designe to be faster and handle hierarchical models that are out of reach for both OpenBugs and JAGS. Stan runs on various versions of Windows, Mac OS X and Linux. According to the Stan manual: “A Stan program is first compiled to a C++ program by the Stan compiler stanc, then the C++ program compiled to a self-contained platform-specific executable.” Stan’s language for expressing the Bayesian models and its method computing posteriors (No-U-Turn sampler, a variant of Hamiltonian Monte Carlo) are also different from the BUGS. R users access Stan through the RStan package.

If you are new to Bayesian Computing Drew’s sample code is a good place to start. Here are three lines of code to do a Bayesian regression that may open up an new way of thinking about regression inference for you.

Now instead of a point estimate and a p-value for a coefficient you have an entire distribution to consider.

A very nice expositional touch in Drew’s tutorial is the use of RStudio’s manipulate package to create a slider for allowing the one to see how changing the shape parameters of the Beta prior distribution affect the estimate of the posterior distribution. I think this intuition building tools is very helpful.

Compares JAGS, STAN and LaplacesDemon with code examples for all of the three approaches. LaplacesDemon is actually a really cool package, a pure R approach to Bayesian modeling. http://www.bayesian-inference.com/software