Tuesday, April 28, 2015

Informed priors for Bayesian comparison of two groups

The BEST programs, for Bayesian estimation of two groups, were written with generic vague priors only minimally informed by the scale of the data. Here are new versions of the programs that are better suited for specifying informed priors.

A little background: Suppose we have two groups of metric data. In each group the data seem to be unimodal and symmetrically distributed, but perhaps with outliers relative to a normal distribution. So we chose to describe the groups with possibly heavy-tailed t distributions, but using the same normality (df) parameter for both groups because outliers are rare (and it's the outliers that most influence the normality parameter). Thus there are five parameters to be estimated: μ1, σ1, μ2, σ2, and the normality parameter ν.

The original BEST program put simplistic broad priors on the five parameters. For example, σ1 was given a broad uniform prior extending across a range that far exceeded any plausible estimate for the scale of the data. It was explained how to make basic modifications of the prior in Appendix B of the article describing the programs. But the programs really were not intended for easy expression of informed priors. In the post I present a version of the BEST programs that are much easier for expression of informed priors.

In the new version, each parameter is given its own line of code for expressing its prior, so that each can be individually set.

Each μ is given a normal prior with its own (constant) mean and standard deviation.

Each σ is given a gamma prior with its own (constant) shape and rate. The shape and rate are derived from the desired mode and standard deviation of the gamma distribution.

The normality parameter ν is given a gamma prior with its own (constant) shape and rate. The shape and rate are derived from the desired mean and standard deviation of the gamma distribution.

Pages 236-239 of DBDA2E explain the gamma distribution and formulas for converting the desired mode or mean and standard deviation to corresponding shape and rate values. In particular, it defines the R functions gammaShRaFromModeSD and gammaShRaFromMeanSD that convert desired mean or mode and SD to corresponding shape and rate parameters.

The JAGS model specification is shown below. To understand it, note that the metric data values are y[i] and the corresponding group membership is specified by x[i].

But it doesn't often make sense to just arbitrary put strong constraints in the prior. Instead, and in general, we would like the prior to resemble a posterior from previous data, starting with a vague proto-prior. Instead of relying on intuition to express an informed prior directly on the parameters, we start with a vague proto-prior and enter some reasonable data that instantiate our prior knowledge. Then we run the Bayesian analysis on the data and observe the resulting posterior distribution. It is the posterior distribution that is now the informed prior for subsequent data analysis. From the posterior distribution of the prior data, just "read off" the mode and sd of the marginal posterior on σ and other parameters. In the program, the MCMC chain is in the matrix mcmcChain, so for example, we can get at the mean and sd of the nu parameter using
mean( mcmcChain[,"nu"] ) and sd( mcmcChain[,"nu"] ).

The full scripts are listed below:

BESTgamma.R:

# MODIFIED 4/28/15 WITH GAMMA PRIOR ON SIGMA
# John K. Kruschke
# johnkruschke@gmail.com
# http://www.indiana.edu/~kruschke/BEST/
#
# This program is believed to be free of errors, but it comes with no guarantee!
# The user bears all responsibility for interpreting the results.
# Please check the webpage above for updates or corrections.
#
### ***************************************************************
### ******** SEE FILE BESTgammaExample.R FOR INSTRUCTIONS **************
### ***************************************************************

#==============================================================================
Script to run after above file is sourced:

### Make sure that the following programs are all### in the same folder as this file:### BESTgamma.R### plotPost.R### HDIofMCMC.R### HDIofICDF.R### openGraphSaveGraph.R### Make sure that R's working directory is the folder in which those ### files reside. In RStudio, use menu tabs Tools -> Set Working Directory.### If working in R, use menu tabs File -> Change Dir.

# Get the functions loaded into R's working memory:source("BESTgamma.R")

# Specify data as vectors: # (Replace with your own data as needed. R can read many formats of data files,# see the commands "scan" or "read.table" and its variants.)y1 = c(101,100,102,104,102,97,105,105,98,101,100,123,105,103,100,95,102,106, 109,102,82,102,100,102,102,101,102,102,103,103,97,97,103,101,97,104, 96,103,124,101,101,100,101,101,104,100,101)y2 = c(99,101,100,101,102,100,97,101,104,101,102,102,100,105,88,101,100, 104,100,100,100,101,102,103,97,101,101,100,101,99,101,100,100, 101,100,99,101,100,102,99,100,99)

4 comments:

I am very sorry to be directly asking for help on your blog ... It would be wonderful if you could help.

It's the first time I'm doing MCMC (the first time using R, even ;-)) and I have problems getting it to work.

I want to do a comparison of two groups, so I am using BEST. My likelihood function should be gamma (I'm generating the data, for now), and if I'm correct I should also specify a gamma prior on the mean for the posterior to be gamma (the prior on the standard deviation I am leaving uniform).

Even though the syntax is working now, the results - even for prior_only - look very wrong. And I have difficulties adapting the code for plotting (which comes with BEST) for a gamma prior/posterior.

It would be great if you could have a look at the code below (basically based on BEST but adapted) and tell if it's basically correct - I think it can't really be as the prior mu's are zero. Also I'd like to ask if you have code for plotting the results for MCMC with gamma - but of course this only really matters if I can get it basically correct :-)

I've only glanced at your code and it seems basically okay, but I haven't studied it closely. One suggestion: For debugging, try data N around 100, not 40000! It'll take a long time to churn through 2*40000 data points.

Thank you very much! Yes, I've been using that post for reference too :-)But because of what I'd like to show, I'd prefer an informed prior distribution, not uniform - actually, best would be - like you use in BEST - one common prior for the mean for both groups.

But if you say it looks basically ok, i will try debugging!

For plotting the results, I should probably refer to that blog post and try to adapt it to two groups I guess ...