Saturday, December 6, 2014

Likelihood Ratios from Statistically Significant Studies

The previous post I reacted to an old Black Belt Bayesian post about p-values.

Since then, there's been some more discussion of this article in the LA LessWrong group. Scott Garrabrant pointed out that the likelihood ratios coming from p-values are far less than he naively intuited. I think I was making the same mistake before reading BBB, and I think it's an important and common mistake.

How much should we shift our belief when we see a p-value around 0.05 (so, just barely passing the standard for statistical significance)?

The p-value is defined as the probability that a statistic would be as great or greater than observed, assuming the null hypothesis were true.

The very common mistake is to confuse P(observation | hypothesis) with P(hypothesis | observation), naively thinking that the p-value can be used as the probability of the null hypothesis. This is bad, don't do it. (David Manheim, also from the Los Angeles LessWrong group, pointed us to this article.)

But if that's not the correct conclusion to draw, what is?

The Bayesian answer is the Bayes Factor, which measures the strength of evidence for one hypothesis H1 vs another H2 as P(obs | H1) / P(obs | H2). If we combine this with a prior probability for each hypothesis, P(H1), P(H2), we can compute our posterior P(H1 | obs). For example, if our prior belief is 50-50 between the two and the likelihood ratio is 1/2, then our posterior should be 1/3 for H1 and 2/3 for H2. (H2 has become comparatively twice as probable.) However, the Bayes factor has the advantage of objectively measuring the influence of evidence on our beliefs, independent of our prior.

The less common mistake which both Scott and I were making was to think as if a p-value were a Bayes factor, so that a statistically significant study will shift belief against the null hypothesis by a ratio of about 1:20.

The formula mentioned by Black Belt Bayesian shows this is wrong. For a p-value of 0.05, the Bayes factor can be lower-bounded at 0.4, which means the odds of the null hypothesis only shift by 2:5. This is much less than the 1:20 shift I was intuitively making. (Of course, if the p-value is lower, this will be better!)

Also notice, this is a minimum: the actual likelihood ratio could be much higher! A higher ratio would be worse news for a scientist's attempt to reject the null hypothesis. It's even possible that the Bayesian should be increasing belief in the null hypothesis, if the alternative hypothesis explains the data less well. This might happen if our alternative hypothesis spreads probability mass very thinly across possibilities. The Bayes Factor is a relative comparison of hypotheses (comparing how well one hypothesis compares to another) whereas null hypothesis via p-values attempts an absolute measure (rejecting the null hypothesis in absolute terms).