A blog on statistics, methods, and open science. Understanding 20% of statistics will improve 80% of your inferences.

Saturday, March 19, 2016

Who do you love most? Your left-tail or your right tail?

TL;DR: Don’t like one-sided tests? Distribute your alpha
level unequally (i.e., 0.04 vs 0.01) across two tails to still benefit from an increase in power.

My two unequal tails in a 0.04/0.01 ratio (picture by my wife).

This is a follow-up to my previous post, where I explained
how you can easily become 20% more efficient when you aim for 80% power, by
using a one-sided test. The only requirements for this 20% efficiency benefit
is 1) you have a one-sided prediction, and 2) you want to calculate a p-value. It is advisable to pre-register
your analysis plan, for many reasons, one being to convince reviewers you
planned to do a one-sided test all along. This blog is an update for people who responded they often don't have a one-sided prediction.

First, who would have a negative attitude towards becoming 20% more
efficient by using one-sided tests, when appropriate? Neo-Fisherians
(e.g., Hurlbert
& Lombardi, 2012). These people think error control is bogus, data
is data, and p-values are to be
interpreted as likelihoods. A p-value
of 0.00001 is strong evidence, a p-value
of 0.03 is some evidence. If you looked at your data standing on one-leg, and
then hanging upside down, and because of this you will use a
Bonferroni-corrected alpha of 0.025 and treat a p-value of 0.03 differently, well that’s just silly.

I almost fully sympathize with this ‘just let the data speak’
perspective. Obviously, your p-value
of 0.03 will sometimes be evidence for the null-hypothesis, but I realize the
correlation between p-values and
evidence is strong enough that it works, in practice, even when it is a formally
invalid approach to statistical inferences.

However, I don’t think you should just let the data speak to
you. You need to use error control as a first line of defense against making a
fool of yourself. If you don’t, you will look at random noise, and think that a
high success rate on erotic pictures, but not on romantic pictures, neutral pictures,
negative pictures, and positive pictures, is evidence of pre-cognition (p = 0.031, see Bem, 2011).

Now you are free to make an informed choice here. If you
think the p=0.031 is evidence for
pre-cognition, multiple comparisons be damned, I’ll happily send you a free
neo-Fisherian sticker for your laptop. But I think you care about error
control. And given that it’s not an either-or choice, you can control error
rates and after you have distinguished the signal from the noise, let the strength
of the evidence speak through the likelihood function.

Remember: Type 2 error control, achieved by having high
power, means you will not say there is nothing, when there is something, more than
X% of the time.

Now for the update to my previous post. Even when you want
to allow for effects in both directions, you typically care more about missing
an effect in one direction, than you care about missing an effect in the opposite
direction. That is: You care more about saying there is nothing, when there is
something, in one direction, than you care about saying there is nothing, when
there is something, in the other direction. That is, if you care about power,
you will typically want to distribute your alpha unequally across both tails.

Rice and Gaines (1994) believe that many researchers
would rather deal with an unexpected result in the opposite direction from
their original hypothesis by creating a new hypothesis, than ignoring the result as not supporting the original hypothesis. I
find this a troublesome approach to theory testing. But their recommendation to
distribute alpha levels unevenly across the two tails is valid for anyone who
has a two-sided prediction, where the importance of effects in both directions is
not equal.

I think in most studies people
typically care more about effects in one direction, than about effects in the other
direction, even when they don't have a directional prediction. Rice and Gaines propose using an alpha of 0.01 for one tail, and an alpha of
0.04 for the other tail.

I believe that is an excellent recommendation for people who
do not have a directional hypothesis, but would like to benefit from an
increase in power for the result in the direction they care most about.