Sunday, July 9, 2017

Statistics Sunday: Null and Alternative Hypotheses

In my writing about statistics, there is one topic - considered basic - that I haven't covered. This is the issue of null and alternative hypotheses, which are key components in any inferential statistical analysis you conduct. The thing is, I've seen it cause nothing but confusion in both new and experienced researchers, who don't seem to understand the difference between these statistical hypotheses and the research hypotheses you are testing in your study. I've rolled my eyes through doctoral research presentations and wielded my red pen in drafts of grant applications as researchers have informed me of the null and alternative hypotheses (which are implied when you state which statistical analysis you're using) alongside their research hypotheses (which require stating).

Frankly, I've been so frustrated by the lack of understanding that I questioned whether to even address these topics. When I teach, I downplay the significance (pun fully intended) of null and alternative hypotheses. (And in fact, many in the field are trying to move us away from the so-called Null Hypothesis Significance Testing, or NHST, approach, but that's another post for another day.) In any case, I treat this topic as an annoying subject to get through before getting to the fun stuff: analyzing data. Not that I think this is a boring topic, or that I have a problem with boring topics - when you love a field, you have to love the boring stuff too.

Rather, I questioned whether the topic was even necessary. You can conduct statistical analysis without thinking about null and alternative hypotheses - I often do.

I realize now that the topic is important, but it's not really explained why. So statistics professors and textbook authors continue to address the topic without addressing the purpose. Today, I'd like to do both.

First, we need to think about what it means to be a science, or for a line of inquiry to be scientific. Science is about generating knowledge in a specific way - through systematic empirical methods. We don't want just any knowledge. It has to meet very high and specific standards. It has to be testable, and, more importantly, falsifiable. If a hypothesis is wrong, we need to be able to show that. In fact, we set up our studies with specific controls and methods so that if a hypothesis is wrong, it can show us it's wrong.

If, after all that, we find support for a hypothesis, we accept that... for now. But we keep watching that little supported hypothesis out of the corner of our eyes, just in case it shows us its true (or rather, false) colors. See, if we conduct a study to test our research hypothesis, we will use the results of the study to reject (if it's false) or support (if it doesn't appear to be false). We don't prove anything, nor do we call hypotheses true. We're still looking for evidence to falsify it. That is the purpose of science. To study something again and again, not to see if we can prove it true, but to see if we can falsify it. It's as though every time we do a study of a hypothesis that's been supported, we're saying, "Oh yeah? Well, what about if I do this?"

This is the nature of scientific skepticism. There could be evidence out there that shows a hypothesis is false; we just haven't found it yet. Karl Popper addressed this facet of science directly in the black swan problem. You can do study after study to support the hypothesis that all swans are white, but it takes only one case - one black swan - to refute that hypothesis.

Boom

So essential is this concept to science that we build it into our statistical analysis. The specifics of the null and alternative hypotheses vary depending on which statistic you're using, but the basic premise is this:

Null hypothesis: There's nothing going on here. There is no difference or relationship in your data. These are not the droids you're looking for.

(Come to think of it, that Jedi mind trick is the perfect demonstration of a Type II error. But pretend for a moment that they really weren't the droids they were looking for.)

This is your basic reminder that we first look for evidence to falsify before we look for evidence to support your research hypothesis. We then run our statistical analysis and look at the results. If we find something - the difference or relationship we expect - we reject the null, because it doesn't apply in this situation (although because of the possibility of Type I error, we never lose the null completely). And we have support for our alternative hypothesis. If, on the other hand, we don't find a significant difference or relationship, we fail to reject the null. (Yes, that is the exact language you would use. You don't "accept" or "support" the null, because nonsignificant results could simply mean low power.)

You also use the null and alternative hypotheses to state if there is an expected direction of the effect. For example, to go back to the caffeine study example, we expect caffeine will improve test performance (this is our research hypothesis). So we would write our null and alternative hypotheses to demonstrate that direction:

Null: The mean test score of the caffeine group will be less than or equal to the mean test score of the non-caffeine group or MCaffeine ≤ MDecaf

Alternative: The mean test score of the caffeine group will be greater than the mean test score of the non-caffeine group or MCaffeine > MDecaf

Notice how the null and alternative hypotheses are both mutually exclusive and exhaustive (together they cover all possible directions). If we conducted our statistical analysis in this way, we would only support our research hypothesis if the caffeine group had a significantly higher test score. If their test score was lower - even significantly lower - we would still fail to reject the null. (In fact, if we follow this strict, directional hypothesis, finding a significantly lower score when we expected a significantly higher score would simply be considered Type I error.)

If we didn't specify the direction, we would simply state the scores will be equal or unequal:

Null: The mean test score of the caffeine group will be equal to the mean test score of the non-caffeine group or MCaffeine = MDecaf

Alternative: The mean test score of the caffeine group will be different from the mean test score of the non-caffeine group or MCaffeine ≠ MDecaf

These hypotheses are implicit when doing statistical analysis - they're for your benefit, but you wouldn't spend time in your dissertation defense, journal article, or grant application stating the null and alternative. (Maybe if you were writing an article on a new statistical analysis.) Readers who know about statistics will understand they're implied. And readers who don't know about statistics will prefer concrete differences - what you hypothesize will happen in your study, and what specific differences you found and what they mean.

As you continue learning and practicing statistics skills, you may find that you don't really think about the null and alternative hypothesis. And that's okay. In fact, I wrote two posts that tie directly into null and alternative hypotheses without once referencing these concepts. Remember alpha? And p-values? I said in these posts that these refer to probabilities of finding an effect of a certain size by chance alone. Specifically, they refer to probabilities of finding an effect of a certain size if the null hypothesis is actually true - if we could somehow pull back the curtain of the universe and discover once and for all the truth. We can't do that, of course, but we can build that uncertainty into our analysis. That is what is meant by Null Hypothesis Significance Testing.

But, as you saw, I could still describe these topics without even using the phrase "null or alternative hypotheses." As long as you stay a skeptical scientist, who remembers all findings are tentative, pending new evidence, you're doing it right.