Sunday, January 11, 2009

In a valid deductive argument, the conclusions follow necessarily from the premises. This is a proof in the mathematical sense of the word. Provided we know that the premises are true, we can establish with complete certainty that the conclusions are true. For example, identifying a single unicorn would establish without a doubt that unicorns exist.

Unfortunately much of the time this type of certainty isn't possible. Consider another example from the realm of mythology: weapons of mass destruction (WMD) in Iraq. Here's what Donald Rumsfeld had to say on the subject in 2002 (the boldface is my addition):

There's another way to phrase that and that is that the absence of evidence is not evidence of absence. ... Simply because you do not have evidence that something exists does not mean that you have evidence that it doesn't exist.

But surely hunting high and low for WMD month after month and not finding any (absence of evidence) supports the inference that there aren't any there (evidence of absence). Indeed, it turns out that the popular maxim cited by Rumsfeld issimplyincorrect.

But didn't he have a point? Absolutely: failing to prove that something exists does not prove that it does not exist. Or, in the words of the English writer William Cowper (1731-1800):

Absence of proof is not proof of absence

Compare this with the version invoked by Rumsfeld:

Absence of evidence is not evidence of absence

The originator of this maxim seems to be the cosmologist Martin Rees, although it has been attributed to many others, including Carl Sagan. By substituting the word evidence for proof it makes a much stronger (and invalid) claim. Evidence, after all, is often uncertain. If I look outside and see that the ground is wet, that is evidence that it has been raining. But perhaps my neighbour was watering her flowers. Seeing someone walk by with an umbrella folded under their arm might strengthen my evidence for the rain hypothesis, but perhaps they are anticipating rain later on. In general, evidence can support an inference, but it won't necessarily prove it. And that's where Rees's formulation of the maxim falls down.

Black and white thinking about evidence

When evidence is construed as being certainty, we get into all kinds of trouble. This is how Rumsfeld turned a simple truism (no WMDs have been found, but they might still be) into a puzzle of obfuscation (absence of evidence is not evidence of absence).

But Rumsfeld is not the only one. As I noted recently, the term "no evidence" is commonly used to describe situations where an effect is not found to be statistically significant. Now statisticians are wary of people concluding that a lack of statistical significance implies that there is "no effect". (It might be, for example, that the sample size was inadequate.) Hence, it is not at all uncommon for statisticians to declare that absence of evidence is not evidence of absence! As Kim Øyhus has pointed out, even the American Statistical Association buys into it, as the t-shirt they sell attests.

Of course statisticians know well that uncertainty isn't easy to think about or communicate to others. So why have we fallen into this trap?

Well, part of the reason may be philosophical. Statistical reasoning is inescapably inductive—it does not guarantee certainty. Philosophers have been worrying about what is called the problem of induction for a very long time. David Hume (1711-1776) challenged the logical foundations of induction, and ever since, philosophers have sought a way around the problem. The reigning "solution" is known as the hypothetico-deductive method, developed by philosopher of science Karl Popper (1902-1994). Popper argued that induction in science could be avoided by proposing a hypothesis and then seeking evidence that would either prove the hypothesis wrong ("falsify" it) or fail to do so. This is very similar to the frequentist statistical hypothesis testing framework that developed from the work of Fisher, Neyman, and Pearson. Unfortunately, it lends itself to black-and-white thinking. A hypothesis is either proven wrong or it isn't. There's no grey zone.

Popper's formulation, in particular, buries the uncertainty completely, construing the reasoning as entirely deductive. Suppose, for example, that a new biochemical theory predicts that a certain drug will shorten the duration of an illness, whereas the older theory does not. Now duration of illness depends on numerous factors, including differences in patients' immune systems, and we expect to see variation above and beyond any differences due to the drug. A clinical trial may demonstrate that the average duration of illness for patients who are randomly assigned the drug is shorter than that for patients who receive placebo, and that this difference is statistically significant at the 0.05 level. Has the older theory been proven incorrect? Not with absolute certainty. The evidence against it may be strong but it is possible that this is a "type-I" error—rejecting the null hypothesis even though it is true. Indeed, because of the way the statistical test has been designed, when the null hypothesis is true we expect to see such errors 5% of the time. The companion to the type-I error is the type-II error—failing to reject the null hypothesis even though it is false.

Pretending that type-I and -II errors don't exist is wishful thinking. Just as diagnostic tests produce false positives and false negatives, statistical hypothesis tests can give the wrong answer. The point is that we can study and control the error rates and make inferences while acknowledging their limitations.

It suited Donald Rumsfeld's purposes to be fuzzy about the distinction between evidence and proof. It doesn't suit ours.

Update 03-Jun-2009: I had originally attributed the maxim "Absence of evidence is not evidence of absence" to Carl Sagan in his 1995 book The Demon-Haunted World." Apparently however, the originator was cosmologist Martin Rees. There is reference to it in the proceedings of a 1972 symposium titled Life Beyond Earth & The Mind of Man [pdf], jointly sponsored by Boston University and NASA. In his introductory remarks, the chair, Richard Berendzen stated:

A generation ago almost all scientists would have argued, often "ex cathedra," that there probably is no other life in the universe beside what we know here on Earth. But as Martin Rees, the cosmologist, has succinctly put it, "absence of evidence is not evidence of absence." Beyond that, in the last decade or so the evidence, albeit circumstantial, has become large indeed, so large, in fact, that today many scientists, probably the majority, are convinced that extraterrestrial life surely must exist and possibly in enormous abdundance.

(The boldface is mine.) Note that Carl Sagan was one of the panelists at the symposium.

Is it possible that the distinction is one of logical reasoning (and Bayesian probability) versus statistical interpretation (for hypothesis testing)? From a frequentist perspective, there are no priors to consider in a statistical hypothesis test. And we know that we can never prove the absence of a relation using a statistical hypothesis test--instead we seek evidence against it.

The p-value is statistical evidence against the null. It cannot be considered as statistical evidence for the null because it is calculated under the assumption that the null is true. Therefore, in this context, absence of statistical evidence is not statistical evidence of absence (i.e., the statement is summarizing the interpretation of the p-value). This statement should not, however, be generalized outside the context of statistical hypothesis testing.

It would be interesting to find the first use of this maxim in statistics. I wouldn't be surprised to learn that a university professor devised it with tongue in cheek, knowing full well of the logical fallacy if taken out of the context of statistical hypothesis testing.

When evidence is defined as "rejecting the null hypothesis", the claim that absence of evidence is not evidence of absence becomes failing to reject the null hypothesis is not rejecting the alternative hypothesis, which conveys the point clearly enough. Strictly speaking however, it's nonsense: "rejecting the alternative hypothesis" is actually a meaningless notion in the hypothesis testing framework! Perhaps more to the point, while it is unwise to automatically interpret failure to reject the null hypothesis as evidence that the null hypothesis is true, it is also unwise to dogmatically insist that we can't say anything about the plausibility of the null hypothesis in the light of the evidence. (Note that with a point null hypothesis, one needs to consider an interval around the null hypothesis, known as a margin of equivalence, a point I alluded to in my previous post.)

Suppose I gave you a coin and told you that I had tested it and found "no statistically significant evidence" that it was biased towards heads or tails. In fact the two-sided p-value was 1. That sounds good, but what does it mean?

It might mean that I tossed the coin 1000 times and got exactly 500 heads. That would seem like pretty strong evidence that the coin is unbiased (in fact a 95% Clopper-Pearson confidence interval for the true probability the coin comes up heads is 0.469 to 0.531).

On the other hand, perhaps I tossed the coin twice and got one head and one tail. This seems like pretty weak evidence that the coin is unbiased (the confidence interval this time is 0.013 to 0.987).

But in both cases, the p-value for the test of the null hypothesis that the coin is unbiased is 1. The p-value is a measure of the strength of evidence against the null hypothesis. As you have pointed out, the asymmetry of the hypothesis testing framework does not allow the notion of evidence for the null hypothesis. Fortunately confidence intervals allow us to overcome this limitation.

Now, why do you have to drag the alternative into this. Next thing you know we'll be discussing Fisher's formulation of hypothesis testing versus the Neyman-Pearson school of frequentist statistics. =)

Maybe I'm wrong, but I interpret "absence of evidence is not evidence of absence" (in the context of hypothesis testing) to mean "failing to reject the null is not equivalent to accepting the null." I'm thinking of the null hypothesis of "no treatment effects". You don't have significant evidence to reject the null, and therefore an absence of evidence of treatment effects, but this is not the same thing as saying you have evidence of no treatment effects (because of the formulation of hypothesis testing, flawed as it may be).

One point, which I believe you are alluding to, is that an equivalence test would be more appropriate. But I've heard some statisticians and researchers try and argue that they could use retrospective power to "prove the null" when they are faced with non-significant results. See Abuse of Power [PDF] (this paper was the nail in the coffin, if you will, in a previous discussion I was having with a group of statisticians).

I believe the maxim is simply trying to emphasize that the p-value is calculated having assumed the null, and therefore can't be used as evidence for the null (as it would be a circular argument). Trying to make more out of the maxim than this may be the sticking point. It's too simple, and therefore flawed when taken out of this limited context.

I agree with your previous post. If I'm not mistaken, one point was that failing to reject the null means the confidence interval contains a value of "no effect". But there could still be differences of practical importance, and so failing to reject the null is not the same as showing there's no effect. The "statistical note" from the BMJ, Absence of evidence is not evidence of absence, seems to be saying the same thing: absence of evidence of a difference is not evidence that there is no difference. Or, absence of evidence of an effect is not evidence of no effect. Because you can't prove the null using a hypothesis test (you instead need an equivalence test). What do you think?

Great bit of writing Nick (we met breifly at Opa! in Halifax with the Parefico meeting years back; Dan put me onto your blog). This is something that seems to need reiteration repeatedly as what we get taught in university is Popper, but what we do in practice (as, say, macro-scale Ecologists) is induction (no experiments here). This should be an entire lecture in every ecology course...

Incidentally I had an article on this recently (Environmental Conservation (2008), 35 : 193-196 ) so it's close to my heart!Cheers, A.

Aaron, great to hear from you! Yes, I remember that evening. And of course it reminds me how much I miss Ram.

I had a look at your article, and it was a very good read. The journal has made it freely accessible, which is wonderful. I would highly recommend it to anyone who is reading these comments. It's very interesting to see that we're thinking some similar thoughts. Your point about entertaining multiple hypotheses is also an interesting connection.

Yes, I also find it hard to think about this stuff without getting my brain in a knot!

Incidentally, the Abuse of Power paper you cite was written by John Hoenig, who is a friend (and former boss) of mine. He's a super guy. It's very useful to cite his paper whenever people start talking about power after the fact.

You're right that I was thinking about equivalence (and non-inferiority) tests. They are most easily formulated in terms of confidence intervals. Interestingly, confidence intervals have a direct correspondence with null hypothesis statistical testing. So although I think that the results of an ordinary null hypothesis test can sometimes be interpreted in a misleading way, the closely-linked idea of confidence intervals gets around these problems. But the Popperian approach seems wedded to the narrower null hypothesis testing framework.

I just searched Google and got 46,800 hits for the phrase "absence of evidence is not evidence of absence". It's certainly popular! But does it do a lot of damage? And when is it important (or worth the effort) to challenge it?

I think what Rumsfeld said was and is correct. What is not correct is how that statement translated into action. One should not punish without proving guilty. Civility demands that you do not prove innocence but guilt….simply because not doing so will lead to injustice, disorder and chaos.

Now coming back to the maxim…you yourself state that evidence is often uncertain (and evidence you did not define). Why? I think because of our limitations of finding and evaluating it. It is not only the type I or II error…it is also what comes before it – the research methodology of data gathering.

In the end what you can say is that it probably is or probably is not with a certain level of confidence based on statistical testing. But that confidence is based on an assumption that methodology is sound. Uncertainty of the methodology is not incorporated in the measure of confidence (or lack thereof).

Therefore, how truly uncertain we are, remains truly uncertain. The confidence interval is not much help given uncertainty about methodology --- and assessment of methodology goes much beyond study design….subjectivity in gathering data, pre-analytic and analytic validity of our measurement tools, their accuracy, and even the comprehensiveness of outcomes investigated. Even the best randomized studies are criticized for their minor flaws.....However, the actual magnitude of those flaws is unquantifiable. In the end, evidence usually must come down to a suggestion both of presence or absence, despite a quantifiable confidence interval. And since it’s a suggestion, a cautious one would limit it to mere presence of things and keep silent about their absence.

The maxim helps in exercising that caution…..for most of us born athletes and high jump to conclusions.

On the other hand, given a perfect methodology.....absence of evidence is I think evidence of absence.

I agree with you that flaws in methodology can often swamp narrow consideration of statistical uncertainty.

I do think, however, that evidence is almost never black or white. But the decisions we make in the light of evidence may well be. For example, based on the evidence, a government may decide whether or not to implement a certain policy, for example to cover the cost of immunizing the public against a certain disease with a new vaccine.

Adding to the irony of attributing the quote to Carl Sagan's "Demon Haunted World", in that book he explicitly criticizes the maxim as an argument from ignorance:

"appeal to ignorance — the claim that whatever has not been proved false must be true, and vice versa (e.g. There is no compelling evidence that UFOs are not visiting the Earth; therefore UFOs exist — and there is intelligent life elsewhere in the universe. Or: There may be seventy kazillion other worlds, but not one is known to have the moral advancement of the Earth, so we're still central to the Universe.) This impatience with ambiguity can be criticized in the phrase: absence of evidence is not evidence of absence." (Sagan, "The Fine Art of Baloney Detection", reprinted in "The Demon Haunted World")

Reducing that to "Absence of evidence is not evidence of absence." is a quote mine, since the rest of that very sentence argues against the maxim.