Extraordinary claims require extraordinary evidence was a phrase made popular by Carl Sagan. It is central to scientific method, and a key issue for critical thinking, rational thought and skepticism everywhere.

The evidence put forth by proponents of such things as gods, ghosts, the paranormal, and UFOs is highly questionable at best and offers little in the way of proof. Even if we accepted what evidence there is as valid (and it is highly debatable if we should), limited and weak evidence is not enough to overcome the extraordinary nature of these claims.

Contents

Alice and Bob are two friends talking after school. Alice tells Bob that she watched a movie the previous evening. Bob believes her easily, because he knows that movies exist, that Alice exists, and that Alice is capable and fond of watching movies. If he doubts her, he might ask for a ticket stub or a confirmation from one of her friends. If, however, Alice tells Bob that she flew on a unicorn to a fairy kingdom where she participated in an ambrosia-eating contest, and she produces a professionally-printed contest certificate and a friend who would testify to the events described, Bob would still not be inclined to believe her without strong evidence for the existence of flying unicorns, fairies and ambrosia-eating contests.

While the idea that a sufficiently outlandish claim requires a lot more compelling evidence is quite intuitive, it can be quantified nicely with probability theory in a Bayesian framework. In short, sufficient evidence must be capable of raising a highly improbable claim to be highly probable - and the more improbable the evidence, the better. By application of Bayes' theorem, it's possible to show this in action mathematically.

Assume, for instance, someone claims to be able to predict what way a coin[1] will land almost perfectly. We know this is an extraordinary claim, so we'll say that just by guessing if the person is telling the truth or not that it's a million-to-one chance. In reality, the number would be even more improbable, but this can be used for illustration. So we ask them to demonstrate the skill. They're almost perfect, so let's assume they guess right about 90% of the time - this allows them the opportunity for their skill to mess up once in a while, but still prove to be pretty good. This gives us all the information we need to know to actually quantify how extraordinary the evidence must be.

Consider if they guessed a single coin toss correctly. The odds of guessing by chance is a mere 50%, or 50:50.

A single coin toss doesn't improve our odds very dramatically. The evidence just isn't extraordinary enough - you can correctly guess a single coin toss correctly 50% of the time with no special skills involved. It all rests on how improbable our evidence, P(B), actually is and a 50:50 chance isn't particularly improbable. For two coin tosses P(B) becomes 0.25, and for 10 coin tosses it comes to roughly 0.00097. Plugging those numbers in Bayes' theorem gives us a probability of genuine skill (given P(A) of a million-to-one) of around 0.0009, which although still small is a considerable improvement on that original million-to-one chance. By 20 or so correctly guessed coin tosses, the skill is starting to look a lot more genuine.

This is the basic idea underpinning statistical significance; is it more likely that our evidence is random, or due to a real effect, and is the improbability of the evidence presented in proportion to the improbability of the claim being made. But Sagan's quip about extraordinary evidence doesn't just mean that we can take someone's word for it if they managed to toss so many coins in a row. Derren Brown can pull off such a feat with some effort and misdirection as shown in his special on The System, so we always need to consider alternative hypotheses and compare how likely they are. Like with Derren Brown tossing a coin with 10 heads in a row, is it more likely that they're psychic, or are cheating? So tests such as James Randi's million-dollar challenge will control for this potential factor, making sure that the probability of foul play, fraud and cheating is far less than the probability of genuine psychic power.