One of my absolute favorite episodes of the classic TV show The Twilight Zone is "The Nick of Time". In this episode, a young, newly married couple, Don (played by none other than William Shatner) and Pat are forced to spend time in a small town diner while they wait for car repairs. In the diner they come across a fortune telling machine, the "Mystic Seer," which accepts yes/no question and (for a penny) spits out a card with an answer. The episode quickly establishes that Don is very superstitious. He eagerly asks the Mystic Seer a series of question and soon begins to believe in the supernatural powers of this particular device. Pat remains continuously skeptical even as each assertions of the Seer comes true. Eventually, Pat comes around to believe in the powers of the seer, but unlike her husband finds the possibility that the seer is prophetic terrifying.

Good thing we have lots of pennies left from all those coin tossing experiments!

So how do we determine if the Mystic Seer is truly mystic? How can we understand how Don and Pat are reasoning differently as they each get the same data?

Bayes Factor to the Rescue

Throughout this episode we are faced with two competing hypotheses:

\(\mathbf{H}\) - The Mystic Seer truly can predict the future.

\(\mathbf{\bar{H}}\) - The Mystic Seer is just a lucky gimmick.

Our data \(D\) in this case is very simple: it is a sequence of correct answers. Although Don and Pat debate the results a bit, the major assumption is that the Mystic Seer is correct every time. The question is: is it supernatural or is it merely a coincidence. For us \(D\), our data, always represents a sequence of \(\n\) correct answers.

As we debate on the nature of the Mystic Seer we are analyzing how well the two hypotheses are explaining the data. The formula for these are:

\(P(D|H)\) - the probability of \(n\) correct answers in a row given that the Mystic Seeker can predict the future. Interestingly enough this is always 1! If the Mystic Seeker is supernatural, then it will always pick the right answer whether it is asked one question or a thousand!

\(P(D|\bar{H})\) - assuming that the Mystic Seeker is just randomly spitting out answers the probability of guessing \(n\) answers correct in a row is just \(0.5^n\).

For these two hypotheses to compete we can just look at the ratio of how they describe the data we've observed:

$$\frac{P(D|H)}{P(D|\bar{H})}$$

This ratio of likelihoods is called Bayes' Factor. In this case, Bayes' Factor is measuring how many times more likely the data is given \(H\) as opposed to \(\bar{H}\) Now let's see how these ideas compete!

Behold! The mystic art of Bayesian Reasoning!

Measuring Bayes' Factor

We'll assign our Bayes' factor the traditional value of \(k\). The nice thing about our model is that our numerator, \(P(D|H)\) is always 1, so for any given questions answered correctly,\(n\) we have:

$$k = \frac{P(D_n|H)}{P(D_n|\bar{H})} = \frac{1}{0.5^n}$$

To get a feel for Bayes Factor let's consider the point where the Mystic Seer has given 3 correct answers. At this point \(P(D_3|H) = 1\) and \(P(\bar{H}) = 0.5^3 = 0.125\). Clearly \(H\) explains the data better, but certainly nobody, even Don, would be convinced by 3 correct guesses. Our \(k\) for 3 questions is:

$$k = \frac{1}{0.125} = 8$$

Cleary then \(k = 8\) must not mean much since this is not convincing. The typical guidelines for Bayes factor are

Common interpretations for Bayes' Factor

At 3 questions answered correctly and \(k=8\), we should at least be curious about the power of the Mystic Seer, though still very far from convinced.

Wait a second, there's a big problem here. First off I don't know about you but 3 correct questions doesn't convince me that the Mystic Seer is psychic at all. But even worse by 4 questions Don is convinced (at 7 it's pretty much a fact in his mind) and it takes Pat 14 questions, \(k=16,384\), to believe just enough to get nervous. So what is going on?

Null-Null Hypothesis

Before moving on it is worth noting how what we've set up here is different than a traditional statistical test. In classical Null Hypothesis Significance Testing (NHST) we frame our results in terms of what is called the Null Hypothesis. The Null Hypothesis, typically labeled \(H_{0}\), is pretty much the same as our \(\bar{H}\). \(H_{0}\) essentially states "What is the probability of seeing the same results due to random chance?", which is exactly how \(\bar{H}\) explains the results.

In a NHST we are testing to see if we can reject \(H_{0}\). How we reject \(H_{0}\) is where p-values come in. The p-value is just the probability of seeing results as extreme as we have due to chance. For this particular example the p-value is essentially \(P(D|\bar{H})\). So \(H_{0}\) and \(\bar{H}\) are the same then right?

The big difference here is a philosophical one, but a philosophical difference with very practical implications. \(H_{0}\) is saying "there is nothing special about the Mystic Seer", \(\bar{H}\) is saying "the mystic seer is just lucky, it is NOT psychic." The latter is a stronger assertion about the world. This becomes more obvious when we look at \(H\).

The strongest statement we can make when we run a NHST is "It is unlikely that the Mystic Seer is just random". The statement we are making with Bayes Factor is "It seems much more likely that the Seer is supernatural than that it's just lucky". In Bayes Factor we are comparing two positive claims and seeing how well they explain the data relative to one another. Pat's view is not "there's not statistically significant behavior in the Seer", it's the stronger assertion that "The mystic seer is BS".

With Bayes Factor, we are making a positive assertion about how well one theory about how the world compares against another. In a NHST all we can ever say is "It looks like not-nothing is happening."

Prior Beliefs Matter

The thing missing is in our model is Prior Beliefs! Don is extremely superstitious and Pat is a skeptic. Clearly Don and Pat are using extra information in their mental models because each of them arrives at conclusions of different strengths at very different times relative to how much data they've been given.

We can model this reasoning by simply imagining the initial odds of \(P(H)\) and \(P(\bar{H})\) given no additional information:$$\text{Prior} = O(H) = \frac{P(H)}{P(\bar{H})}.$$

The colloquial expression of belief in an idea works quite intuitively in Bayes Factor. If you and I walk into a diner, and I ask you "What are the odds that Mystic Seer is psychic" you might reply "Uh, 1 in a million, there's no way that thing is supernatural." Mathematically we can express this literally as:$$O(H) = \frac{1}{1,000,000}$$ Now we simply have to combine this Prior belief with our data. To do this we take the product of our initial odds with the results of Bayes Factor to get our personal Odds in the Hypothesis given the data we've observed:

$$k = O(H|D) = O(H)\cdot\frac{P(D|H)}{P(D|\bar{H})}$$

Thinking there's a 1 in a million chance the seer is psychic from the beginning is pretty strong skepticism. The nice thing about our Bayesian approach is that this skepticism is reflected quite well in the model. If you think the hypothesis is extremely unlikely, you are very, very unlikely ever to be convinced no matter what the data says. In a Classical NHST 5 correct answers would put our p-value at 0.03125, which would normally be criteria for "Rejecting the Null Hypothesis". But in our Bayes Factor with Priors we get the following results:

This result corresponds quite well with our intuition. If you really don't believe in a hypothesis from the start, it's going to take a lot of evidence to convince you otherwise. In fact, with Bayes' Factor we can work out how much evidence we need! At \(k = 2\) you're just barely considering the possibility of the supernatural hypothesis. So if we solve for \(k > 2\) we can determine what it would take to convince you.

$$ 2 < O(H|D_{n}) = \frac{1}{1,000,000}\cdot\frac{1}{0.5^n} $$

If we solve for \(n\) to the nearest whole number we get:$$n > 21$$

Which means at around 21 correct answers in a row a strong skeptic should start to question, just a bit, whether or not the seer is, in fact, psychic.

Developing our own Psychic Powers!!!

Finally, there is one more interesting trick we can do with Bayes Factor. We've seen how if we know our prior beliefs and our evidence we can determine how strong with think \(H\) compared to \(\bar{H}\). Then we look at how much evidence it would take to convince us of \(H\) given our prior belief in \(H\). However in this episode of The Twilight Zone the big piece of information we're missing is how much Don and Pat believe in the possibility of a Mystic Seer when they first walk into the diner. That is, we would like to be able to quantify Don and Pat's prior beliefs given how they react to the evidence they've seen! In essence, Bayes' Factor can be used to gain our own psychic ability to peer into the unobserved minds of Don and Pat!

It takes Don about 7 correct questions to become very nearly certain of the Mystic Seer's supernatural abilities. We can estimate that Don's \(k \approx 150\) at this point as 150 is the threshold for "very strong" beliefs in our table above. Now we can write out everything we know except for \(O(H)\) which we'll be solving for:

Solving this we for \(O(H)\) we get:$$O(H)_{\text{Don}} \approx 1.17$$

Now we have a model for Don's superstitious beliefs! Don walks into the diner being slightly more willing than not to believe that the Mystic Seer is supernatural even before collecting any data at all! This makes sense of course since superstitious people behave the way they do because they naturally have a penchant to believe the supernatural plays a larger hand than not in daily activities.

Now on to Pat. At around 14 questions correct Pat starts to suspect the Mystic Seer, clearly getting nervous and calling it "A stupid piece of junk!" as her fears grow. She begins to believe but clearly is not nearly as certain as Don. I would estimate her \(k \approx 5\). Basically moving from strong skepticism to "I guess it's not impossible".

In other words Pat, walking into the diner, would claim that the Seer has about 1 in 3,000 chance of being supernatural.

What is your prior belief in the Mystic Seer?

Conclusion

Using Bayesian Reasoning, we can model mathematically quite a bit of complicated and very human reasoning in this episode of the Twilight Zone. Rather than being stuck with the weak claims of typical NHST, using Bayes' Factor we can make confident assertions about differing Hypotheses. By modeling our prior belief in a hypothesis we are also able to frame our questions concerning "What would it take to convince you something is true?".

Reasoning and convincing others of a hypothesis is what statistics should always be all about. The trouble with p-values is that they make us think in heuristics, not rational arguments: "If p < 0.05 something is up" (or more accurately "not nothing is probably not the case"). When we focus on these artificial thresholds for "truth" we abandon the ability to reason and reason individually about observed data.

A frequent critique of Bayesian reasoning is that using prior beliefs is too subjective. But as we can see in this example prior beliefs, irrational or not, are a huge part of how we reason under uncertainty. Bayesian analysis allows us to argue subjective views in a rational manner. There are many things that you and I may be much more or less willing to believe. What it takes to convince you of something might be different that what it takes to convince me. With Bayesian analysis, we can quantify our skepticism to determine how much evidence has convinced us. We can determine how much evidence we would need to convince someone if we know their initial skepticism. Finally, we can even work backward and quantify what are typically viewed as qualitative beliefs.