[After spending the weekend with Doug Drinen, founder of Pro-Football-Reference.com, we decided that Football Perspective needed to revive this fantastic post of his, explaining why “What are the odds of that?” is a much less straightforward question than you might think.]

That may sound like a simple question, but it isn’t. Some would answer the question by stating that the odds of a roulette wheel landing on the number 19 on seven consecutive spins is a simple math problem. There are 38 numbered pockets on an American roulette wheel, so the odds of a ball landing on 19 in one spin of the wheel would be 1 in 38. The odds of that happening seven straight times would simply be (1/38)^7, or 1 in 114 billion.1

An equally plausible response would be that we don’t care that the wheel landed on “19” in seven straight spins, but rather that it landed on the same number for seven straight spins. In that case, what we really want to know is the likelihood that the wheel lands on any number (odds: 38/38, or 100%) and then lands on that same number again on the next six spins (odds: (1/38)^6). The odds of that happening are 1 in 3 billion.

But it’s not that simple, either. The question “What are the odds of that?” can, and often should, be interpreted differently. What are the odds of a roulette wheel, on seven consecutive spins, landing in the following order: 10-34-3-9-18-30-21. Take a second and think about that.

…

The answer is 1 in 114 billion. It is 38 times as likely for a roulette wheel to land on the same number seven times in a row than it is to land in the sequence 10-34-3-9-18-30-21. That’s because 19-19-19-19-19-19-19 is just as likely as 10-34-3-9-18-30-21. It may not feel that way to you, because 10-34-3-9-18-30-21 isn’t substantially different to you than 4-22-9-30-2-36-18, or a host of other seemingly random results.

Or, as Doug described, consider what happens if you flip a coin ten times and get THHHTTHHTH. What are the odds of that?

“What are the odds of that” is a question that sounds straightforward but is rather complex. It depends on what you think is the essence of a sequence of coin flips. The odds of getting exactly THHHTTHHTH is 1 in 1024. But that probably isn’t the essence of the sequence that you are curious about: your chances of getting 6 heads and 4 tails is about 20.5%, and the probability of a 6/4 split one way or the other is about 41%. THHHTTHHTH may be a rare outcome — just like 10-34-3-9-18-30-21 — but every single sequence is a rare outcome when flipping a coin 10 times. Usually, we’re often not interested in the outcome itself. Instead we group outcomes together into the events that we’re interested in. We might be interested in the event “six of one, four of the other,” which consists of a lot of different outcomes. That event is not particularly rare.

Remember my Splits Happen post? What were the odds that Jacoby Jones would gain three times as many receiving yards against the 8 teams at the back of the alphabet as he would against the 8 teams he faced in the front of the alphabet? It depends on your perspective. The actual odds of that specific outcome would be extremely rare, much like THHHTTHHTH. But the odds that one receiver in the NFL would have a crazy split like that? Not odd at all, in fact. With a large enough sample size, it’s bound to happen, which is why I was able to find three other examples in that article.

This is a very good reason why it’s often inappropriate to apply standard significance tests to football statistics. Surely Jones’ splits would pass any standard significance test, signaling that his wild split was in fact “real” even though we know it wasn’t. With a large enough sample, you would expect to have false positives, which isn’t a knock on standard significant testing. If something is statistically significant at the 1% level, that doesn’t mean you shouldn’t expect to see a false positive if you have 100 different samples. Scott Kacsmar has written that Aaron Rodgers has a miserable 3-18 record at 4th quarter comeback opportunities, a rate far, far below that of other elite quarterbacks. Does this mean Rodgers is inherently bad at 4th quarter comebacks? Maybe so. You could make some assumptions and run the numbers; maybe you’d find that there is only a 1%2 chance of Rodgers having such a terrible 4th quarter comeback record if he was as good as other elite quarterbacks in those situations. So you would confidently conclude: the probability of Rodgers’ record having happened due to chance is extremely low. Therefore, it probably isn’t just chance; he must be inherently bad at 4th quarter comebacks.

But that isn’t necessarily appropriate. Even if all elite quarterbacks, Rodgers included, had the exact same odds when it came to 4th quarter comeback opportunities, and therefore chance was completely responsible for any discrepancy in the results, over time, there would undoubtedly still be an elite quarterback like Rodgers who would have that 3-18 record (about 1% of all star quarterbacks, in fact!). Do you see why?

Some in the statistical community refer to this as the Wyatt Earp Effect. You’ve undoubtedly heard of Wyatt Earp, who is famous precisely because he survived a large number of duels. What are the odds of that? Well, it depends on your perspective. The odds that one person would survive a large number of duels? Given enough time, it becomes a statistical certainty that someone would do just that. Think back to the famous Warren Buffett debate on the efficient market hypothesis. Suppose that 225 million Americans partake in a single elimination national coin-flipping contest, with one coin flip per day. After 20 days, we would expect 215 people to successfully call their coin flips 20 times out of 20. But that doesn’t mean those 215 people are any better at calling coins than you or I am. The Wyatt Earp Effect, the National Coin Flipping Example, and my Splits Happen post all illustrate the same principle. Asking “what are the odds of that?” is often meaningless in retrospect. If you look at enough things, enough players’ splits, enough 4th quarter comeback opportunities, enough coin flips, or enough roulette wheel spins, you will see some things that seem absurdly unlikely.

This isn’t to say that Rodgers isn’t particularly bad at 4th quarter comebacks, as the data suggest. It’s just a reminder that the question “What are the odds of that?” is rarely a straightforward math question, and temptations of the Wyatt Earp trap are often extremely tantalizing and difficult to ignore.

What are the odds of a Roulette Wheel landing on 19 on seven straight spins? One answer is 1 in 114 billion. But with 10,000 roulette wheels across America being played virtually non-stop, we would expect one casino in American to hit the same number on 7 spins in a row about once every 15 months.

I am blurring, and will continue to blur, the distinction between odds and probability. Nothing bad will happen as a result. [↩]

If you assume Rodgers has a 40% chance of succeeding at 4th quarter comebacks — arguably a low estimate for star quarterbacks — then there is only a 1% chance that he would win 3 or fewer times in 21 opportunities. [↩]

Perhaps the reason he doesn’t have very many fourth quarter comebacks is that he isn’t often down in the fourth quarter, particularly lately? I don’t actually know this, but I would guess most of his opportunities came earlier in his career, when I wouldn’t actually expect anyone to make all that many comebacks, or make them at a very high percentage. I predict that his percentage will increase as time passes and he gets more opportunities.

Chase Stuart

Scott does a good job addressing those arguments in his article Andrew. It’s also worth remembering that even in his first year as a starter, he was well above average. There is no doubt that his percentage will increase as time passes on, just like there’s no doubt that Jacoby Jones’ splits among teams according to the alphabet will converge.

It also points out something that makes probability even more troublesome for use in sports analysis: There is often something else affecting probability beyond what we are seeing.

You have a typo you might want to fix, because it may confuse some readers: “The answer is in 114 billion” should be “The answer is 1 in 114 billion.”

Chase Stuart

Thanks Shattenjager — fixed that.

Tim Truemper

In the recent book “Think Fast, Think Slow” (recommended by Chris Brown of Smart Football) the author addresses the problems people have thinking “statistically” vs intuitively. He makes the point that people classify seemingly random occurrences of number outcomes (e.g. 15-24-36-11, 99 and others like that) as all one type while the series of the same number is seen automatically as only one object within a group. Thus, when people are told that the outcomes are equivalent in the chance they would occur at all, their intuition resists that. As you know this occurs with coin flipping probablities too.

In regards to Rogers record in 4th quarter comebacks and the other examples, one is not entirely a chance event (Rogers) while the other examples such as coin flipping is. Are these really comparable examples? Or perhaps I am missing something. I have training in inferential statistics but I’m no expert.

Richie

I hadn’t heard the term Wyatt Earp Effect before, but I have observed it.

On more than one occasion I have heard/read/watched some sort of based-on-truth war story. Sometimes, as I am in the story I am amazed that the person managed to survive. I think, “amazing that this story teller happened to survive all these close calls”, but then I realize I have it backwards. SOMEBODY was bound to survive all these close calls, and this one wrote a book about it. If he hadn’t survived, he wouldn’t have written the book.

OK, we need to rethink the Rodgers 4th quarter stats. How big of a deficit did he have? A one-point comeback and a 20 point comeback are universes apart. Too many variables: defenses, weather, size of deficit, surrounding cast, need for comeback (bad defense? fumbles? career performance by a back-up?) Some situations don’t have odds at all. Jared Goff takes off in Year Two. Rare that a “bust” turns it around. Is his QB rating up 50 points? Can you set odds on how probable a 50 point gain is? And when we use stats like this, we give averages across all the QBs who had these possibilities. That’s a statistical paradigm, which is fine when when flipping coins but I’d bet on Aaron Rodgers before I would Jay Cutler, and I always thought of Wyatt Earp as a skilled duelist, so his stats are a product of his skill, not some statistical refinement. Very intricate stuff.