Motivation

Why is probability an appropriate way represent uncertainty?

Statisticians typically emphasize the need to estimate uncertainty in inference and prediction. Despite making heavy use of randomness in statistics, we rarely explain why randomness is an appropriate tool to use to model the world. If we would like others to use statistics, I believe we should provide an explanation of the importance of probability. This post contains one explanation I find personally satisfying.

A starting point

For the purposes of this post, I assume that we are interested in making some sort of inferential or predictive statements. I also assume that we are interested in determining how uncertain these statements are.

The world is not random

I don’t have an opinion about whether the world behaves randomly or not.

However, I think many of us have an implicit belief that if we had enough data about the state of the world, we could model it deterministically. We need to address this point, regardless of whether or not it is true. If the world is inherently deterministic, it doesn’t make sense to treat it as random, and probability distributions are the wrong abstractions to use in our models.

Sweet jesus, not the swans again

Let’s switch gears for a moment and consider a classic example of inductive reasoning:

P1. We have seen \(n\) white swans, all of which have been white.
C1. All swans are white.

Here the conclusion that all swans are white is not guaranteed to be correct. For example, we could see a black swan at any point and learn that the conclusion is incorrect.

But most people would agree that our confidence in the statement is somewhat reasonable, and that our confidence in it should increase as we see more white swans.

There is, still, uncertainty associated with the statement All swans are white. Since we don’t know the structure of this uncertainty, we call it epistemic.

Next a statistician steps in. Why not make a more precise statement, they say, and assume that some proportion \(p\) of swans is white. Let’s put a \(\mathrm{Beta}(\alpha, \beta)\) prior on \(p\), and treat the count of white swans as being Beta-Binomial distributed. Calculate the posterior distribution and you’ll have an estimate of the distribution of \(p\)!

This is appealing – our inferential procedure still allows us to claim that all swans are white, but we can also make a number of other, equally precise claims, all with uncertainty estimates. Even better, the uncertainty is now probabilistic rather than epistemic1.

This move solves one problem but introduces another. Our uncertainty about the proportion of swans that is white is probabilistic, but we haven’t eliminated the epistemic uncertainty from our inference. We’ve just moved it – now we have to wonder if the distribution of swans is truly Beta-Binomial. In other words, is the world random? This uncertainty is epistemic.

But whether or not our uncertainty about the color of swans can be quantified is perhaps not that important. If we specify a flexible enough probability distribution, we may be able to approximate unquantifiable certainty with quantifiable certainty2.

The more time I spend becoming familar with probability distributions, the more convinced I become that this is the case. Certainly this approximation argument isn’t formal, but I find it compelling.

There are some more traditional explanations of the role of probability in statistics, but these have always felt more pragmatic than fundamental to me3.

Statistical education

This is my personal justification for the use of probability in statistical inference. I’m sure it’s naive in many ways, and that there’s lots of more interesting work in the philosophy of science addressing this.

But correctness is not my goal here. I hope that others find my argument compelling. More broadly, I believe we need to find compelling arguments that randomness matters, and teach them to the people learning to do statistics. What precise argument convinces someone in the end doesn’t bother me much – I just hope they are being exposed to many such arguments.

In particular, I worry that we are not doing a good job at reaching undergrads. Many introductory statistics classes begin with half a semester of probability. A statement like “we want to make probabilistic statements” is little motivation for seven weeks of classes4.

Epistemologically inclined students may presume probabilistic uncertainty is for mathematical convenience only, and dismiss statistics on this basis5.

Summary

I suspect that people have varying intuitions about how meaningful it is to calculate probabilities of real world events. It isn’t immediately clear that probabilistic uncertainty is a good way to think about uncertainty, which may be unquantifiable.

My personal solution to this dilemna is to understand probability models as approximating (potentially) unquantifiable uncertainty. I imagine there are a number of other interesting resolutions in a similar vein.

If you know of other arguments for probabilistic uncertainty, or disagree with mine, please find me on Twitter or leave a comment! My thinking is very much in progress and I’d love to have a discussion about this.

Traditionally, this type of uncertainty (probabilistic) has been refered to as aleatoric. The phrase is Latin, with alea meaning “a game of dice” and tor meaning “player”. Roughly, we can think of epistemic uncertainty as qualitative, and aleatoric uncertainty as quantitative.↩

Trying to imagine forms of uncertainty other than probability is an interesting exercise. The Science of Conjecture touches on some of these notions.↩

I have come across two other explanations of the role of randomness in statistics.

Probabilistic prediction allows us to maximize utility when incorrect predictions have differing costs.

I find these explanations unsatisfying because they motivate probabilistic thinking only in certain statistical subfields, rather than across all of statistics.↩

Discussion of the importance of randomness remained scant long passed my introductory classes, and indeed appeared in only three places during my collegiac career.

My sophomore year we discussed exchangeability in an experimental design class, although only in informal terms. It wasn’t until this recent spring that I discovered Miguel Hernan’s book draft that I started to play with these concepts more formally.

My junior year, in a machine learning course in the computer science department, we briefly went over an exercise about optimal prediction in light of different costs for false positive and false negatives.

Finally, in my senior year, we talked about randomness in a graduate level Bayesian course, although mostly in comparison to frequentism.

Collectively, my education on the importance of randomness has consisted of perhaps 1 - 3 hours of lecture, all in the latter half of my degree. This feels insufficient.↩

I wish also statistical education spent more time discussing the mathmetical convenience of statistical assumptions. The more convenient a model, the greater the amount of time we should spend demonstrating that the model is suitably flexible to capture real world phenomena. As a corollary, we should have a stable of models that are generally considered too convenient to be used for anything other than pedagogical purposes.↩