In All Probability

As a teenager, I had the good fortune to meet
Tom Körner. He had written a book,
The Pleasures of Counting, that opens with a story about John Snow.

Snow was a 19th-century English physician who painstakingly collected and
analyzed vast amounts of data to convincingly argue that cholera spreads
through contaminated drinking water, and not, as was once widely believed, from
some kind of air pollution.

This struck me as a powerful example of how mathematics, and in particular,
statistics, can impact our lives. How many more would have succumbed, had
the true cause remained hidden?

How did this happen? I’m sure these famous quotes were composed mainly in jest,
and perhaps referred to shady accounting more than actual calculation. But
these days, even the mathematics itself seems suspect:

Statistics is indeed a troubled subject. It turns out some guy named
R. A. Fisher is to blame.
Fisher had a tragic combination of gifts and flaws that led to today’s
erroneous orthodox statistics. (Despite an ever-growing mountain of
evidence, Fisher steadfastly refused to believe smoking causes lung cancer.
How good could his methods be?)

My undegrad introductory course on probability and statistics followed
Fisher’s dogma. As a result, I felt that the methods they taught seemed more
like black magic than mathematics. But I was convinced that the lecturer only
seemed to be teaching superstitions because my understanding was too shallow,
and I concluded I must have a poor intuition for the subject.

Years later, and determined to conquer my weakness in this area, I went back to
my textbook. And some other books. I discovered the shocking truth:
my undergrad textbook is wrong. For once, a crazy conspiracy
theory was true and They really were corrupting us all with Their
false mathematics.

Disclaimer

I heard from Fred Ross that this link got posted to
Hacker News. I’d like to remind visitors that these notes are meant for my
personal use; I’m happy if others read them, but be aware the material is
derived from what little I’ve read.

Fred Ross also supplied the following summary which includes suggestions for
further reading:

The underlying theory that justifies most inference (Bayesian, minimax, etc.) is decision theory, which is a subset of the theory of games. Savage’s book on the foundations of statistics has a very nice discussion of why this should be. I learned it from Kiefer’s book, which is the only book I know of that starts there. Lehmann or Casella both get to it later in their books.

The justification for p-value is actually the Neyman-Pearson theory of hypothesis testing. The p-value is the critical value of alpha in that framework. I wrote a couple of expository articles for clinicians going through this if you’re interested.

Jaynes was a wonderful thinker, but be aware that a lot of the rational actor theory breaks down when you don’t have a single utility function. That is true of using classes of prior (see the material towards the end of Berger), or in sequential decision problems (look at prospect theory in psychology, where the overall strategy may have a single utility function, but local decisions along the way can’t be described with one). So the claims in the middle of the 20th century for naturalness of Bayesian reasoning haven’t held up well.