Apologies for the long delay since my last post. Christmas is partly to blame, of course. But it's also because I'm currently traveling in northwestern Australia, and the last three (THREE!!!) places we've stayed have claimed that their promised Internet connection is "broken." The place before that didn't even pretend to have Internet.

So I'm living an offline life, which is of course a good thing in some ways. I'm currently reading a book made out of paper, which is sort of fun. It's Daniel Kahneman's Thinking, Fast and Slow. And it's totally mind-blowing. I can't recommend it highly enough, though it's not an easy (i.e. Gladwellian) read.

One of the many, many things relevant to this blog that Kahneman talks about is how and why our statistical intuition is so bad. An example (which he takes from statisticians Howard Wainer and Harris Zwerling):

A study of the incidence of kidney cancer in the 3,141 counties of the United States reveals a remarkable pattern. The counties in which the incidence of kidney cancer is lowest are mostly rural, sparsely populated, and located in traditionally Republican states in the Midwest, the South, and the West. What do you make of this?

As he points out, it's pretty easy to come up with plausible explanations for this, related to pollution, food patterns, lifestyle, or whatever. This is what we do on a daily basis when we hear the results of studies on health or athletic performance - we tell ourselves stories to explain them.

Now, what happens when you consider the counties with the highest incidence of kidney cancer? It turns out that these counties are mostly rural, sparsely populated, and located in traditionally Republican states in the Midwest, the South, and the West. And once again, it's easy to come up explanations based on lack of access to good medical care, higher alcohol and tobacco use, and so on. But both these "theories" can't be right at the same time!

To understand what's going on, consider another example. A big jar of marbles contains exactly half white ones and half red ones, and two people takes turns randomly selecting a handful of marbles and counting their colors:

Jack draws 4 marbles on each trial, Jill draws 7. They both record each time they observe a homogeneous sample - all white or all red. If they go on long enough, Jack will observe such extreme outcomes more often than Jill - by a factor of 8 (the expected percentages are 12.5% and 1.56%).

In other words, the counties with the highest and lowest incidence of kidney cancer have nothing special about them other that small populations. It's an artifact of sample size. Of course, we all know that large samples are more precise than small ones. But we're less likely to realize that small samples yield extreme results (at both ends of the spectrum) more often than large ones, Kahneman notes - even though that's just another way of saying the same thing:

The first statement has a clear ring of truth, but until the second version makes intuitive sense, you have not truly understood the first.

And Kahneman isn't just trying to tell us that we're stupid. In fact, his line of research (which eventually led to a Nobel Prize) started more than four decades ago with the observation that he and other statistically trained researchers had consistently and predictably poor statistical intuition, but nonetheless had unjustified confidence in the conclusions their intuitions suggested. It's an inevitable part of being human, and the only possible response is to be humble about what (we think) we know.