Friday, December 16, 2011

"All models are wrong but some are useful"

UPDATE, Feb 9: Kaiser Fung's blog has a useful illustration of the variability point discussed below in a post about representations of LA's annual rainfall, here.

Kaiser Fung, a statistician and blogger (also here - I've written about his excellent site, "Junk Charts," earlier) has written a terrific book about what he calls "the statistical way of thinking." In it, he explains how to think about what happens when numbers don't lie. Here's a screenshot of the cover, from Amazon.

In five easily digested chapters, using events from newspaper reports or even everyday life to illustrate, he makes five basic points about statistical analysis that everyone should know.

1. The average isn't as important as the variability around the average - how large the variation is, how often it happens, and why are all more useful information. Fung illustrates this principle by discussing the wait times for rides at Disney theme parks. Think you have a long wait ahead of you? The "Fast Pass" allows you to come back at a set time, when numbers are lower. You feel as if you are not waiting, but of course you've gone and done something else while you waited.

2. Often, variability doesn't have to be explained by causes - correlation can be enough to make the information useful. Fung contrasts an epidemiological search for an outbreak of food-borne disease with consumer credit ratings. In the latter, correlation is a good enough reason to extend or deny credit. Epidemiologists need to be more cautious, and Fung reiterates Bradford Hill's nine aspects of cause and effect.

3. Not everyone or everything can be aggregated, and aggregation can mask differences. Fung points to differences in SAT scores among black and white students, which are masked when all scores are analyzed together. "Black students of high ability score the same as whites; the scores for low ability students of both races are also the same. And yet the average white student score . . . is higher than the average black student score. Due to superior educational resources, 60% of white students have high ability compared to only 20% of black students." The averages mask the weights, or the greater number of white students in the high-achieving group and of black students in the low-achieving group.

4. Making judgments based on statistics can force you to balance two types of error, over-inclusion and under-inclusion. Unfortunately, these two types are inversely related, so by lowering the chances of one you increase the chances of the other. Fung's example in this case is drug testing for athletes, and he shows that by setting the cutoffs for positive tests very high, there a lot of false negatives among tested athletes. Remember that next time you read a statement saying an athlete has never failed a drug test.

5. Statistical testing can help us decide whether available evidence explains an event. When sellers of Canadian lottery tickets turned up as winners at a rate considerably greater than chance would have us expect, Fung explains, it was unlikelihood of it happening (one in a VERY large number) that set authorities down a path to correct the wrong and change processes so that it was harder for vendors to cheat.

This is a clearly written, lucid book. After carefully setting out and illustrating the five main concepts, Fung demonstrates how statistical tools are often used together, going back to his original illustrations with a deeper explanation. It's very satisfying, and very clear. I highly recommend it.