I just finished Charles Seife's book Proofiness. It's an OK read - the writing leaves a lot to be desired, and some chapters turned out much better than others. In any case, Seife's thesis - that numbers are being abused in many realms of life - is compellingly argued.

One theme that emerges throughout the book is that media and journalists are particularly notorious for abusing any number that they can get their hands on. It's assumed that numbers are purely objective, because 2 plus 2 always equals 4; but statistics are only valid if they're based in logically-grounded theory. Often, they aren't.

I've had my own experience with this. Last spring I did an analysis of college degree density in America's largest metro areas. The post was divided into four sections. The first section was a simple list of geographies, sorted by college degree holders per capita, which I calculated by dividing the number of people with a college degree by the land area of the city (or county).

I knew that such a list, on its own, was bogus, because the number of college degrees per-capita would simply be a function of the number of people living in a given geography. So parts 2 and 3 of my analysis used regression and residuals analysis, respectively. In my opinion, this was by far the most interesting finding of the analysis. When controlling for overall population density, how do various cities and counties stack up?

When the post went viral, nearly all of the coverage focused on the first section, with virtually no mention of the rest of the analysis. This was extremely frustrating, especially when dramatic headlines popped up describing on section of my analysis as a study of cities "from smartest to dumbest" or something like that. Unfortunately, in order to discover why the numbers look the way they do, readers had to click through to my original post, because too many media sources weren't giving them the information they needed to draw appropriate conclusions.