Archives for statistics

Official statistics would certainly suggest that crime in China is extremely low. Murder rates in China are roughly one-fifth as high as in the United States. According to the official crime statistics there, all crimes are rare. China certainly feels safe. We walked the streets in rich areas and poor and not for a moment did I ever feel threatened. Graffiti was completely absent. The one instance where I thought I finally found some graffiti near a train station in the city of Shangrao, the spray painted message on a bridge turned out not to be graffiti, but rather a government warning that anyone caught defecating under the bridge would be severely punished.

Yet, there were all sorts of odd behaviors that made it seem like some crimes were a big problem.

First, there seemed to be an obsession with the risk of counterfeit money. Our tour guides felt the need to teach us how to identify fake money. Whenever I bought something with currency, the shopkeeper went through a variety of tricks to validate the legitimacy of the bills.
Read More »

In the New Republic, Nate Cohnexplores the small but growing role of advanced statistics in football. Projects like Football Freakonomics notwithstanding, the NFL isn’t usually thought of as a realm where stats hold all that much sway, in part because the game is so much more of a complex-dynamic system than, say, baseball. Here’s Cohn on one big change fans might notice if more coaches start relying on statistics:

The one place where fans could see analytics at work is in play calling, which also happens to be the place where analytics could impact the average fan’s experience of the game. The numbers suggest, for instance, that teams should be aggressive on fourth down, and that it’s better to go for first down with a lead in a game’s final minutes than to run the ball on third down to run out the clock. Yet even the teams with well-regarded analytics departments, including San Francisco and Baltimore, largely adhere to a conservative and traditional play calling approach: the coaches “just aren’t listening to them yet,” [Brian] Burke says. And the few coaches with a reputation for following the statistics, like New England Patriots coach Bill Belichick, aren’t even close to as aggressive as the numbers would advise.

The statistician Andrew Gelman has asked us to publicize what sounds like a nifty project: a Year-in-the-Life look at what data hounds and statisticians actually do:

So here’s the plan. 365 of you write vignettes about your statistical lives. Get into the nitty gritty—tell me what you do, and why you’re doing it. I’ll collect these and then post them at the Statistics Forum, one a day for a year. I think that could be great, truly a unique resource into what statistics and quantitative research is really like. Also it will be perfect for the Statistics Forum: people will want to tune in everyday to see what comes next.

In an e-mail, he adds:

I think it would be a great service to the professions of quantitative research to get vignettes from a wide variety of statistical practitioners. (I’d be interested in hearing what empirical economists do during their days too!) So I’d like to spread the net wide and get lots of stories from people.

And yes, for those of you who read the agate type, this post goes in the Bygones Being Bygones file.

Q. Under what circumstances will a voter actually change his/her mind about whom to vote for? I understand that this rarely happens (this study for example), and that most of the action involves undecided voters deciding whom to vote for.

Also, if political scientist are right that voters rarely change their minds, how can a large swing in the polls ever occur? A classic example that your briefly mention in your book is that of Michael Dukakis, who was ahead of GHW Bush by 10% at one point in 1988. -Alan T

A. We see more big shifts in the primaries, when voters don’t have that much information about the candidates. Dukakis was a relative unknown at the start of the 1988 race, before the two parties could advance their own narratives. You rarely see big swings in voter conversion in late stage presidential races, though. If I knew how to cause such a swing, I’d be drawing a big salary from one of the campaigns right now.
Read More »

As Justin Wolfers pointed out in his post on income inequality last week, the Census Bureau was talking statistical nonsense. I blame the whole idea of statistical significance. For its weasel adjective “statistical” concedes that the significance might not be the kind about which you care. Here, I’ll explain what statistical significance is, and how its use is harmful to society.

To evaluate the statistical significance of an effect, you calculate the so-called p value; if the p value is small enough, the effect is declared statistically significant. For an example to illustrate the calculations, imagine that your two children Alice and Bob play 30 rounds of the card game “War,” and that the results are 20-10 in favor of Bob. Was he cheating?

To calculate the p value, you need an assumption, called the null (or no-effect) hypothesis: here, that the game results are due to chance (i.e. no cheating). The p value is the probability of getting results at least as extreme as the actual results of 20-10. Here, the probability of Bob’s winning at least 20 games is 0.049. (Try it out at Daniel Sloper’s “Cumulative Binomial Probability Calculator.”)
Read More »

New data on income inequality in the United States were just released. And they provide a useful teaching moment. The graph below, which comes from the Census Bureau, shows the evolution of the Gini coefficient since 1967. It’s pretty clear that this measure of inequality has been rising pretty much through this whole period.

Based on the Gini index, income inequality increased by 1.6 percent between 2010 and 2011; this represents the first time the Gini index has shown an annual increase since 1993, the earliest year available for comparable measures of income inequality.

Miguel Sancho, a senior producer with ABC’s 20/20, writes in with a question I’ve often wondered myself but cannot answer. Can you?

A thought – every hurricane season we see headlines ascribing blame for lives lost on a given storm. “Hurricane Irene Blamed for Five Deaths in North Carolina,” etc. Certainly when people drown, are killed by floating debris, or die because they can’t make it to the hospital, the statistic sounds logical. But it occurred to me that perhaps, in the interests of fairness and accuracy, we should also give Hurricanes “credit” for lives not lost thanks to the interruption of normal human activity. How many homicides, vehicular fatalities, or drug overdoses didn’t happen [last] week in New Orleans, for example, because people were otherwise occupied protecting themselves from Hurricane Isaac? Just wondering if anyone has ever studied this, comparing average morbidity rates in hurricane zones to the stats during the times when hurricanes roll through.

This is not to suggest that overall, hurricanes are a social good. Bastiat’s broken-windows fallacy and all that. But perhaps in this one particular metric, we aren’t seeing the whole picture.

Please don’t judge Sancho’s observation as insensitive to the death and destruction caused by the hurricane itself. I can assure you he is not.

A new paper by psychologists E.J. Masicampo and David Lalande finds that an uncanny number of psychology findings just barely qualify as statistically significant. From the abstract:

We examined a large subset of papers from three highly regarded journals. Distributions of p were found to be similar across the different journals. Moreover, p values were much more common immediately below .05 than would be expected based on the number of p values occurring in other ranges. This prevalence of p values just below the arbitrary criterion for significance was observed in all three journals.

The pattern of results could be indicative of dubious research practices, in which researchers nudge their results towards significance, for example by excluding troublesome outliers or adding new participants. Or it could reflect a selective publication bias in the discipline – an obsession with reporting results that have the magic stamp of statistical significance. Most likely it reflects a combination of both these influences.

“[T]he field may benefit from practices aimed at counteracting the single-minded drive toward achieving statistical significance,” say Masicampo and Lalande.