Sunday, December 04, 2011

Polling sins: Don't do this either

Second in an occasional series

OK, Santa isn't exactly a polling issue. He's there to remind you that Fox® brand news is always going to be a little different from grownup news. "Gingrich surges to the lead in Iowa GOP poll," on the other hand, is worth noting because NPR made the same bizarre misrepresentation of some pretty straightforward data this morning. (I heard it around 8:20, shortly after opening the Fox homepage.)

The offense here isn't using notionally "objective" data to cloak a partisan agenda, though that happens. It's pretending that a poll supports a storyline it doesn't support, and that occurs across the news spectrum. It's part of the broader journalistic instinct to make things sound more exciting, or authoritative, or novel, than they are. The editor's job is to remind writers that there is no Santa Claus. Polls say only what they say, and that's rarely as interesting as you think it is. Let's proceed.
Here's the AP lede Fox used:

Newt Gingrich has taken the lead in a poll of Republican voters in Iowa, followed by Ron Paul and Mitt Romney.

And, under the hed "Gingrich surges in new Iowa presidential poll," a Reuters lede from Northern Illinois public radio:

A surge in support for Republican White House hopeful Newt Gingrich has made him the new front-runner in Iowa, which holds the first of next year's presidential nominating contests, according to a closely watched opinion poll published on Saturday.

The poll in question is the Des Moines Register's, and it covered 401 Iowans described as "likely Republican caucusgoers," surveyed Nov. 27-30. It shows Gingrich at 25%, followed by Ron Paul at 18% and Mitt Romney at 16%. All those things are true and worth reporting. Our concern is with the extra baggage -- whether this is a "surge," whether Gingrich has "taken the lead" or "stepped into the breach," whether Romney had an "armor-plated" average to dent, and so forth. For that, we need to look at this poll in context of other polls.

The Real Clear Politics poll collection is a good place to start. Fair warning: Never, ever use the "Real Clear Politics average." It's a meaningless number; you can't take a simple average of surveys with different sample sizes, and you can't take a plausible average at all of samples from different populations ("all adults" vs. "likely voters," for example). It doesn't matter what the Washington Post and the New York Times think. Do not use this number. Feel free, though, to ask what a succession of polls suggests about the broad outlines of public opinion. (Drawing conclusions from snapshots isn't the same thing as pretending random snapshots form a movie.) Let's look at a few plausible ways to do that.

It helps to keep in mind a couple of things about "poll says" and what it represents.We're interested in two figures: the population value and the sample value. For any group we're interested in -- say, "likely Republican voters" -- there's a number that represents exactly what every member of that group would say if we polled them today.* Because finding that population value would cost far more time and money than it's worth, we're going to take a random sample and estimate from the sample value to the population.

The most important thing about sampling is that, when it's done correctly, samples form a normal distribution -- a bell curve, if you want -- around the population value. If you take 20 random samples, half of them will probably be above the population value and half below. The more people you talk to in any sample, the closer that sample value is likely to be to the population value. That's the biggest factor in determining the "margin of sampling error," about which more in a moment.

In polling of likely Iowa voters since mid-November, we can note that Gingrich's support is consecutively 32%, 28%, 26% and 25%. Does that mean his support is surging, declining, or forming a normal curve around some yet-undetermined point? We don't know, but some guesses are better than others. He's more likely to be going down than going up; if the Register's poll had sampled as many people as the Rasmussen poll of Nov. 15, we wouldn't be off base to say the decline was statistically significant at 95% confidence. But it's also possible that his support has settled in the upper 20s and this series of polls is doing exactly what a normal distribution would do.

We can be pretty confident, though, in saying this poll doesn't find a "surge" that has propelled Gingrich to the top; he hasn't "taken the lead" in this poll, and he isn't the "new front-runner." A more sober lede might point out that whatever strange implosions have occurred aboard the Cain Train in the past two weeks, Gingrich's support is somewhere between unchanged and slightly lower.

And the other candidates? If Romney actually had an armor-plated average, it was probably dented in mid-to late October. Since then, he's been pretty much where you'd expect a candidate in the upper teens to be. Ron Paul? For a candidate who's been somewhere between 10% and 20% since summer, his support is either (a) fluctuating wildly day by day or (b) somewhere in the middle teens -- exactly what a bunch of frequent small-N polls seem to be showing.

Lessons? Iowa is neither America nor New Hampshire (where Republicans seem to be paying consistently more attention to Jon Huntsman, notwithstanding the brief Cain surge; on RCP's crosstab, it's kind of fun to watch the infatuation with the loonies move from right to left). But more broadly, public opinion rarely changes fast enough to be as interesting as political writers want it to be. That's a problem, because competent polling is expensive, and no one can expect to please the bean-counters with a lede that says "Things aren't especially interesting and haven't changed very much."

Still, that's the job. When the upper management comes along to ask why the headline doesn't say "SURGES INTO LEAD" or "CLAWS BACK INTO FIGHT," our task is to say "Because that's not what the data say."

* If at this point you're noting that we're still 11 months from the election, you may now have a beer on the house. Any proclamations about what will happen next spring, summer or fall should be treated as the guesses they are.

@Anon 1: That would be true if the reporters/stories were discussing the Iowa polling data in the context of the Iowa caucuses. But since they're all breathless talking about the national mood and the national numbers, they're not. They're talking about next November.