Articles and Analysis

It started not long after the sunrise last Wednesday morning. One reporter after another wanted to know: Which poll or pollster was most accurate? Which was worst? Votes were still being counted (they still are in some places), results still unofficial, and yet the rush to crown a pollster champion (and goat) was already in full swing.

I am going to write several posts on pollster accuracy -- this is just the first -- but I want to try to emphasize some common themes: First, leaping to conclusions about "accuracy" without considering random sampling is almost always misleading. Second, most of the pollsters came reasonably close to the final result in most places, so they tend to be bunched up in accuracy ratings and, as such, small differences in the way we choose to measure accuracy can produce different rankings. Third, I want to raise some questions about the polling industry's focus on the "last poll" as the ultimate measure of accuracy.

For today, let's start with something simple: It is foolish to focus on a single poll that "nailed" the result is given the random variation that is an inherent part of polling. Because most surveys involve random sampling (even internet panel surveys randomly sample from their pool of volunteers), they come with a degree of random variability built in, something we know of as the "margin of error." If we make the assumption that the final poll's "snapshot" of voter preferences comes close enough to the election to predict the outcome, then the best we should expect a poll to do is capture the actual result within its margin of error (although even then with caveat that the margin of error is usually based on a 95% level of statistical confidence, so 1 poll in 20 will likely produce a result outside that error margin by chance alone). So, if all polls are as accurate as they can be, the difference between "nailing" the result and being a few points off is a matter of random chance -- or luck.

If we are going to try to compare pollsters, the wisest thing to do is to measure accuracy across as many polls as possible, because the role of random chance will gradually diminish as the number of polls examined increases.

Unfortunately, that observation is not stopping a lot of reporters and observers from scanning the final national polls and trying to identify winners and losers. So before moving on to more elaborate aggregations, let's look at the list the final national poll conducted by 19 different organizations over the final week of the campaign. Looking first at the final survey results (as opposed to "projections" that allocated the undecided), we see that all of the polls had Obama leading by margins of 5 to 11 percentage points. A straight average of these surveys shows Obama leading by 7.6% (51.4% to 43.8% ).

How did these polls compare to the actual results? First, let's keep in mind that provisional and late arriving mail-in ballots are still being counted in some places (and may not be reflected in the "99% of precincts counted" statistics typically provided by the Associated Press). The most current and complete national count I can find now shows Obama with a 6.6% lead in the national popular vote (52.7% to 46.1%). Obama's margin has increased by about a half a percentage point over the last week and (if the pattern in 2004 is a guide) may increase slightly more as secretaries of state release their final certified results.

Given that margin, however, just about every national poll can claim to have gotten the result "right" in some respect. Most captured either the individual candidate results or the margin within their reported margin of error (keeping in mind that the margin of error on the margin between two candidates is a little less than double the reported margin of error for each poll). Many that reported more in the undecided category, thus coming in low on individual candidate percentages, offered "projections" that allocated undecided. And remember, the 95% confidence level tells us that one of these polls should have fallen outside of the margin of error by chance alone.

Of course, if we are hell bent on crowning a champion, we still need to decide which accuracy measurement is best (do we compare the margins, how close the poll came to predicting the percentage for one or both of the candidates?) and in some cases, we would need to decide whether to focus on the survey results or the pollster's projection. For Battleground/GWU, for example, we have three sets of numbers: A final poll showing Obama with a 5-point lead and two projections (one from the Democratic and Republican pollsters involved) showing Obama with leads of 5 and 2 points respectively.

I am not devoting much effort here to calculating or charting the accuracy of the individual polls here because, again, random chance is such a big player in determining where each pollster ranks. I am working on another to follow soon, hopefully tomorrow, that will look at how pollsters did in statewide contests where we can aggregate accuracy calculations across multiple polls.

But before moving on from the national polls, let's look at this issue another way. What if we back up and look at the "snapshot" of polls as of Friday, October 31. After all, we have considerable evidence that virtually all minds were made up by the final week of the campaign. According to the national exit poll, only 7% of voters say they made their decision in the final three days (10% over the course of the final week). Although McCain did slightly better -- running roughly even with Obama -- among the late deciders, my colleague David Moore points out that those final decisions would have had little or no impact on the margins separating the candidate over the final week.

The overall performance is about the same. The average the results of the polls in this table, all of which concluded between October 26 and October 31, shows an average Obama lead of 7.1 points (51.4% to 43.0%) -- just slightly narrower than the 7.6% margin on the final round of national polling. What is different, however, is the spread of results. Where the final poll Obama margins varied from 5 to 11 points, just three days earlier the spread was from 3 to 15. The standard deviation (a measurement of the spread of results) was 1.8 on the Obama margin on the final polls, but 3.2 on the polls just a few days earlier.

I do not want to use this table to beat up on any individual pollster, especially since my October 31 cut-off is arbitrary and the field dates vary considerably (the Pew survey, for example started and ended earlier than most of the others). A slightly different cut-off date would have produced a different picture. Obama's 5 point margin on the IBD/TIPP 10/27-31 survey, for example, shrank to just 2 points the next day and then expanded back to 8 points on their final release.

We should remember that pollsters hold the details of their "likely voter models" close, habit that allows many to tinker with their selection and weighting procedures on their last poll. Gallup -- among the most transparent of pollsters in terms of describing their likely voter model -- disclosed a small adjustment in their model made just days before the election (although Gallup's Jeff Jones explained via email that the change did not explain Obama's growing margin over the last few days of their survey).

All of this brings me to the question we ought to keep front and center as we think about the accuracy of state level polls, where we are in a better position to quantify final poll accuracy. How many pollsters were tinkering or adjusting their models on that "last poll" with an eye toward the "final exam" coming on Election Day? And if the final poll results tended to converge around the average on the last round of polls, how much of that convergence was real and how much the result of last minute tinkering with LV models and weighting? And what does all of this say about focusing solely on "the last poll" to as a way to rate pollster accuracy? After all, just 19 of the 543 poll displayed on our national poll table were the "last poll." Which surveys had the biggest impact on campaign coverage?

Interesting subject...hard to see how IBD, Battleground, and Zobgy were doing anything other than messin around with the numbers for political reasons in the weeks before the election- ditto for CBS and some of the other outliers...the fact that they closed accurately only highlights the shinanegans.

I think there is a real danger in using the Nov. 6 results to evaluate the efficacy of polling. A true report card of this years pollster performances should include all of 2008 and not just the November Presidential election. In a two party presidential race party identification gives pollsters a huge advantage. Perhaps it is coincidence but someone using party ID (an 8 point spread according to PEW) to predict the race would have been as close as many of the polls. In the primary races with no party ID to latch onto the polls preformed abysmally. They were never able to predict turnout, likely voters or late deciders. Early in the primary season it seemed clear that demographics were more predictive than polls. It would be a shame if the Nov. 6 results are seen as vindication for a polling methodology that appears fundamentally flawed. In my opinion claiming credit for predicting the Presidential race is like bragging that you can spell CAT when you are spotted the C and the T. A more apt test in 4 years will be whether they can spell New Hampshire.

The standard deviation of the last survey results is 1.8. As you mention, the sampling MoE of the margin between the candidates should be almost double the published MoE, so the listed MoEs are just slightly higher than the actual standard deviations in the D-R margin. Those values vary between 2 and 4. Even if all the voter models were the same, and so all the polls were drawing from distributions with the same mean, there is no way you should produce a total s.d. of 1.8 (absent getting lucky, of course). Allowing for different likely voter models, and hence different means, would only increase the s.d. of the sample.

I'm not a statistician, but I'm sure you could work out a probability of such a small s.d., assuming the means are all the same. I would be surprised if it wasn't tiny.

This would of course not be evidence against any particular pollster, but rather evidence that, in general, fingers were on the scale, and possibly quite a few.

Personally, I'd like to see what would happen if you took all the polls by a given pollster and ran them through Pollster's regression methodology to come up with an adjusted final prediction. And yes, I know I could do this myself with the interactive chart, but I am too lazy.

Unlike the National polls, there are clear winners and losers in the state polls.

For state polls, we typically see a larger error in the non-contested states, and less error in the close states. So (with only a little of the cart-before-the-horse), I have rated the pollsters based on performance in the 21 states where the margin was

...
CLEAR WINNERS
...

SurveyUSA
ARG
PPP

Here's how they did, with + being bias for Obama and - being bias for McCain:

1. The margin of error for a difference of proportions is NOT double the margin of error for a single proportion, given that other choices (aside from McCain and Obama) were available. The correct MOE is a bit less. This is a minor point but it has been bothering me for a long time as no-one (until now) seemed to mention it.

Question: Why don't pollsters report the MOE for the difference in proportions, as that is the main number of interest.

2. Pollsters like IBB/TIPP and Zogby are clearly "fudging" to do well on the "final exam". When pollsters are evaluated the transparency of their methods should be examined. Rasmussen and Gallup do well on this point.

3. Multiple observations are needed. Looking at different state polls is good, but accuracy can also be inferred from the time series leading up the final result. Pollsters that were all over the map relative to the Pollster or 538 aggregate, like Zogby and IBB/TIPP, are clearly wrong much of the time and if they do well on the final it is a combination of fudging and luck.

QUESTION? Why not combine information from different polls that provide clear internals regarding party ID and important demographics and use "meta-analysis". The Pollster national trend and the 538 tracker must be doing something like this implicity. Why not do it explicitly, which would allow for MOEs to be calculated.

In any case, it is great to see someone with knowledge and intelligence doing the evaluations.

Post a comment

Name:

Email Address:

URL:

Comments: (you may use HTML tags for style)

Please be patient while your comment posts - sometimes it takes a minute or two. To check your comment, please wait 60 seconds and click your browser's refresh button. Note that comments with three or more hyperlinks will be held for approval.