Where comparisons exist, meta-analysis by straightforward reduction of poll data outperformed econometric predictive models, poll aggregators (electoral-vote.com, the 3BlueDudes average, and FiveThirtyEight), and the InTrade electronic market. Overall, the results support the conclusion that the collective “market” of pollster methods contains sufficient information to converge extremely well on final outcomes. Adjustments based on undecided-voter assignment, voter demographics, and pollster reliability have the effect of reducing accuracy.

I’m about to fade back into academic mode, which means reduced posting. I hope to write up this whole project for a regular peer-reviewed publication. In the meantime, for one last bit of bloggy goodness, including some pretty, um, robust comments, let’s look at the mailbag.

Overall, the results support the conclusion that the collective “market” of pollster methods contains sufficient information to converge extremely well on final outcomes. Adjustments based on undecided-voter assignment, voter demographics, and pollster reliability have the effect of reducing accuracy.

I’d be willing to go along with that if you replaced ‘have the effect of reducing accuracy’ to ‘haven’t been shown to improve accuracy’.

My bigger problem is the measurement you use (predictions on the day before the election) has little value to me. I’m hard-pressed to see what value it has to anyone, quite honestly. Did anybody ever think that you couldn’t predict the outcome a presidential election the day before it takes place?

Congratulations, Sam. You’ve clearly demonstrated the accuracy of the meta-analytical methodology (given that the aggregated polling data are accruate). If I may say so, the Monte Carlo simulations that sites such as Fivethirtyeight.com employ provide comparable accuracy. As I see it, the meta analysis has its clear advantage in the efficient, closed form algorithm that it employs to calculate the probability distributions.

(2) I think you’re fudging just a bit saying that 364 was THE prediction. That was your “personal prediction” out of the few most likely ones, but I don’t think it was either the median or the mode of the final distribution … was it? In fact, it was based on an additional presumption about cell phones — a manual adjustment to the polls — which sort of undercuts the simplicity argument. A more accurate assessment would be to say that the actual result was one of the few your method identified as the likeliest outcome. Your report doesn’t need any questionable finagling to be vindicated. Correct me if I’m too harsh….

Ockham – In regard to a prediction, for that you should not be watching polls at all, since no poll aggregator makes a prediction. I’ve written about categories of modeling here. For example, FiveThirtyEight did not generate a true prediction, though it was called that. The closest thing you will ever get to a prediction is the type of model constructed by political scientists, which works pretty well but not overwhelmingly so.

The value of the meta-analysis was to provide a tracking index that gave a low-noise measure of where the race really stood. This allowed you to see the history of the campaign unfolding in real time, as I argued in August.

I agree that the election-eve prediction is not terribly surprising, though it was interesting to see what would happen in IN, MO, NC, and ND. Also, it’s a natural thing for people to want from this site after reading it all season. Just imagine an alternate scenario I decided not to offer that information. Seems odd somehow.

Paul – It’s a fair point. However, take a look at arguments I presented here on the routes that could arrive at 364 EV.

The problem was that a single-day snapshot wasn’t the best prediction because of the rapid fluctuations that were occurring. The cell-phone adjustment stayed within the 68% CI, and I’m not sure it was a good idea. Although it was supported by the Pew Center data, it undercuts the polls-only argument.

I think the right approach in the future will be to reduce the fluctuations that occur in the last month by identifying the period during which no apparent change occurred, then integrate polls over that period. This was how I broke ties at a state-by-state level.

Given that there is not much difference between the final EV predictions offered here and by the better poll aggregators (and given that differences in the means or medians of the distribution only really matter when the election is really, really close) can you offer a comparative scorecard of the estimated standard deviations (or another preferred measure of dispersion)?

Differences in this measure are equally important, and perhaps adding demographic or other types of variables to the model have the potential to minimize uncertainty. Then again, maybe not.

Steve – Look at the state-by-state results. Any improvement would have to come from resolving the close states, Indiana, Missouri, and North Carolina, which were within a percentage point both in polls and in final outcome. Such small discrepancies were well within sampling error. Any demographic fiddling would have to reduce the SEM of a state’s estimate to 0.5% to be of use. I don’t think this is likely.

However, on the accompanying thread to this post it was pointed out that proxy variables can be used to fill in where data are sparse. Evidently the Democratic primaries presented such a case.

In regard to the meta-analysis being of use mainly when the race is quite close, that’s out of the control of the analyst. Unfortunately, I made a misstep during the 2004 campaign that partially obscured the value of this approach (though I did make the pure-poll result, which turned out to be right, available). Reasons to care about what I did this year would be (a) if you had an interest in the exact EV outcome in advance, say for oddsmaking; (b) if you had an interest in accurate tracking of the swings of the campaign; and (c) if you wanted to know how valuable someone’s vote was on a state-by-state basis.

Are polls in Nebraska and Maine purely state-wide, or are they split by Congressional district? Because it seems like if only state-wide polls are available, then some demographic adjustments could prove useful in attempting to determine the probability of a split vote.

When I look at your plot of median EV, the final prediction looks to me more like 352 for Obama. Is the difference between the expected value and the mean value that great, and if it is why did you plot the mean value? (The expected value should give you the lowest variance). It looks a bit like you are choosing your statistic after the fact.

Sam — I have been a loyal reader for months. I like your site, and I think your system produced good results.

But you are not being honest in presenting your results.

Over the last two weeks, this has drifted from an understandable desire to put your “unofficial”, yet closer-to-correct number in boldface to what looks like outright deceptiveness. This is unfortunate, because it casts doubt on everything else in the site — it raises the possibility that you are retro-fitting your stats and predictions in ways that are less obvious. It also diminishes from your real accomplishment, since by pretending your prediction was 364 EVs you lose the opportunity to discuss the error bands around your real, official, meta-analysis-generated, prediction of 352, and similar real analysis of the results of the real meta-analysis.

In your post titled “Final Predictions for 2008”, you made the following prediction:

So now you claim “Final polls” as 364. But your real “final polling snapshot” (that’s what you called it in your pre-election prediction post) was 352, and the 364 was in fact *not* the number dictated by your final polling analysis — it was adjusted by your gut guess at a 1% number. This is just dishonest.

Even worse, with respect to the 2004 results, you have now reversed the adjustment you in fact made at the time, and are presenting Bush 286 as your site’s actual prediction — which is false. So in each case, you are cherry picking the number that was *not* your actual prediction and claiming it was.

You could have legitimately compared 286 and 352, and said: “here are the actual numbers the statistics produced, without my thumb on the scale. In 2004 I did, in fact, put my thumb on the scale — and was wrong — and in 2008 I did not, in fact, put my thumb on the scale — and was wrong again. But to see whether and how well the meta-analysis itself works, I am now presenting these both without the adjustments I did (in 2004) and did not (in 2008) add.”

You sort of acknowledged this discrepancy in some older comment threads pointing out your inconsistency, but since then it has only gotten worse.

PS, I also wonder — while you now use your “unofficial” prediction as your prediction for EVs, with the 1% adjustment, do you also put that 1% adjustment into your claimed popular vote prediction? If not, how do you justify using your “unofficial” prediction for EVs, but your “official” for popular vote?

AAF – These are legitimate points. There is a clear need for me to learn from this process, and to identify the proper way to think about the data. Here are my current views.

1) The median EV estimator fluctuated much more in October than in previous months, presumably because of the increased frequency of polls. The obvious solution is to find an appropriate time window for calculating the median.

Obviously, for such a method to be useful, it needs to be applied with total consistency to all years, 2004, 2006, 2008 – and 2010/2012.

2) The cell-phone correction was driven by the Pew Center data. I thought it was of only moderate importance because it kept the estimator within the 68% CI. Now that we know that the cell phone effect was statistically indistinguishable from zero, the better lesson is: don’t add adjustments.

I think a more fundamental driver is what I saw in the data for much of October. The median spent most of the time at 352, 364, or 367 EV. This biased me toward the likelihood that it was the right prediction to make. So reason 1) was right, but I was unable to articulate it. Instead I gave the cell-phone effect as the reason. (Frankly, I am still amazed that the effect might be zero.)

If I had it to do all over again, I would take the more parsimonious approach of identifying the longest time window over which results appear to be stable, then using that period to calculate a final, low-noise snapshot. National polling margins were unmoving at Obama +7%, also supporting the use of a longer time window.

In 2010/2012 the challenges will be to fully automate the process, including time windows. The final step would be to tie me up somewhere so I can’t go dumping extra spices into the soup.