May 22, 2008

How to predict the general election

It's really too bad that the folks behind Five Thirty Eight.com have gone and created such a compelling website based around state-by-state general election polling. It's all really well done and, as such, I can't really bring myself to look away. But this stuff is all really and truly meaningless. Six months ago, no polling showed Barack Obama winning the Democratic race, and no polling showed John McCain winning the Republican race and the general election is about six months away.

The comparison in the last sentence isn't valid, however. Presidential primaries are inherently unpredictable for reasons including the lack of clear ideological differences and the greater importance of perceived viability. General elections, by contrast, can be forecast with a high degree of accuracy.

That doesn't mean that state-by-state polling is the right way to predict outcomes -- previous research has shown that macro-level variables like the state of the economy, job approval of the president, war deaths, and/or the length of the incumbent party's time in office explain most of the variance in the national two-party vote. Yglesias and others should focus on those predictors instead. But UW-Milwaukee's Tom Holbrook did find that spring 2004 polls were reasonably predictive of the eventual outcome. For instance, here's his plot of May polls against state popular vote totals in November:

And here are Holbrook's conclusions about the predictive validity of the data:

[W]hen the polling margin was fairly narrow the outcome was truly up in the air. In fact, across all four months [March-June] the poll result called the wrong winner in 17 of the 36 cases in which Kerry's share of the two-party vote in trial-heat polls was between 47% and 53% (this excludes two cases in which the poll result was tied). These results suggest that we should take the term "toss-up" very seriously. At the same time, the poll result was wrong in only 3 of the 44 cases in which Kerry's poll margin was outside this range.

As for me, I think Douglas Hibbs's forecast that the Democrats will get 53-54% of the two-party vote is a reasonable baseline, though I fear that an anti-Obama backlash will reduce that total by 2-3 points. (The Intrade futures market puts the odds of a Democratic win in November at 62%.)

Comments

spring 2004 polls were reasonably predictive of the eventual outcome

I'm skeptical of that sort of analysis - especially for a single election. It's far too easy to find counter arguments like Michael Dukakis' double digit lead in the polls over George H.W. Bush in the summer of 1988.

General elections, by contrast, can be forecast with a high degree of accuracy.

And yet, they haven't been. Unless I'm misreading your earlier post, the Hibbs model is constantly updated based on the results of the last election and then is validated by "predicting" the outcomes of past elections.

At what point did the Hibbs model (or any other model) begin correctly predicting future races?

(1) My criticism of a frequently updated model was about Ray Fair's model, not the Hibbs one.

(2) To be fair, the graphic (created by Lane Kenworthy) in my post is just a bivariate plot of his weighted per-capita income growth variable against the presidential vote, not the forecast of his model as such. It's true, though, that it didn't fit well in 1996 and 2000 -- here's Kenworthy's discussion:

You can see in the first chart above that 1996 and 2000 are not predicted very well by income growth, and unlike 1952 and 1968 they cannot be explained by war casualties. The model predicts 2004 accurately, but perhaps that is a fluke, due to fear of terrorism and the incumbent president’s at-that-time seemingly successful prosecution of the Afghanistan and Iraq wars.

Why might the model no longer work well? One hypothesis is that as a society gets richer, pocketbook issues recede in importance for voters.

A second consideration is that growth of per capita personal income is no longer a useful indicator of how voters have fared economically. In recent decades the bulk of income gains have gone to a small slice of the population — those at the top of the distribution. Measuring growth via an average misses this.