Friday, March 30, 2012

Nate Silver posted a very long takedown of election predictions, some of which are by economists and some by political scientists. He proves, without a doubt that some people, or at least some book publicists, have not been at all careful about their claims (I don't mean to be snide, there; being careful about one's claims is actually about the most important thing an academic can do when taking their work public). His conclusion:

The “fundamentals” models, in fact, have had almost no predictive power at all. Over this 16-year period, there has been no relationship between the vote they forecast for the incumbent candidate and how well he actually did — even though some of them claimed to explain as much as 90 percent of voting results.

It's an interesting post, and I certainly agree that calling people on their predictions is a useful thing to do. I started taking notes...and then John Sides beat me to the response. Which is just as well; John knows this part of the world quite a bit better than I do, anyway, and I agree with almost everything he said.. So rather than composing a proper post, I'm just going to splatter my notes out here, starting with three general points, all of which John covered better than I do:

1. This can't be said enough times, and I wish Silver had said it up front. Election prediction models are a tiny sliver of what political scientists do. To the extent that they're looking on as anything but a parlor game, it's as a guide to explanation, not prediction.

2. And with that: it's not my field, and I don't keep up with it nearly as closely as I perhaps should, but really political scientists know quite a lot about voter behavior and elections. Almost all of that has nothing to do -- nor, really, should it -- with making the best election prediction models.

3. What's more the near-consensus among political scientists is pretty simple: the economy plays a major role in elections, but campaign and candidate level effects can also be real.

4. On prediction systems. There are two reasons prediction systems can fail: because the thing they're trying to predict is not predictable, or because they're not very good models. The first would be true if, for example, candidate and campaign effects were very large -- but it also would be true if perfectly predictable effects of economic performance depended on data that were not available until after the event (or even before the event but after the prediction).

5. If the problem was that the models stink, then what we might find are some predictors that do much better than others, even if it just means they stink less.

6. That appears to be the case. Three of the predictors -- Abromowitz, Wlezien & Erickson, and Hibbs -- do quite well. Their average error (not RMSE, just simple of the error Silver reports) for the two-party margin of difference is 3.3, 3.7, and 4.6, respectively. Out of fifteen predictions among them, only a couple are stinkers -- Hibbs on Gore/Bush missed by 8.7 points, and W & E miss that one by 9.5. And they get the winner right every time.

7. Then again, Hibbs is using two variables and no polling. With that, you can get an average miss of under five points? I'll take it!

8. Major warning: it's certainly possible that those three have just been lucky. However, what's reassuring is that they are consistently among the best. So that even in 2000, when none of them do particularly well, they rank 3rd, 4th, and 6th out of 9 predictions.

9. This makes me strongly suspect -- but doesn't prove -- that what we have are good and bad predictors, not an overall failure of prediction-systems-in-general, or something that is impossible to predict. Again, it doesn't prove it!

10. However, treating these three systems as similar to, say, the Lockerbie predictor -- which has missed by 19.3, 12.6, and 8.9 points in its three trials -- doesn't make a whole lot of sense to me. If someone publishes poorly constructed Senate election projections that perform poorly, does it make Nate Silver's Senate projections worse? Which suggests it's not good enough to just look at and average all predictions; you need to look at and critique which ones are well constructed and consistent with what we know generally about elections.

11. Not that I'm doing that in this post!

12. The one disagreement I have with John's response is that he gives the models a pass because they almost invariably have at least picked the right winner, and after all that's what we really care about. I'm not sure that's right. For one thing, it depends a lot on what the point is of doing the prediction. If it's to test what we think we know by projecting out into the future, then our demands are very different than if the reason is to satisfy our curiosity. Both of those are legitimate things to do, but they imply different standards to use in evaluating a predictor.

13. Silver makes much of the distinction between pure fundamentals predictors which include only non-campaign indicators, and those which incorporate polling information. That's reasonable, but I believe (and I haven't looked at all of them) that there's a wide range in how these predictors use polling. Generally, if someone uses horse race numbers from September (and I don't know that any of the models Silver uses do that) then it's not going to be nearly as useful as one that looks at presidential approval many months out.

14. Basically, what we want for an explanatory-type predictor, it seems to me, is something that excludes the influences of the campaign and of non-incumbent candidates. A pre-campaign presidential approval number essentially incorporates the effects of whatever events have happened during the campaign along with any residual popularity of the incumbent. I can see both advantages and disadvantages compared to either ad-hoc dummy variables (for, say, incumbent party while a city was flooded) or ignoring all the events that can't be systematically accounted for (such as the economy or wars).

15. By the way: if the question is whether the models in general work, then I think Silver makes the wrong choice by including multiple version by the same author(s). If the models were updates, then only the last one should count; if they were released together...I'd probably just dismiss those altogether. Silver is of course correct that anyone who releases multiple versions and then only touts the winners is misbehaving. But that doesn't really speak to the question about whether the models as a whole are capturing anything.

16. That said, as I eyeball it, I'm not seeing that it matters much; again, just from a very quick glance, it doesn't appear that the highest-number version does (much? any?) better.

17. Silver mentions the issue of revised data. This is, again, a fairly big deal, although how it affects any of these predictions I have no idea. My general memory of these things is that there were significant post-election revisions in three of the five cycles. In 1992 the revisions were improvements, which would have made Bush a more likely winner and, again just eyeballing, tending to hurt the models' performance. In 2000 the revisions were downward, which would have helped W. and made most of the models' performance better; in 2008, the revisions were again downward, which would have helped Obama, helping some predictors and hurting others. Caveat: that's again based on both my memory and on eyeballing -- and of course different predictors use different numbers, so the revisions could easily affect different models differently.

18. Of course, if the purpose is to successfully predict the future, then it's a fairly big deal if it turns out that the economic data just aren't good enough quickly enough to be able to do so. If it's explanatory, then it's no big deal at all to go back and plug in the right numbers after the fact.

19. I'd be interested to see if the separation between "good" and "bad" models I noted above holds up if final economic numbers are plugged in. Not enough interested to do the work myself.

20. Silver makes much of the spread between the models -- not only are their errors large, but they don't vary together. That's a good point -- but again, if some of these models are better than others, then it's not all that interesting to know that the "bad" predictors are all over the place. The three "good" models are relatively tightly bunched in all five cycles, with no more than a six point spread.

21. I think I'm repeating this for the third time, but it's important: I don't know that the best-performing predictors are actually good, or that the worst-performing are bad. Could be luck.

22. And last point. I can't find it, but if I recall correctly someone (Nyhan?[See update below]) showed that a weighted average of the predictors does an excellent job. Silver says the predictors are still mostly useless averaged; is that true with a weighted average, in which the "bad" models would count for much less than the "good" ones?

7 comments:

Good points.The point that I would make to Nate is that you have to actually get under the hood of the models. Just looking at predictions and RMSE or R2 or whatever your metric is misses the point. What would be much more interesting is a comparison of the beta weights in models that include both polling and fundamentals. And, methinks that the fundamentals are going to really swamp the polls.

It's not that polls don't add useful details for the prediction. It's that the basic nature of the race partially determines them, too. I liken it to golf. Yes, the short game is very important. But, if you face 10 degrees in the wrong direction off the tee, you're hosed. In golf, knowing which general direction to face off the tee is easy...but it's also fantastically importnat. Drive to the east on a northerly fairway and you're not going to do very well.

I don't do election forecasts for a couple of reasons (a) it's not my field and (b) I don't want to look stupid in public. I will, however, say a couple of other things, based mostly on having a little knowledge of models and of statistics. (1) You probably want to look at different levels of elections differently (House of Representatives separately from Senate separately from Presidential separately from state legislatures...). (2) If you're trying to create a model, in most cases you have relativelyfew observations. Take Presidential elections. This year is, what, number 56? 57? If you want to use any sizable number of explanatory variables, you're going to lose all of the 18th century elections and probably all of the 19th century elections and at least some of the 20th century elections. Good luck developing a useful model from what's left.

But I should really let people who create these models defent themselves...

I have to say, if these predictors are parlor games, and viewed so by the poli sci community, then there is an enormous amount of gassage going on in the press and blogosphere about things that don't amount to a hill of beans (not that such should be a surprise to anyone).

To pick up on doc's point, it would be interesting to know the status of Congressional predictors vs Presidential predictors. There has been a lot of talk the past couple of years, including on this blog in the last couple of weeks, about the inherent illogic of the GOP strategy in the House. Taking tough votes on things that can't pass, backing wildly unpopular policies, following a delusional strategy, etc. Yet, if Abramowitz is right with his Congressional model, this has hurt them ... not at all. He is predicting holding their losses in the House to three and a 6-7 seat pickup in the Senate. That certainly doesn't sound like a party that has been hurt. Is his model flawed (I believe he himself admits that it has a very large error on the Senate side)? Or do all these votes in the House we have discussed on this blog ... actually matter not at all? Does the debt ceiling debacle that we talked about damaging Congress ... actually matter not at all? Is the delusional attachment to unpopular policies actually a delusion on our part (about the impact of that) and not theirs?

Maybe Paul Ryan is right, and there simply isn't a penalty to be paid for going off the deep end. Or maybe the penalty is only paid once the policies actually get passed? If that is so, we are in for a world of hurt, since the Republicans will pay no penalty for their policies until they have actually passed regressive tax cuts and damaged Medicare. Bummer.

The idea is that Abramowitz's predictor doesn't include (except in the generic ballot so far) the effects of GOP self-damage. So, first, if House Republicans were doing more popular things, they might be doing better on the generic ballot and thus predicted to do better. But, second, it's at least possible that the self-damage could wind up making Republicans underperform compared to the predictors.

Thanks for the clarification. In other words, (or maybe pretty much the same words) Abramowitz's model just isn't built in such a way that it can take into account the effect of the things we have been talking about. Fair enough.

Still, the general question remains, is there any consensus, or informed body of opinion at all, with regard to the general accuracy of Congressional models versus Presidential models? It would seem that Congressional models would be a lot trickier, since local factors would presumably loom much larger in the individual races. But maybe that is just a naive view.

Now we're getting deeper into my area. For my money, the model to beat all models is Jacobson's. He gets R squares north of .8 with just a few variables. And the one that does just a TON of heavy lifting is a proxy for a local variable: candidate quality. What Jacobson does is total up the % of Dem challengers who are quality (ie, have won an election for SOMETHING before) and subtract the % of the GOP challengers that are similarly qualified. That quality difference just explains a ton. Now, itself, it's also predicted strongly by the state of the economy (transformed right: so the bad economy in 2010 is good for the Reps, and the bad economy in 2008 is good for the Dems, because voters anchor on the party of the president).

What's really funny is that my own work (unpublished) and that of Eric McGhee (also unpublished, and you'll see why in a second) trying to use this same variable measured better get it to NOT work. (This is unpublished because it's a null finding that contradicts accepted theory and what we KNOW to be true: incumbents win their elections. Either we've done something wrong, or we've got the wierdest emergent property ever...one that is emergent only in the way we count it, not in the simpler, less sophisticated ways that OTHERS count it! Until I figure out how to square this circle, this paper remains on the shelf.)

Now, does it matter what happens in Congress? Well, the more popular models don't include that. Rather, what you have are contributions looking at specific election years that often find effects, usually in the form of people who voted for X were more likely to lose. So, there's a few people that have found that for votes on the ACA, for example. (It's naturally not very practical to do the hypothetical JB often poses: how bad would it have been if it hadn't passed? Just as it will be difficult to prove effects of voting for the Ryan budget because they already voted for it (unless anyone changed their votes, but your leverage is going to be SO TINY on that, that you won't find anything)

But, to return to the question, does it matter what happens in Congress? My dissertation says no. Sarah Binder's research suggests no. What I found was the passing major legislation helps presidential approval. It has no real impact on congressional approval besides that moderated through presidential (when we ask people what they think of Congress, we're doing even more of a "how are things going" than we are with the presidency, and so the "right track/wrong track" element carries through both measures). And, studying the effects on congressional elections (which, I modeled using some really cool data that took me MONTHS to collect, so I have more proxies for previous electoral margin than anyone else I'm aware of)....found no effects. So, that's overall productivity of major laws, and over the entire postwar period. Individual votes on individual bills can still have an effect, even if aggregate productivity doesn't. Which makes sense, since how do you credit invidual members for the productivity of the system?

When I read Nate Silver, I often get the impression that he has made significant money on predictive models of how certain sports events turned out.... especially with his emphasis on the importance of point spreads. He expresses no opinion on policy or partisanship. Politics has interest to him because it is a new arena for his tradecraft, and apparently pays well. He's a breath of fresh air.