Medians win, except…Angle Defeats Reid!

November 3rd, 2010, 9:08am by Sam Wang

I will write more later, but overall the simple approach did well…though not as well as expected. Current actual outcome: Senate 52D, House about 245 243R. The House discrepancy, about 1513 seats, is equivalent to about 1.6 1.4 percentage points of popular vote. Not ideal, but pretty good. If you’re dissatisfied, consider that this transparent, low-assumption calculation did as well as assumption-laden models such as Pollster and FiveThirtyEight. Furthermore, those models only put a House takeover at ~80% probability, which was obviously wrong.

The Senate medians got the outcome of three races correct (WA/IL/WV) but not the closest race, CO. But then there’s this:

This error is so statistically glaring. I am not sure what to think about it. More discussion of this, and the House result, later.

19 Comments so far ↓

You are surprisingly content with these results given your predictions yesterday of the House being 228-232 Republicans with 95% confidence. 245 seems very far outside that range.

I’m also not sure how you can say that you did as well as 538 in this regard (though you have done well in the past). While 538 was predicting very similar numbers for the House, rather than near certainty that the actual result would fall within a few seats of its average, 538 cautioned readers on the amount of uncertainty in the polling data this year. What actually happened fell within 538’s 95% confidence interval, and well outside yours.

I understand the appeal of simpler models, and many of the concerns you have about Mr. Silver as an empiricist, but I don’t think you can use this election, specifically the House results, as evidence to support your claim that your simpler model does as well of better than yours.

Is there some mathematical way to prove that something was greater than an 80% probability, after it happened? It seems like the definitiveness of it occurring isn’t particularly good evidence that the probability was higher than that.

Chris: If you examine what I wrote before, I said 95%CI given the assumption, and that if the result fell outside, the assumption was wrong. Which evidently it is in some subtle way (as I wrote above). Truthfully, I was surprised to be that far off. If the benchmark is getting within 2 seats, the calculation didn’t work. If the benchmark is being closest based on poll-based evidence, the calculation did well.

To some extent this is a matter of taste. My taste is for analysis to be transparent so other people can see the moving parts. When more parts are added and I don’t know what they do, I get suspicious. An example is “house effects,” which seem real enough but do we know how much it helps to correct them?

Chris and CMo: It’s inarguably true that 233 is greater than 230! In the case of the FiveThirtyEight model, the small improvement in performance may have come from using national polls. However, that’s in the noise compared with the large difference from the true outcome. In that model there are so many other assumptions. I am not sure which ones helped, and which ones didn’t. Generally, lots of moving parts are hard to test together, but starting with a few parts one can build up. Who knows – maybe the only necessary improvement is to use national polls, which are taken more frequently and therefore can capture rapid movement in the closing days.

Think about it: all those assumptions to make a tiny improvement over what I could do in an hour. Also important: the same assumptions blur the “snapshot” so that you no longer have a snapshot. Look at the plot in the right hand column of this blog, showing Obama EV. That’s a story about the 2008 race. You won’t see that anywhere else.

On a related point, think about the claim of “prediction.” Since there’s only one day that voting occurs, it’s hard to test whether a model is predictive. The one way I can think of doing that is to compare two models and ask if, throughout the campaign season, one is consistently closer to the final result than others. Political science models have a good track record this way.

Dave Rutledge and AySz88: The way to test my statement is to make lots of predictions and assign probabilities. After doing a bunch of them, see what fraction you got right. If your %correct rate matches your average probability, then you were estimating probability correctly.

In this regard the 80% estimate is almost certainly wrong. Silver became well-known in 2008 for doing well in analyzing the Democratic primaries. In the blogosphere, being 80% right doesn’t get you far. And he did better. So his true probability of being correct must be higher than 80%. The real effect of saying 80% is to make things more fun for readers. But it’s not really true.

Take as an example my Senate medians for this campaign. The error bars gave a 68%CI, and two out of five outcomes (West Virginia and Nevada) fell outside the bands. That might seem good, but they fell way outside. So there’s something hidden that I missed. In the case of West Virginia, it was probably movement in the closing weeks. Since it was toward Manchin, I didn’t care. In the case of Nevada…well, what on earth happened in Nevada? I find this result quite odd. That’s a heck of a lot of union buses.

Sam, keep up the good work! I greatly valued your meta-analysis approach; you filtered out unnecessary noises from many other polls. I pleasantly discovered your web site during last 2008 election and found it very helpful when I need to have a good, reliable “quick glance” at the numbers. I like most also about your simple, yet accurate method and your professional and “non-commercial”, unpretentious site. Many sincere thanks!

Dave R/Sam – Silver’s website contains well over 100 races with “percent chance of takeover” figures. It shouldn’t take a young nerd more than an hour to calculate whether he patternistically get his probabilities wrong … It would take only five minutes if the nerd didn’t have to decompose into local vs national effects … No?

Andrew Gelman at Columbia did just that. The answer is yes, I am correct.

In any event, considering that the outcome is within a dozen seats of both my simple calculation and Silver’s complicated one, and considering the hundreds of polls that went into both…did you really think Tuesday’s outcome was in any doubt at all? The nominal probability, even allowing for several percentage points of systematic error, was near 1 for both chambers.

The number of seats required for a majority in the house is 218 if I recall correctly. You predicted on Monday 230 seats for the Republicans. That estimate would only give us reasonable certainty that the republicans would have a majority if the chance of your guess being off by more than 12 was negligible. Considering the Republicans will probably get 243 seats, you were off by 13 seats. Thus the odds of being off by more than 12 seats would seem not to have been negligible after all.

Now this could still have been a once-in-a-lifetime instance of unlucky polling, but the more likely conclusion would be that it was not 100% certain the republicans would take over the house after all.

It might still be that 538’s confidence interval was too wide (such a statement is very hard to prove; it is easier to see if some measurement falls outside an interval), but I would think that the results show that containing some positive chance of democrats retaining the majority was not unwarranted.

Thanks Sam, this is very close to what I was looking for. However, boos to the nerd and his nerd-mentor for using crude bucketing rather than maximum likelihood sorta stuff.
But if the claim is that they were authoring for a dumbed-down audience, then I am okay with that.
My intuition is that Intrade’s election-eve figures on republican capture of the house (95%) were accurate while Silver’s were low (75-80%).
Often I will ascribe 5% as the odds that I am wrong even when I feel certain as the school of hard knocks had taught me that I often miss things that lie beneath.
Of course, this is a real pity when I try to defend the “Fact of Evolution” vs the “Theory of Evolution” ;-)

On a collaborative project, on one set of measurements my colleague obtained a significance value of P=10^(-9) or something like that. It seemed weird to report that, and a violation of commonsense definitions of significance values.

I told her we should report it as P<10^(-6) because there was about a one in a million chance that she had gone completely off her rocker and faked the data to please me. She thought that was hilarious, and that's what we did.

Paul, hi. If you want real probabilities see stochasticdemocracy.blogspot.com. Pretty well done there. My impression InTrade is that it captures mean well but is a bit pessimistic about what I think is a true probability. Maybe there’s a fancy arbitrage opportunity there.

[…] Sam Wang from the Princeton Election Consortium claims he outperformed our “assumption-laden” (?) model with his Bayesian magic. I have yet to confirm, but it sounds like he only missed about 13 House seats, which is pretty good. […]

As David mentions at Stochastic Democracy, that method of judging the predictions only works if they are independent – one election to the next, for example. For 538, an “80%” probability wasn’t saying that each race independently had an 20% chance of loss – they were saying that there was an 20% chance that national shifts / polling bias / noise / etc. would all combine to produce the unexpected result. Some of those things (ex. polling bias, national shifts) would be highly correlated between all the races. I’d expect that the vast majority of the (for example) 20% probability were for scenarios with some sort of national polling miss — where, given such a miss, all the underdogs’ chances would improve.