How did our House prediction do?

We offered some predictions about House elections in earlier posts (see here, here, and here). We based our predictions on a model that included some national factors like presidential approval and the state of the economy, plus some district variables like the district presidential vote and incumbency. Now that we have something close to the final House results, how did we do?

The bottom line: our model performed quite well, predicting only 7 fewer seats for the Democrats than they actually won, and miscalling only 21 races out of 435.

Now for the details.

Before we can get any further, we have to decide which model to evaluate. Our first prediction was a purely “fundamentals” model that used only incumbency and the district presidential vote to distinguish one district from another. Nothing about the relative strength of the candidates was included. This model proved to have a large amount of error, suggesting that candidate strength does make a difference. Adding campaign spending to the model brought down this error a lot, though for our forecast it forced us to use fundraising in the summer as an indicator of likely spending in the fall. We couldn’t be certain how well that would work. But since summer fundraising still falls far before the election, this approach stuck to information that was publicly available before the most intense period of the campaign season. That comes close enough to a “fundamentals” model for us, so we’re going to use it as our final prediction.*

There is more than one way to evaluate the model’s performance. The first is to see how close it came to the topline vote and seat share. In this respect, we were a little too hard on the Democrats. Based on current results at the New York Times’s “big board,” the Democrats have won 195 seats for sure, and six of the remaining seven are leaning their way. That’s a total of 201 seats. By contrast, our model predicts 194 Democratic wins,** missing the actual result by 7 seats. Our model also predicted a Democratic two-party vote share of 48.9%, or about 1.9% below the actual result of 50.7%. The model expected a vote share at least as high as the actual one about 36% of the time, and it expected a seat share that high about 33% of the time. So both fall in a comfortable range for the model’s error.

The second way to look at our model is to see how well it predicted each individual race. Below is a scatter plot of the predicted vote share against the actual vote share for all 435 races. The diagonal line is equivalence: if the predictions were exactly accurate all the points would fall along that line. The red data points are cases where the model missed the winner: there were 21 such cases overall.

Perhaps the most striking aspect of this graph is the curved relationship between our prediction and the outcome. The model is too hard on Democrats at the low end and a little too easy on them at the high end, producing a sort of s-curve. This curvature is entirely a function of using early fundraising as one of our predictors. The model without campaign money has a more linear relationship, but it also gets the actual outcome wrong more often. In other words, early money misses some of the dynamics of the race, but it does a good job of discriminating between winners and losers.

So our model was bearish on Democrats, was somewhat off on district vote shares, and missed the actual winner in 21 cases. How does this compare with other predictions?

In terms of total seats, our prediction was closer to the final number than Charlie Cook, and two seats worse than Larry Sabato. It was also two seats better than than Sam Wang’s generic ballot prediction (which considered 201 to be a highly unlikely outcome), and three seats worse than his Bayesian combination of the generic ballot and the handicappers. So on this score, Wang’s hybrid beats all other forecasts by a nose.

What about predictions for individual seats? Here we have to drop Wang, since he didn’t offer such predictions. But we can still look at Sabato and Cook. Compared to these two handicappers, our 21 misses were the highest of the bunch. Sabato was the best, miscalling only 13 races, while Cook fell in between at 17.

However, one can think about this a different way. Our model only used information publicly available by the end of the summer (when the last primaries were decided). The handicappers had lots of information (some of it proprietary to the campaigns) up to election day. Yet our miss rate wasn’t that much higher. At most, all that extra information amounted to 8 correctly called seats. The outcome of the rest could have been known far earlier.

To be fair, the races that distinguish between these forecasts were incredibly close. None of Sabato’s missed predictions were decided by more than 10 percentage points. Only two of our missed predictions fit this description, and one of Cook’s. Most of the misses were far closer. Many of these races could have gone the other way, meaning the differences between the success rates might be due to chance alone. There should be some credit all around for correctly identifying the races that were likely to be close in the first place.

On balance, there is much that we could improve about our model, and we hope to keep working with it in the future. In the first time out of the gate it got the lay of the land quite well, and actually came within spitting distance of forecasters who had a lot more information at their disposal. We think that’s pretty good.

*We also did a prediction with Super PAC money, but that was more for explanatory purposes than anything else. It certainly violated our rule of using only the fundamentals.

**Our original forecast was 192 seats, but in conducting this post-mortem, we discovered an error in our code that mis-classified some uncontested races. When the error was fixed, the prediction bumped to 194. Other aspects of the prediction were basically the same.