UK Forecasting Retrospective

Our UK forecasting model, which tried to improve upon the deficiencies inherent in uniform swing, performed underwhelmingly. We went out on something of a limb here, and sometimes when you do that, the limb breaks! But let me distract you with some pretty pictures.

This is a comparison of Labour’s vote in 2005 to their vote in 2010 in each individual constituency. English and Welsh seats are indicated in red, and Scottish seats in blue. Constituencies in Northern Ireland are not graphed all. Nor is Thirsk & Malton, where the vote was postponed because of the death of the UKIP candidate, nor the consistencies of Buckingham or Glasgow North East, where the current and former Speakers reside and where the election is essentially nonpartisan.

Was the swing against Labour uniform? In one sense it was, in that in the aggregate it was fairly linear. On the other hand, there was quite a bit of variance from constituency to constituency. Scotland — where Labour in fact improved their share of the vote — is the most obvious example of this, but even within England and Wales, the vote was not especially neatly distributed and it was fairly commonplace for Labour to perform 5 or more points better, or 5 or more points worse, than uniform swing would have predicted.

The Conservatives’ swing, on the other hand, was quite uniform and quite well behaved:

But for the Liberal Democrats, it was quite erratic. Although they received about the same share of the vote overall, they did much better in some constituencies and much worse in others, in comparison with last time around. Moreover, these variances tended to be largest in seats where the Liberal Democrats began with enough of the vote to be competitive.

In comparison to other recent elections, the distribution of votes was somewhat less well predicted by the outcome of the previous one, particularly for Labour, as there was more variance according to geography and demography. (Perhaps the UK’s electorate is gradually becoming more like ours in the United States). As a result of this, the root mean square error (RMSE) associated with uniform swing was higher than in previous elections.

So, did our fancy, non-uniform model do any better than uniform swing? Nope. Although, it also did not do any worse. It was somewhat better than uniform swing at forecasting Labour’s vote, but worse for Conservatives. It was about equal in predicting the vote for the Liberal Democrats, as well as the Labour-to-Conservative swing.

(Note: the statistics below reflect the performance of the models now that we know the distribution of the nationwide vote. Neither uniform swing nor the 538 model are designed to forecast the national vote; instead they translate it into individual constituencies, such as to forecast a seat count.)

Keep in mind, however, that our model did apply a regional adjustment, whereas a naive version of uniform swing does not. Thus, most of the skill that it demonstrated stemmed from its ability to account for the non-uniform voting patterns in Scotland and some other regions. Ignoring Scotland, the model performed about the same as uniform swing on Labour’s vote and somewhat worse for the other two parties. A uniform swing approach with a regional adjustment — like the ones used by our friends/rivals at PoliticsHome — would have outperformed both models and done fairly well.

At the same time, it’s a bit of a stretch to chalk this election up as a success for uniform swing. On a constituency-by-constituency basis, uniform swing was less accurate than in previous elections and wasn’t able to capture the odd dynamics that governed the performance of the Liberal Democrats. Had the LibDem surge held, it might have done rather poorly in projecting the seat count.

On the other hand, there isn’t an especially obvious replacement for it. Making alternate assumptions (like proportional swing) about the governing function that dictates the shift in the votes from one election to the next did not improve upon uniform swing this year and have been either marginally worse or marginally better in previous elections.

Ultimately, we are suffering from a real paucity of data. A fundamental problem of psephology is that elections occur only so often and so the sample sizes are not large. But at least in the United States, we have a real abundance of data associated with each election cycle, such as extremely robust polling both before and after elections, which provides much more information on preferences by locality and demographic group. To the extent that swings in electoral preferences are non-uniform, it is almost always possible to explain them robustly after the fact (e.g. “Reagan did especially well with working-class whites”), and quite often possible to anticipate them ahead of time.

At a bare minimum, it is disappointing that the BBC and other organizations do not do American-style exit polling, with detail on voting patterns by racial, religious, gender and economic class. Such exit polling would allow the pollsters to weight and calibrate their surveys more effectively, while also making additional tools available to forecasters. If we’d known, for instance, that Labour would lose relatively little of their vote among religious minorities and working-class city dwellers, but more among middle-class suburban whites, we could probably have done a relatively good job of forecasting the election, even without local-level data. Indeed, in an American context, these effects would be discussed and analyzed ad nauseum.

We’ll never know for sure, but my sense is that all the macro-type forecasting models may have narrowly averted a disaster here, not because of any fault of the forecasters (although clearly our approach was overambitious) but because there’s only so much one can do with such limited evidence. Even if the underlying behavior of voters is complex (as it surely is), relatively more complex models of their behavior usually require more data than is available here in order to have much chance of bettering simpler ones.

At the same time, I’m happy that we did this. Not that it wasn’t disappointing — it’s always fun to be right! — but were pretty explicit about disclaiming that it was a thought-experiment framed as a forecasting model, and it provoked a really good discussion. One of the flaws of academia is that incuriosity or laziness often masquerades as prudence; one of the flaws of punditry is that self-assuredness is often mistaken for actual insight. We try to walk a fine line between those extremes by being bold but showing our work and placing it into context. Kudos to the forecasters — like the folks at PoliticsHome — who made the best of the situation.

Nate Silver is the founder and editor in chief of FiveThirtyEight. @natesilver538