A statistical reaction to Brexit

Features

Author: Dr John Fry

Date: 11 Aug 2016

Copyright: Image appears courtesy of Getty Images

Recent weeks have certainly proved a dramatic time in UK politics. Following Brexit, several high-flying political careers lie in ruins. The UK has a new Prime Minister. Labour leader Jeremy Corbyn lost a vote of confidence amongst his own MPs and faces a new leadership challenge. Even those who successfully campaigned for Brexit were affected amidst failed leadership bids and the resignation of UKIP leader Nigel Farage. Brexit has also left its mark upon the pound, the stock market and the housing market amidst wider fears for the UK economy.

The UK’s vote for Brexit took many people by surprise. Here, we discuss the prediction of the EU referendum and some of the implications of the surprise vote to leave.

Against this backdrop there is enduring academic interest in election prediction. Given the mixed record of recent attempts at scientific election prediction (see e.g. [1-2]) it is interesting to see how accurate a prediction can be made for the simpler problem of predicting UK referenda. This is without the complications of a multi-party system and added individual constituency-level effects. However, even the much simplified problem of referenda prediction is challenging.

From a statistical perspective there appear to be three key take-home messages for statisticians:

1. Both elections and referenda are very difficult to predict.
2. Be careful not to be fooled by randomness [3].
3. Be sure to include confidence intervals and appropriate uncertainty quantification in any forecasts made.

(Wrongly) Predicting the EU referendum

Buoyed by recent successes in [4-5], and purely as an interesting statistical exercise, we sought to predict the EU referendum two months in advance using data from bookmakers odds and opinion polls.

As such, we restricted ourselves to opinion-poll data available until the end of April 2016 to forecast the Referendum in June 2016. The results of multiple opinion polls are given on the website http://whatukthinks.org. This data record appears remarkably comprehensive and is one of the key sources cited by major national newspapers in the UK in the lead up to the EU referendum. Removing "Don't know responses", a plot of the proportion in favour of leaving the EU, averaged across multiple polls, is shown below in Figure 1 and gives an indication of an increase in the level of support for leaving the EU over time. Though this approach is imperfect, and inter-alia, it does over-look some dispersion across individual polls, and does represent the simplest way of achieving a systematic group-based forecast [6].

However, the rate of increase over time appears to be relatively slight and does not achieve statistical significance (t=1.378, p=0.217). Extrapolating from the simple linear regression line shown gives a simple and pragmatic, though admittedly imperfect, way of incorporating a time trend into the forecast. This gives an estimate of 48.7% voting for Brexit. This suggests that a vote to remain in the EU appears the most likely result though the result is close – and notably much tighter than the Scottish independence referendum. The available data on bookmakers’ odds paints a similar picture (see below).

Figure 1: Support for leaving the EU according to opinion polls (multiple opinion polls averaged over time).

Estimating Probabilities from bookmakers’ odds

In addition to opinion polls, there is a large amount of readily available political information in the form of bookmakers’ odds. Not only is such information often used in political forecasting (see e.g. [4]) but evolutionary finance tells us that bookmakers have every incentive to set accurate odds in order to ensure their long-term survival [7].

Estimated probabilities from bookmakers’ odds can be calculated as follows. One bookmaker stated that the odds in favour of leave were 12/5. In this case the probability of Brexit can be calculated setting (1-p)/p=12/5. This can be solved to give p=5/17. The same bookmaker also gave the odds of a vote in favour of remain to be 1/3. In this case, the probability of Brexit can be calculated by setting p/(1-p)=1/3. Similarly, this can be easily solved to give p=1/4. Finally, it is standard econometric practice to compute the estimated probability as the average of these two values: p=1/2(5/17+1/4)=37/136=0.272.

In order to increase the accuracy of our prediction, we can call upon a "wisdom of the crowds effect" and repeat the calculation for multiple bookmakers before taking the average. Doing this across 20 different bookmakers gave an overall estimate of p=0.272 for the probability of a vote in favour of Brexit.

We model the proportion of EU referendum votes in favour of Brexit as a Beta distribution under the Bayesian paradigm [8]. We set the mean value to be equal to the projected proportion from the opinion polls. This gives that the parameters of the fitted Beta distribution a and b satisfy E[X]=a/(a+b)=0.487. Averaging the results over different bookmakers as above suggests that the probability that X is greater than 0.5 is 0.272. Though this approach again overlooks some differences of opinion amongst individual bookmakers, the affect is much reduced in comparison to the level of dispersion amongst data from opinion polls.

Figure 2: Estimated probability distribution for the proportion voting for Brexit in the EU referendum

These two statements lead to a set of nonlinear simultaneous equations that, in turn, reduces to a one-dimensional parameter search which can then be solved numerically in R. (Furthermore, in response to a point raised by the reviewer, the approach taken can highlight occasions when the two sources of information are discordant. When the two sources of information (opinion polls and bookmakers odds) disagree with each other, this can lead to problems with the numerical optimisation routine, though this was not the case here). Omitting the full details, a plot of the estimated probability density is shown in Figure 2. A 95% Highest Density Interval (Bayesian confidence interval) is (0.447, 0.528) [8]. This suggests that with probability 0.95, the proportion of those voting in favour of Brexit will lie between (45-53%).

Whilst our prediction ultimately turned out to be wrong, the results are sufficient to show that the vote was always likely to be very close. The results also highlight the importance of uncertainty quantification in applied statistical work. The proportion who finally voted for Brexit, 51.9%, lies within the 95% Highest Density Interval though towards the upper end of what could have been reasonably anticipated in advance of the vote.

In conclusion, Brexit tells us that elections and referenda are difficult to predict, don't be fooled by randomness and quantify your uncertainty in all forecasts made. The short-term consequences of Brexit have been dramatic. The long-term consequences of Brexit remain to be seen. Bets are already being placed upon the outcome of the forthcoming Labour leadership election [9]. However, according to implied probabilities from bookmakers odds, there is a little chance of a new labour leader being elected (current estimated implied probabilities of winning the Labour leadership election at the time of writing are: Jeremy Corbyn 0.878, Owen Smith 0.122).

Acknowledgements

The author would like to acknowledge helpful comments and criticisms from an anonymous reviewer on an earlier draft of this article. The usual disclaimer applies.

[6] Wright, G. and Rowe, G. (2011) Group-based judgmental forecasting: An integration of extant knowledge and the development of priorities for a new research agenda. International Journal of Forecasting 27 1-13.