We should expect the NY Times and 538 to incorrectly predict 4.25 and 6.02 states, respectively.

In 2008, Nate Silver got 49 out of 50 states correct and in 2012 he got 50 out of 50 states correct and it basically cemented his status as the go to authority for election forecasting. However, I thought Silver was too conservative in his probabilities in 2012. In that post, I used FiveThirtyEight’s state by state predicted probabilities to simulate how many states Silver was expected to get wrong. There I showed a very simple simulation showing that we should have expected Silver to get 2 or 3 states wrong, but we actually observed him getting 0 wrong.

This is essentially a hypothesis test where our null hypothesis is that the vector of state win probabilities is 100% true. We can then simulate the election based on those probabilities and count how many state a forecast got incorrect in each simulation. For example, say some forecast projected Clinton to win Pennsylvania with probability 0.75. We could them simulate from a binomial distribution win n=1 and p=0.75; if we draw a 1 that forecast got it correct and if we draw a 0 the forecast missed (This forecast is predicting Clinton to win Pennsylvania since 0.75 > 0.5). We then simulate all of the other states in this same way and count how many state were incorrectly forecast. This number is stored and then the whole process is repeated a large number of times. This gives us a distribution of the number of states that we expect each forecast to miss ASSUMING that their state win probabilities are perfectly true. Once the election happens we can see how many states each forecast predicted incorrectly and compare it to the distribution assuming their probabilities were perfectly true. If the actual, observed number of misses is right in the middle of their distribution of misses, there is no evidence that their probabilities were wrong. However, if we expected a forecast to get 5 or 6 states wrong and they actually get 0 or we expect a forecast to get 0 wrong but they actually get 5 or 6, then we can say that there is strong evidence that their state win probabilities were probably not correct.

So, this year I’m going to repeat my experiment from 2012 with the six different election forecasts: NY Times (NYT), FiveThirtyEight (538) , Huffington Post (HuffPost), PredictWise (PQ), Princeton Election Consortium (PEC), and Daily Kos (DK). (State by state probabilities are collected from the New York Times The Upshot.)

The data that I collected is available in .csv format on my github page here and the code I used for the simulation is here (that code also contains my evaluation code using log loss and Brier score for tomorrow night). Note: That code is kind of a mess because I’m rushing to get everything done before the election.

So anyway, I simulated each of the six election forecasts 5000 times each and their distributions are below. The big note I need to make here is that I simulated all of these assuming independence between state outcomes, which is definitely not true, but I’m going to make that assumption for now. By making this assumption I am underestimating the variance of these distributions, but the expected value of the distributions are not affect. The six simulated distributions are below.

You can see that most of these are similar to each other but the NYT and 538 stand out as being different. HuffPost, PW, PEC, and DK all have an expected number of states missed below 3 with 1.88, 2.91, 2.56, and 1.89, respectively. The expected number of misses for NYT is 4.25 and for 538 it is 6.02. What this means for 538, for instance, is that based on the state win probabilities they are putting forth, if those probabilities were 100% correct and we ran this election in a parallel universes thousands of times, on average, the 538 model would miss over 6 states and the NYT would miss more than 4 states. Also, the farther away what we actually observe (the true number of states missed) is from this expectation, the more evidence it is against the probabilities being true. So 538 and the NYT should be expected to miss a few states. In fact, if they don’t miss any states, that is evidence that they did a bad job with their probabilities.

Assuming the independence assumption is reasonable (it’s definitely not, but it’s makes things simpler), the probability that the NYT or 538 gets every state correct happened 30 and 5 times out of my 5000 simulations. HuffPost and DK are both at around an 11% chance and PW and PEC are at 3.42% and 4.52%, respectively.

I again think this puts Nate Silver in a tough position to look good. He’s basically predicting that he will miss, on average, about 6 states. If he actually gets everything correct, his state probabilities were likely off by a bit. Further, if he does get all of the states correct, that will be around 320 electoral votes for Clinton, which is right in the range where everyone else is predicting, but they are doing so with much less uncertainty. So what Nate Silver needs to look good now is a close Hillary victory where he gets at least a few states wrong, from a mathematically perspective, of course. The public isn’t going to be as enthralled by the headline “Nate Silver gets 44 states correct, but was probabilistically more accurate than everyone else”. All the public will read is “Nate Silver missed 6 states”.

From a public perception standpoint, I think it’s basically impossible for Silver to come away from this looking good, which I think is unfortunate because he has done so many good things for data journalism and he almost single handedly made statistics cool (Thanks for that, by the way!). This is such a tough spot for Silver, who is far and away the most high profile forecaster out there, because the media (and he himself by doing good work) has set the bar so high for him that missing 6 states, which is what is implied by his state win probabilities, would be disastrous to the perception of nearly perfect prediction. On the other hand, if he does miss 6 states, I’ll be writing a blog post tomorrow about how Silver really nailed it from a probability standpoint and the other forecasters were way over confident. And no one will really care because all they will see is that he missed 6 states.