Thursday, October 27, 2016

A couple of days ago I was surprised that FiveThirtyEight gave Trump a 13.7% chance of winning, which seemed too high to be consistent with the state-by-state breakdowns. After reading their methodology I learned that this was due to them not assuming that state outcomes were independent. In other words, if one swing state breaks for Trump this might increase the likelihood that other swing states also break for Trump.

However, I still wanted to do the exercise to ask: what would be Hillary's chance of winning if each state's probability of winning was truly independent of one another? Let's write a program to find out!

Raw data

A couple of days ago (2016-10-24) I collected the state-by-state data from FiveThirtyEight's website (by hand) and recorded:

Maine apportions two of its electoral votes based on a state-wide vote (i.e. "Maine" in the above list) and then two further electoral votes are apportioned based on two districts (i.e. "Maine - 1" and "Maine - 2". FiveThirtyEight computes the probabilities for each subset of electoral votes, so we just record them separately.

Combinatorial explosion

So how might we compute Hillary's chances of winnings assuming the independence of each state's outcome?

One naïve approach would be to loop through all possible electoral outcomes and compute the probability and electoral vote for each outcome. Unfortunately, that's not very efficient since the number of possible outcomes doubles with each additional entry in the list:

>>>2^ length probabilities
72057594037927936

... or approximately 7.2 * 10^16 outcomes. Even if I only spent a single CPU cycle to compute each outcome (which is unrealistic) on a 2.5 GHz processor that would take almost a year to compute them all. The election is only a couple of weeks away so I don't have that kind of time or computing power!

Distributions

Fortunately, we can do much better than that! We can efficiently solve this using a simple "divide-and-conquer" approach where we subdivide the large problem into smaller problems until the solution is trivial.

The central data structure we'll use is a probability distribution which we'll represent as a Vector of Doubles:

This Vector will always have 539 elements, one element per possible final electoral vote count that Hillary might get. Each element is a Double representing the probability of that corresponding electoral vote count. We will maintain an invariant that all the probabilities (i.e. elements of the Vector) must sum to 1.

For example, if the Distribution were:

[1, 0, 0, 0, 0... ]

... that would represent a 100% chance of Hillary getting 0 electoral votes and a 0% chance of any other outcome. Similarly, if the Distribution were:

[0, 0.5, 0, 0.5, 0, 0, 0... ]

... then that would represent a 50% chance of Hillary getting 1 electoral vote and a 50% chance of Hillary getting 3 electoral votes.

In order to simplify the problem we need to subdivide the problem into smaller problems. For example, if I want to compute the final electoral vote probability distribution for all 50 states perhaps we can break that down into two smaller problems:

Split the 50 states into two sub-groups of 25 states each

Compute an electoral vote probability distribution for each sub-group of 25 states

Combine probability distributions for each sub-group into the final distribution

In order to do that, I need to define a function that combines two smaller distributions into a larger distribution:

The combine function takes two input distributions named xs and ys and generates a new distribution named zs. To compute the probability of getting i electoral votes in our composite distribution, we just add up all the different ways we can get i electoral votes from the two sub-distributions.

For example, to compute the probability of getting 4 electoral votes for the entire group, we add up the probabilities for the following 5 outcomes:

We get 0 votes from our 1st group and 4 votes from our 2nd group

We get 1 votes from our 1st group and 3 votes from our 2nd group

We get 2 votes from our 1st group and 2 votes from our 2nd group

We get 3 votes from our 1st group and 1 votes from our 2nd group

We get 4 votes from our 1st group and 0 votes from our 2nd group

The probabilityOfEachOutcome function computes the probability of each one of these outcomes and then the totalProbability function sums them all up to compute the total probability of getting i electoral votes.

We can also define an empty distribution representing the probability distribution of electoral votes given zero states:

In other words, Clinton has a 99.3% chance of winning if each state's outcome is independent of every other outcome. This is significantly higher than the probability estimated by FiveThirtyEight at that time: 86.3%.

These results differ for the same reason I noted above: FiveThirtyEight assumes that state outcomes are not necessarily independent and that a Trump in one state could correlate with Trump wins in other states. This possibility of correlated victories favors the person who is behind in the race.

As a sanity check, we can also verify that the final probability distribution has probabilities that add up to approximately 1:

This is a significant improvement over a year's worth of running time.

We could even speed this up further using parallelism. Thanks to our divide and conquer approach we can subdivide this problem among up to 53 CPUs to accelerate the solution. However, after a certain point the overhead of splitting up the work might outweigh the benefits of parallelism.

Monoids

People more familiar with Haskell will recognize that this solution fits cleanly into a standard Haskell interface known as the Monoid type class. In fact, many divide-and-conquer solutions tend to be Monoids of some sort.

... and the Monoid class has three rules that every implementation must obey, which are known as the "Monoid laws".

The first rule is that mappend (or the equivalent (<>) operator) must be associative:

x <> (y <> z) = (x <> y) <> z

The second and third rules are that mempty must be the "identity" of mappend, meaning that mempty does nothing when combined with other values:

mempty <> x = x
x <> mempty = x

A simple example of a Monoid is integers under addition, which we can implement like this:

instanceMonoidIntegerwhere
mappend = (+)
mempty =0

... and this implementation satisfies the Monoid laws thanks to the laws of addition:

(x + y) + z = x + (y + z)
0+ x = x
x +0= x

However, Distributions are Monoids, too! Our combine and empty definitions both have the correct types to implement the mappend and mempty methods of the Monoid typeclass, respectively:

instanceMonoidDistributionwhere
mappend = combine
mempty = empty

Both mappend and mempty for Distributions satisfy the Monoid laws:

mappend is associative (Proof omitted)

mempty is the identity of mappend

We can prove the identity law using the following rules for how Vectors behave:

-- These rules assume that all vectors involved have 539 elements-- If you generate a vector by just indexing into another vector, you just get back-- the other vector
Data.Vector.Unboxed.generate 539 (Data.Vector.Unboxed.unsafeIndex xs) = xs
-- If you index into a vector generated by a function, that's equivalent to calling-- that function
Data.Vector.unsafeIndex (DataVector.generate 539 f) i = f i

Equipped with those rules, we can then prove that mappend xs mempty = xs