R Code for Election Posterior Distribution From a Random Sample

I wrote a summary article a couple of years ago discussing some probability aspects of the 2012 Presidential general election with a particular focus on exit polling. I’ve had a few people email me asking for the code I used in some if the examples. I have used this code since before the 2008 elections so I’ve made several changes over the years and now use it for many projects. But here is the basic code to take the state estimates and compute the posterior distribution of the electoral votes. This code should run “right out of the box”. This approach works for methodologies considered simple random samples such as a landline/cell phone poll (e.g. surveying absentee voters by phone). However, applying this to an exit poll methodology is more complex than a phone poll as an exit poll is actually a stratified cluster sample design. For now I am posting the simple random sample code where if one wants they can extend it to more complex designs and models.

The following snippet of code simulates the distribution of vote using the Dirichlet probability distribution. Though in this instance the Beta distribution would also work well enough as the Dirichlet is a multivariate generalization of the Beta. Using the Dirichlet distribution allows the distribution to be built using the top two candidates and then all other third-party candidates. Here the third-party candidates generally make up an insignificant portion of the vote. The example data is artificially generated but is based on true data. The data can easily be replaced with any other data.

In addition to extending this code to more complex sample designs this code can be adjusted to accommodate more complex models or alternate distributions. This way other known variables can be applied to the model.

One thought on “R Code for Election Posterior Distribution From a Random Sample”

p.win = function(state){
#Dirichlet distribution because there can be multiple candidates
# the c() part of this code creates a vector
p=rdirichlet(1000000,
raw$size[state]*c(raw$Rep.pct[state],raw$Dem.pct[state],
(100-raw$Rep.pct[state]-raw$Dem.pct[state]))/100+1)
mean(p[,2]>p[,1])
}

I actually want to explain it to a class of sophomore level college students in the context of probability functions and bayesian statistics. I would like more of an explanation of what is going on in the numerator. Thanks!