Friday, September 12, 2008

Seat Projections

Since 538 rocks my world, I figured it might be fun to try something similar for the federal campaign, so I've developed a seat projection system that uses a probabalistic approach. Here's the explanation, in as plain English as I could put it. Feel free to e-mail me or ask for clarification in the comments section - I'm open to suggestions for improvements:

1. I've collected all the publicly available polling data released since September 1st (Nanos, Leger, Segma, Environics, Ipsos, Angus, Decima, SC, and CROP). I've only included data where regional splits are available.

2. A weighted average of the data is calculated for each region, assuming a 3 day half life for polling data. What that means is that a 3 day old sample of 500 from Quebec is weighted equally to a 6 day old sample of 1000 from Quebec - both would be counted as 250 completes. The "days old" number is based on the middle night of polling.

3. At this point, each seat is projected based on the change in the region. So if the Liberals are up 10% in Alberta (ha ha ha), they get 10% added to each seat. Sort of. In order to make the model more realistic, I've weighted the "base" for each seat 3/4 from 2006 and 1/4 from 2004, to reflect any "bizarre" fluctuations that may have happened last election due to local candidates, etc. I've also made a correction for incumbents retiring (between 2004 and 2006, the "incumbency factor" was worth 4.1% once regional changes were controlled for). And, in ridings where a by election has been held, I've given the by election and last election equal weighting. The important thing to remember is that, even after all this, everything gets projected to the regional numbers. So regardless of the tweakings, the numbers projected in Atlantic Canada will match the polling data.

4. And I could quit at this point and just list the projected wins and loses. But the problem with that is that a projected 2% Liberal win counts as 1 Liberal seat, as does a 30% projected Liberal win. And when you consider all the error associated with these projections, the two are definitely not even. So I decided to go the simulation route.

So I ran 1000 simulated elections. In each one, the regional numbers were simulated based on the sample size for the region (using the half-life discussed above). And then each riding was given a "random shift" based on the "regional to riding variance" observed when I ran this same model on the 2006 election using 2004 data (standard error of about 4% for each riding).

So in each of these 1000 elections I've got a winner in every riding. That means I can project a "probability of victory" for each seat, and get an average number of seats won per party.

It should be noted that this is all assuming the election is held today...I'm not predicting future shifts in popular support. It's also assuming the polling numbers are accurate. And it's not going to take into account a lot of the "unique" riding dynamics (Bill Casey, Lizzie May, etc) or different shifts that might be occuring between, say, Vancouver and rural BC.

So, based on the simulations, here are the results and graphs:

NationalConservatives: 137.9 (95% CI from 132 to 144)Liberals: 98.5 (95% CI from 93 to 104)NDP: 28.8 (95% CI from 25 to 32)Bloc: 42.0 (95% CI from 39 to 45)Indepent: 0.8 (Andre Arthur is the only independent who's going to show up on this model and, regardless of whatever the projections say, he's pretty much a lock since the Tories aren't challenging him)It should be noted that, up until a few days ago when the Decima and Nanos polls rolled in, this model was projecting a high probability of a Tory majority.

Of course, I'd love to see further breakdowns. For example, saying a party has X% support in province Y doesn't necessarily help you predict seats because there is such an urban/rural split in almost all provinces.

And some of the provinces can be split up into regions with very different voting tendencies (eg: Quebec, Ontario, BC).

This is really interesting. Are you at all correcting for pollster accuracy? While most of that may be politico devotion to Nanos, there are sampling techniques that they do do differently than others.

This model actually has the Liberals picking up seats in every region (when rounded to the nearest whole number)except Ontario. I find that encouraging. Interestingly, if there was some degree of correction for polling accuracy, the Liberals would be doing even better.

naylor - The pollster accuracy thing is tricky, because they may have changed since the last election or fluked into a good result. So, the difference is that large samples and more recent samples get a higher weighting. Since Nanos happened to release today, that did weight his sample heavier for this update, just because it was more recent.

Volman - Yup, and that's the drawback of any seat projection technique using public data unless there's more detailed released splits.

This is really interesting and I hope you continue with it during the entire election.

I'm trying to figure out where the 3.3 in Atlantic Canada would come from for the NDP. I get three ridings, Halifax, Sackville-Eastern Shore and Acadie-Bathurst but I can't figure the one they would have to win to get 4. I suspect it must be Dartmouth but they really are not competitive in New Brunswick (outside of Godin's seat) or Newfoundland.

bailey - The Atlantic numbers are skewed up a bit because the Nanos poll had a somewhat unrealisticly high Liberal vote in that region (and, being more recent, it got weighted heavier).

As for Atlantic, the three Tory seats in Newfoundland project out to around 50% Liberal wins (probably higher in reality if you consider the mood there).Dartmouth also has the Libs with a high prob of victory.

But I'm not so confident predicting individual seats with this because this doesn't take into account the sub regional shifts...it really only works at the aggregate regional level in my opinion.

You have the Liberals gaining seats in Quebec. That alone makes me think that taking national polls with small regionals and extrapolating may not be the most accurate way to predict seats. Especially given the Bloc meltdown and the swing to the CPC in Quebec-only polls.

538 also includes regression results based on demographics (and economic conditions) in its model. This approach led to some strikingly accurate primary predictions, and is more stable. You could do similar stuff with past election results.

Additionally this strikes me as something that multilevel/hierarchical modeling could really get at. I think somewhere in my Bayesian notes there was something about doing multilevel monte carlo experiments... Essentially we have national data (economic data and national polls - which have a lower M o E.), encompassing regional data (regional polling results) encompassing riding data (riding demographics). I won't pretend I know exactly how to implement something like that (though I do know that three-level models are an absolute bitch).

Finally, it might be interesting to back-cast a model like this on the 2004 or 2006 elections - how well did it predict seat outcomes in those cases.

Oh and in the spirit of being like 538... [insert random dailykos or foxnews talking point]

H2H - I did some backcasting to 2006 based on 2004 to find the variances...that's where some of the randomness comes into play.

The regression stuff makes some sense too...I did a bit of that on my Masters project so it would be easy to incorporate, although I have some doubts that it would actually be more accurate than the 2004/2006 data.

You're right about the hierarchical stuff being a bitch, which I why went for simulations rather than variances. I think by simulating the regional poll results for each run, and then simulating the region-to-riding shifts based on the previous results, that's been accounted for. The only question is whether or not including national numbers makes sense.

rat - If you look at the big Quebec samples like CROP, it shows the same thing in Quebec. Mainly that a drop in Bloc support helps the Liberals.

Now, the obvious failling in this is that if the Bloc support only drops in rural areas (shift to Tories), that won't win the Liberals any more Montreal seats, but even some Bloc bleeding to the NDP in Montreal could give the Liberals a few extra seats (there were a few they lost closely last time).

Sorry dude but I think this whole process is frivolous, you have a lofty goal but statistics is a tricky business, these numbers are even more meaningless than the individual polls themselves... Nothing personal, love the blog, just calling it as i see it.

I think Outremont is a safe hold for the NDP (I averaged by elections and general elections together so it probably understates Mulcair's influence). But so is Andre Arthur, especially since the Tories aren't running against him.

If there was an NDP surge in Quebec, this model probably wouldn't catch it, just like the CPC surge in Quebec last time wouldn't have been caught - mainly because the baseline is so low.

But, if I were predicting it, I can't see the NDP winning more than 2 seats in Quebec. Maybe 3 if the stars align.

If these projections hold true, how long before the knives come out for Dion? Can he survive a loss at the polls? Will Iggy and Rae attach themselves to this brick? Or will the brick grow into a balloon?

I really like your approach here, CG - I don't see any alternative to your bootstrap method for variance estimation, and it would probably be the only way to incorporate more information like riding-by-riding demographics. I'm not sure that you necessarily want to go there, though, as you may end up with a certain overfitting.

Wow. What an interesting take on seat projections. I am geeking out bigtime as I enjoy your post. Heads up for link love via the EP blog - thanks for linking to us & I'd love to hear any suggestions to take it to the next level.Your fan,Meegs