Thursday, March 3, 2011

Seat projection methodology

When individual polls are reported upon at ThreeHundredEight.com, seat projections are often reported as well. The following is a detailed description of that seat projection methodology. This is the same methodology used during and in the run-up to election campaigns, with some minor differences.

If you are looking for the seat projection methodology employed for the 2015 federal election, please see here.

The results of each poll can be plugged into the seat projection model to give likely results, based on the premise that the results of an election would be exactly the same as the results of the poll. Assuming that to be the case, the seat projection model has a margin of error of only 3.4 seats per party and makes the right call in each riding 85% of the time.

At its core, the seat projection model uses a simple proportional swing method based on the difference between the results of the last election and current polls. Put simply, if a party managed 20% in a given region in the previous election and is now polling at 40% in that same region, their results in each individual riding would be doubled. The image below shows how this method would have estimated the NDP's support in the riding of Trinity-Spadina in the 2011 election.

This swing is applied to every party in each riding. As this will sometimes result in total support of more or less than 100%, the numbers are adjusted upwards or downwards proportionately to equal exactly 100%.

This model is in contrast to the uniform swing method popular in the United Kingdom. With that method, in the example of Trinity-Spadina the NDP's increase of 7.4 percentage points in Ontario would have simply been added to the NDP's result in 2008 in Trinity-Spadina, estimating that the party would captured 48.3% of the vote instead of 57.7%, as proportional swing would suggest. In this one case, that would put the error of uniform swing at about double the error using the proportional swing method.

The proportional swing method is a better estimation of how results change between elections, reflecting that a party with a large base of support in a riding is more likely to grow by larger proportions than a party with no real support. It can also perform well when parties make very large gains - with the actual province-wide results plugged into the model, it would have projected 60 seats for the NDP in Quebec to four for the Bloc Québécois, instead of the actual result of 59 to 4.

Taking other factors into accountThe swing model alone, however, cannot take into account the individual characteristics of each riding. Other factors need to be taken into account.

Incumbency is the most important factor, as it applies to every riding and can have a significant effect. My own research shows that support for incumbents is far more resilient than for other candidates, and that when parties do not have incumbents on the ballot they suffer a serious loss in support. That drop equals about 10% of what the party managed in the previous election, resulting in a slip of anywhere from four to six points (all else being equal). But the incumbency effect is also determined by a how a party is doing overall. My research shows that an incumbent retains more of their vote when their party's support is dropping in the region. It also shows that incumbents who have been re-elected at least once make lesser gains when a party's support is increasing in a region, while incumbents running for re-election for the first time tend to out-perform their own party's gains.

This would seem to be reflection of the difference between a first-time incumbent and a veteran incumbent. A veteran has a more solid base of support that is harder to move in either direction, whereas a sophomore is now a much safer bet compared to when they first ran for election. They have a record of winning, whereas in the previous election they had none.

Accordingly, when a party is losing support incumbents are given a "bonus" usually worth three to five points, while when a party is gaining sophomores are given a bonus worth about one to two points while veterans are penalized by about that much. When the incumbent is not running for re-election, the party is penalized accordingly.

New to 2015 is a factor for leaders. These are special incumbents. When a leader is running for re-election in the same riding as before, they lose far less support than other incumbents when the party is dropping overall. At the same time, however, they gain less support than other incumbents when the party is increasing overall.

When an MP is running for the first time as leader, but not for the first time in his or her riding, the bonus is even larger when the party is dropping in support overall. When the party is gaining, new leaders see a larger boost than other incumbents. This also goes for leaders running for the first time both as a leader and in a given riding.

And similarly to when an incumbent MP decides not to run again, there is a steep penalty when a leader vacates a riding, either because they lost it in the previous election or because they are not running for re-election.

Star candidates improve their party's performance in the vast majority of cases, though the classification of star candidates is one of the purely subjective aspects of the model, as I have to determine whether a candidate should be considered a "star" or not. This is usually quite obvious, and one of the biggest determinant factors is whether a candidate is widely considered as a star in the media, which has its own effect on how the candidate is perceived by voters, or by the party itself, in terms of profile and resources the party pours into a candidate's riding. Star candidates are usually former MPs or cabinet ministers, party leaders, or well-known figures from the private sector.

Floor crossing is difficult to take into account, and has been dropped as a factor in 2015. Instead, the floor-crosser causes a no-incumbent penalty to be applied to the party the candidate crossed from, and a star candidate bonus is applied to the crosser.

The presence of independents can also be difficult to model. If an independent politician is running for re-election as an independent, their vote is dropped marginally from the previous election, as has occurred in other cases. The same penalty is applied to popular independent candidates who were never elected. Politicians who left or were forced out of their party caucuses and are running for re-election as independents are treated differently. Based on an analysis of previous cases, these candidates take a proportion of their vote share from the previous election based on the circumstances of their departure from caucus. Those who depart for positive reasons retain much more of their support than those who leave in disgrace. When the circumstances are hard to define, an average proportion is used. A no-incumbent penalty is applied to the party the candidate left.

By-elections are also taken into account. When the result of a by-election was significantly different from the results of the previous general election, the proportional swing is applied to the by-elections results based on how current polling levels differ from where the parties stood in the polls at the time of the by-election.

The particularities of an electionWhen necessary, the projection model takes into account the individual particularities of an election campaign. One common particularity is the presence of a new party, or a formerly fringe party running a full (or almost full) slate of candidates.

When a party is running candidates where they did not have a name on the ballot in the previous election (whether that be limited to a handful of ridings, as often occurs with smaller parties, or in the bulk of ridings, as occurred in the 2012 election in Alberta for Wildrose), the regional vote projection for the party is applied directly to the riding. For example, if a party is polling at 20% in a region it will be projected to have 20% in each riding in that region. However, that number can be adjusted by any of the factors listed above and is always adjusted when the model makes all of the projections add up to 100%. In this example, in ridings where there is little room for the party to have 20% their vote will be adjusted downwards. When there is a lot more room, the vote will be adjusted upwards. This system performed well when the real results of the 2012 Alberta election were applied: Wildrose would have been projected to win 18 seats (instead of the actual result of 17).

Precision when there needs to be uncertainty
A seat projection for an individual poll will appear precise: a party is projected to win X seats, no more and no less, with what the poll reports. But just as the poll itself is susceptible to any number of sampling errors, the projection is susceptible to the model's own errors as well.

Seat projections need to be read with a good understanding of what we don't know. An error in the poll by one or two points could change the results in dozens of seats if a race is close enough, and that would still be within the margin of error of the poll. And an error in the modelling of a half-dozen seats could mean the difference between winning or losing an election. The seat projection for individual polls is a best guess and the most likely outcome - but is no more foolproof than the poll it is based upon. A grain of salt, which needs to be taken with every poll, also needs to be taken with the seat projections.

36 comments:

This sort of deep projection is always what I hoped threehundredeight would eventually offer. I first found 538 because I had read Nate Silver's work on baseball projection for some years prior to 538's founding, and I thought it was a terrific field to which to apply the same sort of statistical approach.

And from 538, I heard about you. I knew that the level of data available to you wasn't anything like what Nate got to use at 538, but I hoped that you would ultimately create a projection system that would call races riding by riding.

Looking at your current projections, other than a couple of decisions you make that seem odd, I would say you have 3 that seem wrong to me and then there are several that really will be toss ups and can not be accurately projected before the election.

In general I think your model is getting there but with 308 ridings, there is a lot of local tweaking that needs to go on.

Esquimalt Juan de Fuca was held by the MP and not the party - people here forgave him his Liberal connections and voted for him narrowly. With Keith Martin retiring it kills the Liberals. Both the Conservatives and NDP are already in full campaign mode.

North Vancouver, projecting a Liberal win is odd. The 2006 result was low for the Conservatives because the candidate was not popular. How much incumbency bonus do you give Andrew Saxton?

In Knigsway I have real trouble seeing Don Davies of the NDP losing. I assume you assigned him an incumbency bonus. Don Davies is significantly more popular than past MP Ian Waddell

It wouldn't be quite fair to use the old model to project for 2008, as the model was based on past elections. So it will always be pretty close to past election results, since those results were what determined the seat distribution in the model.

But it would have projected 23 Conservatives, 8 NDP, and 5 Liberal seats in BC.

The new model would have projected 24 Conservatives, 9 NDP, and 3 Liberals.

Esquimalt - Juan de Fuca is a tricky one. Everyone says that it is a Martin riding and not a Liberal one. But when Martin wasn't a Liberal, the Liberals didn't do horribly (22% to 26%).

So, there is a bit of a base there. And after three elections, old habits die hard.

I'm afraid I can't just go on a gut feeling or local perception on this, as there aren't any numbers to back it up. So I have to treat it as I do every other Liberal riding without an incumbent.

As for North Vancouver, it went Liberal in 2004 and 2006 and was a close race in 2008. The Liberals were also at 30%+ in the 90s and in the 2000 election. Saxton, as a sophomore MP, gets a small incumbent bonus.

For Vancouver Kingsway, the Liberals used to hold the seat and it is difficult to know how much of their lost vote in 2008 was due to Emerson jumping ship. Davies gets the bonus as an incumbent, of course, but the NDP is not doing very well in BC. In any case, as you can see, it is a close race. A couple NDP ticks up and Liberal ticks down will put it back in the orange.

London West will almost certainly go Liberal in the next election. Doug Ferguson is an exceptionally strong candidate and very popular locally. He also has an excellent organization online and on the ground. Conservative Ed Holder has been caught misleading constituents with his regular flyers and for the most part, had very little success securing Harper favours for the riding during the stimulus spending: http://www.lfpress.com/news/london/2011/02/09/17216431.html

Can you comment on your plans if an election comes around RE: projection updates? since polls will be coming out every few days if not daily, will you be doing a full update very frequently to get the most recent data into your projection?

Just as I was during the New Brunswick election, I expect to be very busy if a federal election is called. Ditto if there will be a BC election in the spring/summer, and of course for the myriad of provincial ones this fall.

My current plan is to do a resume of all the polls released in the preceding 24 hours and a projection update every morning, with the afternoon or evening open to posts on other topics.

Éric - I think you've made the right decision on Esquimalt-Juan de Fuca. Yes, there is reason to believe that the projection's result isn't necessarily accurate, but the point of building a projection model like you have is to stick with it and see how it does.

Based on the next election result, maybe you'll learn something that will allow for a tweak (maybe some adjustment that you make only for retiring MPs who previously crossed the floor - who knows?), but for now I think it's important that you leave the projection as it is until you have more data.

In 2008, the Liberals in BC hit a 26-year low at 19% (Dion + the Green Shift were the main culprits).

Over the previous 5 elections, the Liberals garnered 28% of the vote in BC. I'll wager that the Liberals will rebound in BC compared to 2008.

The Liberal's core support in BC is within the City of Vancouver and neighbouring environs as well as Greater Victoria, to a lesser extent.

The Libs in EJDF have selected a poll topping councillor from Langford and I can see the electoral dynamic in that race to be "Vote Liberal to keep out the Con", since the NDP was so far behind in 2008.

In V-K, the NDP seems to have hit a glass ceiling and the demographics are more Liberal friendly. I can also see Yuen taking that riding back for the Libs.

Same for N-V, where the incumbent MP is a social conservative and the demographics are more Liberal friendly.

IOW, I wouldn't count the Liberals out winning 1 - 3 of the aforementioned seats.

Incidentally, my argument for why you shouldn't make special adjustments for places like Esquimalt is the same argument Nate Silver used for why he shouldn't make special adjustments to his baseball projections for Ichiro Suzuki. Nate knew he projection system (PECOTA) was going to get Ichiro wrong. He knew it would be wrong every year, and he even knew why and how it would be wrong. And while he adjusted the system overall from time to time, he never made specific adjustments for atypical players, because that wouldn't be allow for a fair assessment of his projections overall.

Here's an article Nate wrote about PECOTA's specific failures back in 2004:

Once you start modifying individual ridings or subjectively judging who is and isn't a star candidate, it becomes more of an exercise like the Election Prediction Project, than a statistical projection system.

Still, most of these additions, especially incumbency, are welcome. The only danger I'd caution is in over-fitting the model. You're validating the model using the same data it was constructed on, so obviously it will be effective. The real test would be to try and validate it using a different data set (i.e. 2006 election).

I think including a star candidate factor, even if arbitrarily applied, is important. There is no other way to represent the big shifts in support that star candidates bring to the table. My primary concern is with the model being accurate. Being uniform is secondary.

And taking into account individual factors (Elizabeth May, candidates dropping out, prominent independents) is very important if the model is to be taken seriously. It would be unbelievable to have Ms. May at 10% in her riding, or Helena Guergis at 1%.

As to your second point, I've done my best to avoid the issue. All of the factors are based on what happened in both the 2006 and 2008 elections, with a few of them using even older examples as well.

The tests were done only in British Columbia to, in part, mitigate the concern you bring up. The factors applied to each riding were drawn from all parts of the country, so what happened in BC is diluted to a great deal by what happened in the other provinces. That the results are still accurate for British Columbia after applying cabinet minister and incumbency increases, for example, drawn primarily from Ontario and Quebec prove to some degree that the model is sound.

And, as pointed out in the post, I have tested for the 2006 election in terms of seat calls, and the model had the same 34 for 36 result in BC. If the next election turns out to be in October 2012, I'll have time to perhaps go even further back.

Like Bernard, I question the EJF and North Van Liberal picks. With the retiring of Keith Martin, there will be many Conservatives who come home. It will be a ND/Conservative race, with the Libs coming in a very weak third. Langford is a small part of the riding and winning a few polls with the municipal level turnout doesn't mean anything.In North Van, where I live, the Libs don't even put a candidate in place until tomorrow and none of the potential candidates are close to Don Bell - the only Liberal to ever win this riding with a 30 year history of election wins on School Board, Council, Mayor and twice as MP until Andrew Saxton beat him.

And Johnny Quest, Andrew Saxton the current Conservative MP is not a social conservative.

You said somewhere that your model doesn't have a MOE and that all you can do is compare how the model fared in previous elections. But, actually, I think you could produce an MOE for your model using Monte-Carlo simulations, and that would also be very instructive. Here are some thoughts on how:

You could add noise to each poll according to that poll's margin of error. Even if you do not have the poll's precise error distribution, you could sample from a Gaussian distribution of the same spread. If not a safe bet, at least it's a reasonable assumption. It's called the Bootstrap technique and I've used it in my work before.

The run you model thousands of times, each time with new noise samples drawn from each poll's error distribution (or Gaussian proxy). What you get ultimately is a *distribution* of outcomes, and from that you can calculate the odds of, say, the Conservatives getting 100 seats, 110 seats, 120, etc. etc. It sounds complicated, but it's really not (get back to me if you would need some programming help!)

I think this is more or less what Nate Silver used in his 2008 model of the American presidential elections. It'd be nice to see on here, but I realize you only have 24 hours in a day, some of which should be spent sleeping!

How much weighting does a "Star Candidate" get for each riding?My riding, Sydney-Victoria, has a new Federal candidate (Cecil Clarke) who was previously in the Provincial gov't as house speaker, minister of transport, and Attorney-General.

I agree with Bernard and George on thinking you should take another look at Esquimalt-Juan de Fuca using some of your own methodologies.

I think you could apply New Star Candidate and Individual Oddities to this riding.1. New Star Candidate: Garrison's name is widely known in the community. As you may know he has ran before and beat Conservative candidate Troy DeSouza, and even came very close to defeating incumbent Keith Martin. Lillian Szpak is relatively unknown to those outside of Langford. Added to this is the fact that Jack Layton (another star factor) has come to the riding twice now to promote the NDP, one of which he held a very large rally.2. Individual Oddities: As previously mentioned, the history of the riding is very NDP/Conservative split. University of Victoria professor and Victoria's local expert on elections, Denis Pilon, believes the voters in EJDF will revert back to their traditional voting patterns: http://www.youtube.com/watch?v=zjVfOMmLE1A#t=1m07s. Other local experts have also weighed in on this is a contest between Randall Garrison and Troy DeSouza (see: http://tinyurl.com/6bgd8k8 & http://tinyurl.com/3g79lll & http://tinyurl.com/3zp3mya & http://tinyurl.com/3bovwhc )

You've applied similar changes to other ridings, so I would ask you use local knowledge to do the same in this case. Thank you.

Yes there are a few floor-crossers that resulted in similar situations:

-David Emerson, elected to Vancouver Kingsway, crossed the floor from Liberal to Conservative in 2006. This is perhaps the most similar to Esquimalt Juan de Fuca as the riding is considered to be historically very left-leaning. In 2008 he did not seek re-election and the riding reverted back to its historical patterns and elected an NDP candidate.

-Belinda Stronach, of Newmarket-Auror, crossed the floor from Conservative to Liberal and won a re-election after crossing. When she left politics in 2008, the riding reverted back to Conservative.

It appears tooclosetocall.ca, which uses a similar model to yours (i.e. star factor and individual oddity) has manually edited their projection for EJdF. He has a specific blog post outlining how it was done. http://www.tooclosetocall.ca/2011/04/modifications-to-esquimalt-huan-de-fuca.html#idc-container

As you see, my numbers are very close to yours even before I factor the current sophomore incumbent (actually the differences may be related to unknown independents who were not there 4 years ago and only got 0.25% in 2008).Given that the provincial NDP support is a bit stronger than 2011 result and that Mrs Blanchette is the incumbent, My guess is that she might increase her lead.Also the baseline for liberals (Mr Patry in 2011) is no longer there, that voting should somehow be penalized to reflect he was the incumbent and got votes for that reason.

COMMENT MODERATION POLICY - Please be respectful when commenting. If choosing to remain anonymous, please sign your comment with some sort of pseudonym to avoid confusion. Please do not use any derogatory terms for fellow commenters, parties, or politicians. Inflammatory and overly partisan comments will not be posted. PLEASE KEEP DISCUSSION ON TOPIC.

Details on the methodology of the poll aggregation and seat projections are available here and here. Methodology for the forecasting model used during election campaigns is available here.

Projections on this site are subject to the margins of error of the opinion polls included in the model, as well as the unpredictable nature of politics at the riding level. The degree of uncertainty in the projections is also reflected by the projections' high and low ranges, when noted.

ThreeHundredEight.com is a non-partisan site and is committed to reporting on polls responsibly.