Andrew Scheer is the new leader of the Conservative Party. His victory is a relatively big surprise. In terms of odds, I had him with around 20% chances (actually 25% the last time I ran the simulations but I never published these odds, so that's not fair). So that's more than what the BC Liberals had in BC in 2013 (15%) and less than what Nate Silver was giving to Trump (around 30%). This should put things in perspective.

As usual after a mistake or error of the projections, the question is: what could I have done differently? In this case, honestly, not much. Some people have called this race unpollable and while they will feel vindicated by the fact Bernier lost, I think this is an overstatement. The fact is that polls (and fundraising data) were able to predict this race fairly well. Except in one province.

So, why the surprise? The short version is that the farmers in rural Quebec managed to gather against Bernier (who wanted to end supply management). I was aware of the movement against him but data was still showing a large and comfortable lead in Quebec for Bernier. We can also mention the social conservatives (So-Cons) whose second choices ultimately helped Scheer. But the impact was much smaller than the under-performance of Bernier in Quebec. Plus, the model was actually factoring in the fact that Trost and Lemieux supporters would go more towards Scheer than Bernier.

Here below you have the results by province (the main ones) for Scheer and Bernier for the first round, compared to the latest Mainstreet polls as well as my own numbers (which were a mix of fundraising and polling data).

Mainstreet was closer (which makes sense given that they had access to data I didn't, like the number of members per riding). Still, you can see that Scheer beat the polls/projections in many places, including Ontario and BC but especially Quebec (and Bernier did worse). By the way, it's surprising that Mainstreet was closer overall but was giving lower chances of winning to Scheer than me (they were giving him 15%). I think their model simply had less uncertainty than mine. Also, remember, my model was free when their cost $100 a month just to access (at the same time, I thank them for at least providing some polls).

Let's focus on Quebec alone. In terms of points, the polls and projections (averaged) were predicting Bernier to get 3826 points and Scheer 1275, a lead of over 2500 points (in the first round and in Quebec alone). What actually happened is Bernier got only 3073 points and Scheer got 2161, a lead of of only 912!

This, right here, is why Andrew Scheer is now the leader and not Bernier. As a matter of fact, Bernier could have won with around 160 more votes in some key rural ridings in Quebec (in some, each vote represented around 2 points given the low number of members!). Things get worse for Bernier when you realize that he actually lost his own riding in Beauce to Scheer! It's literally possible that a 100 votes in his riding (plus 100 elsewhere in Quebec) could have decided this race!

So again, was it predictable? Hindsight is 20/20 but we simply did not have any indication that Bernier would win Quebec by so little. Not the polls, not the fundraising. I'm sure there are people out there saying "I called it". Well, good for them. But allow me to be skeptical. Yes we knew the farmers didn't like Bernier's plan but to think that this would be enough to cut his lead (in points) by more than half? Never.

So Bernier did worse than expected in the first round and Scheer much better. Overall however, I think the polls and projections did fairly well. Look at the last update I posted on Twitter on Saturday morning:

I think it's pretty close overall. Lemieux is too high and I knew it (I mentioned it) and Trost too low. Scheer is slightly outside of the confidence interval but remember that intervals are at 95%, not 100%.

The second part of the story is how Scheer got more 2nd choices (or 3rd, 4th, etc; I'll use the term 2nd choices or votes from now on). But as predicted, there wasn't a massive and systematic transfers towards Scheer and against Bernier. Look at this table from Andrew Coyne:

There are only really 2-3 cases where the transfers really helped Scheer: from Lemieux and Trost (and O'Toole but at this point, some of his votes were coming from Lemieux and Trost). I tweeted that Bernier's lead was stable at 6-7 points for like 6 rounds.

The model was predicting this. For instance we had that Lemieux supporters had Trost and Scheer as main second choices. Qualitatively the model was sound. Quantitatively? It was closer than the transfers observed in the actual results.

Maybe the one mistake I made was not to update enough the second choices. For instance, once Lemieux was eliminated and a lot of his supporters were with Trost, I could have made is such that even more Trost supporters would have Scheer as second choice. I didn't do that. Why? Well, first of all, these second choices were tricky to estimate and I had very limited data. Secondly, Bernier's lead was supposed to be big enough (thanks to Quebec) to survive a transfer of vote going 60% for Scheer and 40% for Bernier.

The 60-40 split is exactly the split observed yesterday after the first round until the 13th. Some of these transfers can be due to attrition (some ballots were dropped since they only had 1 or 2 choices). My simulations, in average, where increasing Sheer's points total by 9489 points between the 1st and the 13th round while Bernier's total was only increasing by 6243. In percentage, this gives 60% to Scheer and 40% to Bernier! So really, if you are looking for a reason as to why my projections failed, the redistribution of votes isn't where it happened.

At the end of the day, I think the comparison to Trump is an appropriate one as far as the level of surprise that was the Scheer victory (Note: do not read in this statement that I'm comparing Scheer to Trump. I'm not. Because I'm not a moron). Like Trump, the numbers were showing that Scheer could win, it just required a number of things to happen. A number of unlikely things based on the available information. The predictions failed to see the Trump victory because he needed to win a couple of States that were unlikely based on the polls. For Scheer, the fact that he did so well in Quebec (and Bernier so bad, relatively speaking) is by far the biggest reason the upset happened.

Also, let's remember that when my model gives a 20% chance of something happening, it means that, well, such an event should happen 1 out of 5 times. In other words, there is something wrong with a model where the favourite always win. But obviously it doesn't look (or feel) good when the upset happens.

Mainstreet just released its latest poll for the CPC leadership that is ending this Saturday. With that said, I'm not sure a last minute poll is that useful given that most members mailed their ballots already. But based on Mainstreet's numbers, even the last minute trend is favourable to Bernier (and not to Andrew Scheer). On the other hand, we finally have a regional breakdown provided for free (i.e: without paying $100). I decided to keep using my model with the fundraising data but it convinced me to update the projections. In particular, I realized I might have overestimated Bernier in Quebec. And since this province is so important, any adjustment made there can have dramatic consequences on the overall picture.

I thus updated my projections of earlier this week (read for the detailed methodology). The main result remains: Maxime Bernier will be the next leader of the Conservative Party.

Look, we can do the simulations, keep in mind how tricky a leadership race is and how the electoral system (and the incorrectly filled -and therefore invalid- ballots), but Bernier is by far the favourite here. It'd be shocking to me if he were to lose this weekend.

The number of points is really similar but the chances have changed. Since Scheer is higher in Quebec (and Bernier lower), it makes the race slightly more competitive. Still, make no mistake, for Andrew Scheer to win, he needs the poll and fundraising data to be wrong, very wrong.

As for the distribution of possible results in the first round, here it is:

I said it before and I'll say it again: if you don't believe my projections, simply think about it: Bernier will almost surely get around 50% in Quebec. Unless his vote is incredibly inefficient (remember, each riding is worth 100 points), it means Quebec alone will provide him with around 3500-4000 points in the first round. This alone pretty much guarantees Bernier to be ahead after the first round. This is, after all, more points than any other candidate is projected to get across the entire country, except for Scheer (and maybe O'Toole and Lemieux who are both right around 4000 in average).

Bernier will likely not win in the first round, which means that we'll go through a number of "useless" rounds where we redistribute the votes of the candidate who finished last. But there will be so few votes (or points) to redistribute that it won't change anything to the big picture. It will only get interesting around the 7 or 8th round (so when we'll likely be eliminating Trost or Raitt).

Again, if you think I'm too confident for Bernier, remember that I introduced a ton of uncertainty into the simulations. I also made some adjustments to the fundraising data based on the polls that are helping Scheer and hurting Bernier. I even didn't mostly used the next to last Mainstreet poll (instead of this one here) because it was better for Scheer (well, that's not really the why. I used the next to last poll because I don't think a last minute poll is useful when members have already voted).

The Mainstreet regional breakdown also shows that Bernier is ahead pretty much everywhere except in Saskatchewan (province of Scheer). Another indication of the fact Bernier is likely to win.

Some will argue that the first round isn't what really matters. Based on the ranked ballots used, second (and third, fourth, etc) choices are really important. This is only partially true. For a candidate to rally from behind, he needs to benefit from a systematic and massive report of the votes of other candidates. It likely won't happen here because members had to indicate all their choices at once. And sure, you can start arguing that Chong supporters might be more likely to support O'Toole than Bernier or Leitch based on policies, but remember that real life second choices are seldom that clear and "reasoned". You know, that's the same logic that a NDP voter can't have the Conservative as second choice. Except it happens.

I'll say this, I'm absolutely shocked that no candidate officially made a deal with another. Like telling his or her supporters to put another candidate as second choice. But hey, between this race and what Quebec Solidaire chose this weekend, it seems Canadian politicians don't like to make deals.

At the end of the day, I have no reason to believe Bernier isn't a popular second choice. And since he'll finish comfortably ahead after the first round, he will very likely not be caught and ultimately win. In my simulations, the only way Bernier loses is if the polls/fundraising data were wrong and wrong pretty much in every province.

Anyway, we'll see this weekend. Last time I tried to predict a leadership race (the PQ last year), I got the right winner (despite every single poll showing otherwise) but I was quite off in terms of percentages. At the same time, my biggest achievement in electoral prediction was when I had correctly called Stéphane Dion to win the LPC leadership. So far, leadership races have been good to me.

The Conservative Party members are choosing their new leader in 4 days. Polling data has been rare (and expensive to access). Some have even said this race was unpollable because of the closed nature of the race (only members can vote) and the electoral system chosen. This is why I decided to use the same indicator as for when I covered the PQ leadership race last year: fundraising. Mixed with the available polling data, I think Maxime Bernier is by far the number one favourite to win on the 27th.

Predicting leadership races is always tricky and people need to keep that in mind. With that said, I'd go as far as saying that his win is almost guaranteed. There are mostly three reasons for this prediction:

- Maxime Bernier is ahead in both the polling and fundraising data. And the only threat he was really facing (Kevin O'Leary) dropped and officially supported him.

- The electoral system (giving a 100 points to every riding no matter how many members it has) is giving Quebec a very important role (6% of the membres but 23% of the points!). Bernier will take advantage of it as he'll easily win his home province.

- Second choices are unlikely to allow one candidate trailing to climb back up because members have to indicate all their choices at once and no formal deal was made between candidates. Also, we have no reason to believe Bernier isn't doing well for the 2nd, 3rd, etc, choices. On the contrary.

Before going into the details, here are the projections for this race. You have the chances of winning as well as the confidence intervals (at 95%) for the first round.

Percentages of points

The entire field is here, in order:

Compared to the Mainstreet numbers, Bernier is higher. His lead in the fundraising has always been bigger than the one he had in the polls. Also, Bernier will benefit from the efficient votes from Quebec (worth a lot of points). Scheer dominates his province (Saskatchewan) but nowhere else. Actually, if I was only going by the fundraising data, Scheer wouldn't even be top 2. But I can't ignore the rising trend observed in the Mainstreet poll for him since early April. Leitch would be the main contender to Bernier based on the fundraising data, but she has been polling consistently low with Mainstreet and she's decreasing over time. Lemieux is here and this is a little bit surprising. But he has more donors than Scheer (although he raised less money). Also, the temporal adjustments based on the trend in the the Mainstreet polls (see detailed methodology below) has him increasing since April. Still, I wouldn't be surprise if the model overestimates his result but I won't modify the numbers arbitrarily. Finally O'Toole is currently in the top 5. His performance is similar to Scheer: polling better than fundraising and on the rise since early April.

Here are the possible results after the first round. This graph should convince you of the lead of Maxime Bernier. Even in his worst case scenario, he's still ahead. And that's with simulations including a ton of uncertainty, way more than what I usually do for a typical election.

If you think the advantage of Bernier isn't as big as what is shown here, think about this: thanks to Quebec, he'll likely receive at least 4000 points from this province alone, in the first round alone (he'll get 50% of the votes if not more). That is more points than all the other candidates are projected to receive Canada-wide, except Andrew Scheer! Quebec gives Maxime Bernier a huge advantage, one that is, quite frankly, most likely enough to give him the leadership. With the redistribution of the votes of the candidates once they are eliminated, we could imagine Bernier reaching 6000 points in Quebec alone, more than a third of what is required to win overall.

Also, O'Leary was polling higher than Bernier and raising just as much. But he officially threw his support behind Bernier and it seems that at least 40% of his supporters are following through. Something that isn't surprising since the fundraising data regarding multi-candidates donors was indeed showing a strong link between Bernier and O'Leary supporters.

Still, Bernier is highly unlikely to win in the first round. His polling/fundraising numbers are too low and there are too many candidates that split the vote. This means the candidate who finished last will be eliminated and his/her votes redistributed. And so on. So, could Bernier "pull an Ignatieff" and win the first round but lose at the end? It's possible but highly unlikely for multiple reasons. First of all, all the data available shows that there is no "anybody but Bernier" sentiment among the CPC members. Bernier is actually quite popular as a second choice. Fundraising data of multiple-candidates donors (people who gave money to more than one candidates) shows that. Polling data too. Secondly, Ignatieff lost to Dion for the Liberal leadership because the LPC was (back then) using actually delegates to vote. They had to physically be in Montreal and vote multiple times. So when Gerrard Kennedy threw his support behind Dion, he could literally tell his delegates who to vote for. But the CPC race this year isn't like that at all. Members have to indicate their subsequent choices right away. And we haven't seen any formal alliance between candidates. So I think it's highly unlikely that we'll see the second choices leaning heavily for one candidate. When Dion won, over 90% of the delegates of Kennedy followed through and supported Dion. This is the type of rate you need if you finish 15 points behind and are hoping to climb back. But even if the second votes (by second, I also mean 3rd, 4th, etc) were to favour Scheer 60-40, that would likely not be enough to catch Bernier.

By the way, it seems the ballots were so complicated that as many as 20% of them will not count because they weren't filled correctly. This is absolutely insane and it could potentially create some surprises (maybe one candidate will do better than projected because this candidate will have spent a lot of time making sure his/her supporters were filling the ballots correctly!).

You can see the detailed methodology below but let me simply say this: in order for Scheer to win some of the simulations, I had to make some adjustments. In particular I adjusted my fundraising numbers (so data going up to March 31st) based on the trend observed in the polls between early April to last week. According to Mainstreet, Andrew Scheer has been increasing quite a lot during that time (note: Scheer has always polled higher than what the fundraising data suggested). I used the trend in the polls not only to boost Scheer (well I adjusted everybody but Scheer is the one who benefit the most, along with Lemieux) in the first round, but in the other rounds as well (since I also adjusted the intentions among the 2nd, 3rd, etc, choices).

For Andrew Scheer to win, here's what needs to happen:

- Scheer to be underestimated in the polls and fundraising data. We are talking of a systematic bias here where Scheer overperforms the polls/fundraising in multiple provinces.

- Scheer to receive way more second votes than what the data suggests (and Bernier way fewer).

The path to victory for Scheer relies entirely on the polls/fundraising data being wrong. And quite wrong actually. So while it's possible, especially when we remember how unpredictable leadership races can be, it remains unlikely.

Methodology

1. Using the fundraising data available on Election Canada, I looked at the share of donors and shares of amounts for each candidate in each province for the first quarter of 2017. Ideally I'd have looked at the riding level but then most ridings would only have a couple of data points. Province level isn't perfect but it captures some of the disproportions introduced by the electoral system. Based on my research for the various leadership races in Quebec, it's not clear if the amounts or the number of donors is the best indicator. I thus averaged the two.

2. Given the number of points in each province (number of ridings x 100 points), I then allocate these points proportionally. As mentioned above, we would technically need to do it riding by riding. But unless one candidate has a crazily concentrated support in a few ridings in a province, my method should give us a good idea of where each candidate stands.

3. For the redistribution of the votes of the candidates that are eliminated: I used the information from the data of the donors who contributed to more than one candidate. This gave me a big matrix of cross-voting intentions. For instance, I could see how many donors contributed to both the Bernier and O'leary campaign. Again, far from a perfect measure but it's better than nothing.

4. Adjustments based on polls. As I said, the only good polling data was from Mainstreet. But I wasn't gonna pay $110 a month to have access (actually $1000 if I wanted to then use the data on my blog). With that said, Mainstreet published the data publicly on some occasions. One of them was for the period going from April 11 to 13 (thus not too far after the end of the fundraising data going up to March 31st). Another was April 29-30 (thus providing us with a good idea of what happened after O'leary dropped). Finally, they published one last week, conducted between May 11 to 14.

Maybe surprisingly, the numbers I got with the fundraising data were actually quite close to the ones from Mainstreet. Bernier would usually be higher using the fundraising data, so would Leitch, while Scheer would be a lot lower for instance. Again though, overall it was quite similar. I thus adjusted my numbers partially to the polling ones. More importantly, in order to account for the trend between early April to now, I adjusted the numbers based on the trend in the polls. I used the same adjustments for the second choices. These adjustments help Scheer quite a lot. As a matter of fact, without them Bernier was winning 100% of the simulations.

5. Simulations: the share of a candidate in a given province is randomized. For instance for Bernier in Quebec, his share is around 50%. In some simulations it's only 40% while in other it'll be 60%. The margins of error used were of 7%, thus much wider than for my typical simulations for an election. Why? Because leadership races are more unpredictable.

For the 2nd (and subsequent) choices, I also randmonized. So in some simulations many supporters of Leitch would then vote Bernier while in another simulation they'd go to Scheer. Here the margins of error are of almost 10% because we really have limited information regarding these second choices.

I repeated this process 10,000 times, each time until one candidate reached 16901 points. I then counted the number of wins to get the probabilities of winning.

What a night we had on Tuesday. So many close races, including some that will require a recount. As it stands, it seems the BC Liberals have won a minority but they are literally 9 votes away from a majority. On the other hand, nothing guarantees they'll keep Coquitlam-Burke Mountain after the recount.

I just wanted to do a quick comparison of the final projections and the actual (preliminary?) results. Overall, the projections did very well. Sure I projected a Liberal majority but a minority could not have been the most likely scenario, not mathematically. Why? Because there were very few combinations where this could happen. But the final projections did say it was close and it turned out that it really was. Also, some riding polls ultimately causes me to make a couple more mistakes. It's unfortunate but it happens.

Here below are the 10 mistakes made by the final projections. For each one, I tried to provide an explanation.

Explanations

Cariboo North

Former NDP riding where an independent (ex-NDP
MLA) caused a split of the left vote in 2013. It means the left got almost
60% of the vote in 2013. Given that Bob Thompson didn't run this time, I
adjusted the numbers for the NDP. My bad. I would do it again though as it
made complete sense. Maybe this riding just got a different trend over time
(becoming more and more liberals). I don't regret my adjustments as the estimations were showing me that.

Columbia River-Revelstoke

Technically the biggest surprise of the night as
far as winning probabilities are concerned (Liberals only had 3% chances).
But it seems some weird stuff happened there with the NDP candidate literally
being accused and convicted of defamation. I'll admit that I simply did not
follow enough of this story and it seems very unique. There as well, it can
also be because the Liberals got stronger in the interior

Courtenay-Comox

I had the Liberals up by 5. Right now this is a
mistake but let's wait for the recount.

Cowichan Valley

This one bugs me. My model was predicting the
Green to take it with 36.5% of the vote. Then we got the riding polls from
Oracle, including one in this riding showing the Green candidate in third. And all the riding polls on the islands
(including the ones from Mainstreet) were showing the Green much lower than
expected. So I adjusted. It pisses me off to have missed it because I was
spot on before the riding polls. This is really the biggest regret of the night as it would have looked very good for my model to predict the 3 Green seats.

Maple Ridge-Mission

I had a very close race leaning Liberals and it
ultimately went NDP (for now) by 120 votes. I don't consider this riding as a
"mistake" but the call was technically wrong.

North Vancouver-Lonsdale

Another close race that went the other way. Honestly, looking at the candidates, I'm still surprised the Liberals didn't keep this one.

Skeena

Liberals campaigned hard in the North while
Horgan didn't. It probably explained this one.

Surrey-Fleetwood

Another one caused by riding polls. Without the
Mainstreet poll in this riding, I had the NDP ahead and it wasn't even a
close race. Then the Mainstreet poll showed the Liberals well ahead so my
adjustments made this riding a close call leaning Liberals.

Surrey-Panorama

I had a close race, actual results was a
relatively easy NDP win. My bad. One of the failures of the night

Vancouver-Fraserview

My bad as well. I had first hand information
regarding this riding and the NDP campaign and it didn't look good at all. I
would never have projected such an easy win for the NDP. Election Prediction seemed to agree with me.

So, mistakes came mostly for two reasons: a different swing in the Lower Mainland and in the Interior, and some riding polls that were simply off.

For the first part, polls were indeed showing that, albeit with a lot of variation. I knew it was likely that the NDP would do better than expected in the Greater Vancouver. I tried to make adjustments but it was giving me a NDP victory and making this party win almost every close race (which is usually unlikely). Also, the riding polls from Mainstreet (and their numbers for the Lower Mainland) in Delta North and Surrey-Fleetwood were showing the Liberals much higher than expected. I had full confidence in Mainstreet and I thought these polls were indicative of a real effect (maybe incumbents would do better). I'm not throwing Mainstreet under the bus but their riding polls were off, both in the Lower Mainland and on the island. I talked to Quito Maggi (the CEO) and he didn't know why they missed by so much. He suggested that maybe the weights based on the 2011 census were outdated. Surrey is a fast growing city after all.

Bottom line, I thought about boosting the NDP a little bit in the Lower Mainland but also boosting the Liberals incumbents. The net result was a wash compared to my adjusted projections. Therefore I didn't do anything. The riding polls not only made me changed some ridings specifically, they convinced me not to do any other regional adjustments. When I saw that the Liberals were doing better in both Surrey-Fleetwood and Delta North, I thought it could be a significant signal that the NDP would not sweep Surrey as much as they would based on the regional breakdown of provincial polls. Also, notice that the model performed well in Burnaby and Coquitlam. It's really in Surrey and Richmond that it failed. Richmond is where the NDP increased the most! Not sure what happened but the NDP definitely got more popular among Chinese and South East Asian citizens.

I regret maybe not paying more attention to some key ridings in the North or the Interior as I could maybe have avoided the mistake in Skeena at least.

Overall, I'd say that 2 mistakes came from the riding polls (Cowichan-Valley and Surrey-Fleetwood), 3 were simply close races that went the other way (Courtenay-Comox, Maple Ridge-Mission and North Vancouver-Lonsdale) while the other 5 were due to the different regional swing (along with some weird circumstances in Columbia River).

Also, advance turnout did identify Surrey as the potential location of surprises. I wasn't sure if this was indicative of a NDP wave since the NDP was actually pretty stable province-wide. I think that in the future, if I see a region where the turnout is strongly increasing, I'll give a boost to the party that is supposed to increase there.

Overall though, as I said, I'm quite happy with the results. The probabilities also worked since 55% of the candidates projected with chances between 50 and 60% ultimately won (as it should be then!). I'll wait for the final results to do a comparisons of the percentages projected and actual.

Also, the biggest surprises were actually not when I made mistakes. For me the biggest ones were the high NDP vote in Richmond (where they almost won a seat) and in False Creek. In these ridings, I made the right call but it was much closer than expected.