Thursday, April 26, 2012

Margin of Error

I don't think I've ever followed an election where the polls were as horribly off the mark as they were in Alberta.

Last May, when the media jumped on the "pollsters blew it" bandwagon for not projecting a Tory majority, most companies were still within the margin of error on the final vote intent numbers. Even during the 2004 federal election, the case study in pollsters "missing" a late swing, there wasn't a poll the final week of the campaign that had the Liberals behind (even if seat projections did), and most only under-estimated Liberal support by 3-5 points.

But last night? This wasn't just a case of shanking a field goal "wide right", but booting it in the complete opposite direction of the goal posts. Here's how the final polls stacked up with the results.

As most have commented, Forum's Sunday afternoon poll picked up part of the late swing but, even then, to go from a 2-point Wildrose lead and 10-point PC win is under 24 hours is shocking. It wasn't just a case of last second "strategic voting", since most polls in the final week correctly pegged Liberal and NDP support levels.

So what went wrong? I can think of 6 possibilities:

1. The polls made little effort to screen out the 43% of Albertans who didn't bother to vote on election day. Just asking respondents if they were absolutely certain to vote would have been a good start, even if few followed through on those intentions. But there are other attitudes and demographics that can help predict intent (i.e. older people are more likely to vote), and because of a lack of transparency in how these questions were asked or weighted, we have no way of knowing what steps were taken to screen out unlikely voters.

2. Building on the above point, the Big Blue Machine may have had a superior get out the vote operation than the relatively new Wildrose Party. I suspect this is part of the reason the federal Conservatives have "over performed" the polls on election day in recent years. Still, the best GotV operation will only bump you up a few percentage points, and it's not like the Wildrose Party was short of former Tory organizers, money, or volunteers.

3. The PCs had better candidates and more incumbents. Even though local candidates rarely have a big impact on the results (see Quebec, 2011), it's possible Albertans "voted" for the party and leader they wanted when asked that question on the survey, then considered the local candidates when they saw the names on the ballot. Still, once again, I can't imagine this would translate to more than a point or two at the province-wide level.

4. With voters growing increasingly disengaged and disinterested in the political process, it's possible many simply made up their mind in the voting booth. Since most polls only asked vote intent, there was little analysis in terms of strength of support, or where undecideds and soft voters might break before election day.

5. The most popular theory is that there was a "late swing" back to the PCs. This is born out by the Sunday Forum poll but, even then, a 20-point swing in the margin over the course of 5 days, or a single day 12-point swing is almost unheard of in politics. I don't doubt there was a late shift, but from what I hear, the PC Party's internal numbers showed them in much better shape than any of the media polls, suggesting that Smith's lead was never as big as it was reported.

6. So how could all those polls have been wrong? Well, if you look at that table above, you'll notice that Leger was one of the closest to the final mark, despite leaving field a week prior to the vote, before any "swing back" to the PCs was fully felt. The pollsters who overshot Wildrose support the most all used robo-diallers and online panels.

Both of those methodologies have inherent problems. You often need to make 50 to 100 robo calls to find one sap willing to complete the survey. So we know the Wildrose Party was popular with shut-ins, but that's about it. Moreover, since robo calls can only ask 5 simple questions before respondents drop off, you rarely have the opportunity to collect enough demographic information to judge how representative the sample is.

You can get those demographics using online panels, but while a national panel will have hundreds of thousands of Canadians on it, you're fishing from a much smaller pool when you get down to the Alberta level. Companies who don't frequently conduct political polling in Alberta might not have a good understanding of the biases inherent to the panel they're using, opening up the risk of skewed results.

The blame doesn't rest solely on the polling companies. The fact is robocalls and online polls are cheap to produce, and that's all the media is willing to pay for. The internal Tory polls used live callers, and asked more demographic and attitudinal questions than just vote intent - this no doubt let them verify the validity of their sample, and provided direction on what levers could cause the public to swing back to the Tory fold. There's something to be said about the old "you get what you pay for" adage, and most newspapers simply don't have the budget to invest in getting the job done right.

We'll probably never know which of the above factors were actually in play. And hell, this being Alberta, it could just be part of the deal with the devil the Alberta PCs signed long ago that ensures PC victory after PC victory.

"I hear, the PC Party's internal numbers showed them in much better shape than any of the media polls, suggesting that Smith's lead was never as big as it was reported."

I wonder what Wildrose internal numbers were showing them?

Is it not possible that there was significant Wildrose bias in the polls simply because Wildrose supporters seemed to be the most angry of the electorate and thus were most likely to bother answering the robocalls? I really wonder who bothers to answer polls anymore what with all the constant calling regardless of my number supposedly being on a no call list. It seems to me we're at a point where only those desperately wanting to let the world know how they're voting are going to bother and I can't think of a more eager group than supporters of an upstart party of disgruntled former members of the ruling empire.

"So we know the Wildrose Party was popular with shut-ins, but that's about it."

"Is it not possible that there was significant Wildrose bias in the polls simply because Wildrose supporters seemed to be the most angry of the electorate and thus were most likely to bother answering the robocalls?"

Self-selected participants in polls never results in proper and significant data.

I received many robocalled polls but, unlike many others, I decided to reply knowing that I would screw them up a bit.

Is it possible that some people who had been contacted by pollsters decided to have some fun?

What I mean is that perhaps some people who would normally hang up or tell the pollster to f-off decided to tell the pollsters and their media clients what they wanted to hear.It wouldn't take that many people to do this nor would it have to be conspiratorial.

The narrative for quite some time was that this was an election for the ages.So, a few people just feed this nonsense by lying to the polling firms.

Anon - I can't imagine there being that much widespread "mischief" (I feel like most would answer "communist party" or something of the sort if they wanted to make a joke of it), but there's something to be said for people getting caught up in the "change" momentum, and then pumping the brakes when it came down to making the actual decision.

So while the Tories over-performed their poll numbers, they did so in Calgary and the rest of Alberta in particular. To me, that suggests a pattern you would expect to see as the result of a very successful GOTV drive. The Tories surged in the places where they needed to surge.

If the polls were wrong because of bad likely voter screens, it isn't clear to me why they would be more wrong in Calgary and rural Alberta than in Edmonton.

That also suggests that strategic voting can't explain the last minute PC surge. Liberal and NDP support was already low, and concentrated in Edmonton, not the places where the Tories gained most.

Personally, I beleive that media outlets, through slavishly following EVERY SINGLE polling company press releases, have reduced election campaigns to a horse race, which does a dis-service to the electorate.

I appreciate that with the consolidation of media ownership, and excessive concern about the "bottom line" (because really, media IS a business, not a public service!), it is tempting to just take the information that one is given by polling companies.

But this in itself perhaps diminishes the importance of elections, and issues within the election. Could that be why voter turnout is lower across the board than 20 or 30 years ago?

Perhaps media outlets should invest more money into "shoe leather" and doing some really thorough and comprehensive analysis of the ISSUES!

CalgaryGrit: The pollsters who overshot Wildrose support the most all used robo-diallers and online panels.

Barely. Compared to how far off the mark all the pollsters were, the difference between types of methodology was minimal.

Don't forget the most accurate poll (Forum) used IVR, and Leger's last phone interview poll actually showed Wildrose support trending up sharply compared to their previous one.

I'm also more than a little skeptical of Bricker/Wright's attack on other polling methods, given that their company's (phone interview) poll was the least accurate predictor out of all the pollsters in last year's Ontario election.

"I think Wildrose was lucky, all things considered. One more week and they'd have been in the single-digits."Uh, yes.___________________________________________________

For future reference, ALL Alberta reporters should remember this:Twenty Questions Journalist Should Ask About Poll Results

1 Who did the poll?2 Who paid for the poll and why was it done?3 How many people were interviewed for the survey?4 How were those people chosen?5 What area (nation, province, or region) or what group (teachers, lawyers, contractors, PC/NDP/Lib voters, etc.) were these people chosen from?6 Are the results based on the answers of all the people interviewed?7 Who should have been interviewed and was not? Or do response rates matter?8 When was the poll done?9 How were the interviews conducted?10 What about polls on the Internet or World Wide Web?11 What is the sampling error for the poll results?12 Who’s on first?13 What other kinds of factors can skew poll results?14 What questions were asked?15 In what order were the questions asked?16 What about "push polls?"17 What other polls have been done on this topic? Do they say the same thing? If they are different, why are they different?18 What about exit polls?19 What else needs to be included in the report of the poll?20 So I've asked all the questions. The answers sound good. Should we report the results?___________________________________________________

A recent Alberta poll claims that Vancouver Canucks will win the 2012 Stanley Cup over the Boston Bruins.

Could partly be luck of the draw. I mean, they're always saying that X poll is right within Y %, 19 times out of 20. So, OK, it's more wrong than that 1 out of 20 times--and that's the old fashioned reliable kind. So the chance of a few polls all being really wrong is admittedly vanishingly tiny, but there are a lot of elections and a lot of polls. This could just be the election where lots of pollsters had bad luck in the sample.

On the other hand . . . I know that in some countries, they routinely falsify polls because there's a major party the pollsters back or want to get rid of. So e.g. in Venezuela, before you pay a gram of attention to a poll you have to check the polling company, because many of them are more enthusiastic about Chavez-bashing than they are about professionalism. I'm pretty sure that kind of thing hasn't been widespread in Canada, but significant electoral fraud hasn't been heard of in Canada for a long time either--yet here we are. So maybe a company or two was fudging the numbers for their Harpercon-connected buddies.

I am of the opinion that Allan Greg was on the right track when he referred to some of these 'pollsters' pretending they had a representative sampling when they didn't.

All through the election published polls were based on a sample size that would be appropriate if there was a uniform response throughout the province, even though the media analysis usually referred to three separate elections in Alberta.

If you look at the post election results that analysis of three elections was overly simplistic. Edmonton and Calgary are not monolythic areas but the 'rest of Alberta' shows even more variation. If you were going to pick this up in a poll you would have needed appropriate sampling for Edmonton, Calgary, Southern Alberta rural, Northern Alberta rural, Lethbridge and Red Deer.

There has already been extensive discussion on the failures of sample populations for a variety of reasons. If you have a hetrogeneous population that you are assuming is a homogeneous population you are going to have even more problems and distortions in your forecasts.

Purple library guy: So maybe a company or two was fudging the numbers for their Harpercon-connected buddies.

Suuuure they were. Somehow all the polling companies got together and decided to put out fake results to boost Wildrose (even though that most likely hurt them when it came time to actually vote), damaging their professional reputations and future business prospects in the process.

Getting back to reality, Angus Reid has released a survey which appears to support the "late swing" theory. They found that almost 40% of Albertans made up their mind on the final weekend, including 23% on Election Day itself.

The survey also says that 43% of PC voters considered voting Wildrose at some point in the campaign.