Election Update: When To Freak Out About Shocking New Polls

Trump has gained in our forecasts on stronger swing state polling.

At 6 this morning, Quinnipiac University released a set of surveys of Florida, Ohio and Pennsylvania with the best polling news Donald Trump has gotten in a long time. In the version of the polls that includes third-party candidates — that’s the version FiveThirtyEight uses — Trump led Hillary Clinton by 5 percentage points in Florida, 1 percentage point in Ohio and 6 percentage points in Pennsylvania.

Nonetheless, the bevy of state polls today worked strongly to Trump’s benefit overall. His chances of winning the Electoral College are up to 29 percent, from 23 percent on Tuesday, according to our polls-only model. And they’re now 33 percent, up from 27 percent, in our polls-plus model, which also accounts for economic conditions. FiveThirtyEight’s forecasts are generally conservative until late in the race, so those qualify as fairly big changes by our standards.

Ordinarily, this is the point at which I’d urge a little patience. There’s been a lot of news over the past two weeks — the conclusion to the FBI’s investigation into Clinton’s emails and the Dallas shootings of police officers, in particular — and it would be nice to see how the polls settled in after a couple of slow weeks on the campaign trail. However, we’re entering a period of rapidly moving political news. Bernie Sanders endorsed Clinton only Tuesday. Trump is expected to name his VP later this week. And then we’ll have the party conventions. The prospects definitely look better for Trump than they did a week or two ago, but the landscape also looks blurrier, and it may not be until mid-August that we have a chance to catch our breath.

So for the rest of this article, I’m going to focus mostly on the Quinnipiac polls — both to explain why our model reacted relatively strongly to them, when some of the other data wasn’t so bad for Clinton, and as an example of how you might think about “unexpected” polling results as they arise over the next few weeks. If you’re a poll junkie, this situation will seem familiar. You think you have a pretty good idea of where the race stands, but then a couple of splashy polls come out that contradict that impression. You have to figure out how much to incorporate the new polls with the data you had previously.

The FiveThirtyEight models make that calculation automatically; we just input the new data and press “go.” That’s helpful because when people (us included) rely on their intuition about how to evaluate new polls, they tend to make one of two mistakes. The more common error is to treat every new poll as a “game changer,” inventing an elaborate story about how the whole race has been upended. Often, it turns out, these interpretations don’t hold up to scrutiny, and a highly touted new poll won’t move the forecast much at all, or a poll that comes out the next day contradicts it.

But there’s also the potential mistake of dismissing a poll as an “outlier” and ignoring it when it provides important new information. This mistake is probably becoming more common because of the influence of sites like FiveThirtyEight. People have learned to trust the polling average more than individual polls — and that’s a good lesson. However, they sometimes take this a step too far, forgetting that the average is composed of individual polls. The average isn’t an excuse to ignore data you don’t like.

So let’s talk about what the FiveThirtyEight forecast models do when they encounter a new poll. I realize that you probably don’t have a model of your own, but the thought process behind this can be valuable, I hope.

Check the pollster’s track record. Has the pollster been around for a while? If so, has it tended to produce fairly accurate results? FiveThirtyEight’s pollster ratings are based on data from hundreds of elections since 1998. Quinnipiac, for example, earns a good grade; its polls have usually been more accurate than average. Thus, Quinnipiac polls are worth taking seriously — and they get more weight in the FiveThirtyEight forecast than an unknown pollster would.

Check the sample size. People underestimate the amount of noise that can be introduced into a poll by sampling error. For example, a 600-person poll has a margin of error of plus-or-minus 4 percentage points. But that number pertains to one candidate’s vote share only. The margin of error for the difference separating the candidates is roughly twice that, or almost 8 percentage points. That means in a state where Clinton is really up by 5 percentage points, about 1 in every 40 polls will show her up by 13 percentage points (!) or more. And about 1 in every 40 polls will show Trump ahead by 3 points or more. Quinnipiac’s samples are on the larger side — about 1,000 people per poll — which makes this slightly less of an issue. But the Monmouth poll in Colorado surveyed only 404 people, which makes an anomalous result more likely.

Check the dates. The order in which polls are released doesn’t always coincide with when they’re conducted. For instance, the polls Quinnipiac published this morning (July 13) aren’t especially recent, having been in the field for almost two weeks, from June 30 to July 11. There isn’t necessarily anything wrong with that — since it’s only July, you shouldn’t get too obsessed about recency. (A poll conducted on July 15 doesn’t inherently tell you much more about the November outcome than one conducted on July 1.) Still, check the dates before making inferences about cause and effect. Because many of Quinnipiac’s interviews were conducted before FBI Director James Comey’s critical comments about Clinton on July 5, it’s not clear how much his statement had to do with her poor numbers, for example.

Check the sample population. Was the poll conducted among likely voters, registered voters or all adults? (The Quinnipiac polls were of registered voters.) In some years, such as 2010, when there was a massive “enthusiasm gap” in Republicans’ favor, this explained a lot of the difference from poll to poll. So far this year, it’s not clear whether registered voter or likely voter polls are better for Trump. Be wary, however, of polls of all adults, which can produce outlier-ish results in either direction; the FiveThirtyEight model significantly discounts these.

Check what the pollster said previously. This one’s really important. Does the pollster have a significant house effect, meaning that it consistently shows better results for one party or another? Quinnipiac does have such a house effect; so far this year, its polls have been a net of 3.6 percentage points more favorable to Trump than others of the same states. This is an important mitigating factor for Clinton, although this morning’s polls still had bad news for her even if you adjust for it.1

Check the trend lines. Even better than evaluating the firm’s overall house effect, check to see how the new poll compares with previous editions of the same survey. The Quinnipiac polls all showed worsening numbers for Clinton compared with polls they conducted in the same states last month, for example. What about those Marist polls? The top-line results were fine for Clinton, but the trend lines weren’t so good. Her lead declined from 15 percentage points to 8 percentage points in Marist’s poll of Pennsylvania, for example, and from 6 points to 3 points in Ohio. By contrast, some of Clinton’s stronger polls today — particularly the Fox News and Monmouth polls of Colorado — didn’t come with trend lines, since this was the first time those pollsters were surveying the state for the general election. The FiveThirtyEight models react strongly to these trend lines.

Check out what other polls say. Most important of all, don’t look at the polls in isolation. Twelve polling firms have released data in Florida since June, with results ranging from Trump up 5 points to Clinton ahead by 13. The polling average, even with the heavily weighted Quinnipiac poll included, suggests that Clinton’s probably still a smidgen ahead there.

Hopefully, this gives you a sense of why the Quinnipiac polls moved the model’s numbers a fair amount, when other polls don’t. On the one hand, Quinnipiac’s polls have been Republican-leaning, and we have a lot of other polls of Florida, Ohio and Pennsylvania. Those factors limit the damage to Clinton. But pretty much everything else about them was bad for her. They came from a highly rated pollster, and they took fairly large samples. The trend lines were negative for Clinton.

And we shouldn’t neglect that the polls came in three really important states. Florida, Pennsylvania and Ohio have 67 electoral votes combined, whereas Virginia and Colorado, where Clinton had stronger polls, have 22.

If that’s the to-do list when evaluating new polls, there are also a few to-don’ts. Here’s what not to do when you see a potential “outlier.”

Don’t throw the poll out. There’s no way to make a bright-line distinction about what poll is an “outlier” and which isn’t, making it hard to avoid cherry-picking. And more often than you might think, the so-called “outlier” proves to have the right answer — the Iowa Senate race in 2014 is one such example. Historically, taking an average of polls — “outliers” included — produces a more accurate forecast than alternative measures such as looking at the median.

Don’t get lost in the crosstabs. Trust us — you don’t want to take the route of scrutinizing the poll’s crosstabs for demographic anomalies, hoping to “prove” that it can’t possibly be right. Before long, you’ll wind up in the Valley Of Unskewed Polls. Sample sizes are one issue. If a 600-person poll breaks out the results for men, women, Hispanics, blacks, Democrats, Republicans, older voters, younger voters and so forth, those subsamples will have extremely high margins of error, pretty much guaranteeing there will be some strange-looking results. Also, these comparisons are often circular. It might be asserted that a poll must be wrong because its demographics don’t match other polls. But no one poll is a gold standard — exit polls certainly aren’t. There are also legitimate disagreements over methodology — some polls weight by partisan identification and some don’t, for example. Although some of these debates may be important in the abstract, our experience has been that they involve a lot of motivated reasoning when raised in the middle of the horse race.

Don’t get mad because the polls disagree. There’s less to fight about when polls show similar results. But that doesn’t necessarily mean they’ll turn out to be more accurate. Instead, that consensus may reflect herding — pollsters suppressing results that they deem to be outliers, out of fear of embarrassment. Sometimes, as in the case of the 2015 U.K. general election, there’s a strong consensus among pollsters about where the race stands, and the consensus turns out to be quite wrong.

In fact, you should trust a pollster more if it’s willing to publish the occasional “outlier.” Clinton probably isn’t winning Colorado by 13 percentage points right now or losing Pennsylvania by 6 points. But the fact that Monmouth and Quinnipiac are willing to publish such results are a sign that they’re letting their data speak for itself. In the long run, that’s what leads to more accurate polling.

Footnotes

Specifically, the model subtracts a proportion of the house effect, but not the whole amount. In Quinnipiac’s case, for example, it adds a net of 2.4 percentage points for Clinton. She’s still behind in Florida and Pennsylvania even if you account for that, however.

Nate Silver is the founder and editor in chief of FiveThirtyEight. @natesilver538