None of this deterred reporters and analysts from frequently citing early vote data in the closing weeks of last year’s presidential campaign, very often taking it to be a favorable indicator for Hillary Clinton. On Oct. 23, for instance, The New York Times argued that because Clinton had banked votes in North Carolina and Florida, it might already be too late for Donald Trump to come back in those states:

Hillary Clinton moved aggressively on Sunday to press her advantage in the presidential race, urging black voters in North Carolina to vote early and punish Republican officeholders for supporting Donald J. Trump, even as Mr. Trump’s party increasingly concedes he is unlikely to recover in the polls.

Aiming to turn her edge over Mr. Trump into an unbreakable lead, Mrs. Clinton has been pleading with core Democratic constituencies to get out and vote in states where balloting has already begun. By running up a lead well in advance of the Nov. 8 election in states like North Carolina and Florida, she could virtually eliminate Mr. Trump’s ability to make a late comeback.

Initially, these reports on early voting were at least consistent with the polls: Clinton had led in most polls of North Carolina and Florida in mid-October, for instance. But when the race tightened after James B. Comey’s letter went to Congress on Oct. 28, early voting data was increasingly cited in opposition to the polls, with pundits and reporters criticizing sites such as FiveThirtyEight and RealClearPolitics for not incorporating early voting data into their forecasts. (It can be easy to forget now, but we spent a lot of time arguing with people who thought our forecast was too generous to Trump.)

So what happened? In North Carolina, Clinton won the early vote by 2.5 percentage points, or about 78,000 votes. Furthermore, about two-thirds of votes were cast early. But Trump won the Election Day vote by almost 16 percentage points. That was enough to bring him a relatively healthy, 3.6-point margin of victory over Clinton overall.

TRUMP

CLINTON

METHOD

VOTES

SHARE

VOTES

SHARE

Early (mail or in-person)

1,474,296

47.1%

1,552,203

49.6%

Election day

888,335

55.1

637,113

39.5

Total

2,362,631

49.8

2,189,316

46.2

Clinton won early voting, but Trump won North Carolina

Election day votes include provisional ballots. Includes third-party candidates.

Source: North Carolina State Board OF Elections

The Election Day surge for the GOP wasn’t anything new in the Tar Heel State, however. In 2012, President Obama had built a 129,000 early vote lead over Mitt Romney — substantially larger than Clinton’s over Trump — but had lost the Election Day vote by a huge margin, costing him the state:

ROMNEY

OBAMA

METHOD

VOTES

SHARE

VOTES

SHARE

Early (mail or in-person)

1,297,067

47.2%

1,426,129

51.9%

Election day

973,328

55.3

752,262

42.8

Total

2,270,395

50.4

2,178,391

48.4

Obama won early voting, but Romney won North Carolina

Election day votes include provisional ballots. Includes third-party candidates.

Source: North Carolina State Board OF Elections

So Clinton was running behind Obama’s early voting pace in North Carolina — which obviously wasn’t a good sign, given that Obama had lost the state. Why, then, had people taken the North Carolina numbers as good news for her? Actually, not everybody did. A few news outlets had pointed out that Clinton was running behind Obama’s pace there, and the Clinton campaign itself was worried about its North Carolina numbers.1

Still, early voting data can be easy to misinterpret. Early voting is a relatively new innovation. Traditions and turnout patterns vary from state to state, and they can change whenever new laws are passed, or depending on how much the campaigns emphasize early voting.2 Meanwhile, early voting numbers are reported from lots of different states at once. Many news outlets focused on a supposed turnout surge for Clinton among Hispanic voters while giving less attention to signs of decline in African-American turnout.3 The latter was actually more important than the former because blacks are more likely than Hispanics to be concentrated in swing states.

Furthermore, early voting data doesn’t necessarily provide reason to doubt the polls, because early voting is already accounted for by the polls. For instance, some North Carolina polls had shown Clinton losing the state despite winning among early voters, just as actually occurred.

So there are multiple interpretations of the data, but there’s not much empirical guidance on which one works best … that makes for a recipe for confirmation bias. The Times, for instance, was exceptionally confident in Clinton’s chances from the start of the campaign onward, and early voting tended to reinforce its pre-existing views of the race.

There’s also a broader point to be made about the use and abuse of data in campaign coverage. After the election, some of the pundits who had touted Clinton’s early voting numbers as an alternative to polls claimed that “the data” was wrong and had led them astray. And the Times, which had spent a lot of time reassuring its readers that Clinton would win, wrote an article entitled “How Data Failed Us in Calling an Election.”

Whenever I see phrasing like this, I mentally substitute the near-synonym “information” for “data” and reconsider the sentence. Would the Times have published a headline that read “How Information Failed Us in Calling an Election”? Probably not, because that sounds like the ultimate dog-ate-my-homework excuse. Isn’t it the job of journalists to sort through information and uncover the real story behind it?

But the thing is, blaming “the data” usually is a dog-ate-my-homework excuse. The problem is often in assuming that because you’ve cited a number, you’ve relieved yourself of the burden of interpreting the evidence. And as we’ve described in the first few installments of this series, news outlets referenced lots of data during the general election but often misinterpreted it, almost always reading it as good news for Clinton even when there were conflicting signals. They touted early voting as favorable for Clinton, even though it hadn’t been very predictive in the past and showed problems for her in states such as North Carolina. They asserted that the Electoral College was a boon for her, even though the data showed it was Trump’s voters and not Clinton’s who were overrepresented in swing states. They highlighted Clinton’s numbers in Arizona, but downplayed data showing Clinton struggling in Ohio and Iowa, which had traditionally been bellwether states. They mostly ignored data showing an unusually high number of undecided voters, which made Clinton’s polling lead much less secure.

I don’t mean to suggest that one should have gone to the other extreme and confidently predicted a Trump victory.4 Nor do I mean to imply that interpreting election data correctly is easy; it usually isn’t. (This goes for us too: FiveThirtyEight got itself in one heck of a mess in assessing Trump’s chances in the Republican primary.) But political journalism circa 2016 was in a place where there was a lot of fetishization of “data,” but not a lot of experience with or appreciation for the tools needed to interpret it — namely, probability, statistics5 and the empirical method. That made for a high risk of overconfidence in extracting meaning from the data.

Footnotes

Also, early voting data is incomplete: Many states report early voting turnout statistics by party before Election Day, but they don’t actually count the votes until election night.

To its credit, the Times did publish one excellent article on declining black turnout numbers, although it didn’t figure much into their final analyses of the race.

My view, instead — even with the benefit of hindsight — is that the preponderance of the data showed that Clinton was a favorite, just not a particularly heavy favorite.

The term “statistics” has two common meanings. There’s statistics as in nuggets of quantified information, e.g., “Tom Brady threw for 28 touchdowns this season” and “there were 17 unprovoked shark attacks in Australia in 2016.” And then there’s statistics as in a branch of science devoted to the analysis and interpretation of data, e.g., “there’s no correlation between shark attacks in Australia and Tom Brady’s passer rating.” At FiveThirtyEight, we’re mostly interested in the latter definition of statistics — that is to say, we’re interested in statistical analysis — since statistical factoids cited without context are mostly just noise.

Nate Silver is the founder and editor in chief of FiveThirtyEight. @natesilver538