With more voter data than ever, why can’t anyone call this race?

“You can build models and you can use data science, but sometimes the conclusion can be: ‘Be careful.’”

By Zach Church |
September 23, 2016

Donald Trump can win. Hillary Clinton will probably win. But high numbers of undecided and third-party voters are going to make for a volatile end to a volatile presidential campaign.

So Nate Silver isn’t calling anything yet. As of Sept. 23, Trump has about a 40 percent chance of becoming president, according to Silver’s FiveThirtyEight website. Visit the site’s election forecast for current numbers.

As in other fields, a data revolution is well underway in political polling and political campaigning. In 2012, Barack Obama spent nearly $1 billion on a campaign recognized as the most data-fueled in the history of American politics. Four years later, even mayoral candidates in small towns are using data to direct strategy, voter outreach, and campaign decisions, said Kassia DeVorsey, SB ’04, chief analytics officer at Messina Group Analytics, a data-driven political consulting firm.

Despite a wealth of data from a wealth of sources (Alvarez pointed to—and this is only part of the list—voter history files, survey data, social media, blogs, newspapers, and proprietary data such as consumer files), predicting the outcome of elections isn’t getting any easier, Silver said.

“As we are having more and more trouble collecting—this is a problem for society—data from random surveys, where now only about 10 percent of people respond to high-quality telephone polls … that means there’s more modeling that can take place, and modeling can be a good thing, but it means that more assumptions are introduced,” Silver said. “And this year you really see disagreement among the polls. I’d rather have this disagreement out in the open than have everyone herd together because they’re afraid to stand out. But still, it makes the polls a little bit harder to interpret.”

Silver, who correctly predicted all 50 states and Washington, D.C. in the 2012 presidential election, doesn’t expect to repeat.

“You can build models and you can use data science, but sometimes the conclusion can be: ‘Be careful,” he said.

Meanwhile, at campaign headquarters “I can’t really overemphasize how important modeling has become,” said DeVorsey, whose firm was founded by members of the Obama campaign staff. All the major decisions of Obama’s 2012 campaign relied on predictive modeling, she said.

DeVorsey demonstrated how political strategists can use predictive models to decide which individual voters to pursue or to choose particular neighborhoods to campaign in.

“Let's rank everyone in Ohio from 0 to 100 and that's the probability of how likely we think they are that they will vote for Hillary Clinton,” DeVorsey explained. “Let's rank everyone from 0 to 100 as to their probability of going to vote, period. ... This continues on in quite a lot of different ways: we model likelihood to vote early, to vote early in person, to vote early by mail. We model people’s interest in particular ideas. Maybe I'm someone who's strongly motivated by the idea of stopping climate change. A campaign might say then 'Ok, we believe this person is particularly interested in one issue or another, when we communicate with them we can focus on one issue over others.’”

"We also use these scores in the aggregate, for instance, [when] surrogates or the candidates themselves, travel, go around, hold rallies, they say 'Where is the greatest concentration of persuadable voters? Where should I go? We look at the data and we suggest 'This particular part of town,'" she said.

DeVorsey also acknowledged some possible negative implications of modeling by campaigns, including privacy concerns, micro-targeting resulting in unlikely voters being ignored by candidates, or data being used to promote unsavory candidates, policies, or political tactics.