Politics and statistics

March of the nerds

CAN we put the Great Forecasting Debate to rest at last? The 2012 presidential election went exactly as predicted by the leading quantitative analysts. Nate Silver of the New York Times’s FiveThirtyEight blog, Sam Wang of the Princeton Election Consortium and Drew Linzer of Votamatic all got at least 49 states right. They differed only on Florida, which all three listed as a dead heat, and which indeed turned out to be the closest race. (If it goes to Barack Obama, as seems likely, then Mr Silver and Mr Linzer will have run the table, while Mr Wang will have a single blemish on his record). Mr Silver, who has taken the brunt of the backlash over statistical methods in this campaign, has now been vindicated as the finest soothsayer this side of Nostradamus, and is enjoying a nice sales bump for his new book on the art of prediction.

Just as the criticism piled on Mr Silver in recent months was grossly misplaced, so will the praise be for his sterling showing on election night. The fact of the matter is that predicting the 2012 presidential election was hardly rocket science. By the time the voting began, the state and national polls had largely come into alignment, and Mr Obama led the RealClearPolitics polling average in every state he eventually won except Florida. Mr Silver established his reputation in the 2008 presidential primaries, when his forecasts proved impressively accurate despite highly volatile polling and voting. Since then, elections have offered far fewer surprises. As a result, there have been few opportunities to test whether the complexity of his model really adds much value compared with a simpler approach like Mr Wang’s.

But the strong performance of the publicly available polls does offer two lessons for future forecasters. The first is that pollsters’ much-criticised methodology for predicting voter turnout is working just fine. The best argument that the polls overstated Mr Obama’s support, advanced by Dan McLaughlin and Ted Frank and implemented in the “Unskewed Polls” compiled by Dean Chambers, was that they predicted a big advantage in Democratic turnout that was unlikely to materialise. In fact, exit polls show that the makeup of the electorate was almost precisely as the polls foresaw: there were a lot more Democrats than Republicans, but the independent vote went heavily for Mr Romney. This supports the interpretation offered by Josh Marshall, that a lot of voters calling themselves “independents” were really disgruntled former Republicans. (Mr Marshall speculates this group is comprised of tea-partiers who thought the GOP had gone soft; I think it’s more likely they’re moderate business-first Republicans alienated by the party’s newly strident tone). Regardless, although these voters have cast aside their party identification, they remain conservative, and preferred Mitt Romney to Mr Obama by a large margin. The conclusion is that re-weighting polls by party identification as well as demographics is a very bad idea. People can and do change their party affiliation, and if pollsters try to control for that by imposing a different turnout model on their sample, they wind up erasing the very signal—a change in the electorate’s preference—that they are trying to detect.

A second take-away is that despite Mr Silver’s reputation as an evangelist for the accuracy of polls, he probably didn’t trust them enough. The main reason why his forecast had a lower likelihood of Mr Obama being re-elected than Mr Wang’s did was that Mr Silver assigned a higher probability than Mr Wang did to the risk that the polls were simply wrong, underestimating support for Mr Romney across the board. Only once every last vote has been counted will we be able to determine exactly how close the polls were to the final tallies. But their record this year in predicting the winner in each state means there’s a good chance that forecasts four years from now will have more confidence in the polls’ reliability than Mr Silver’s did this year. That would enable forecasters to assign a high probability of victory even to a candidate with a fairly narrow lead.

Finally, the outcome should leave much of the media eating crow just as much as the Republicans are. As I wrote two days ago, the vast majority of journalists said that the race was “coming down to the wire”, “deadlocked”, “too close to call” or a “toss-up” when it was anything but. Donning my hat as the editor of Game theory, The Economist’s sports blog, I think that most political journalism now is where sportswriting was a decade ago. Starting in the 1980s, outsiders armed with calculators such as Bill James began writing that many long-held beliefs about how to win baseball games could not withstand quantitative scrutiny. In the 1990s and early 2000s, early adapters (most prominently Billy Beane, featured in the film “Moneyball”) began implementing the strategies recommended by the analysts, and were rewarded with success on the field. Only after pretty much every team in the game had hired a staff of in-house number-crunchers did the media stop confining modern statistics to isolated “nerd’s view” sections—like the New York Times’s “Keeping Score” column, to which I am a longtime contributor—and allowing figures to leach into the bulk of their coverage. For the baseball fans among you, the writers’ vote on the Most Valuable Player of the American League this year will be a good indication of how far this process has come. If Mike Trout, the statisticians’ favourite, is chosen, we can probably declare victory; if Miguel Cabrera, the traditionalists’ preference, is selected instead, we still have a long way to go.

In politics, the stakes are much higher, because the media influence the outcome as well as reporting on it. But the process of replacing fact-free punditry with empirical analysis in the press has barely begun. Mr Silver is perfectly accustomed to getting raked across the coals for daring to inject a dose of objectivity into a discussion—he was part of the original vanguard of quantitative baseball analysts (and frequently quoted in “Keeping Score”) long before he moved on to politics. In the sports world, his methods are no longer controversial, and are broadly accepted at least by most young fans. In politics, however, he remains a lightning rod.

I think it is inevitable that media coverage of politics will eventually follow the path taken by sportswriting, and that traditional pundits will be left out in the cold—just as there are ever-fewer members of the old guard, like the recently retired Joe Morgan, in baseball broadcast booths. After all, the campaigns have already been using advanced statistics for years. But it’s up to individual news outlets to determine the speed of progress. I hope to see many more references to weighted poll averages, quantitative win probabilities and betting-market odds in the pages of The Economist in the years to come.

Correction: An earlier version of this post mis-stated Josh Marshall’s theory regarding the motivations of former Republicans who now identify as independents but still voted for Mr Romney.

The political "experts" were the most obvious victims of the stat geeks in 2012, but the political journalists were equally embarrassing. They gave far too much credence to whispering campaign operatives with their secret polls, and therefore blew the fairly obvious prediction that Obama was a clear favorite. Instead, they mistakenly portrayed the race as a coin flip, and did silly things like show the safe Obama states of Wisconsin, Nevada, and even Pennsylvania as being "in play."

CNN got it wrong. NBC got it wrong. The WSJ Page 1 got it wrong. They all wrote neck and neck, too close to call, rather than the truth of Obama is in a comfortable position, yes he *could* lose but he would need to have A LOT of things break against him.

Yes, but the experts and political journalists work for broadcast networks, cable networks, news websites etc... whose interests are best served by making the horserace seem as close as possible. It drives viewers to their commercials and eyeballs to their websites. Much like a sports fan watching football on Sunday, people want to see a close game, not a blowout.

There is a valid question of motivations here. Are the political experts and journalists driven to give the most accurate prediction/report, or are they more concerned with generating revenue for their employers. I'll leave you all to grapple with that brain buster.

Not that big of a brain buster, Mr. Goon. There will always be an inherent conflict of interest between the media as purveyors of facts and the media as entertainers. Sometimes, these motives are aligned. More often than not, these motives diverge. And when they do (i.e. when profitability is threatened), the entertainment value will always trump the truth.

You're right. We should all learn to read, speak, and think in binary, only discuss quantifiable things like money, and ignore the value added by all non-mathematical forms of knowledge... or at least subjugate those who study them.

jld314,
I entirely agree with you - Verbal ability is also about communication, such as communicating the facts found via one's math ability.

A common misconception, such as already seen in this thread, is that the two abilities are mutually exclusive, a frequent sour-grape assertion made by a party who harbor the misconception, whichever ability they believe they lack.

In several states roughly a 55/45 split. If you ran the math in Ohio (this is by memory), for example, then 39% Dem plus 45% of the 31% Independent gave a majority.

I assume you are an Independent of long standing, rather than leaving the Republican party around 2010 because you prefer/revile the Tea Party and still identify as conservative, just don't like the brand Republican any more. The much-enlarged group of independents shouldn't be seen (yet) as swing voters.

GIGO: Garbage In- Garbage Out.
Meta-analysis depends on the quality of the original data.
Any data can be included if you have a sufficient margin of error.
And a dead cat will bounce when thrown from a sufficient height.

I'm disappointed that this article did not mention the prediction markets. It would have been interesting to see if the polls and the models lagged the markets. Pundits, take note of the end product of the gut when venturing to make predictions.

I addressed them in my previous post. Aside from Intrade, which is an unreliable, thinly traded outlier, the betting markets all put Obama at around 80% to win, a bit below the forecasters who were over 90. No way to know who's right there.

It makes sense that prediction markets would always be a bit below quantitative forecasts. Prediction markets include people knowingly taking a bet on the long shot, so it's hard to discern how much of the bets going for the underdog is honest money, for lack of a better term.

Michael Gerson demonstrated a very poor understanding of the construction and results of a statistical model. Josh Jordan did better but his analysis boils down to "Nate Silver is lying when he explains his methodology for changing discretionary factors in his model, instead it's all due to partisan bias." One might well have argued that PECOTA was biased towards the Tigers.

Hasn't Nate Silver dropped hints that he doesn't want to work on elections anymore?

The only reason Nate Silver will be considered a genius is because so many others were denying the obvious conclusions to be drawn from the polling data. All of the cable news pundits were claiming the race will be "razor tight" (ugh). However it was clear in the final weeks that there would need to be some kind of systemic polling flaw for Romney to have more than a longshot. Still, kudos to Mr. Silver for not cowing to the horserace blather.

The point of reporting is to give me a snapshot of whats going on in the world and a bit of insight into each campaigns process. When news channels or websites spin the data & give maligned interpretations thats a disservice. Its not helping anyone be better citizens or be well informed voters, or more often, both. Im glad the conversation is moving more toward politifact and 538 and away from bad sources of data or spin.

What are polls for? They do not help voters assess either the character or the policies of the candidates. Instead, they distract voters and, especially, journalists/pundits from doing their duty to democracy. If it's just a spectator sport, and if the result is foregone, why bother to vote? Journalists should make a pact to ignore them in their reporting just as, once upon a time, they ignored the health problems and sexual peccadilloes of politicians.

At what level? Within a campaign they tell you where to concentrate your efforts. (And so Obama did not contest Indiana despite the surprise win four years ago, which was built on a long ground game there.) For voters, beyond the horserace data we demand, they let us know where to concentrate our efforts. Can I safely vote for message-choice Johnson, or is my state to be decided by voters still on line when the polls close as in Florida? Should I volunteer to GOTV here or in a neighboring swing state?

Not really. The bookies don't make odds based on squishy feelings, their odds change based on the level of betting on the respective options (and this is a science that has been refined for centuries). There is a line of investigation that suggests that betting odds make for good predictions because people are only willing to put money on something that they feel confident of (demonstrating that they will be voting that way, and likely the majority of the people that they know).

I, for one, bow down to the awesomeness of Nate Silver. It is not as simple as averaging and weighting all the polls. It's much much complex than that. It's akin to Google in election prognostication. A lot people are interested in his algorithm.

It is complex, perhaps unnecessarily so. That's the point. I could make the odds of coin flipping very complex. I could factor in aerodynamics, the method of the toss, and historical data. Would it matter? It might improve the prediction's accuracy to some negligible degree or it might make it worse. Either way, it wouldn't be significantly different from the most simple prediction.

It is important to note that Nate Silver paired his statistical reporting with clear and cogent explanations and interpretations of the data.
He didn't just 'average the polls', he did a masterful job of putting them into context - something that many old-fashioned political reporters could do a much better job of.

According to some ['Expert Commentators' Yesterday:
-today we would have an exact electoral tie at 269-269
-Armies of lawyers examining hanging chads in Fla, VA and Ohio
-a result that may be delayed until the end of November or even as late as Jan 2013
-and a possible Romney-Biden ticket by Legislative Constitutional Decree

A great deal of Silver's prominence has to do with the right labeling him as the evil magician to be defeated, completely ignoring Wang and Linzer standing behind him saying that Silver's probability of an Obama win was way too low, ready to step forward if he fell. (Or the faceless RCP average, but that lacks the element of human drama.)

For the past few weeks it truly seemed a segment on the right believed that if they could get Silver to unscrew his polls and declare Romney the winner, then Romney would BECOME the winner by magic. The existence of other poll aggregators does not work with this belief system, and so they vanish.

If this as yet unnamed field of Political Elections Management Science(of which polling is just a part) keeps on advancing, elections themselves will become redundant or at least meaningless.

Obama´s team had a deeper grasp of this art turned science and acted on it from the very first day of his presidency, defeating by fine-tuning the broad predictions of the old, blunt GDP/unemployment/wrong track models.

For a group that doesn´t blush to consider Evolution "just a theory",to deny credence to far more esoteric and just off-the shelve science is a natural instinct.

I wonder whether that is why Romney took so long to concede defeat last night.With 76% of Ohio´s votes counted it still took him two hours to give his concession speech.The numbers were clear unless one didn´t really want to see them.

In support of the view that "Political Elections Management Science" is coming of age: 3 of the last 4 elections have been decided by margins under 4% of the popular vote(75%).In the previous 100 years only 4 were(16%).
And it is not just an American phenomenon.

I think it was something they needed to do, because such strong predictions two days before the election had the danger of dispiriting and discouraging those who would vote for Romney. I would be more understanding.

There were other hot races for Senate, House, local whatever where voter turn-out is under 50% (or even less than 30%); hence, the need to fire up the GOP base to get out and votefor Romney and the rest of GOP runners.

Maybe they have a different version of reality... or simply used the wrong methodology to arrive at their optimistic predictions... or they are lacking intellectually.
I hope they are better than all the above and just lied; it's only politics and money.

Good for Nate. He set out his methodology in June as I recall and kept it unchanged when some other polls were tweaking their methodology as they went along.

I had more enjoyment than I should have watching talky heads on Fox spout forth about Romney landslides. I had even more merriment this morning reading the frothier commentators on Red State complaining about poll inaccuracy.

The truth was there if you bothered to look. Their problem was (and is) they refuse to look outside their own echo chamber to the facts on the ground.

This is both fascinating and scary at the same time. If one can predict the results of an election, why bother to go vote?
is our opinion nothing more than a multi-dimensional Gaussian distribution with a mean value of say (fiscal liberal, social libertarian, 'insert variable here')and a given standard deviation value? If our opinion truly is that why bother with anything at all? Can we change our minds? Or will we just be within 1-sigma, and on a really bad year within 2-sigma of our mean?
Makes me wonder

I'm not saying one could end the whole voting process. I am more worried whether people are actually so predictable. Or at least their opinion.

We could actually think of people whose Gaussians would intersect in some kind of dividing line across the two main parties. These would be the swingers... there is an uncertainty there, but it would be quantifiable.

It reminds me of the Isaac Asimov short story, Franchise, where computers determine the election through the use of a single voter.

I think the challenge could lie in keeping folks believing that thir vote matters. If an election is called at a 98% chance of victory for Candidate A, does that make his supporters less likely to go vote (since it's a lock? Does it make his opponent's supporters more or less likely to vote?

If the prediction is fully trusted, what effect will that have on the outcome?