The Folly of N.F.L. Predictions

By Judd Olanoff

Sept. 10, 2015

Super Bowl 50 will be played in five months. The game is so far in the future that The San Jose Mercury News has nothing better to do than speculate about whether Taylor Swift might sing at the halftime show. A long list of events needs to unfold at Levi’s Stadium in Santa Clara, Calif., before it hosts the big game on Feb. 7: like an entire N.F.L. season, for instance.

But with the season opener approaching tonight between the Steelers and the Patriots, Paul Bessire’s N.F.L. “Prediction Machine” doesn’t need to wait five months. It knows a lot about Super Bowl 50 already. Including which teams will participate (the Green Bay Packers and the Indianapolis Colts). And the final score (31-28, Packers).

So, everyone can stay home — no need to play the games.

This is not to impugn the statistical rigor embedded in Bessire’s model. The “Prediction Machine” runs 50,000 simulations of the season. It sounds thoroughly sophisticated.

Image

The odds are worse than 50-50 that you'll be able to predict a team's record within a margin of two games.CreditJonathan Bachman/Associated Press

But the question isn’t whether Bessire’s model employs a mathematically sound approach. The question is whether the whole idea of predicting — in August — five months’ worth of individual game scores and season records is, inherently, a bit ridiculous.

A review of recent predictions suggests that the answer is yes. I took the last five years of win predictions from ESPN, CBS Sports, Bleacher Report and Las Vegas oddsmakers (a combination of the Westgate Superbook and MGM Grand), and compared them with the actual results. Two patterns emerged. First, the predictions tend to look markedly similar to the prior season’s final standings, despite the fact that N.F.L. rosters sometimes undergo major off-season changes. Second, the experts proved to be only slightly more accurate than if they had taken the prior year’s results and changed nothing. Let’s call that “same as last year” alternative the Groundhog Day approach.

A good way to measure a prediction’s accuracy is by its Mean Absolute Error (M.A.E.). This method simply calculates the average difference between the predicted and actual wins for each team. The higher the measure, the bigger the error and the worse the prediction. From 2010-2014, ESPN, Bleacher Report and CBS Sports produced a 2.54 M.A.E., on average. In other words, they missed the mark by 2.54 wins per team over a 16-game season. The predictions barely outperformed the Groundhog Day strategy over the same period (2.84).

A different method called Root Mean Squared Error (R.M.S.E.) punishes egregiously poor projections more severely. This method tells the same story as M.A.E. — the experts have, on the whole, demonstrated only marginally superior predictive accuracy than the Groundhog Day benchmark.

In 2010, the CBS Sports predictions yielded a notably poor 3.19 M.A.E. CBS projected that the Carolina Panthers, coming off an eight-win season, would win 10 games. They won two. CBS said the Cincinnati Bengals, coming off a 10-win season, would win 10 games. They won four. CBS said the Tampa Bay Buccaneers, coming off a three-win season, would win four games. They won 10. CBS made 12 such errors of four or more wins in 2010.

In 2012, ESPN’s 3.66 M.A.E. actually exceeded (as in, was worse than) the Groundhog Day M.A.E. of 3.09 wins. That means that ESPN’s 2012 win predictions were less accurate than if it had visited its own website, lifted the final standings from the prior season and changed nothing.

Image

It probably makes more sense to enjoy the season rather than try to forecast it. It starts Thursday night, with the Steelers facing the Patriots.CreditDuomo/Corbis

Others have observed these trends before. The Boston Globe noted that in the 2014 season, experts failed to measurably beat the Groundhog Day method. Brian Burke of ESPN, the founder of Advanced Football Analytics and former New York Times contributor, similarly found that Football Outsiders’ 2009 win predictions were no more accurate than a mindless 8-8 prediction for every team (Burke calls this “Constant Median Approximation,” or “CoMA,” as in the prediction could have been made by a person in a coma).

Burke also discovered that the Football Outsiders 2009 predictions proved less accurate than an iteration of the Groundhog Day strategy (simply assuming that each team would win six games plus a quarter of its prior season’s wins). Burke calls that approach “Koko the Monkey.” It’s unclear whether Burke has greater respect for the predictive powers of Koko the Monkey or the coma patient. In any case, both competed favorably with Football Outsiders that season.

By measuring the predictive accuracy of some glaringly flawed or basic systems — such as the “same as last year” or “8-8 for every team” strategy — we can create a baseline error margin that any decent prediction system should never fail to beat. Chase Stuart, creator of the Football Perspective website and a Times contributor, used formulas accounting for those two simple systems and calculated that baseline error margin to be around 2.30 wins. From 2010-2014, ESPN, CBS Sports and Bleacher Report all produced M.A.E. error margins worse than that.

But the experts’ inability to convincingly beat mindless guessing strategies doesn’t necessarily mean that they are bad predictors. As Stuart points out, predicting wins and losses in the N.F.L. may just be a hopeless endeavor, even for advanced models. “I think there is just too much randomness and unpredictability in the N.F.L. from season to season, and too much in-season variance, to make the predictions much more accurate,” he said.

So if you’re a Packers or Colts fan, don’t plan on those Super Bowl tickets just yet. Unless you’re a big fan of Taylor Swift.