Sabermetric Research

Phil Birnbaum

Sunday, September 29, 2013

Do NFL underdogs consistently beat the spread?

I learned two things from investment blogger Eddy Elfenbein this week. First, I learned that if you were invested in the S&P 500 from 1932 to 2009, you'd have made a total return of 63,000% 14,000% (not including dividends). But if you were invested only the middle 2/3 of each month, you'd have LOST money. Wow.Second, I learned that, in the NFL, heavy favorites consistently fail to beat the spread.------Since 1978, teams favored by 12 points or more were 220-275-9 against the spread. Ignoring the nine pushes, that's a winning percentage of only .444. The effect was easily large enough to turn a profit, even after the bookie's vigorish.I wondered if, maybe, this was an anomaly that existed in earlier years, but that bookies eventually caught on to and erased. But, since 2005, those heavy favorites are 64-94-2.Does the effect disappear for "less heavy" favorites? Again going back to 1978, and looking at teams favored by 6 to 11.5 points ... they were 1361-1475-57 (.480 excluding pushes). Teams favored by 0.5 to 5.5 points were 2197-2330-156 (.485). So, yes, it seems the effect is more pronounced for the heavy favorites.I broke it down a bit further, into "one point" buckets from 8.5 up. Teams favored by 8.5 or 9 points were 163-187 (.466). Teams favored by 9.5 or 10 were 186-193-12 (.491). And so on.For every one of those groups, except one, the favorites had a losing record. (The exception was teams favored by 15.5 or 16, who went 19-15 against the spread.) No favorite above 20 points has ever covered (0-7). ------Is this a known anomaly? Maybe it's just my ignorance, but I've never heard that this happens. Well, actually, I should have known ... it was obvious in the numbers for the "home underdog" effect. But, actually, it seems to applies equally to home/road. Home favorites (12+) were 194-236-8, while road favorites were 26-39-1. I'm very, very surprised. Add this to the list of arguments against NCAA basketball point shaving. If favorites failing to cover is evidence of point shaving in the NCAA, then it must also be evidence of point shaving in the NFL too, right?But hardly anyone argues that. I still think it's just a case of bookies shading their lines towards the underdog favorite. (P.S. Good discussion of bookies' lines in some of MGL's comments here.)

Sunday, September 22, 2013

Selective sampling could explain point-shaving "evidence"

Remember, a few years ago, when a couple of studies came out that claimed to have found evidence for point shaving in NCAA basketball? There was one by Jonathan Gibbs (which I reviewed here), and another by Justin Wolfers (.pdf). I also reviewed a study, from Dan Bernhardt and Steven Heston, that disagreed. Here's a picture stolen from Wolfers' study that illustrates his evidence.

It's the distribution of winning margins, relative to the point spread. The top is teams that were favored by 12 points or less, and the bottom is teams that were favored by 12.5 points or more. The top one is roughly as expected, but the bottom one is shifted to the left of zero. That means heavy favorites do worse than expected, based on the betting line. And, heavy favorites have the most incentive to shave points, because they can do so while still winning the game.After quantifying the leftward shift, Wolfers argues,

"These data suggest that point shaving may be quite widespread, with an indicative, albeit rough, estimate suggesting that around 6 percent of strong favorites have been willing to manipulate their performance."

But ... I think it's all just an artifact of selective sampling.Bookmakers aren't always trying to be perfectly accurate in their handicapping. They may have to shade the line to get the betting equal on both sides, in order to minimize their risk.It seems plausible to me that the shading is more likely to be in the direction consistent with the results -- making the favorites less attractive. Heavy favorites are better teams, and better teams have more fans and followers who would, presumably, be wanting to bet that side. I don't know whether that's actually true or not, but it's not actually necessary. Even if the shading is just as likely to happen towards the underdog side as the favorite side, we'd still get a selective-sampling effect.Suppose the bookies always shade the line by half a point, in a random direction. And, suppose we do what Wolfers did, and look at games where a team is favored by 12 points or more. What happens? Well, that sample includes every team with a "true talent" of 12 points or more -- with one exception. It doesn't include 12-point teams where the bookies shaded down (for whom they set the line at 11.5).However, the sample DOES include the set of teams the bookies shaded *up* -- 11.5-point teams the bookies rated at 12.Therefore, in the entire sample of favorites, you're looking at more "shaded up" lines than "shaded down" lines. That means the favorites, overall, are a little bit worse than the line suggests. And that's why they cover less than half the time. You don't need to have point shaving for this to happen. You just need for bookies to be sufficiently inaccurate. That's true even if the inaccuracy is on purpose, and -- most importantly -- even if the inaccuracy is as likely to go one way as the other.------To get a feel for the size of the anomaly, I ran a simulation. I created random games with true-talent spreads of 8 points to 20 points. I ran 200,000 of the 8-point games, scaling down linearly to 20,000 of the 20-point games. For each game, I shaded the line with a random error, mean zero and standard deviation of 2 points. I rounded the resulting line to the nearest half point. Then, I threw out all games where the adjusted line was less than 12 points. (Oops! I realized afterwards that Wolfers used 12.5 points as the cutoff, where I used 12 ... but I didn't bother redoing my study.) I simulated the remaining games as 100 possessions each team, two-point field goals only.The results were consistent with what Wolfers found. Excluding pushes, my favorites went 355909-325578, which is a .478 winning percentage. Wolfers' real-life sample was .483. --------So, there you go. It's not proof that selective sampling is the explanation, but it sounds a lot more plausible than widespread point shaving. Especially in light of the other evidence:-- If you look again at the graph of results, it looks like the entire curve moves left. That's not what you'd expect if there were point shaving -- in that case, you'd expect to see an extraordinarily large number of "near misses". -- as the Bernhardt/Heston study showed, the effect was the same for games that weren't heavily bet; that is, cases where you'd expect point-shaving to be much less likely.-- And, here's something interesting. In their rebuttal, Bernhardt and Heston estimated point spreads for games that had no betting line, and found a similar left shift. Wolfers criticized that, and I agreed, since you can't really know what the betting line would be. However: that part of the Bernhardt/Heston study perfectly illustrates this selective sampling point! That's because, whatever method they used to estimate the betting line, it's probably not perfect, and probably has random errors! So, even though that experiment isn't a legitimate comparison to the original Wolfers study, it IS a legitimate illustration of the selective sampling effect.---------So, after I did all this work, I found that what I did isn't actually original. Someone else had come up with this explanation first, some five years ago. In 2009, Neal Johnson published a paper in the Journal of Sports Economics called "NCAA Point Shaving as an Artifact of the Regression Effect and the Lack of Tie Games." Johnson identified the selective sampling issue, which he refers to as the "regression effect." (They're different ways to look at the same situation.) Using actual NCAA data, he comes up with the result that, in order to get the same effect that Wolfers found, the bookmakers' errors would have to have had a standard deviation of 1.35 points. I'd quibble with that study on a couple of small points. First, Johnson assumed that the absolute value of the spread was normally distributed around the observed mean of 7.92 points. That's not the case -- you'd expect it to be the right side of a normal distribution, since you're taking absolute values. The assumption of normality, I think, means that the 1.35 points is an overestimate the amount of inaccuracy needed to produce the effect.Second, Johnson assumes the discrepancies are actual errors on the part of the bookmakers, rather than deliberate line shadings. He may be right, but, I'm not so sure. It looks like there's an easy winning strategy for NCAA basketball -- just bet on mismatched underdogs, and you'll win 51 to 53 percent of the time. That seems like something the bookies would have noticed, and corrected, if they wanted to, just by regressing the betting lines to the mean. Those are minor points, though. I wish I had seen Johnson's paper before I did all this, because it would have saved me a lot of trouble ... and, because, I think he nailed it.

Monday, September 09, 2013

Acknowledging incorrect facts but not incorrect logic

What is the chance of seeing an "Original Six" NHL Final, like we did this past season with Chicago facing Boston?Well, the Finals are comprised of one team from each conference. After realignment, five of the Original Six are in the East, and one (Chicago) is in the West. Both conferences have fifteen teams.So, if all teams have an equal chance, the probability is 5/15 (One of the five reaching in the east) multiplied by 1/15 (Chicago reaching in the west). That's 1/45, or 2.2 percent.-----This came up in a recent article in Sports Illustrated:

"In the future, with only Chicago in the West, the random odds, calculated by David Madigan, chair of the statistics department at Columbia, shrink to 2.2%."

That made me a little sad, that an expert had to be quoted. I guess I can't really complain, because the writer had to do it ... even if he understood how to calculate the number -- which I suspect he did -- skeptical readers would believe he pulled the number out of his butt, if it suited their preconceptions. That got me wondering what the criterion is, for when you quote an expert as opposed to just stating a fact. "Over 82 games, an NHL team can accumulate as many as 164 standings points, according to Dr. Mary Doe, of the Mathematics department at Harvard."That wouldn't happen, right? Too simple: 82 games, times two points a game. How about,"The chance of a fair coin landing heads twice in a row is 25%, according to John Smith, mathematics professor at Yale and author of several texts on probability theory."Probably not: the reporter would probably just explain how that 25% is calculated. But what about more coins?"The chance of a fair coin landing heads six times in a row is less than 2%, according to ..."That one would probably happen.This is just from my gut ... it almost seems like there's a rule, that if it's not something simple enough for most readers to understand, you're not allowed to just state it without having a source. And that seems to be the case even if you know it for yourself. It's almost like ... if it turns out to be wrong, it's important that it not be the reporter's fault.------In that regard, it always seemed strange to me how journalism is so careful about correcting "facts," but so lax about acknowledging bad logic. Here's an example I'm making up (but based on actual articles I've seen):"Speeding on Anytown roads is at its highest level ever. At any given time, 60 percent of city drivers are exceeding the limit, according to researchers at the Anystate Insurance Institute. And the lack of enforcement exacts a hefty toll. The Institute reports that 77 percent of all fatal multi-car collisions involved at least one speeding driver. "However, at a press conference yesterday, Mayor Doe downplayed this evidence that speeding kills, and resisted calls for additional enforcement."Now, if the reporter accidentally misquoted the numbers, there would be a correction in the next day's paper:"Yesterday, we reported an incorrect Insurance Institute statistic on the proportion of speeding drivers. The correct figure is 70 percent, not 60 percent as reported. The Anytown Daily News regrets the error."But ... even if the facts were correct, the conclusion doesn't follow.If 60 percent of drivers speed, then only 40 percent of drivers aren't speeding. In that case, all things being equal, only 16 percent of two-car collisions -- 40 percent of 40 percent -- would involve only non-speeders. That means 84 percent would involve at least one speeder. But the actual number is only 77 percent. At face value, those numbers actually suggest that speeding *prevents* accidents! So, you'd expect a correction:"Yesterday, we incorrectly reported that Insurance Institute data proved that speeding is dangerous. However, the quoted facts actually show no evidence that speeding kills, and could even be interpreted as evidence that speeding saves lives. The Anytown Daily News regrets the error."That would never happen, right? You have to correct facts, but not logic. -----From that same SI article:

"So there is at least a small chance that the 2013 finals ... is not last call for the Original Six. But ... the bartender is checking his watch."

Well, it's not that small a chance. Over the next 30 years, say, there's almost a 50-50 chance of at least one Original Six final. (That's 100% minus (97.8 percent to the 30th power)). But that's the author doing his own logic, not quoting the expert. It seems like that's another rule: you have to quote a source for the raw facts, but you can say anything you want about what those facts mean. ------What's my point? Um ... I'm not completely sure I have one. Well, I guess, I find it frustrating how these things work. Because, usually, if a conclusion is wrong, whether in journalism or academia or blogs or conversation, it's because of the logic, not the facts. So the emphasis is backwards. You get in deep trouble if you accidentally omit part of your dataset, even when it doesn't change your conclusion ... but, if you get the data right and badly misinterpret what it means, it's not a big deal. I'm all in favor of getting the facts right, but not at the expense of pretending that the reasoning doesn't matter.