Sunday, September 22, 2013

Selective sampling could explain point-shaving "evidence"

Remember, a few years ago, when a couple of studies came out that claimed to have found evidence for point shaving in NCAA basketball? There was one by Jonathan Gibbs (which I reviewed here), and another by Justin Wolfers (.pdf). I also reviewed a study, from Dan Bernhardt and Steven Heston, that disagreed. Here's a picture stolen from Wolfers' study that illustrates his evidence.

It's the distribution of winning margins, relative to the point spread. The top is teams that were favored by 12 points or less, and the bottom is teams that were favored by 12.5 points or more. The top one is roughly as expected, but the bottom one is shifted to the left of zero. That means heavy favorites do worse than expected, based on the betting line. And, heavy favorites have the most incentive to shave points, because they can do so while still winning the game.After quantifying the leftward shift, Wolfers argues,

"These data suggest that point shaving may be quite widespread, with an indicative, albeit rough, estimate suggesting that around 6 percent of strong favorites have been willing to manipulate their performance."

But ... I think it's all just an artifact of selective sampling.Bookmakers aren't always trying to be perfectly accurate in their handicapping. They may have to shade the line to get the betting equal on both sides, in order to minimize their risk.It seems plausible to me that the shading is more likely to be in the direction consistent with the results -- making the favorites less attractive. Heavy favorites are better teams, and better teams have more fans and followers who would, presumably, be wanting to bet that side. I don't know whether that's actually true or not, but it's not actually necessary. Even if the shading is just as likely to happen towards the underdog side as the favorite side, we'd still get a selective-sampling effect.Suppose the bookies always shade the line by half a point, in a random direction. And, suppose we do what Wolfers did, and look at games where a team is favored by 12 points or more. What happens? Well, that sample includes every team with a "true talent" of 12 points or more -- with one exception. It doesn't include 12-point teams where the bookies shaded down (for whom they set the line at 11.5).However, the sample DOES include the set of teams the bookies shaded *up* -- 11.5-point teams the bookies rated at 12.Therefore, in the entire sample of favorites, you're looking at more "shaded up" lines than "shaded down" lines. That means the favorites, overall, are a little bit worse than the line suggests. And that's why they cover less than half the time. You don't need to have point shaving for this to happen. You just need for bookies to be sufficiently inaccurate. That's true even if the inaccuracy is on purpose, and -- most importantly -- even if the inaccuracy is as likely to go one way as the other.------To get a feel for the size of the anomaly, I ran a simulation. I created random games with true-talent spreads of 8 points to 20 points. I ran 200,000 of the 8-point games, scaling down linearly to 20,000 of the 20-point games. For each game, I shaded the line with a random error, mean zero and standard deviation of 2 points. I rounded the resulting line to the nearest half point. Then, I threw out all games where the adjusted line was less than 12 points. (Oops! I realized afterwards that Wolfers used 12.5 points as the cutoff, where I used 12 ... but I didn't bother redoing my study.) I simulated the remaining games as 100 possessions each team, two-point field goals only.The results were consistent with what Wolfers found. Excluding pushes, my favorites went 355909-325578, which is a .478 winning percentage. Wolfers' real-life sample was .483. --------So, there you go. It's not proof that selective sampling is the explanation, but it sounds a lot more plausible than widespread point shaving. Especially in light of the other evidence:-- If you look again at the graph of results, it looks like the entire curve moves left. That's not what you'd expect if there were point shaving -- in that case, you'd expect to see an extraordinarily large number of "near misses". -- as the Bernhardt/Heston study showed, the effect was the same for games that weren't heavily bet; that is, cases where you'd expect point-shaving to be much less likely.-- And, here's something interesting. In their rebuttal, Bernhardt and Heston estimated point spreads for games that had no betting line, and found a similar left shift. Wolfers criticized that, and I agreed, since you can't really know what the betting line would be. However: that part of the Bernhardt/Heston study perfectly illustrates this selective sampling point! That's because, whatever method they used to estimate the betting line, it's probably not perfect, and probably has random errors! So, even though that experiment isn't a legitimate comparison to the original Wolfers study, it IS a legitimate illustration of the selective sampling effect.---------So, after I did all this work, I found that what I did isn't actually original. Someone else had come up with this explanation first, some five years ago. In 2009, Neal Johnson published a paper in the Journal of Sports Economics called "NCAA Point Shaving as an Artifact of the Regression Effect and the Lack of Tie Games." Johnson identified the selective sampling issue, which he refers to as the "regression effect." (They're different ways to look at the same situation.) Using actual NCAA data, he comes up with the result that, in order to get the same effect that Wolfers found, the bookmakers' errors would have to have had a standard deviation of 1.35 points. I'd quibble with that study on a couple of small points. First, Johnson assumed that the absolute value of the spread was normally distributed around the observed mean of 7.92 points. That's not the case -- you'd expect it to be the right side of a normal distribution, since you're taking absolute values. The assumption of normality, I think, means that the 1.35 points is an overestimate the amount of inaccuracy needed to produce the effect.Second, Johnson assumes the discrepancies are actual errors on the part of the bookmakers, rather than deliberate line shadings. He may be right, but, I'm not so sure. It looks like there's an easy winning strategy for NCAA basketball -- just bet on mismatched underdogs, and you'll win 51 to 53 percent of the time. That seems like something the bookies would have noticed, and corrected, if they wanted to, just by regressing the betting lines to the mean. Those are minor points, though. I wish I had seen Johnson's paper before I did all this, because it would have saved me a lot of trouble ... and, because, I think he nailed it.

13 Comments:

If I followed correctly, I think there should be two results that fall out of your/Johnson's hypothesis. First, you should see that leftward shift at *any* line selection. For example, if Wolfers had picked 8 points as his cutoff, he would have been including teams with a 'true talent' of 7.5, which would bring the win percentage down.

Second, the other part of the sample (the teams favored by 12 or less for Wolfers, or 7.5 or less in my example) should have a rightward shift because they include teams that were underestimated by the line. Wolfers should have teams with a 'true talent' of 12 but given a line of 11.5, as you point out, meaning that group overall should cover more often than expected. If the lines are correct except for random error, this distribution would have to have a rightward shift so that when combined with the 'big winner' distribution's leftward shift you would be centered at 0 in the overall sample.

It looks like Wolfers didn't find that second effect, going by the figure you presented. Did you or Johnson find either of these effects? Or am I barking up the wrong tree?

Alex, here's a paper that shows the effect at two different heavy-favorite cutoffs, but not at a cutoff of 4.

http://www.cla.temple.edu/RePEc/documents/detu_10_09.pdf

I don't think you'd necessarily see a rightward shift in the other part of the sample because it's so large compared to the +12 part. That is, there might be a rightward shift, but too small to notice.

Or, no shift at all. There's nothing that says the overall curve has to be centered. It could be that the low-spread games are accurate, and the high-spread games are deliberately shaded.

The rightward shift would be smaller, but if you know how many games are involved and the size of the leftward shift, you should be able to calculate what it should be. If lines are generally correct (the efficient market idea in that paper), then you know there should be a symmetric curve in all games. If a subsample is systematically shifted to one side, the other games must counter it. Your simulation might be a good place to look since you have full control over the numbers.

I think it would be odd if there was a division in the curve. Firstly, that paper makes it sound like it isn't observed empirically. Secondly, it would mean that bookies are knowingly putting an actionable trend into their lines. I would be shocked if something relatively simple like 'take all 15+ point underdogs' was a profitable move, but that's what the claim would have to be.

I thought the playoff game analysis was particularly interesting. Is there any reason to think the shading/regression should stop happening in the playoffs?

"I would be shocked if something relatively simple like 'take all 15+ point underdogs' was a profitable move, but that's what the claim would have to be."

I am guessing that Phil was not including the juice.

It is a well-known fact that back in the day at least (not too long ago), betting on big dogs could get you close to even in almost all major sports.

That is simply because the general public likes to bet favorites and the sports books can exploit that by shading the line towards the favorite and away from an accurate line.

That is a perfect (and simple) explanation for Wolfers' findings. It is certainly more likely than, "Point shaving flourished in NCAA basketball."

As Carl Sagan used to say, paraphrasing David Hume I think, "Extraordinary claims require extraordinary evidence."

Did not Wolfers even consider that? Did he look at NBA basketball or other sports like NFL football? If he found the same effect, which he should if I and Phil (and others) are correct, wouldn't that sort of put the kabosh on his theory, since there likely is NO widespread point shaving in professional sports?

I might be misunderstanding the graphs, but I don't think so. The dashed line in Figure 2 has a bump, which is also its maximum, to the left of 0 (the text says -4.3, I think). Then there's also a bump, a local maximum, to the right (the text says around 5). There's also a bimodal distribution in figure 4, and the text says that it's in figure 3 although it's hard to make out with the error bar band. The dashed lines in figures 6 and 7 don't have any bimodal features that I can make out.

I'm curious about whether and how to account for the phenomenon of the favorite playing down to the level of its competition. As a former player and coach, I was involved in many games, on both sides of the equation, where the heavily favored team struggled even to win, much less cover the "spread". Oftentimes, whether due to overconfidence, lack of preparation, or whatever, the heavy favorite struggles and is only able to prevail because of a gross mismatch of ability. This might account for margins of victory being less than expected vis-a-vis the spread.

Trey the spread takes that into consideration (n basketball it is mostly because of garbage time - when you have a big lead, as would tend to happen in mismatched games, you take our your regulars at the end and the point differential shrinks). Anyway, we are not interested in how they make the spread, only whether it is accurate and/or biased or not. The researchers assumed that they were accurate and unbiased, and that was a really dumb assumption, that they could have easily discovered by a variety of methods. Perhaps even asked someone involved in handicapping.