Thursday, May 03, 2007

Racial bias and NBA referees -- Part 1

A new study says that NBA referees are racially biased.

The study came to light Wednesday in an article by Alan Schwarz on the front page of the New York Times. To his credit, Schwarz did a bit of investigating before running the article -- he had obtained an advance copy of the study, and he sent it to three prominent economists (Ian Ayres, David Berri, and Larry Katz), who pronounced it valid.

Here's the study, courtesy of the NYT website. It's called "Racial Discrimination Among NBA Referees," by Joseph Price and Justin Wolfers.

Price and Wolfers looked at every regular season NBA game from 1991-92 to 2003-04. First, they noted that white players took more fouls than black players (after normalizing to 48 minutes on the court):

4.330 fouls/game – black players4.970 fouls/game – white players

The authors are quick to point out that this statistic, by itself, is not evidence of discrimination; the two groups are probably just different. "The difference in foul rates," they write, "largely reflect the fact that white players tend to be taller, heavier, and more likely to play center than black players."

(By the way, the Times article gets this wrong; in its second paragraph, it says "white referees called fouls at a greater rate against black players than against white players." That isn't true – both white and black referees called significantly more fouls against white players.)

But even if white players take more fouls than black players for non-racial reasons, you can still check for referee bias. There are three referees per game, and you can measure the difference in black/white fouls with more black referees, and compare it to the same stats with fewer black referees. The results:

0 white refs: whites take 0.827 more fouls than blacks1 white refs: whites take 0.675 more fouls than blacks2 white refs: whites take 0.654 more fouls than blacks3 white refs: whites take 0.574 more fouls than blacks

Now, if you're a skeptic like me, the first thing you usually do, after a result like this, is to think of other reasons that the effect shows up. For instance, in baseball, if someone found that player X hits better on weekends than weekdays, you'd say, "Aha! That's because weekday games are played during the day, and weekend games at night. So what's really being measured is day/night effects, not weekend/weekday effects!" And you'd probably be right.

Is there a similar argument that can be made in this study? Nope. The equivalent to weekend/weekday in this study is white refs/black refs. So to make the day/night argument, you have to find something that correlates with the race of the referees. But you probably won't – because the NBA assigns the referees almost completely at random!

So it doesn't matter what internal correlations there are within the rest of the data. Maybe games in certain arenas are more aggressive, and should have more fouls. It doesn't matter, because black and white referees are equally likely to be assigned to those games. Maybe different players have different styles. It doesn't matter, because black and white referees are equally likely to be assigned to those players. And so on, for anything you can think of.

Now, it *is* possible that despite the referees being assigned randomly, whites were assigned to more high-foul games just by chance. But that's what the significance tests are for, and it turns out the probability of that is very small. And, just to make sure, the authors ran a couple of other tests, and couldn't find any significant differences between the white-referee games and the black-referee games.

And, just to make completely sure, the authors ran a bunch of regressions that corrected for all kinds of factors: player characteristics, home/road, stadium, and so on. They even ran one mega-huge test, where they used dummy variables for every different player, and for every different referee! And for all those tests, the measured effect of the race of the referee was pretty much the same. And that's as expected, because, no matter how many factors you correct for, the referees were assigned randomly to those factors anyway. The only possibility is that some weird random things happened, like white referees were assigned to all the games with the basketball equivalent of Donald Brashear – but the authors' extra tests show that didn't happen.

So the differences are real. The next question is, what do they mean? And do they prove racial bias against blacks? I'm thinking about some of this stuff, and I'll probably discuss some of it in another post, which is why this is "part 1".

For now, I'll list a few more of the findings from the study:

1. The effect is mostly limited to the frequency with which referees call fouls on whites:

Blacks are called pretty consistently regardless of the race of the referee. This is kind of interesting, but I'm not sure it necessarily means anything (more next post).

2. There is a similar race effect based on the victim of the foul, not just the perpetrator. It's roughly equal in size. This means that in some ways, the effect is double what it appears to be from the above results.

Since box scores don't identify the victim, his race had to be approximated by the "average race" of the team. This increases the variance of the estimates, but they were still all statistically significant.

3. If I understand the regression coefficients properly, the real-life effect is this: Going from the extreme case of "all black players and all white referees" to "all white players and all white referees" would save a team about 2 extra foul calls a game (one less as perp and one more as victim). It also leads to a 5 point difference. (This assumes that the new white players are exactly as likely to commit the same foul-related plays as the black players. In real life, white players foul more.)

However, this 5 points is an unrealistic measure of the effect. In real life, there are 67% white referees, and five players on the court. A more realistic example: if you add replace one black player with white player, you improve by 67% of 1/5 of 5 points, which is 2/3 of a point. That's still pretty significant, but doesn't really affect strategy much at all.

And keep in mind, too, that if two teams have an equal percentage of minutes played by whites, they'll cancel out and neither team has the advantage.

Also, note that if referee bias gives whites fewer fouls, that will show up in their statistics (fewer turnovers, for instance, since an offensive foul counts as a turnover). So the "Moneyball" effect is not 100% of the race effect – some of the advantage should be priced into white players' salaries already, at least that's true for the part of the advantage that shows up in stats. For the other part – for instance, if white players are more likely to draw opposition fouls, but teams don’t keep track of that -- maybe there's a slight opportunity to exploit there.

Overall, I think the race effect doesn't really affect the outcome of the games very much. But, obviously, the study raises more important issues than just basketball strategy.

17 Comments:

I'll try to comment substansively for the next post. But for now, here's a link to the discussion on the APRBmetrics board. My thoughts: the study was well conceptualised and performed, but the reported effect sizes are very small.

Phil (or anyone who knows): Are ref assignments random at the individual level, or the squad level? i.e. is an all-white (or all-black) squad the same 3 guys working together for a season, or are the squads constantly scrambled? I think that might matter, especially given how different the single-race squads look on some dimensions.

But how often are they scrambled? That is, three black refs may be randomly assigned to one crew, and their game assignments are essentially random. But do these 3 guys work together for one game? A full season? Multiple seasons? If they stay together, then all the results for "0 white" crews may reflect the actions of just 8-9 black refs over these years. Even the all-white crew results might be impacted by 1 or 2 'rogue" crews, if they stayed together a long time.

Absolutely, and the authors of the study are very careful to check the randomness assumption. They write,

"... groups of three referees [work] together for only a couple of games before being regrouped. [E]ach referee ... is not allowed to officiate more than nine games for any team or referee twice in a city within a 14 day period."

They also compare "number of white referees per game" to various other game factors (attendance, team, etc.) and find no significance.

Got it, thanks. Sounds like very good randomization. But also makes results more suprising: I can believe that three white refs working together for a while would change their behavior, but harder to see how it makes a difference given the rotation that occurs.

BTW I uploaded my NBA officials dataset to Swivel about a month ago. (The timing of this paper was just a coincidence.) If anyone wants to take a look at it. Officials are identified by name, not race.

1. I'm looking forward to your thoughts in Part II about why white player foul delta (relative to # white refs) is 10 times greater in magnitude than black player foul delta. I imagine your response will touch on the question of "who is doing the discriminating?"

2. While the NBA wouldn't share the data from their evaluation model, we know the results of that model (which refs work the post season). Wouldn't it be interesting to pool games into groups where these "top" refs were present vs. not? Would we see the level of bias decrease as a function of NBA-perceived accuracy?

3. Would the introduction of a latin/asian player group act as a control to test how the discrimination is occuring? (maybe sample size 10 years ago too small...)

1. Yes, exactly ... suppose black referees call tighter games in general than white referees. Then the fact that black refs don't call more fouls against black players than white refs could be taken as evidence of (favorable) discrimination by black refs against black players.

That is, there are many different hypothetical scenarios that are consistent with the data.

Geez, there goes part 2. :)

2. That would make an excellent study.

3. That would be good too. I'm also thinking American white vs. non-American white would also be helpful. What if it turns out that the entire effect is Europeans, and that American whites are treated no differently from American blacks? Then it would be that referees discriminate differently by style of play, or something, rather than race.

Phil: I posted the following over at Sports Economist in response to you on the issue of whether different player "styles" could account for some/all of the report's findings. Interested in your thoughts.

"You may be right, but what if white players disproportionately commit certain types of fouls more than blacks, and other fouls less? And what if black and white refs, for cultural/historical reasons, also employ different standards for different types of fouls -- blacks quicker to call X foul, but slower to call Y? And suppose these both reflect differences in the "black" and "white" game, such that white refs are more sensitive to the kind of fouls committed more by black players. Wouldn't that then create the result the authors found? I know the models had various controls for players and position, but since they are always dealing with "fouls" as a combined category, I'm not sure the controls would catch this kind of correlation."

Phil: I agree it could also be a North American/European thing, not black/white. (Those of us who follow hockey could agree here.) Or it could be a rogue official. We know it's something.

Going back to the officials in hockey, when the 1987 Canada Cup (Canada v USSR in the Finals) happened, the Russians *insisted* on having a Canadian referee, not American (all referees were from the NHL). Imagine that. The Russians so distrusted the Americans, that they'd rather have the home referee. I think it was Don Koharski. No better complement to him, I think.

Interesting: The authors appear to have a foreign-born identifier in their dataset, as it's included in the table 2 comparison of white and black players. But it's not clear if that was included in the regression model. Could certainly play a role.....

I agree with you 100% ... I was actually going to say the same thing, in almost the same words. Now I'll just quote you.

By my comment at The Sports Economist, I should have said that player styles can't matter unless there is also variation in referee styles. I think the argument I was rebutting didn't talk about referee styles.