Sabermetric Research

Sunday, November 25, 2012

Another NBA race study

A recent academic paper claims to prove that NBA coaches discriminate in favor of players of their own race, by giving them extra playing time. (It's been mentioned in the press here and here.)

Unlike some of the other race studies I've written about, where the problems were subtle, this one is obvious.

------

The authors start by showing differences between white and black players. From 1996-97 to 2004-05 -- the seasons covered -- the black players performed better than the white players. In the average of their previous 20 games, they scored 1.4 more points per 48 minutes, and had more assists and steals. The white players, on the other hand, committed fewer turnovers, and grabbed more rebounds.

The white players' advantages seem smaller, and that's borne out by the fact that the black players got 4.8 minutes more playing time per game.So, the black players seem generally better than the white players.

Having shown that, the authors now run a regression on a whole bunch of stuff -- including performance stats -- to predict minutes played.

Before the regression, the black players got 4.8 minutes on the floor than the white players. After the controls, though, it goes the other way: being black *decreased* playing time by 2.9 minutes. Clearly, the regression doesn't do a particularly good job predicting minutes played.

Remember, the most basic comparison possible showed that the black players played 4.8 minutes more than the whites. After controlling for everything in the regression, the difference is still 2.9 minutes. Even after all that regressing, there still appear to be large unexplained differences between whites and blacks.

It's pretty obvious why this might happen: playing time isn't linear. If you have twice the per-minute stats of Kobe Bryant, you're not going to play 80 minutes a game. And if you have only 1/10 the stats, you're not going to play 4 minutes: you're going to be out of basketball.

So the model is very poor. And, since whites and blacks appear to be very different in their statistical characteristics, the model is inaccurate for them in different ways.

So if the black players get 2.9 minutes less playing time than the regression thinks they should, it's probably that the model overweights the things black players do, and underweights the things white players do.

In summary, the model the authors used overpredicts for black players -- even ignoring the race of the coach.

-------

So, what happens when the authors include a dummy for the player and coach being of a different race?

Well, most coaches are white, and most players are black. So, "white-coach/black-player" is going to be much more frequent than "black-coach/white-player". If the ratios are 70/30 in both cases, the "different race" bucket is going to be 84 percent black players.

And we know the model overpredicts for black players. And that's why, when the player is of a different race than the coach, he gets less playing time than the model thinks he should. Because, 84 percent of the time, he's black, and the model is biased too high for black players.

It's not necessarily race bias at all; it's just a consequence of having a bad model.

Monday, November 19, 2012

Bicycle helmets III

Last post about bicycle helmets, I mentioned that their defenders talked about risk reduction, but never brought out any actual risk numbers. So let me do that now.

Before I start, let me ask you: how low would the risk have to be for you to withdraw your recommendation that I wear a helmet? You can't insist on zero, because then you'd have to wear a helmet as a driver and pedestrian. So you must have some minimum risk level in mind. Maybe not a real number, but an intuitive idea.

So, let me ask you: suppose a helmet saved 100 lives, per million people, per year. Is that low enough that you'd realize it's not worth it? What about 50 lives? 10 lives? 1 life? Half a life? One-tenth of a life?

Think about it, before you read on and I look at some actual data.

-------

During a Google search, I found this Coroner's Report from the Province of Ontario, that looked at all bicycling deaths over a five year period, 2006 to 2010. I'm sure there are other sources of data, and I expect the numbers to be roughly the same. (In any case, I haven't analyzed many multiples of sources and picked the one most favorable to my argument. And, indeed, even if the numbers were off by double, or half, I don't think it would make that much difference.)In that five year period, there were 129 cycling deaths in Ontario. The report breaks them down in a number of ways, including by helmet use. It says:

"In this Review, only 34 of 129 cyclists (26%) sustaining a fatal injury were wearing a helmet. Of particular concern was that observation that, despite existing legislation, only 1 of 16 cyclists (6.25%) under the age of 18 who died were wearing a helmet."In 71 of the 129 cases (55%), the cyclist sustained a head injury which caused or contributed to their death. In 43 of those 71 (60%), a head injury alone (with no other significant injuries) caused the death. Those whose cause of death included a head injury were three times less likely to be wearing a helmet as those who died of other types of injuries."

As an aside, I find the presentation a bit biased ... it seems more an attempt to promote helmets than to give us the actual data. I'd expect that in an op-ed or an internet comment, but not in a coroner's report. (And, what's with "three times less likely"? Taken literally, that means that some people were brought back from the dead. I'm going to assume they mean "one-third as likely.")

Anyway, I had to solve a couple of equations to get what must be the real numbers:

-- of those who died (at least partially) of head injuries, 10 of 71 (14%) were wearing helmets.-- of those who died of other causes, 24 of 58 (41%) were wearing helmets.

That makes the numbers work out: 41% is around three times 14%.

So, how many lives would helmets have saved?

Well, if helmets had nothing to do with head injuries, we'd find 41% of riders who died from head injuries had been wearing helmets. We'd still have 10 deaths from the helmet wearers, but now we'd have only 14 non-helmet wearers (that is, we'd have 10 of 24, rather han 10 of 71). So, the extra 47 deaths would have been averted by helmets.

(This is consistent with other reports that helmets cut deaths by 30 percent: 47 deaths out of 129 works out to 36 percent, which is close.)

Those 47 deaths were over five years, so around 10 deaths per year.

The population of Ontario is about 13 million. So, the annual death risk from not wearing a helmet is around 1 in a million. Not everyone is a bike rider, though, so let's maybe call it 1 in 250,000.

-------

To put that in perspective: for driving, the death rate in Canada is 9.2 people per 100,000, which is around 1 in 11,000. You're around 20 times more likely in a car crash than you are to die a non-helmet-related death in a bike crash.

Therefore, the increase in safety from wearing a bicycle helmet is the equivalent of improving your driving safety by 5%.

A more intuitive way to equate the risk is by driving distance. There are 8.2 fatalities per billion miles driven. To get the equivalent risk of riding helmetless for a year -- 1 in 250,000 -- you only need to drive 487 kilometers -- once. That's 300 miles. One five hour drive.

I drive from Ottawa to Toronto around once a month, for about 500 miles. To those of you who are scared for my helmetless head: why aren't you scared about my monthly trip? Why aren't you also demanding that I take a plane, or a train?

For those of you who worry about my helmetless head costing tax money in medical care ... if I eliminate one trip to Toronto, will you now be happy, since I'm actually saving you more money than if I wore a helmet all year?

-------

Now, you might say, a 5 percent increase in driving safety is a lot! Well, you want to know an easy way to increase driving safety by 5%? Don't drink.

The legal breathalyzer limit, most places, is .08%. In Ontario, they take your license for a day if you blow over .05%.

But, .03% is perfectly fine. Aside from the most rabid MADD activists, nobody's going to demand you take a cab at .03%, which is just a couple of drinks. They'll just say, "be careful".

Well, those two drinks increase your risk by more than 5 percent. You can see why, if you criticize me for not wearing a helmet, when you happily drive home at .03%, I'm going to be irritated at your double standard.

Oh, and when I say "more than 5%", I mean, according to this study (.pdf), 190%. In that light, I'm sure even ONE beer will increase your risk more than 5%. If you assume a constant risk increase (like compound interest), a level of .01% raises the risk 43%. That's not even a whole beer ... well, maybe a whole beer in Oklahoma supermarkets.

-------

The Ontario report also notes that in 21 of the 129 deaths, the cyclist was encumbered by unwieldy cargo, like shopping bags hanging from the handlebars. That's 16 percent of deaths.

My own observation is that far, far fewer than 16 percent of cyclists are doing that. Let's suppose it's 8 percent, which is probably still too high. That would suggest that shopping bags increase the risk by around 100%. Actually, it's probably much more than that: if you're wobbling about with groceries, you probably aren't out for a 25 mile ride. I bet the risk is at least triple.

Helmetlessness, on the other hand, is only around 40%.

Oh, and, by the way ... I don't ride encumbered. I have panniers. So you can reduce my risk by 16 percent right there.

It looks like cars are the main danger, doesn't it? Suppose you don't drive on roads much, just on bike paths (like I do). You'd think the risk would drop by 80 percent. That is: driving on roads is 300 percent riskier than driving off road.

I see lots of cyclists using the road, when there's a perfectly good bike path right beside them, that goes roughly the same place. Why aren't you upset about their risk?

More important for me, you have to discount *my* risk? Instead of 1 death in 200,000 per year, it's probably ... well, it's probably around 1 death in a million per year. Generally, I'm only on major roads when I'm crossing them, or using them to get from one bike path to another.

So, even if I had been riding my bike, without a helmet, since the birth of Jesus ... I'd still only have had a 1 in 500 chance of being killed, in that time, by my bare head.

-------

Last post, commenter Swoods made me realize that we can quantify the risk pretty well. Economists have studied the value of a life, in terms of how much extra money people are willing to give up, or earn, for a given probability of death. (Like, for instance, coal miners, who have a higher risk than, say, gardeners.)

A typical number is $6,000,000. That sounds reasonable.

(For instance: there are around 47,000 underground coal mining jobs in the US. Thirty deaths a year is typical. At $6 million per death, miners would have to earn an extra $4,000 or so to compensate them for the risk. That doesn't seem out of line.

Other such calculations come up with similar, reasonable numbers.)

Suppose my helmetless risk is 1 in 500,000. That's $12 a year.

But, to be honest, I probably value my life at more than $6MM. So let's double it, to $24. Let's double it again, to take injury into account, instead of just death. Now we're at $48.

I'm willing to accept that. It's not an unreasonable risk I'm taking, at all, $48 worth to not wear a helmet. In fact, I bet if someone invented an invisible "force field" helmet, that let you get all the safety of a helmet without having to wear one -- and it was $48 a year -- many of you would buy it.

------

It occurs to me, as I write this, that this is perhaps a reasonable framework for deciding what risks are OK and which aren't. Look at the cost to the risk-taker, and the benefit. $48 a year for helmetlessness is cheap. $4800 wouldn't be. I'd probably be willing to consider helmet laws if the risk were 100 times as high.

According to this story, not wearing a seat belt makes you 47 times as likely to die in a car crash. That gives you a 1 in 100 chance of dying per year, assuming you drive 25,000 km. That's huge, much higher than I would have expected. It's $60,000 worth of risk, compared to only $1,230 for the rest of us.

So, I can see why you might want to nag a loved one who won't use his seat belt, and even pass a law. It's over a thousand times different.

------

The numbers show that the risk of riding without a helmet is quite low.

In terms of added risk, it's 8 times as "riskier" to have a beer before driving. It's more than twice as "riskier" to ride a bike at all (compared to staying home). It's probably twice as "riskier" to speed in a car by 5 mph. It's ten times as "riskier" to let your teenager drive you to the mall, instead of you driving yourself (same .pdf as before). And it's more than 100 times as "riskier" to not wear a seat belt, which is the activity that it's most compared to.

Even if you never do any of these things -- and, by the way, I won't believe you if you tell me you don't -- I'm sure I can find other things in your life that are riskier than riding without a helmet. Your diet, your sex life, your hobbies, your vacations ... you take comparable risks all the time.

So, again, why are you picking on bicycle helmets? Why are your risks OK, but not mine?

Saturday, November 10, 2012

Bicycle helmets II

There's a lot of bicycle helmet activism going on these days ... it's a big issue, trying to force riders, by law or moral suasion, to wear a helmet.

I don't get it. What's going on? Why are my friends are so horrified when I ride bare-headed, especially when there are lots of other things that don't horrify them? In fact, I bet they'd laugh at me if I decided to wear a helmet when driving, even though, to me, it doesn't seem that much different.

Last month, I asked for a "framework" for how you decided that helmets should be worn (for those of you who believe that). I figured this was a good audience to ask, since we're a group that has repeatedly tried to set frameworks for other questions, like Hall of Fame credentials. We criticize those who want Jack Morris in the Hall, because we don't see any principle by which Morris belongs but, say, Rick Reuschel doesn't. If you consider that helmetless riding belongs in the societal "Hall of Shame," then, I asked, what's the principle by which helmetless driving doesn't?

I didn't really get any serious answers, either here or at Tango's blog. I did get a few responses that tried a justification, but that don't hold up.

One answer was, riders should be forced to wear helmets because, when they get injured, it imposes costs on the rest of society -- such as, for instance, the cost of medical care. But ... almost all activities impose costs on others. People get hurt riding even with a helmet, and people get head injuries in car accidents. Why are those things OK for society to pay for, but not riding without a helmet?

In fact, I bet just one baseball pitcher costs society more than ten helmetless riders. Arm injuries are pretty much inevitable for pitchers. How many extra medical appointments do you think the average pitcher uses up? But we don't ban pitching.

At best, the "it imposes costs" explanation is just question begging. At worst, it's an attempt to deflect the question by asserting a principle that we're not actually willing to follow.

Another answer I got was, "because that's what society determines is unacceptable." Which, again, is begging the question. No religious activist would accept that abortion should be legal because "society" has "determined" it's "acceptable". And no gay person would accept those reasons for why he shouldn't be allowed to marry his partner.

This answer is actually indistinguishable from "because we say so." It differs only in that it uses bigger words that make it sound like it means something meaningful and important.

A third answer I've heard is, "I had a really bad accident and a helmet saved my life." A good friend here in Ottawa actually told me that -- he wiped out and broke his collarbone, among other injuries, and it took him a long time to recover. The crash broke his helmet into two pieces, he said.

But ... that's not really a principle. It's an anecdote. I think everyone should understand that one person's experience shouldn't bind the rest of us.

Just to hammer home the point ... here's an actual news story about a child that was shot while riding in a car -- but she survived because a bible slowed the bullet down. Her sister was spared because the bullet later lodged in a watermelon on the child's lap.

Should we legislate that we all carry watermelons on our laps from now on?

-------

With respect, these rationalizations are pretty weak. It does seem to me like many respondents started with the conclusion -- *of course* you should wear a helmet! -- and then tried to build a set of principles around it, which didn't test out.

I think it also goes to show that the issue is sensitive and emotional, when a community that's so used to objectively analyzing conventional wisdom still comes up with these kinds of responses. If someone had made these arguments about wearing a helmet when *walking*, I think the flaws would be pointed out in seconds.

What I was expecting was something to do with risk, with costs and benefits. And that did come up. A few people pointed out, reasonably, that the risk of helmetlessness was higher for bicycling than for driving. From there, I expected the discussion to start talking about the actual risks. Maybe someone would look up some numbers, and say, "helmets would save X lives for bicycling, but only Y lives for driving, and I believe the cutoff for forcing people to wear them should be between X and Y." And then, we could go from there, trying to figure out what the threshold should be, just like we try to figure out the threshold for Hall of Fame admission.

But that didn't happen. I don't think *anyone* talked about how big the risk has to be, even in broad terms. I don't think there was any serious attempt to set a principle.

As for benefits ... I think someone mentioned that helmets in cars were a bigger inconvenience -- and therefore bore a bigger cost -- than the inconvenience of helmets for cyclists. I'm not sure about that ... why would it be different? It's the same helmet! Regardless, if you acknowledge that comfort and convenience are factors, then, shouldn't you try to figure out how high the costs are for cyclists? Clearly, riders without helmets at least appear to have different preferences than other riders. So the costs aren't obvious.

And people are different. If I love cucumbers and you don't, and I want to pass a law forcing everyone to eat cucumbers because they have health benefits ... it'll take you about ten milliseconds to tell me, emphatically, that you hate cucumbers and who am I, who loves them, to force my preferences on you? But, for bicycle helmets, the issue never even came up.

-------

As for the costs ... on almost any scale, the risk of helmetless cycling is small. Here's an article that claims a helmet can save around 30 percent of cycling deaths. (It's not clear from the article, but I believe that's comparing 0% helmets to 100% helmets.)

Is 30 percent big? No way. 30 percent is nothing. I'm pretty sure that riding at night increases risk by at least 30 percent. I'd bet you that riding in rush hour traffic increases risk by at least 30 percent. I'd bet you that riding fast increases risk by at least 30 percent. In fact, I'd bet riding on the roads, instead of bicycle paths, increases risk by at least double.

Do you want to ban those, too? If you don't, then, since the cost higher, you must believe the benefit is higher, too. Fine. But it's different for different people. I don't like riding at night, and I seldom do it. Banning night riding costs me almost nothing. Banning helmetless riding costs me a lot.

If you want to allow night riding but not helmetless riding, logic forces you to admit that you are weighing someone else's benefit more than mine. You're not tallying the actual cost to me, but, rather what you think the cost to me *should* be. I *should* be willing to ride at night, because, that's worth the risk! But not riding without a helmet, because that's not!

In fact, it's the very definition of unfair: banning something for no reason except that we believe you shouldn't want to do it.

Now, you could come back with this response: OK, we acknowledge that you hate wearing helmets, and the risk isn't that big. But, Phil, you're just being obstinate. Once you wore a helmet, you'd get used to it, and your cost would drop to zero! Long term, the benefit way outweighs the cost. The law is our way of forcing you to give it a try!

That's the most reasonable argument I can think of, and I was expecting to get that from my original post. And, you know, it may be true, that once I started wearing a helmet, I'd get used to it.

But I might not. And even if I do, who's to say that the benefit even outweighs the temporary unpleasantness?

More importantly, do you want to live by that principle for everything? Are you willing to submit to it yourself?

"Candy is unhealthy, so we're going to ban it. You may not want us to ban it, but you'll get used to it and be glad you're not eating it." "McDonald's is icky, so we're going to ban it. You might not like eating at nicer restaurants at first, because they cost more and they're slower, but eventually you'll get used to it and not mind the extra time and money." "PBS political specials may be boring, but they broaden your mind and make the population better informed, so we'll fine you if you don't watch them. Don't worry, eventually you'll actually start to enjoy them."

There are literally hundreds of things you do in your life that increase your risk of something bad happening, and many of those you'd eventually get used to not doing. That doesn't mean it's right for me to ban them.

-------

And what about just riding less? A safety factor of 30 percent means that seven trips without a helmet is just as risky as ten trips with a helmet.

Suppose your spouse or child is the one commuting without a helmet, and that concerns you. You want him to wear a helmet, so you don't worry. You know he really, really hates helmets, and he'd never get used to one, but you want him to wear one anyway.

He things about it, and finally he says,

"You're right, the risk of riding without a helmet is higher, and I know you're concerned about my safety. But I hate helmets, so I'm going to reduce the danger another way. Instead of commuting to work by bike five days a week, I'm going to commute to work only three days a week, and take the bus the other two days. So I'll reduce my risk, not just the 30 percent you wanted, but a whole 40 percent! That's even though I'd rather ride all five days. See how much I love you?"

Would you be happy? Remember: the overall risk is LOWER than if he commuted daily with a helmet.

I'm betting you wouldn't be much happier at all. Can you say why?

-------

Here's another test. How fast do you drive? Faster than the speed limit?

If you drive 10 km/h (6 mph) over the limit, then, according to an Australian study, you increase your chances of having an accident -- not by 30 percent, but by over 400 percent. Now, that sounds too high. Another estimate (on the same web page) suggests it's only 30 percent, which sounds too low.

I bet the real number is somewhere in between ... but never mind. The point is, speeding is riskier than riding without a helmet, relatively speaking.

And, yes, I know speeding is already illegal. But it doesn't carry the same social stigma. I have had friends berate me for not wearing a helmet. If I berated them for driving 10 km/h over the limit, they'd think I was a creep. They'd tell me to mind my own business, and they'd probably avoid me. And they'd be right ... nagging someone for speeding is socially unacceptable.

That's true even though other people have a vested interest in your highway speed, since you're putting the rest of us at risk, not just yourself. And, even though that risk is higher than for riding a bike without a helmet.

My friend feels like he's doing a good deed by nagging me ... but I'm a creep if I nag back.

Clearly, it's not about the risk. It's something specific about the helmet. It's arbitrary.

Monday, November 05, 2012

Arguments vs. studies

Last post, I argued that if Toyota would have raised the price of a Camry by one cent, it would have sold two or three fewer cars last decade. My argument went something like this:

"If Toyota raised the price by $20,000, it would have sold almost no cars, which is four million fewer than it did. That works out to two cars per penny, on average. You could argue that certain pennies had a larger effect, and certain pennies had a smaller effect, but the average has to be two cars per one cent increase. And there has to be at least ONE penny of increase that changes the expected number of cars sold, otherwise you'd never get from four million to zero."

I was inspired by a post from Bryan Caplan, that had come to the same conclusion with a different argument. Caplan's post was an example intended as a follow-up to one of his tweets:

"In social science, the best arguments prove more than the best studies. Hands down."

Absolutely right.

The argument I made convinced some of you. If I had tried to do a *study* to prove that ... well, I couldn't, really. Toyota doesn't vary their price by pennies, and, even if they did, there would certainly not be enough data. And there would be all kinds of other factors that you'd have to worry about. What if rich people were willing to pay more? What if Toyota raises prices on rainy days, and that accounted for the lower traffic? What about advertising campaigns, and recalls?

It just couldn't be done. If we insisted on a study, rather than an argument, we'd never have an answer.

------

On the same EconLib blog, David Henderson took Caplan one step further. Studies aren't just worse than arguments, Henderson said; they're almost useless!

"Economist Jeff Hummel said he couldn't think of even one controversial issue that had been resolved with econometrics. The other 4 economists present, including me, immediately started trying to think of counterexamples. The first one that came to my mind was Milton Friedman's consumption function. Jeff agreed that this had resolved an issue but pointed out that Friedman did it simply with data, not with econometrics. The other examples that the other economists came up with were similar: data had resolved the issue but it didn't require econometrics."

This echoes something I've been saying for a long time, for sabermetrics: complicated studies aren't needed. There are those who defend academic studies in sabermetrics, claiming that they're more rigorous and better evidence than what the "amateur" community has come up with. To them, I have issued a challenge -- show me just *one* academic study, or a study with a complicated methodology, that discovered something that couldn't be found using simpler methods. To date, nobody has replied.

Henderson's choice of words is interesting: the issue was resolved with "data" rather than "econometrics". I assume that's the same as "simple methods" and "complex methods".

If I'm interpreting it right, it goes a long way to explaining why academic journals won't publish studies that don't include regressions. They consider other methods to be just "data"!

------

I think that's a stunning admission, that fancy methods don't resolve issues. This is economics, a serious academic subject. But almost 100 percent of what gets published in academic journals -- even the most prestigious ones -- cannot resolve any issues! On the other hand, a simple argument in a single blog post can be totally convincing. And so can a simple study, one that's not deemed "rigorous" enough for publication.

But, I think it's true.

For you to decide an issue is "resolved", you need to understand it. Complex statistical studies are very, very difficult to understand, even for people who have been reading them for a long time. Some of the studies I've critiqued on this blog are like that ... it's taken me hours to figure out what's really going on, and what the regression really means.

Take something you believed for a long time, or something that seems intuitively obvious. Like, say, whether you'll sell fewer cars if you raise the price by a penny. And someone comes along and says, I did this really complicated study, and I've proved that, on average, two buyers quit over a single penny!

Are you going to change your mind? I bet none of you would. The study might be just plain wrong, and it's too complicated for you to get your head around to tell. In the best case, you might start to have a bit of doubt, and think, well, if a study shows it, maybe there's something to it. But, probably not.

But other people come along -- me, and Bryan Caplan -- and give you our arguments. Now do you change your mind? Some of you have!

Arguments can change minds. Complicated studies can't.

------

And this one particular argument, the one about the Camrys, is pretty simple. Even a child, I think, would understand the logic behind it.

Yet ... intelligent people disagree about it, and strongly. I'm absolutely sure it's right. You might be absolutely sure it's wrong. And we might both be of above-average intelligence, with no political stake in the argument, perfectly capable of understanding fairly complex mathematical principles, and both of us well-versed in analytics and sabermetrics.

But this simple argument, and we can't agree.

If that's the case, how is any complex sabermetric or econometric study going to be convincing?