The authors went through the Mitchell Report page by page, finding all players where there were allegations of steroid use, and the specific seasons in which the players were supposed to have taken them. Then, they tried to find out if, in those seasons, the accused batters showed any evidence of better hitting, as compared to the large population of all other players not accused.

They adjusted each season for the age of the player. To get that adjustment, here's what they did:

-- calculate the average RC27 for hitters at each age-- subtract that from the mean to get an age adjustment-- correct each player-season RC27 by the amount of the age adjustment.

Their conclusions: the accused players outperformed expectations by somewhere between 6 and 12 percent (and even more when Barry Bonds was included in the sample). When the player was compared to his own career only, as opposed to all other players in MLB, the effect was smaller – 3 to 8 percent.

A couple of months ago, J.C. Bradbury criticized the study on the grounds that the authors had not properly separated out the steroid seasons from the non-steroids seasons. His criticism is here, and is followed by SSK's rebuttal, and J.C.'s response to the rebuttal.

My criticisms of the study are a little different. I have two main objections.

First, comparing the accused players to all the others is meaningless. In general, the players accused of juicing tend to be better than average. There are various reasons this could be the case. It could be that power hitters gain the most benefit from steroids, and are therefore most likely to be users. Or it could be that power hitters are more likely to be accused of using, even if they're innocent of the charges.

In any case, suppose that you find that David Segui hit better than, say, Jose Vizcaino, in the years when Segui was said to be on the juice. Why does that qualify as evidence of anything?

The other regressions, the ones where the players are compared only to their own career trajectories, are better – but that brings up my second objection, that the aging adjustments are flawed.

SSK created their aging adjustments by simply observing MLB-wide performance levels at each age. As Bill James pointed out back in the 1982 Abstract, that doesn't work – it severely underestimates the effects of aging because it ignores players who decline so much that they drop out of the league.

Suppose that, after the age of 30, players lose one "unit" of productivity per year. And suppose that once you go below 3, you're out of the league. And suppose there are five 30-year-olds in the league, with productivities of 8, 6, 4, 2, and 0, respectively.

The first year, they perform at 8, 6, 4, 2, for an average of 5.The second year, the last guy is released. The other three players are at 7, 5, 3, for an average of 5.The third year, the top three guys are 6, 4, 2, for an average of 4.The fourth year, they're at 5 and 3, for an average of 4.The fifth year, they're at 4 and 2, for an average of 3.The sixth year, the remaining player is at 3.The seventh and last year, he's at 2.

If you look at these numbers, the average decline is half a unit per season (it fluctuates between a decline of 1 and a decline of 0). But the real decline is 1 unit per year. By ignoring the retired players, you wind up thinking the effects of aging are much smaller than they actually are.

What does this mean for the SSK study? It means that the authors would be too conservative in projecting the effects of steroids. If juiced player X went from 5 units one year to 5.5 units the next, SSK would figure they're 1 unit above where they should be (0.5 unit gain, plus 0.5 units of staying put against the aging current). But, really, X should be pegged at 1.5 units (because the current is really 1 unit, not 0.5 units).

This bias means the results in the paper are probably underestimated. The accused players actually did even better than the authors think they did.

And there's another bias.

If I undersatnd the paper correctly, the authors applied the league values to individual players arithmetically. That is, if the average hitter declined 0.3 runs (per 27 outs) between age 32 and 33, that figure is used to adjust all players. But that number should be higher for better players and lower for worse players, shouldn't it? If the average player drops (say) 0.3 runs, and Barry Bonds is (say) twice as good, shouldn't his expected drop be 0.6? Shouldn't the decline be as a percentage of performance, rather than a fixed number? In fact, since RC27 has increasing returns (instead of being linear), shouldn't Bonds drop even more than twice as much what the average player drops? Maybe he drops 0.7, or 0.8, or even more.

So if you expect Barry Bonds to drop 0.3, but he really should be dropping 0.7, you're again underestimating the benefit he gets from steroids. Of course, this might only apply to older players, but my impression is that the accused individuals were mostly in the declining phase of their careers. So the study might again underestimate the outperformance of the players accused in the Mitchell Report.

But there's also a third bias, and this one goes the other way – it might lead to overestimates of any outperformance.

The authors assumed that aging curves are the same for all hitters. But, as I think Bill James pointed out a long time ago, power hitters tend to stay active longer, as power and walks are skills that tend to increase well into a player's 30s. Those hitters are less affected by aging than the average player.

So even though the average drop in MLB may be 0.3 runs, that figure might be a combination of 0.1 runs for the power hitters, and 0.5 runs for everyone else. In that case, if the batters mentioned in the Mitchell Report tend to be power hitters – and I think they do – applying league-wide aging patterns will tend to overestimate what their decline "should have" been, and thus exaggerate the discrepancy of their real-life age-adjusted performance. This would provide evidence for the hypothesis that the players are users – but it would be false evidence.

That's the third of the three biases. Here they are again, in summary:

Because of these biases, I'd argue that when the regressions find statistically significant coefficients, that does not indicate good evidence of steroid use. It does, perhaps, indicate that the accused players are different from the general population in some way. But that way could be:

The aging adjustments are just too biased, and too rough, to isolate any measure of steroid accusations.

Of course, we don't have a perfect method of making aging adjustments. We don’t even have an excellent one, or a very good one. That means that fixing the study would be a lot of work – you'd have to come up with a model of how players age, and show that it applies, without bias, to various types of players, including those types of players who happen to be over-represented in the Mitchell Report.

That's not likely to happen. Is there any way to get a study like this to work?

I think there might be. For every accused player in the Mitchell report, use the Bill James "paired players" method and find the most similar player not accused, where "similar" includes age, position, era, and recent performance. Then, compare the two players in the alleged steroid year, and see if the accused player outperformed the innocent one.

If you do that, you can ignore the issue of aging completely, since the players are the same age and the same profile. And even if some of your assumptions aren't completely accurate, there probably isn't any reason for them to be biased against the Mitchell players in particular, but not for the players almost exactly similar. So your results are more likely to be meaningful.

Of course, then you're not using regression, and it might be harder to get confidence intervals and such. But, I think, you'd be more likely to get closer to the real answer.