Rocket Science: Clemens and ‘Roids

Another brilliant comment from economist Steve Walters of Loyola College:

I plead guilty to being uncharitably suspicious.Watching Roger Clemens go off on a young sportswriter after his last minor-league tune-up in Scranton, I wondered: “’roid rage?”

The tantrum was, at one level, merely classic Egotistical A-hole Athlete.Clemens escalated an angry denunciation of the writer into a larger rant against negativity by the press in general, playing the usual jock card about how sportswriters have never been “in the arena” and so are unqualified to ponder weighty, complex topics like… pitching.Translation:I’m way better than you; shuddup.

But the tone and direction of the tirade were so utterly inappropriate, so disconnected to the question asked, that the writer would later describe it as “downright scary.”It was—and not with any reasonable basis that I could see.And so my ungenerous imagination took over.

Which is why MLB must do far more to assure us that the guys who are performing such miraculous feats these days—and at such advanced ages—are doing so legitimately.The current testing program is a joke, and until it’s toughened up we’ll continue to have our suspicions, fairly or not.And since the program is in its infancy, there’s even greater suspicion about the legitimacy of records set a few years ago.

Fair took a cut at the question Bill James first asked back in the ‘80s and Jim Albert pursued in ’02 (discussed here on April 21 in “It Ain’t Necessarily So”):When do ballplayers reach their peak?He used some sophisticated statistical tools to get the basic answers (28 for hitters, 27 for pitchers) for his sample of 441 hitters and 144 pitchers who played at least 10 full seasons between 1921 and 2004.

Then he asked this follow-up:Which players have exhibited the most unusual age-performance profiles?Specifically, are there any players who got better with age?

Over the entire period between 1921-’04, Fair found only 18 hitters who appear to have defied Mother Nature, logging four or more seasons after the age of 28 in which their OPS (on-base plus slugging average) exceeded their age-specific expected level by more than one standard error.Here’s the list, ranked by the size of the largest “over-performance residual” (with the year and player’s age at the time of his greatest “outlier season” in parentheses):

1.Barry Bonds (2004, 40)

2.Sammy Sosa (2001, 33)

3.Luis Gonzalez (2001, 34)

4.Mark McGwire (1998, 35)

5.Ken Caminiti (1996, 33)

6.Albert Belle (1994, 28)

7.Larry Walker (1999, 33)

8.Dwight Evans (1987, 36)

9.Gary Gaetti (1998, 40)

10.Rafael Palmeiro (1999, 35)

11.Andres Galarraga (1998, 37)

12.Chili Davis (1994, 34)

13.Julio Franco (2004, 46)

14.Paul Molitor (1987, 31)

15.Bob Boone (1988, 41)

16.Steve Finley (2000, 35)

17.B.J. Surhoff (1995, 31)

18.Charlie Gehringer (1939, 36).

Notice anything?According to Fair’s analysis, Mother Nature apparently lost her grip starting in the ‘90s.Only three of the guys on this list (Boone, Evans, and Gehringer) played mostly before the ‘90s, and Fair observes that their performance residuals really don’t show the same pattern as the rest of the players on the list; in other words, they’re more likely to be just late bloomers.

But the rest might be, if not suspects, “persons of interest” in an investigation of the effect of performance-enhancing drugs on baseball since the early ‘90s.One of the players on the list has tested positive for ‘roids; another (according to illegally-leaked grand jury testimony) admits to having briefly used them, though unwittingly.Fair concludes that “since there is no direct information about drug use in the data…, [these findings] can only be interpreted as showing patterns for some players that are consistent with such use, not confirming such use.”

What about Clemens?Since the pitchers’ sample was much smaller, Fair didn’t examine it for unusual aging patterns.Given the greater variance in pitching performance, he probably thought no pitchers would satisfy his criteria for “suspect residuals.”

But I punched the appropriate parameters for Clemens into a spreadsheet nevertheless, and found that the Rocket is just a single year of dramatic over-performance shy of crossing the “Fair threshold.”By my calculations, Clemens’s performance during his presumed “decline phase” exceeded the Fair model’s age-specific prediction by more than one standard error in 1997 (at age 34), and in both the ‘05 and ’06 seasons (when he was 42-43).If he puts up an ERA below 3.15 in what’s left of ’07, he’ll be in the company of statistical outliers like Bonds, Sosa, Caminiti, and Palmeiro.

Of course, this is just statistical doodling.One of the attractions of sport is the possibility that we’ll see something that the laws of probability say is extremely unlikely.Outliers make the games fun.And extreme outliers are rarities, not impossibilities.Their occurrence need not be considered evidence of cheating; they might simply be a manifestation of excellence.

So let’s all fervently hope that the historic accomplishments we’re witnessing in big-league ballparks these days are perfectly legit—that these guys are just freaks of nature who, coincidentally, started getting freakish at just about the same time the market for steroids got out of control.

If so, however, we’d still have to face at least one fact:Roger Clemens frequently acts like a jerk for no good reason.

47 Responses to "Rocket Science: Clemens and ‘Roids"

I’ve been curious for awhile about why Clemens has gotten such a pass on speculation about possible steroid users. I’ve been suspicious of Clemens for the past several seasons, but of course there’s no evidence. This doesn’t constitute evidence, but it’s at least more than a little interesting.

Fair’s analysis is compelling, but am I reading it right that its looking for a pattern of better-than-age-appropriate play over multiple seasons? Isn’t it fair (no pun) to say that today’s player is better conditioned than players in the past, which should logically alter the aging/performance trade-off?

I don’t know if Fair looked at it this way, but what about looking at the data within “vintages” which would help control for changes in conditioning over time.

Of course, if a large percentage of today’s players are using steriods, then what might look like a conditioning effect could really be a steriods effect… probably no easy answers.

The impact of age on performance is is an important and interesting topic, and it would be great to have someone of Fair’s talent provide a good analysis of it. However, he has failed to account for two factors — one that he acknowledges, the other not — that explain most of the post-peak over-performers.

First, there are a few extreme ballparks that have such a large impact that you can’t ignore it. One is CO, and another is AZ. Four of your 15 “suspects” posted their suspicious seasons in these parks. Fair notes the possibility of a park impact, but with the easy availability of park adjustments there’s really no reason to ignore this.

More importantly — and unfortunately, this really destroys the whole paper — Fair has not adjusted for scoring levels by year. There was a huge increase in scoring that began in 1993. Any player who played both before and after 1993 is therefore likely to have a tendency to over-perform in the later part of his career, simply because his later years came in a high-OPS period. Not coincidentally, nearly all of the “suspects” fall into this category.

Similarly, Fair’s ranking of pitchers is dominated by pitchers with the good fortune to have pitched in low-scoring periods, while pitchers like Clemens and Randy Johnson should be ranked higher.

Someone should suggest to Fair that he rerun his analysis using Baseball-Reference’s OPS+ and ERA+ metrics. These conveniently adjust for run scoring context and park. Then, we might learn something.

(Fair also treats a percentage change in all sports metrics as equivalent. So, because pitchers decline 9.5% in ERA at age 37, but hitters decline 5.6% in OPS, he concludes pitchers decline more quickly. However, a 5.6% drop in OPS is atually equivalent to about a 10% drop in runs scored, so the declines are comparable. He really should convert everything to SDs.)

Guy,
I would be inclined to take your comments more seriously if you could eliminate phrases like “this really destroys the whole paper.” How about “my observations raise a few questions” or “my observations cast a bit of doubt on the conclusions.”

That’s a fair point, Dave: “destroys the whole paper” was harsher than necessary.

On the other hand, the point of the paper is to try to measure the impact of aging with some precision (after all, lots of people have looked at aging and come up with similar overall results). And changes in run scoring levels across time are enormous, given the age-related changes Fair is trying to measure. For example, from 1992 to 1994 ERAs in the N.L. increased by 20.6%, more than twice the rise Fair says a pitcher will experience from peak to age 37. So you simply can’t produce good aging estimates without taking league context into account.

Also, Fair chooses to identify 18 “over-performers” and point out the high incidence of such performances in the 1990s. He knows this could be considered relevant to the question of whether some players used PEDs (that’s why he includes Palmeiro). So, I think he has an obligation to be careful here, and to consider alternative explanations for why players since 1990 might have a different aging profile.

In any case, there’s a good lesson here for economists writing about sports: run your papers by some fans of the sport in question. They will spot types of errors that others will not.

I’ve got to say when I first saw this paper my first though was has Fair adjusted for booming offense in the 1990s (and other eras). I haven’t read the paper yet but I assume from the comments that he hasn’t.

I agree with Guy that the conclusions are suspect without unless we see the league adjusted totals. As I remember the Albert paper had the same issue.

Anyway, I’d be keen to hear Dave’s thoughts on the aging issue for sure …

Guy,
My comment was really a general statement, rather than something specific about Fair.

So often non-economists commenting on research think they have discovered something that only a “true fan” of the sport could have seen. More often than not, though, the non-economist simply failed to understand something. Although this might not always be the case, often it is. Hence, I would urge caution in your statements. You might see something that Fair missed. Or you might be missing something yourself. Until you know, words like “destroy” should be not used. In fact, even if you did know, words like “destroy” are probably not too useful and should be avoided.

Given Guy’s earlier reply I think he realizes that his use of “destroy” was a little strong. Point taken. Let’s get to the point in hand.

I think the bigger question is can Fair’s conclusions still hold without his adjusting for league improvement over time?

On pg 25-27 Fair even acknowledges changes over time. Indeed he even finds the adjustments are different. The most commonly accepted view of the offensive explosing in the 1990s is some combination of a juiced ball, parks and, possibly, steroids. Without controlling for the first two it is impossible to judge the third.

In your capacity as a professional economist/statistician I was wondering to what extent you think that devalues the Fair work.

For me, Fair’s conclusions are interesting but based on the structure of the study far from conclusive.

“So often non-economists commenting on research think they have discovered something that only a “true fan” of the sport could have seen. More often than not, though, the non-economist simply failed to understand something.”

I’m not sure what you mean by “true fan” — I’d certainly agree that devotion does not necessarily equal insight. But knowledge of a sport and its history, as well as of prior statistical analysis conducted by non-economists, can be extremely helpful.

What matters, I would hope we could all agree, is not people’s credentials , but rather the merits of their arguments. In this case, Fair has made mistakes that would be obvious to any good sabermetrician. If he had known about metrics such as ERA+ and OPS+ (for example), he could have done far more valuable research.

More generally, if you’re saying that at least half of all blog comments critical of economists’ work are wrong, I suppose that’s probably true (who knows). But if you’re saying that the criticisms of sports economics work that comes from non-academics knowledgable about satistical analysis in that sport are wrong more often than not, I think you’d have a hard time substantiating that.

Re: Guy’s criticism of Fair’s failure to adjust for (a) park effects and (b) changes in run-scoring environment over time.

These are legitimate criticisms of Fair’s historical “Ranking of Players” in Section 8, Guy–which is why I didn’t even bring that part of the paper up. But they’re not a big deal with regard to Fair’s main task, which is constructing a typical “aging profile” for hitters and pitchers, and then identifying some extreme outliers from that profile (the elements of the paper which I found most interesting and useful).

Effectively, Fair did adjust for historical context in constructing his aging profile. He did this by using a dummy variable for each player in his sample (estimating what econometricians call a “fixed effects” model). It was the estimated player-specific coefficient (“CNST” for “constant term” in his tables) that was ultimately used for his rankings. It’s quite correct to note that this approach involves some error: e.g., hitters from a scoring-rich era (or who just spent their whole careers in good hitters’ parks) will have higher constant terms and move up the rankings. Fair does note this deficiency on pp. 28-9. (And I doubt he even knew he could’ve avoided the problem with better data, like OPS+ or ERA+.)

But this deficiency isn’t relevant to the question of aging and outliers, unless one believes that over time the shape of the aging profile has dramatically changed (thanks, perhaps, to better training methods). Fair takes on that question in Section 7 (“Possible Changes Over Time”), and concludes that “there is some evidence of slightly smaller decline rates in the 2nd half of the 84-year period.” It’s noteworthy, however, that the statistical test for a smaller decline rate for hitters was sensitive to the inclusion of Bonds and McGwire in the sample!

Anyway, one might argue that if the decline phase of the typical aging profile is less steep these days, then some of the outliers in Fair’s group of 18 might be “unfairly accused,” to pick up on TDDG’s caution. Point taken–but the residual sizes for these guys tend to be so large that they’d likely remain outliers even if “slightly smaller decline rates” were built into that test.

Bottom line: I’d take Fair’s rankings like a grain of salt, but the aging profile is pretty good work. One thing he did that’s pretty unique is allow for differing rates of improvement and decline over time. As to whether the extreme outliers are strong or weak evidence of anything, that’s something up for debate. My view is that statistical methods like this (and people can differ about how to define an outlier) are merely a filter, and that once you’ve identified some “persons of interest,” you can use other means to raise or lower your suspicion level (as, e.g., Guy does by pointing to specific ballparks or other environmental considerations).

Remember a little while ago I said that I was asked to peer review a paper? This was one of them.

Unfortunately, some of my comments were not applied, based on comments by Guy. The run environment is a critical component.

I’ll have a review of this paper later this week.

I guess I have a followup to our talk a while ago regarding peer review: once I submit a comment, shouldn’t the researcher then be obliged to address the comment? Otherwise, how do I know whether it was dismissed, or considered and rejected on merit?

(There’s nothing that prevents the economist from being a “fan of the sport”.)

I do think that while interesting and suggestive, Fair’s work suffers from not accounting for the overall increase in offense that began in 1993. That correlates well with expansion bringing in the Marlins and the Rockies–the latter providing an extreme hitter’s park. There was another bump in offense in 1998 when the D-backs and Tampa Bay jointed MLB.

I would imagine, though this is speculation, that were steroids entirely responsible, the change wouldn’t appear as punctuated as it seems more reasonable that use would be an issue of more gradual adoption. That is unless there’s some technological change occurring that made steroid use easier and/or more readily available. My quick glance at medical literature could not locate this, but it is possible that one exists still.

It’s interesting that the ERA of the best pitcher in each season did not show the same pattern, suggesting that dilution at the bottom may indeed be responsible for much of this. The best pitchers are still just as good and seem to be every bit as able to get out major league hitters. But the bottom quartile is not as good as it was 2 decades ago resulting in an overall increase in offense. Selectively beating up on the worst in the game would likely raise doubles and home runs both across the board and for the league leaders.

I am not doubting that steroids have had an impact on performance (not that it bothers me to tell truth–it’s their health that’s at risk, not mine). However, there are many variables at work here and I’d like to see a better attempt to control for those variables.

Does Fair’s paper adjust for the changes in the cross-sectional variability of player statistics? I have no specific reason to think Fair doesn’t adjust for it or that it even necessarily impacts the results.

But how is a one standard error outlier defined? If the dispersion of player statistics in a given year is much greater than normal, then are too many players being picked up as one standard error outliers based on this age projection model? Because the dispersion in player performance, if symmetrical, wouldn’t say anything about steroid usage because there are more outliers reflecting bad performances.

Tango: Your comments must have had some effect, since Fair put this out as a working paper in ’05, and it’s still an unpublished working paper as of ’07. If you said “this paper should not be published unless X, Y, and Z are done,” it appears your evaluation carried the day. Whether Fair will bother to improve the product in order to get it into a journal, who knows.

Again (per my earlier comment on Guy’s points), it’s the rankings in this paper that are problematic. But the fixed-effects approach is a good way of controlling for each player’s context in constructing the “aging curve.” There’s still some measurement error as players move to different teams/parks, but that’ll wash out via the regression methodology (i.e., becomes part of the error term).

So the paper has its uses–but it’s definitely not likely to get into a reputable journal until Fair either fixes those player rankings or jettisons that part of the paper.

Steve:
I don’t see how the fixed-effects method handles the problem with his outlier analysis. Doesn’t each hitter get a single constant value? If so, a hitter who played both before and after 1993 will get a value that splits the difference between the two eras. The model will still expect him to perform better than he really should when young, and worse than he really should when he’s older.

It’s really no different than a park adjustment. It’s like ALL of these hitters moved to Coors starting in 1994. That’s why nearly all of your suspects come from this time period.

On a more technical note, it also seems to me that using the model’s standard error (.075) to identify the outliers is problematic. Won’t this tend to identify the highest-OPS hitters as overperformers, since their true random fluctuations will tend to be larger?

I also wonder if you agree with Fair that variations in all athletic performances can be measured as simple percentages? Is it necessarily true that a 4% drop in high jump performance (3 inches) = a 4% increase in 100M dash times (half a second) = a 4% increase in ERA (.16 runs)? The ERA increase is barely measurable, while the 100M change removes you from elite competition. Seems like an odd approach to me…..

We might be obsessing too much about the recent run-scoring environment, Guy. On close inspection, there really IS NO TREND over Fair’s chosen period.

If you run this simple regression:

Runs/Game = a + b*YEAR,

for the AL over 1921-2007, your estimated b = -0.00236, with a t-stat of -1.17. So b is zero; no overall time trend.

What there is is a lot of year-to-year variation, and scattered bunches of years with above- or below-average scoring (within which there’s still some variation).

In this context, you’re looking at hundreds of players whose careers spread out over his span, with different start and end points and different ages corresponding to these years. The stuff that affects the run environment is captured by the error term in Fair’s regression(s); it comes out in the wash. To argue that all the outliers are in the ’90s because it was a run-rich era is to ignore that the ’30s were, too–but only Gehringer popped onto the radar screen from that era (and his residuals were barely across the threshold). IOW, the outliers are not a result of some sort of variance-correlated-with-mean problem.

As to the rest, I’ll have to think about it–though some of this is ranging rather far afield from the question of whether we can look at “extreme outlier performances” (however ID’d) as some sort of evidence about better living through chemistry–or not.

There is no “linear” time component. You need a dummy variable for 1994-2007, and you need another one (the other way) for 1963-1968, etc, etc. It’s like your heart rate. It’s up-and-down, but part of Bonds’ career played when you were running, and the other part when you are sleeping. A linear regression will show him playing in an era where you are walking on average.

That’s why you need a RPG (runs per game) parameter.

***

I see that the paper is labelled Mar, 2007. That may have been the paper I reviewed. I’ll have to double-check my notes.

Steve: you’re making this too complicated. The question is, are there abrupt discontinuties in scoring, such that a player could play the first portion of his career in a substantially different run environment than the later part of his career? The late 80s and 90s are indeed such a time. OPS from 1994-1997 is about .050 higher than from 1990-1993. That difference alone is about 2/3 of the error that qualifies a hitter as an over-performer in Fair’s analysis.

A constant value for each hitter can’t adjust for this. Chili Davis gets a .450 constant, applied both before and after 1994. Pre-1994, Davis meets or exceeds the model in just 2 of 12 years; post-1994 it’s 5-for-5. Julio Franco is 2-of-10 before 1994, but 5-for-6 thereafter. And so on.

The 1930s were NOT a similar period — OPS is not generally much higher than in the 1920s (1929 and 1930 are outliers, but otherwise R/G seems pretty steady). The dropoff in offense from 1962 to 1963, however, DOES look substantial. I suspect if you tried to identify hitters who underperformed after age 27, using Fair’s model, a huge proportion would have played both before and after 1963.

Again, you may be right that the general aging curve will be close to correct. But you can’t use it to identify late career surges without correcting for changes in scoring. (And I think your standard of overperformance also needs to be sensitive to historical era and player performance level.)

The choice of “10 years” has its own selective sampling issues, as only certain quality of players will be in that pool. That pool of players is hardly representative of the population of MLB players. See this mini-study for more information:http://www.tangotiger.net/archives/artAging.shtml#1013

The effect is for the curve to be flatter than it should be.

You acknowledge injury as a possible outcome, but ignore it afterwards. It’s a big deal, especially for pitchers.

A 9% increase in ERA from age 26 to 37 (3.50 to 3.81) is pretty much impossible to believe. That is extremely flat. The sampling issues noted above need to be addressed.

You talk about the 90s (steroid-era), but you also acknowledge that you did not adjust for the change in run environment. The 1993-2006 time period is a huge offensive era. Without adjusting for the context, it’s going to look like hitters are going better in those years. As well, the new parks are more offense-friendly these days.

While the overall model by age won’t change if you adjust for parks and year, you can’t then ignore these issues when looking at 1990s players in particular. They need to be adjusted, if you are going to focus on them.

While you choose 10+ year periods for each pitcher, you don’t have the same pitchers in each age group. You can have a 10yr period from ages 23-32 or 25-34, or a 15yr period from 22-36, etc. As noted in one of my linked articles, you need to pair the age groups to have the same pitchers and the same weights in each age group. So, select the same pitchers at age 23 and 24, and weight them equally. Then, select the same pitchers at age 24 and 25, etc. The pitchers of 23/24 do not need to be the same as 24/25.

And the comparisons of ERA to chess is not appropriate. Losing 15% in ERA would be the equivalent of losing 7.5% in OBP. It’s just not a 1:1 comparison.

OPS is not a good measure to use for ranking players, in addition to the other problems I noted earlier. Linear Weights would have been a better choice.

ERA also has its own problems (being fielder-dependent as well as the sequence-of-events dependent), and a “component ERA” similar to Linear Weights, would have been more appropriate.
======================

I have now read the paper in full and here are some comments. In line with what Guy and Tango have said I think this paper has some question marks (apologies if I repeat some of what Tango/ Guy/ Steve have said).

Okay — I’ll start at the top.

First, in section 1 Fair states that there have been no rigorous studies estimating age effect in baseball! Where has this guy been? I can name a half dozen at least. Albert, James, Tango has done a few, Woolner. For me this is an egregious omission. How can you try to drive thinking forward on an issue when you haven’t read the prevailing literature.

I must say, I quite like Fair’s approach fitting two equations to the data. That is interesting and isn’t an approach that has been done before (I don’t think). However, some of Tango’s work (eg, where he compares year by tear with equal PA weights is at least as good, if not more accurate).

Fair assumes that all players have the same coefficients. Probably an assumption you have to make in a regression but Albert found that players can have wildly different shapes of aging curve. It would have been good to see a reference to this at least. That players have the same improve/decline rates is open to a lot of questions — there is a segment of the baseball population where this will not apply.

Tango pointed out the 10-year cut-off creating selection bias and I think that is right. IIRC the Albert study suffered from the same problem. What Fair is doing is estimating aging effects for the elite of the elite — not your average MLB player.

Model 2 for ERA gave a peak age of 24 vs 26.5 for model 1. The only difference is that the selection criteria is 8 years and not 10. This has the finger prints of selection all over it. I don’t see anywhere in the paper where Fair talks about this result (the difference between 26.5 and 24) is massive in baseball terms.

Of the 18 players that Fair picks out as “potential steroid users” it is uncanny that in the first half of their career all players underpeform their expected line (negative residuals) while in the latter year they outperform. This further supports the change in offense argument. Why would Bonds be below the line in the first half of his career? Pretty much all the hitters show the same profile. In fact Bob Boone (who wasn’t around in the 1990s) is the only one to buck this trend.

Also using OPS for this will inflate the effect because factors such as park and say the juiced ball (if that is true) will more likely effect SLG than OBP. This is one of the reasons why he finds a time effect for OPS and not OBP (pg. 26)

To then go on and speculate that such prowess is linked to drugs is wrong. Control for changing offense levels and then we can have the drugs conversation. As I (and others) noted earlier change in offense and park can both explain these results — the author adjusts for neither.

Also the fact that Fair finds that hurlers decline more rapidly could also be because of changing levels of offense and parks.

I don’t think we can put a huge amount of weight on the decline% for various athletic events because Fair has put in place an arbitrary 35 year cut-off on the age for all sports (bar baseball). The comparisons are just not like for like (and suffers the problem that Tango pointed out above).

Also some of the comparisons don’t make sense. What’s the point of comparing baseball to chess?

I’m not convinced by the argument that players can influence their decline phase. Okay, maybe a bit depending on whether they work out and keep in shape and they might be able to play a little less aggressively but I’d imagine that the performance dip, if it exists at all, will be relatively small — although I could be wrong.

An interesting read for sure, which uses a potentially interesting methodology. Unfortunately the execution is very flawed.

Thanks for the reviews and comments, John, Tango, and Guy. Many good points raised.

On the main criticism (that the failure to “environment-adjust”) makes the parameters of the aging profile suspect, I guess one thing Fair could have done is show us a residual plot.

On the idea that the inflated run environment post-’93 is (unlike that in the ’30s) biasing the identification of “persons of interest,” I guess I’d just remind that Fair’s approach is to design a filter. That narrows things down; then you’re perfectly entitled to use judgment and more refined analysis to assess whether a particular outlier is due to, e.g., a park change, a different run environment, or whatever.

Better filters are also more costly to construct. What’s more, it’s possible to quibble even with the context-adjusted data that could have been dug out and used (e.g., does using a generic park factor really properly adjust for the advantge a left-handed pull hitter had in the old Yankee Stadium?). It’s also true that this filter had some arbitrary “thresholds” in it–e.g., why 4 years of unusual performance? Why not 3 or 5? Why 1 standard error? Why not 1.5?

So, yeah, Fair’s execution was flawed. But, as John noted, there were some ingenious approaches in the paper. Apart from the steroid-outlier discussion I used it for, I suspect some intrepid sabermetricians could, e.g., adopt the basic model but do the heavy lifting of getting refined, context-adjusted data to do some interesting new rankings using the fixed-effects constants, a la Fair.

I think we can at least agree that the framework is good, and that the implementation needs some work.

Steve is dead right about park factors, as I’ve mentioned many times on my blog. I’ve got a summary here: http://www.tangotiger.net/parks.html
In short, it’s impossible that Coors can possible affect Juan Pierre, Larry Walker, and Dante Bichette the same way. This is one of those things that baseball nuts would readily know, but that a non-baseball fan would simply put in a single “park” parameter, if they would even have considered the park at all.

My general concern is when papers like this one, or the NBA-ref-race, is that the authors extrapolate beyond what the study suggests. There are unaccounted for variables (which is fine), but then conclusions are made as if there were no other unaccounted for variables, except the one the author wants to discuss.

When it comes to “peer review”, the “peer” is not peer to the researcher, but peer to the topic. In order to review the Fair paper, you need to have peers in both baseball and academia. If you can’t find a few in both, find several in each.

Steve: Thanks for your thoughtful responses to these comments. I don’t think we’re really that far apart. We’ll just have to agree to disagree on the idea that Fair has just “designed a filter” and we can then decide what it tells us. I think the filter should tell us which hitters have performed best after age 27, controlling for their core talent, and it’s pretty clear that this filter doesn’t actually do that.

Tango’s point that the pitcher curve is too flat (9% decline at age 37) is important, and raises questions about the utility of the age curve. Expanding sample to 6+ years would help a lot. And as a check on the model, it would be nice for Fair to show us average ERA at each age, as percentage +/- the player’s career ERA. I just took a quick look at the top 10 ERA pitchers in Fair’s sample, and of the 7 still pitching at age 37, their ERA had increased 29% on average.

The durability issue is also hugely important, especially for pitchers. The top 10 ERA pitchers averaged 270 IP at age 27, and 115 IP at age 37 (3 of the 10 were out of baseball). Just using rate stats doesn’t capture the players’ real contributions.

Jason: It seems like expansion could explain the post-1993 offensive explosion, but that isn’t the case. If you look at pitchers who pitched both before and after 1993, you will find that their ERAs generally increased as much as the league overall. So we know for sure it was not the infusion of new pitchers, or expanded innings for weaker pitchers, that caused the change.

Also remember:
* Expansion adds weak hitters as well as weak hitters, so the impact is offset (and replacement hitters and pitchers are both about 20% worse than an average player);
* The # of IP thrown by new pitchers just wasn’t enough to produce a big shift — even if every new pitcher had an ERA of 6.00 (not true), it would only raise total ERA by 0.10.
* There was no significant increase in scoring in 1998.

Though it focuses on the wrong stats (like W-L record) it reaches a similar conclusion to this post–i.e., that the second half of Rocket’s career has enough statistical outliers to raise questions about how he did it.

“Which is why MLB must do far more to assure us that the guys who are performing such miraculous feats these days—and at such advanced ages—are doing so legitimately.”

Actually it’s your responsibility to separate steroid information “truth” wheat from the chaff (I’ll give you two clues: 1) there has only been one scientific peer-reviewed and accepted test performed on steroids relative to healthy males 25 years old and older. 2) check HBO “Real Sports” for late June or early July 2005 and watch the segment on steroids)….

In your list of batters, where did Ted Williams from age 35 forward stack up?

But wait. We have the expected (mean) peak performance age (27/28), and an expected rate of decline. But what’s the variation in that? What’s the s.d. of peak performance age? Looking at Fair’s paper, I had immense difficulty extracting information on the variation in peak age or in performance decline.

Fair has data on 441 hitters. Let’s suppose the s.d. of peak performance age is 2.5 years. Then about 16% of hitters (about 70) would have peak performances at about ate 30.5; about 5% (20) would peak after age 33, and about 1% (4) would peak agter age 35/36. So we have some late peaks. Isn’t that what a normal distribution would tell us? Why are we surprised that some players have late peaks? Why do we immediately attribute that to performance enhancing drugs?

Can someone run Hank Aaron’s numbers through the spreadsheet to see how close he was to appearing on this list? Putting up 5 straight .900+ OPS seasons from age 35-39, two over .950 and another 2 over 1.000 (the two highest of his career, including his peak HR total at age 37), even given his established level of performance, almost certainly have to have him close to making this list. And yes I know that moving from Milwaukee to Atlanta helped his numbers, but we’ve already established the analysis doesn’t take into account park effects.

And I have a lot of anecdotal evidence that people who are not on steroids are “stronger word than jerks” for no good reason. There have certainly been times when I myself have acted like a jerk for no good reason, and I don’t take steroids. It’s just that I’m not important enough for it to happen on camera.

Honus Wagner was a late bloomer, having his greatest season in 1908 at age 34, but he was one of the first athletes to lift weights in the off-season. Similarly, Babe Ruth had remarkable seasons after his hedonism wrecked his 1925 season, but that was because he hired a personal trainer and started working out all winter. So, in an era when most players had to take jobs in the winter, the handful of stars who could afford to work out all winter, and did, could wrack up statistics in their 30s that would look like steroid use to us today.

Steve Sailer sometimes makes some good points (e.g. Gladwell’s status as a guru is very undeserved), but Sailer once criticized David Romer’s 4th down strategy paper as an example of an economist not understanding football and writing about it. Bill Belichick disagrees with Sailer and liked the paper.