The right replacement level

I’m down here at the SABR Analytics Conference, having a blast and learning some cool new things. There was a really nice presentation yesterday on how batters and pitchers perform based on how deep in the strike zone a pitch travels before the bat makes contact. Based on what I saw, we may have found a way to quantify “sneaky fast.” Nice job.

Meanwhile, there’s a rumor in the air that the Fangraphs and Baseball Reference folks are going to sit down here and hash out a common replacement level for their Wins Above Replacement stats. There are lots of differences between the FG and BRef versions of WAR, but replacement level seems like the easiest one to iron out.

I guess that’s a good thing, though I’ve never been bothered by having different replacement levels. In fact, I think it’s a good thing. I guess I’m in the minority on that, however. So, let’s entertain the question of the day: what should the new consensual replacement level be?

You know, figuring out the proper replacement level would be relatively easy if talent were evenly distributed across all teams. The 26th guy in Arizona would be worth about the same as the 26th guy in Chicago. But baseball doesn’t work that way. For a couple of decades, the 26th guy on the Yankees was much better than the 26th guy on the A’s. So, I think the basic question is, which team should you use to set your replacement level? The best? The worst? The average?

While you ponder that, I want to thank all of you who voted for my article, The most critical at-bat, in the SABR historical analysis and commentary competition. That article got the most votes of all, and I’ve got a nice plaque to show for it. The fun part was that I was sitting right next to Bill James when they made the announcement. There’s some bragging rights there.

Back to the subject. I’m saying this off the top of my head—always a dangerous thing to do—but I believe that Fangraphs’ replacement level is around .280, while Baseball Reference’s is around .320. That can make a difference of perhaps 20 wins over an entire career, and that’s a lot. In a way, it doesn’t matter, because both systems are internally consistent. But, again, this seems to bother some folks.

An argument I’ve heard for a lower replacement level, such as Fangraphs’, is that a higher level creates more players with career totals below replacement level. The feeling is that, if a player managed to have a lengthy career as a regular, even if it was spent with the A’s in the 1950’s, he was probably better than replacement level.

I’m not so sure about that. Remember, WAR is an estimate of value, an imperfect one. There are margins of error in them numbers, particularly for those players who provide the most value with their gloves. You’d expect some significant negative variances even over full careers. Plus, there are players who have managed to survive on reputation more than performance. One of the fun aspects of sabermetrics is trying to determine who those players were. We’ll never be 100% sure who they were, but I think we can say with certainty that they existed.

So, are there going to be some players who racked up decent careers, perhaps enhanced by unsupported reputations, with negative variances in our WAR estimates, who hung on as regulars for really bad teams for a while?

Yes, I’m quite sure there were. How many were there? No clue. The smart guys at Fangraphs and Baseball Reference will try to figure that out. All I can say is, don’t be afraid of those negative WAR totals.

About Dave Studeman

Comments

The real value of replacement level is as a metaphor, not a number. Anything you can do mathematically with replacement level you can do with difference from league average, if you wanted, but the idea that there are cheap, good enough players out there is really important for building a team.

The “problem”, if you will, is that this should be an inherently iterative scientific process as the best experts in the field argue and discuss what should go where and how it should work, but that this is now playing out in the full view of the public, who are not as expert.

I agree that we should not be afraid of negative WAR totals. But as more and more studies show (plug for THT annuals which had them), teams generally know what they got and what they are letting go of, so we should be afraid of a lot of negative WAR totals.

In particularly, I would point out Matt Swartz great studies on prospect trading, as well as free agent signings, which conclude that teams know very well what they got and will keep the good players around (by not trading or not letting go into free agency), whereas those who are traded and in free agency, you have to strongly wonder why their old team let them go, as part of the decision process (but as it is in pro sports, sometimes you need a player in that position, and you can’t tell the fans that there’s nobody out there who can help our team so give up on the season, so you kiss the frog and hope a prince pops up; fans forget that when evaluating GMs).

I have to say that I never knew that replacement level was down so low. Nor that there is a good gap between Fangraphs and Baseball-Reference.com.

For some reason, I thought replacement level was closer to the rubric I’ve heard someone say about a baseball season: For any team, 54 games are losses and 54 games are wins, so it is the last 54 games that decide whether a team is a big winner or a big loser. That would set, in my mind, replacement level at .333, but that is much higher than either Fangraphs and Baseball-Reference.com.

I think I would lean more towards a lower replacement level than higher, as a gut reaction. As the saying about this goes, to be a major leaguer, you have to be pretty good in the first place, so a lower level would spread more wins across players.

I would also think it would be helpful for us, as outside observers, to know the logic behind the .280 and .320 as replacement levels. Why those numbers? What’s the logic? Even if you can’t agree, if we know the logic behind them, we can make our own judgments better on which we prefer.

And as much as I am loathe to mention them because I think that they are biased against the Giants, I think you need to include BP in the conversation on replacement level, as well as Bill James, who I’m sure will have some opinion on this. You don’t necessarily have to have a joint agreement with them, but to at least have an understanding of their thoughts on the topic has to be informative to your and Baseball-Reference’s discussions.

Sorry, never finished my thought about teams and negative WAR players (I’m still a bit jet-lagged, got home 3AM on Weds…).

Given that teams generally seem to know what they got with players, any methodology that values negatively a lot of veteran players who played a lot of years has to at least entertain the idea that perhaps you are missing something.

It is true that some players will be kept on beyond their shelf life, particularly pre-free agency, when owners were like plantation owners, favoring their favorites, but there should not be a lot of these around, they should be mostly outliers, like Lasorda is appearing to be, having a long career as a manager, but a number of analyses of managers have found that he wasn’t all that good a manager, more that he was given a great set of talents that he under utilized.

Also, sudden thought, I think that there should be two versions of WAR, which basically Fangraphs and Baseball-Reference captures. There should be WAR that shows exactly what the player produced, which Baseball-Reference does, and a WAR based on a more neutral, even field, as Fangraphs provide. pWAR and rWAR? (I do think that there are too many variants of acronyms out there :^)

And until WAR can capture that there are pitchers who defy DIPS (as Tom Tippett showed in his study of baseball history in the aftermath of Voros and DIPS) and properly value their contributions to their teams, I don’t think WAR is the end-all and be-all that many fans seem to think.

The problem with WAR is that some sabermatricians confuse a useful simplifying assumption with a description of objective reality—the classical social science error of reification. Differences in farm systems, differences in franchise resources, all mean that “freely available” doesn’t mean the same thing for every team in the league.

Thank you George. Mathematics is just a tool. You should never mistake the tool for reality.

In order to apply mathematics, you should use a statistical model that uses a distribution approximation for each statistic—because each statistic is not the same distribution.

Batting Average is fairly normal, but home runs and RBI’s are heavily skewed inverse square distributions.

For instance, the league mean batting average will have roughly 50% of the players doing better than the league average because the data is normally distributed. But home runs and rbi’s are different. If you apply the same logic to the league average number of home runs, less than 30% of hitters have a number of home runs more than the average because the data is heavily skewed and is not centrally distributed about the mean.

This is why you can’t just use z-scores for each relevant statistic and sum all the z-scores. However, you could attach a score based upon the percentiles of the data (percentage less than or equal to) with a score of zero equal to 50%. You could then set the other numbers based upon the empirical rule (0.3,5,32,50,68, 95, 99.7) or other percentage categories (5,10,20,30,40,50,60,70,80,90,95) of your choosing.

Weights for park factors could also be applied to this model, by multiplying the inverse of the park factor.