Table of Contents

Friday, June 20, 2008

Part 6: Accounting for league differences

This is more of a methods post. It's fairly trivial, and not likely to be interesting to 99% of you, but I wanted to post it for the purposes of showing my work. I had to do this in order to produce mlb-wide year to date total value player rankings that will go up in a few minutes.

One of the problems with how I've been calculating runs above replacement is that I haven't accounted for differences in the quality of the leagues. At Tom Tango's blog, they use a difference of ~5 runs per season to represent differences across leagues. Up 'til now, I've just opted to ignore this, mostly because I'm lazy. However, I've noticed that my estimated value for the top NL players this year has been universally better than that for AL players. So, I was concerned that I was overestimating NL player value, at least to some degree, by not accounting for league differences.

I decided that I needed to make an adjustment to my methods to account for this. However, this is easier said than done, because I'm using slightly different methods than what Tango and others have been using. In particular, he starts with linear weights (or WPA/LI) that are standardized vs. average, while I prefer to start with absolute linear weights (some people call these "lwts_rc" because they resemble the output of James' Runs Created). This means that the math is slightly different. What follows is how I'm reconciling the two methods.

At question here is the constant that I use. I've been using 73% for all players, which means that replacement level is 73% of the production of the average big league hitter (MLBAvgR/G above). To match up to Tango's work, a 5.0 r/g hitter in the AL should be worth ~25 RAR over a full season, while 5.0 r/g in the NL should be worth just 20 RAR. This is because the AL is a better league, and therefore it takes a better hitter to hit 5.0 r/g in the AL than the NL.

So, I took what approximates a MLB-average hitter over a full season (700 PA, 0.335 OBP, 5.0 R/G) and solved for the constant needed to result in 25 RAR (the AL value). This constant was 72%, which is very close to the 73% I have been using. However, when I instead solved for the constant needed to get 20 RAR (the NL value), I got 77%. This is quite a bit higher than what I've been using, and indicates that I've been overestimating the value of NL players (including the Reds, unfortunately...and they're not exactly tearing the cover off the ball).

The difference is not huge...we're talking 5 runs per 700 PA, and most hitters never make it to 700 PA. But from this point on (until the NL catches up with the AL, at least), I'm using 72% as the baseline for AL hitters, and 77% as the baseline for NL hitters. The result will be that NL hitters are going to be devalued by up to 5 runs per season. It's not a huge difference in value, but money-wise, it means about $2 million less in estimated salary per season...

Also, I followed a similar procedure to adjust pitcher numbers (though I won't bore you with the math). From this point on, AL pitcher replacement level will be defined as 129% of MLB average for starters, and 108% of MLB average for relievers. Conversely, in the NL, which is an easier place to play, I will define replacement level as 124% of MLB average for starters and 103% of MLB average for relievers. The AL figures match up well with those I've been using, while the NL figures set a higher bar (because it's an easier league).

I'm not arguing that these adjustments are perfect by any means, but my feeling is that they do constitute an improvement over the numbers I have been using.

FWIW, the best NL players are still doing quite a bit better than the best AL players after this adjustment. You'll see that in the next post...