Wednesday, June 03, 2009

Defensive Replacement Level Defined

Okay, this is a “look what I found” post, so I’m not drawing conclusions, but looking at data and saying, “What does this mean?” in hopes that the peer-review world of sabermetrics can take this type of information and run with it. One of keys to more advancement is open-source, which enables others, like Joe Arthur, Sean Smith, Colin Wyers, who are better at database work than I am, to see where they can take the information.

At any rate, a few years ago, when I was doing the 20-years worth of analysis (thanks again SG!!), Sean and Mike and I were discussing the limits of what a team would put on the field. I said at the time that could be valuable information, but I never really did anything about it. Then the other night in a discussion with Colin, I popped in the spreadsheet and did a few pivot tables, and something jumped out at me - defensive replacement level is similar to offensive replacement level. Yes, that is the conclusion, and it came first in this piece, so you don’t have to read all the drivel.

The old discussion essentially said “Is there a floor where a player isn’t allowed to play defense any longer, and where is that floor?” Now, I will point out there is a very small bias because if you don’t catch anything, you never make it to any reasonable number of innings played. Some player that were so awful in very few inning could have lowered the numbers, but the fact that they weren’t allowed to play a minimal number of innings supports the argument that MLB teams won’t allow a player that fields below the named level to play there for an extended time.

So what I did was take the 1987-2008 Defensive Runs Saved (DRS) database and look at the 20-year Zone Rating (ZR) data by position. It included several looks. Do I draw the cutoff at 500 IP (innings played)? Is it 1000 IP? Somewhere in between? For this argument, I am using 500 IP. That is more than a third of a season, and gives a player about 60 games to show that he stinks. this is where teams will say “That’s enough.” Some teams do it earlier, but 200 IP just isn’t enough to draw any conclusions…I think. In this piece, we are gong to claim that a team had to give a player 500 IP to show he could or could not handle a defensive position. I did a cursory glance at 200 IP, and the ZR drop markedly, but very few number of players only got 200 IP and not 500 IP, so I am going to make the initial claims based on 500 IP. That’s assumption #1.

So here’s what we are looking at with 500 IP (and I did round the Replacement Level (RL) Zone Rating) compared to average:

Roughly, offensive RL is 80% of average. Defensive RL is approximately 85% of average, but slightly higher for Middle Infield.

What happens when you start expecting a player to play a full season? Well, he’s a poor fielder, so you sub out for him in the 7th inning, so he only get 1000 IP (minimum). What does that do to the RL?

Boy, Davey Johnson really didn’t care much for defensive prowess. Note that several positions, 1B, LF and RF didn’t change. And the middle positions really moved up. Teams have a strong aversion to putting poor fielder at the positions that get the most chances. It also has to make Yankee fans smile to see that there’s a defensive measure that doesn’t name Derek Jeter as the worst (he was second worst).

Another assumption: I eliminated Manny Ramirez in Boston from this. If you want to know how poorly Manny played LF in Boston, over these 22 years of data, no other Boston LF showed up in this analysis. No Troy O’Leary, No Mike Greenwell. Sure, if I scrolled up far enough, they would be on the list somewhere, but Manny has five seasons of poor defense in Boston that are unrivaled by any other Boston LF. That’s a real individual effect.

So there you have it - defensive replacement level, as defined in the same parameters of offensive replacement level - where you tend to be replaced if your performance reaches.

As a product of this analysis, we can get a good idea of how many runs a replacement level player will cost you defensively, permitted to play out the season:

Pos Worst Fielder RS1b 27.72b 34.03b 35.8cf 29.8lf 39.1rf 39.7ss 32.8

You can see from this that the worst player (that an MLB player will run out there for an entire season) will only cost you about 40 runs. More importantly, while a SS/2B/CF gets the most BIP, teams do not risk as much at those positions, so they aren’t really giving up as many runs as the corner outfielders nor third base. It’s also interesting to see that first base really is the lowest risk position.

Now when we say that while defensive RL is higher than offensive RL, we know by how much.

Reader Comments and Retorts

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

Another assumption: I eliminated Manny Ramirez in Boston from this. If you want to know how poorly Manny played LF in Boston, over these 22 years of data, no other Boston LF showed up in this analysis. No Troy O’Leary, No Mike Greenwell. Sure, if I scrolled up far enough, they would be on the list somewhere, but Manny has five seasons of poor defense in Boston that are unrivaled by any other Boston LF. That’s a real individual effect.

I wouldn't do that. I am sure that you're right about Manny, but is he the only case? What about Griffey? Jeter? What's the cutoff?

It's a truism to say that any given player represents "an individual effect" and I think it's best not to do too much finessing. Unless the player is in the outfield in a wheelchair and they play him there because the team is afraid of being sued for discrimination, I say leave 'im in.

I think the case for leaving Manny out is that the ballpark effect is extreme, and his zone rating does greatly overstates how bad he is. I think Chris did the logical thing here.

Looking at the list of replacement level fielders, I think all were at least 10-15 runs above average hitting. Except for Womack. Overall replacement level is 20-25 runs below average so these guys were mostly above replacement level overall. You can tolerate a bad fielder to the extent that his bat will play. It's rare (or at least should be) to see a guy like Griffey. If he can't handle cf then he should move to a corner, 1b, or dh. Sometimes denial gets in the way of such a move.

Andere, yes, I should have been clearer. AROM is right. Boston LF is difficult to nail down.

I'm somewhat confused by this point too. It seemed like you were saying that O'Leary and Greenwell didn't have the same kind of issues with LF in Fenway - it was just Manny. Which seems like a point for including him, not leaving him out.

I suppose it could be that all the other Boston LFs were just better defenders in general and therefore even though the Monster hurt them, it doesn't make them look that bad.

Basically that Manny and Fenway together make a perfect storm of bad defensive ratings.

I think they rate bad, but not Manny bad. Say Greenwall and O'Leary were average defenders who look like -10 defenders in Fenway. Then Manny comes along, and is a -15 defender who looks like -30 because of the wall. That badness is not all him, so if you're looking at how bad a defender can be before the team has to make him a DH, it's best to look elsewhere.

Looking at the list of replacement level fielders, I think all were at least 10-15 runs above average hitting. Except for Womack. Overall replacement level is 20-25 runs below average so these guys were mostly above replacement level overall. You can tolerate a bad fielder to the extent that his bat will play.

I was going to ask a question along those lines. There are replacement level hitters and replacement level fielders; I'd have to guess that those two groups are more or less mutually exclusive. I'd be curious to know what the overall batting runs are for the replacement-level fielders.

That's not quite true because of position switches, which makes the concept of defensive replacement level interesting on some level. Is the +20/-40 guy playing third like Ryan Braun? Maybe he should move to left field. Is he -40 playing left field or first base? If so he's an average defensive DH.

I disagree. If you learn that replacement level is -20 (or whatever) through some other means (minor league free agents, 26th men, etc.), then replacement level for both offense and defense would be by defintion -20. No?

I disagree. If you learn that replacement level is -20 (or whatever) through some other means (minor league free agents, 26th men, etc.), then replacement level for both offense and defense would be by defintion -20. No?

What I mean is that replacement level D is always dependent on the guy's bat, and vice versa. So, if replacement level is -20, than replacement level D is only -20 if the guy has an average bat, and so on.

That's not quite true because of position switches, which makes the concept of defensive replacement level interesting on some level. Is the +20/-40 guy playing third like Ryan Braun?

This only works if the player's specific defensive skills make him a better defender at his new position than the pool of other -40 third baseman. It's a, what, 10 run positional adjustment between INF and COF, right? That implies that a -40 3B would be a -30 LF, in general. So e.g. Braun would have to be at least a -30 LF in order to make the position switch neutral. If the scouting report indicated that his particular skills were well suited to COF (maybe he's got footspeed that doesn't really help him at 3B), then we're in business. That's why Tango's fan scouting report is so important - it breaks fielding skill down into categories so we can begin to guess which guys will benefit from a position switch and which ones won't.

What I mean is that replacement level D is always dependent on the guy's bat, and vice versa. So, if replacement level is -20, than replacement level D is only -20 if the guy has an average bat, and so on.

I'm starting to think that maybe replacement level only really exists at a combined level

I'm not sure, maybe. I'd guess that replacement level has a different 'number' in different organizations, and that some would rate the balance between the components of hitting replacement level and fielding replacement level differently even if they used the same overall number.

This is the major drawback of replacement level — it is too generalized a concept to be useful in specific cases. It works better as a theory. 'Why give $2.9 million to Scrappy McClutch, when we've got Joe Exprospect in AAA who is just as good and will work for the minimum?' Great idea but, when it comes down to the specifics, you find out Scrappy McClutch's balance between batting and glovework fits better with the rest of your bench, and might also help out with the coaching, and he's good with the press, keeping them away from Sorley McSlugger, who is always getting you in trouble with the commissioner. Suddenly, theory doesn't look so comprehensive.

I think you can use a theory to isolate a pool of candidates, but actually choosing one means looking at the components more 'granularly'.

I'm not sure, maybe. I'd guess that replacement level has a different 'number' in different organizations, and that some would rate the balance between the components of hitting replacement level and fielding replacement level differently even if they used the same overall number.

Well that could be a market inefficiency, i.e., the -30 O/+20 D or +20 O/-30D guys look so bad at one aspect of the game that teams still choose the -10 O/-5 D guys over them.

It seems to jive with the extreme cases being undervalued, e.g. Adam Everett, Adam Dunn, etc.

Have you tried breaking down the worst defenders into sub-sets based on hitting ability, i.e. the worst defender among the +20 or better hitters, the +10-20guys, the 0-+10 guys, etc. to see if the "combined replacement level" (O+D) is consistent?

The replacement level D for a good hitter should be much lower than for an avg. hitter.

I'm not exactly clear on what you did, but maybe try looking at players who moved down the defensive spectrum only, since you have the time series? It's not a good method for defining the replacement level, but it should confirm it.

It seems to me from various studies I have read (or even made myself) that the boundaries of 'fielding runs' are a lot narrower than those for 'batting runs'. That is, as this evidence seems to indicate, that teams will tolerate poor hitting more readily than they'll tolerate poor fielding.* And at the opposite end, it's harder to find exceptional fielders than it is to find exceptional hitters.

So, to push this in a slightly different direction than perhaps was intended, I'd suggest looking at the range between top and bottom of the spectrum. Are exceptional fielders undervalued? Do they make more of a relative difference?

I agree with the basic concept people have been expressing about replacement level being a combination of bat and glove.

And I think you can get some sense of where to draw the line by doing the same work you did for last years team OPD reports and see how the very worst stack up in terms of offense and defense. (IF that doesn't make sense, sorry. Basically, how do the historically bad team positions break down by bat and glove. I suspect it'll be rare to find terrible hitting combined with a really bad glove)

It seems to me from various studies I have read (or even made myself) that the boundaries of 'fielding runs' are a lot narrower than those for 'batting runs'.

T think they have to be. On a team level the ranges from best offense to worst offense are similar to the ranges from best defense to worst defense. But the 9 batters in the lineup are responsible for 100% of that range. On defense, the pitcher is responsible for, well, nobody knows for certain, but a lot of the defensive range. So the variance between the best and worst fielder has to be less than that between the best and worst hitter.

One of my hopes with this type of data piece, is that you readers will say "Have you looked at it this way?"

Something that might be interesting would be to see if particular managers have different tolerances for inferior defense. You mentioned Davey Johnson. How does he compare to Joe Torre, Tony La Russa, Bobby Cox, etc? Obviously it only makes sense to look at long-time managers who have managed 5-10 or more seasons (not exactly sure what the minimum sample size would be).

But a better understanding of the biases of individual managers would seem to be useful information for better predicting future position battles and whatnot.

On defense, the pitcher is responsible for, well, nobody knows for certain, but a lot of the defensive range.

Okay, here's my next assertion:
The lowest value, as it exists for a position is the definition of pitcher-defender responsbility.

As Mike Emeigh correctly points out, the data has the bias of pre-selected players. But, that's a given, and if every player that plays shortstop has a ZR *above* X, the X represents the percentage of GBs that are really "pitcher-driven".

I think the RL is slightly above that mark, and we'd need to look at the 200 IP (or lower) to define the "pitcher-driven" outs.

I think this is where I was looking to go: what percentage of defense is pitcher-driven? The point where every MLB player at that position can make the play (and often MANY who do not). It is a *little* lower than 500 IP, but not much. I'd say 75-80% of fielding defense is pitching. Which, would drive that 50% of the game to a 40:10 split, and put "fielding" as about 10% of the game *once one reaches an MLB position.*

I was moving in this direction, and I think this is a critical point we are getting to.

I left my jumpdrive at home, but I'll continue this when I get home this evening.

But a better understanding of the biases of individual managers would seem to be useful information for better predicting future position battles and whatnot.

Very good idea. I can add those to the db and sort based on that. That's a long term thing, as I don't have that done. Perhaps I can create a vlookup chart for teams/years/managers. It's only the last 22 seasons - how hard could it be?

I did a sort based on players that played at two positions, to get the relative defense spread - that is, when a SS moves to 2B how many runs does that turn into? That's another piece.

First, great work. Second, I sometimes wonder to what extent players have a "natural position." On one hand you hear all the time that so-and-so is a natural 2B playing out of position at SS, or whatever. On the other hand is the rule of thumb that when a guy switches positions, he gains in batting runs roughly the same as what he loses in fielding runs, both relative to positional average. Clearly the rule of thumb is not 100% accurate, but how close is it?

Looking at a sample of position switches might be problematic for a few reasons:

a) Many players with substantial innings at multiple positions are those who were moved because they couldn't hack the old position any longer. There's a selection bias there, as the first position includes good years and some decline phase, while the second position is all decline phase. If it's an age effect you don't want to associate it mistakenly with a positional effect.

b) Small sample size, otherwise.

c) The attributes that make someone a good player at one position might not line up with those of another position. Arm strength is important for 3B, not 2B; range, for 2B much moreso than 3B. Those who are selected for a switch are likely not representative of the general population for a given position.

A couple of years ago I started down the path of sorting through Tango's "by the fans, for the fans" scouting reports to see what I could glean. The path I'd gone down - before life got in the way - was:

- For each skill (first step, speed, hands, accuracy, etc.), establish a mark of proficiency for each position. I simply picked the 10th-best rating at the position for each skill.

- Calculate a rate of proficiency for each skill for each player, as the ratio of their rating to the proficient rating, capped at 100%.

Then...

1. Line up these with defensive metrics to see what skills are important to each position. Establish proper weights.

2. Using the weights, calculate an overall proficiency rate for each player at their position. Rank players accordingly for their given position. (Yawn.)

3. Calculate proficiency rates for each player for every position, including positions they've never played. Assess expected value (in DRS or other metrics) of position switches.

4. Revisit the defensive spectrum, to see how big the gaps are from position to position, based on the general change in proficiency.

I got stuck on 1 - actually I made it through 4, but had to revisit everything - partly because the skills that seemed most important in determining best vs. worst at a position were the skills not typically associated with the position. For example, "speed" for 1B was a big factor; "hands", not so much. There could be selection bias in that if you have bad hands and can't scoop the low throw, you're going to be in OF rather than 1B; consequently "hands" isn't a big differentiator among the players already selected to play 1B. But since most 1B are slow, speed makes a big difference.

There's no way I have time to take this further than I have, so I'm bringing it up here in case it sparks some thought for your work or in the hope that someone else pursues it.

I spent some time at SABR Analytics discussing this, both with Tango and Voros. We are aware that Replacement Level for a hitter is tied to "Freely Available Talent" (FAT), and when people complain about WAR (formerly guilty) it is because it used to be stated that "Replacement LEvel" was 80% of a regular player. That number has been verified by scrounging around FAT players (by me, nad Clay Davenport in a public forum). Probably by Colin Wyers and Nate Silver among others.

People currently assume RL for defense is that as well because FAT players on defense are *at least* average. However, the FAT hitters can sit on the bench and play DH/1B/LF, but FAT fielders cannot. Not since Rafael Belliard anyway. That excludes catchers, because they will claim they are good at pitch-framing.

The reason we don't use average in a historical context, and use RL is because "average" performance has a lot of value. This article shows that to be true as well - average fielders are not getting enough credit in current WAR - they should be getting 15% more. I think.

There's a second reason. RL is also the general construct within your 40-man roster or other FAT, but it has to be FAT players that can hit. So now, RL has got to be more sophisticated in selecting those players - or at least the verification of RL hitters' defense should be evaluated. This may need to be "low value roster players", whether that is someone who is not a budding star, but the guy that gives the starters a rest. What is their defensive level?

Maybe instead of bumping this thread, I write a different piece, but nah. Because I need the details in here, and the old commentary.

Statcast will verify the "catch rates" that I produced above, and I now have 8 more years of data to re-assess that 20 years.