/m/site_news

Reader Comments and Retorts

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

I'm not saying it actually has been. But it's no more inherently absurd than thinking that Carlos Peña in 2010 (.196 BA, 1.7 WAR) might have been a better player than Dmitri Young in 2007 (.320 BA, 0.1 WAR). Nothing in the logic of the calculations, or the nature of baseball, prevents it.

Unless you cite OPS+ for Pena and Young, your point has absolutely no meaning.

The last time I relied on batting average to tell me anything about offense I was 15.

Ray, the point is simply that a single number, whatever the number, always has a context, and is subject to illusion. It's as true of OPS+ or batting average as it is of any other metric. If you want OPS+ as that number, substitute Rey Sanchez in 2001 (64 OPS+, 3.2 WAR) and Kevin Reimer in 1992 (119 OPS+, -0.8 WAR). Maybe that's wrong too. But such things are clearly possible, which is to say not inherently absurd.

I just don't see the point of pointing to one side of the equation, or not even doing the math, and then just saying "that's absurd!"

See, e.g., Sheffield, who has a 76 career oWAR that is gutted by defense such that he ends up with just a 56 WAR. That seems highly implausible to me. No, I can't point to a specific error.

When I read that I think you have pointed to a specific error. You don't think Sheffield was 200 runs below average as a fielder. Well, maybe not error, as I watched Gary Sheffield and happen to think -200 runs over 20 years is very plausible for him. But it's a specific point of disagreement.

Theoretically, Hamilton could certainly be bad enough in CF to compensate for being an exceedingly better hitter than Barney, but as many here have said, it's all in the calculations. There's nothing inherently absurd about the proposition.

Yes, there is.

No there isn't.

Let's say Hamilton is worth 75 more runs on offense than Darwin
Lets' say a typical CF generates 15 more runs on offense than your typical SS (or typical replacement level CF versus SS whatever, I think we all agree that CFs tend to outhit SSs- and Hamilton played nearly 1/2 his games in LF)
So Hamilton is up by 60-65
Can Darwin SAVE that many runs on defense relative to Hamilton? Yes I think so, Hamilton would have to be extremely bad and Darwin extremely good, but it is certainly possible

Whether or not we can we accurately measure that defense- is a separate question- I don't think we can.
Personally I don't think you can, and if you regress each mans fielding stats 1/3 to the mean- Hamilton easily flips ahead of Barney...

That doesn't mean that Barney CAN'T be the better player than Hamilton this year, in my mind it's unlikely, but yes, if he rally is an elite defense SS (near peak Ozzie) and if Hamilton was awful...

When I read that I think you have pointed to a specific error. You don't think Sheffield was 200 runs below average as a fielder. Well, maybe not error, as I watched Gary Sheffield and happen to think -200 runs over 20 years is very plausible for him. But it's a specific point of disagreement.

Your system has him as a better outfielder at 33 and 34 than he was in his late 20s. Then he reverted to being a bad outfielder again.

Actually, the specific issue I see is that he was magically a better outfielder both years in Atlanta (ages 33 and 34) than he was for the Marlins, Dodgers, or Yankees.

Nothing magical. Just random variation, and not even an extreme case at that. From LA to ATL to NY his ratings from 2000-05 are -3, -15, -1, -2, -10, -14. Those are results that would be consistent with a -10 true talent and random variation around that base.

How much of that credit should go to the manager instead of the player?

With the additional caveat that some goes to coaches and scouts, some credit also must go to the pitcher. A pitcher who knows his infield is playing a batter the other way, and throws outside, is aiding his defense a lot more than the bozo who comes inside with a pullable pitch. It just points to the difficulty of separating credit on defense.

AROM: Fair enough. Random variation could well explain it. (But, of course, that's my point: there are reasonable explanations for all of this. But that doesn't really get us very far.)

DL: It was hard enough to evaluate defense before, and people were just starting to get a handle on it. But the shifting throws everything out of whack. There is no baseline to compare a shifted 2B to. There's no data. Especially when the SS is shifted as well. "2B" in a shift becomes a meaningless construct having nothing to do with a traditional 2B.

" ... I’ve been slowly coming to some realizations about defensive metrics in general, and they aren’t encouraging.

The short version: I’m not really sure that we’ve gotten any further than where we were when Zone Rating and Defensive Average were proposed in the '80s. And if we have gotten further, I’m not sure how we would really tell ..."

DL: It was hard enough to evaluate defense before, and people were just starting to get a handle on it. But the shifting throws everything out of whack. There
is no baseline to compare a shifted 2B to. There's no data. Especially when the SS is shifted as well. "2B" in a shift becomes a meaningless construct having nothing to do with a traditional 2B.

As I understand it, DRS doesn't really even account for positioning other than eliminating plays from the data where 3 IFs are on one side of the infield. You could field a ball in a "tough" position in a vector just because you're positioned right in front of it, and DRS will still credit you with making a tough play.

This is starting to feel like the debate over NeL statistics, where people were claiming that we had accurate MLEs for Josh Gibson and the like and that the MLEs should be accepted simply because a lot of hard work had been done.

My argument in one respect boils down to WAR treating its defense component with just as much certainty as its offense component.

Once you combine offense and defense there is no other way to do it. Having a number closer to 0 does not magically mean it has less weight. It takes as much confidence to say Miguel Cabrera is 2 runs below average as it does to say Adrian Beltre is 14 above. Cabrera could be -20, and Beltre could be +25. Defense is an equally important aspect of both players value, and needs to be weighted equally.

This is starting to feel like the debate over NeL statistics, where people were claiming that we had accurate MLEs for Josh Gibson and the like and that the MLEs should be accepted simply because a lot of hard work had been done.

That's about where we are for defense and WAR.

Well as I said, if you want to ignore defense, then ignore defense. That's entirely up to you. But the point of WAR is to combine the values of all aspects of baseball play into one measure. Once you have all but nominally eliminated defense, what you have left is not WAR.

This is starting to feel like the debate over NeL statistics, where people were claiming that we had accurate MLEs for Josh Gibson and the like and that the MLEs should be accepted simply because a lot of hard work had been done.

Of course the main difference being if you question one, you are a racist, and if you question the other you are a Luddite.

Answered twice but unsatisfactorily both times (and now three times with your answer), as Play Index doesn't allow a search by oWAR which was my point.

I thought (and I would assume the other two people that answered you) you wanted to see the oWAR for use in discussing who the MVP is this year, which that list would do fine. You apparently want it for something not really related to this.

I wish BBRef would give me a banana split. Can anyone help me find a banana split using baseball ref?

Defense is an equally important aspect of both players value, and needs to be weighted equally.

Not necessarily, particularly if there isn't as much opportunity to distinguish yourself on defense as on offense and, as for middle IFs, where when you do distinguish yourself it's only to "hit" a single.

But the point of WAR is to combine the values of all aspects of baseball play into one measure. Once you have all but nominally eliminated defense, what you have left is not WAR.

A better way to do that would be to model replacement-caliber players at all positions in all facets of the game (perhaps equally "good" on offense as defense) for a .320-caliber team and measure players against them. That's really what WAR should be.

I thought (and I would assume the other two people that answered you) you wanted to see the oWAR for use in discussing who the MVP is this year, which that list would do fine. You apparently want it for something not really related to this.

? This is a discussion about WAR. I want it for something related to WAR.

I wish BBRef would give me a banana split. Can anyone help me find a banana split using baseball ref?

My argument in one respect boils down to WAR treating its defense component with just as much certainty as its offense component.

That's a valid argument. Two potential solutions, when evaluating single seasons:

1. Stat-based. Use multi-year fielding ratings. For Josh Hamilton in 2012 this would mean maybe taking his average fielding rating from 2009-2012, and combine that with his actual offensive numbers.
2. Subjective based. Grade every fielder from terrible, poor, average, good, great. Award runs on those categories -10, -5, 0, 5, 10. We can all see Adrian Beltre or Brendan Ryan play great defense, so give them the top rating. Someone is a complete butcher? Give him the bottom rating.

Either solution is preferable to burying your heads in the sand and pretending defense doesn't matter.

@262 is that some kind of appeal to authority, -- a >2 year old quote - in a stats thread?

Moving on to other points...

You can't have WAR without Defense, that's the whole point of it. I do wonder if the difficulty of measuring fielding replacement value means we should switch to WAA with a correction for position. You would lose the concept of an average player having value... but I guess you could just add back a constant at the end.

Your system has him as a better outfielder at 33 and 34 than he was in his late 20s. Then he reverted to being a bad outfielder again.

So what, this happens with offense and pitchers All. The. Time. Just not in the aggregate.

I had an idea to fix both FIP and fielding stats by statistically partitioning credit between fielder and pitcher in an iterative way. Because one way or another those outs are getting recorded. But I never quite worked it out or implemented it.

The idea to use N-fold cross validation for each fielder with the N's being the pitchers he plays behind. For each set of fielder plays (in this case subset of a season) you get a ratio of outs/chances, Rf. You also get the league-wide (park corrected?) aggregate as well. For each pitcher you also get (for each fielding position) a ratio of outs/chances, Rp (I am simplifying by ignoring GB/FB - assume we are only talking about GBs for infielders and FBs for outfielders...and of course "chances" depends on the BIP data you are using)

for pitcher i: Rpi = weighted sum of Rf for all fielders
for fielder j, Rfj = weighted sum of Rp for all pitchers

What we wish to impute are the difficulties of the BIP; or really the average difficulties of the BIP for each cell of the matrices.
The difficultly, Dij is the chance that an average fielder converts a given BIP to an out. From 0-0.999 or something. Dij is the pitchers' component of defense (and also probably positioning/coaching). Fij is the fielding skill. An average fielder has an Fij of 1.

Rij = Dij*Fij

If we assume (for a given iteration) that Fij is constant for a fielder - we have the Rijs from data and Dij[Fs,1] = Rij/Fij -> Fij(1) is just Rfj.
Then we assume that Dij (again we are splitting GB/FB) is constant and we get Fij[Ps,1] = Rij/Dij == Rpi/Dij

Now we have Rij(true) and Rij(est) = Dij[Fs,1]*Fij[Ps,1] - we now iterate changing Fij[Ps,2] and Dij[Fs,2] to minimize the difference.

I ran this past Dial a couple years ago, but we never did anything with it ... possibly because it's a dumb idea that won't work, definitly because I am too busy and don't have BIP data.

1. Stat-based. Use multi-year fielding ratings. For Josh Hamilton in 2012 this would mean maybe taking his average fielding rating from 2009-2012, and combine that with his actual offensive numbers.

Doesn't really work. Fielders can lose it pretty quickly, just as on offense. We wouldn't credit Adam Lind with a portion of the 141 OPS+ he put up in 2009, or Joe Mauer with a portion of his 170 OPS+ that he put up in 2009, or ARod with a portion of his 138 OPS+ that he put up in 2009. To do so would be flatly absurd/

2. Subjective based. Grade every fielder from terrible, poor, average, good, great. Award runs on those categories -10, -5, 0, 5, 10. We can all see Adrian Beltre or Brendan Ryan play great defense, so give them the top rating. Someone is a complete butcher? Give him the bottom rating.

But this is no better than evaluating defense based on a combination of scouting and fielding percentage, which was done for decades before ZR or DA.

Either solution is preferable to burying your heads in the sand and pretending defense doesn't matter.

I of course don't pretend it doesn't matter. I just don't pretend that we can get close enough to it that we can have nearly as much confidence in it as we do in the offense evaluations, which is what WAR does.

But the point of WAR is to combine the values of all aspects of baseball play into one measure.

Which is great, but this extended discussion demonstrates why using it as the sole or main arbiter when voting for the MVP and the HoF (and even the HoM and the MMP) is fraught with issues that need consideration.

For another thing, if you don't like dWAR, you can always plug in a different-run based value, like UZR or DRS.

I've just made up four lists of AL 3bs with at least 600 innings for 2012 using BB-ref Rtot, DRS, UZR and RZR. Miguel Cabrera is always at or near the bottom of them. Brett Lawrie is always at or near the top.

Now it seems that no matter how well Cabrera hits, his defence is probably the worst of any AL 3B. Worst 3bs historically cost something approaching a single win, unless we are talking about a historically bad season, which all indications suggest Cabrera has avoided. Does a precise calculation of how bad this season was really matter?

Either solution is preferable to burying your heads in the sand and pretending defense doesn't matter.

Of course it matters.

Does anyone have any idea how many hits Darwin Barney turned into outs that a replacement-level 2B wouldn't have (*)? This is really the fundamental question regarding defense that has to be answered for WAR to work. So how many was it, even approximately? Fifty? More? Less?

(*) Note: Not a replacement-level fielding 2B; a replacement-level 2B. A 0.0 WAR player can have a wide range of both oWAR and dWAR.

To make my above point more clearly: The 141 OPS+ Adam Lind put up in 2009 has absolutely no value to the Blue Jays in 2012. And so to credit him in 2012 with a portion of a defense evaluation from 2009 would be similarly ludicrous. Doing so does not improve WAR so much as it shows its limitations.

Ray, you're not thinking about this very clearly. As you have noted, there is uncertainty in the defensive metrics. We can reduce that uncertainty by looking at prior performance. If Ozzie Smith posts a +25 TZ, we believe it's probably close to right. If Dan Uggla does it, we suspect there's a lot of noise there. So it would make a great deal of sense to regress the current year's rating toward a player's recent-career average. You don't do that to give him extra credit for what he did in prior years, but to make the best possible estimate you can of what he did this year.

And that should be our goal -- to make the best estimate we can. And that estimate should be weighted equal to offense. BUT, getting our best estimate on defense requires us to regress the current year's stats.

Then you also have to decide what defensive metric(s) you have most confidence in....

But what do you do for someone like Trout who has one season in the big leagues? Regressing him to average isn't really fair to him, but you don't want to wait two years before you have an estimate of his true talent.

The uncertainty in WAR is something like the geometric sum of all the uncertainties in the measurements. There is uncertainty in positional replacement, batting value, baserunning and overall replacement too. Defensive uncertainty will dominate in a geometric sum (rms).

To loop back on the Trout v Cabrera argument there is probably some part of the error bar where they overlap. It would be great if I could state the confidence level that Trout is more valuable than Cabrera this season. Is it 90%?

Does anyone have any idea how many hits Darwin Barney turned into outs that a replacement-level 2B wouldn't have (*)? This is really the fundamental question regarding defense that has to be answered for WAR to work. So how many was it, even approximately? Fifty? More? Less?

Comparing Barney's range factor per 9 innings (RF9) with the league average of that (lgRF9), I calculated that Barney has made about 55 more plays than league average (you should be able to find a "replacement-level 2B" who can field the position averagely, I'd think). Obviously, you'd want to account for K-rates and GB tendency of the Cubs' staff and various other what-nots to refine that.

But what do you do for someone like Trout who has one season in the big leagues? Regressing him to average isn't really fair to him, but you don't want to wait two years before you have an estimate of his true talent.

Agreed. So cases like Trout and Barney are hard. Personally, I'd start by looking at plays made above/below average, adjusted for number of BIP his pitchers gave up (airballs for OFs, GBs for infielders). If a player doesn't excel there, then a good DRS is based entirely on the idea he had an unusually challenging set of BIP -- could be true, but I wouldn't give that very much weight for a single season. I'd look at minor league defensive stats. I would see what scouts and fans say about his fielding, and for an OF consider whether he's fast. It's a judgment call for young players. But why should we expect any stat to remove judgment from an MVP choice?

1. Assume that experienced players performed similarly on defense this year as they did in recent years
2. Regress non-experienced players back to league average since there's no other option
3. Make an educated guess about how good a guy is that can incorporate the principles of #1 and 2, but doesn't have to

Ray, you're not thinking about this very clearly. As you have noted, there is uncertainty in the defensive metrics. We can reduce that uncertainty by looking at prior performance. If Ozzie Smith posts a +25 TZ, we believe it's probably close to right. If Dan Uggla does it, we suspect there's a lot of noise there. So it would make a great deal of sense to regress the current year's rating toward a player's recent-career average.

It might improve the evaluation, but that highlights the problem. Perhaps the player really did have a great defense year. We see that a player's OPS+ can bounce around - so too can his defense. Uggla's OPS+ went from 110 to 130 to 110 to 130 to 110 to 100.

How can we tell when a player's defense bounces whether it is a flaw in the system or if he really did turn in that kind of a year? A player's OPS+ might bounce due to dumb luck - e.g., BABIP - but we still know that it bounced, and we still know it had value to the team. We _don't_ know that with defense.

You don't do that to give him extra credit for what he did in prior years, but to make the best possible estimate you can of what he did this year.

At which point you may be double counting, such as you would be extra-counting Lind's 141 OPS+ for no reason.

My complaint is that people recognize the flaws in the defensive systems, but rather than admitting that such means that we can only take defensive evaluations so far, they want to make full use of them anyway.

Comparing Barney's range factor per 9 innings (RF9) with the league average of that (lgRF9), I calculated that Barney has made about 55 more plays than league average (you should be able to find a "replacement-level 2B" who can field the position averagely, I'd think). Obviously, you'd want to account for K-rates and GB tendency of the Cubs' staff and various other what-nots to refine that.

I saw that too. But here's the odd thing: Barney seems to be about league average in assists/9, but way above average on putouts. I'm skeptical about a good defensive rating for an infielder that depends on putouts. Catching linedrives is almost entirely random. Catching shallow flies to the OF can be a real (and valuable) skill, but is that what we're seeing here? Or is he like Orlando Hudson, someone who takes a large number of discretionary flyballs that other players could also have caught? Or some of both?

My complaint is that people recognize the flaws in the defensive systems, but rather than admitting that such means that we can only take defensive evaluations so far, they want to make full use of them anyway

All you are saying -- again and again, in various permutations -- is that there is uncertainty. And this obviously frustrates you greatly. But that's life -- our estimates will be uncertain. All we can do is make the best estimates possible with the data we have. By radically underweighting fielding you will simply trade one set of errors for another. Instead of sometimes giving too much credit to players with good fielding stats, you will instead sytematically undervalue good fielders (and systematically overrate bad fielders). If that's the type of error you personally prefer to make, fine. But don't pretend you've made the problem go away....

Comparing Barney's range factor per 9 innings (RF9) with the league average of that (lgRF9), I calculated that Barney has made about 55 more plays than league average (you should be able to find a "replacement-level 2B" who can field the position averagely, I'd think). Obviously, you'd want to account for K-rates and GB tendency of the Cubs' staff and various other what-nots to refine that.

That seems about right, or certainly not absurd. If Barney were to go 55 for his next 55, all singles, his OPS would still be ca. 100 points short of Josh Hamilton's.

I saw that too. But here's the odd thing: Barney seems to be about league average in assists/9, but way above average on putouts. I'm skeptical about a good defensive rating for an infielder that depends on putouts.

As you should be. Assists would be the better measurement of what we're trying to get at, without putouts. That would reduce Barney's distance from league average (In terms of extra plays made).

How can we tell when a player's defense bounces whether it is a flaw in the system or if he really did turn in that kind of a year? A player's OPS+ might bounce due to dumb luck - e.g., BABIP - but we still know that it bounced, and we still know it had value to the team. We _don't_ know that with defense.

Making plays in the field is just as real as hitting a single. What we don't know is how difficult the play was to make. But that's true for hitters too. Maybe pitchers hung a lot of curves to Trout this year. Maybe outfielders took bad routes to a lot of Miggy's line drives, turning them into hits. We assume this averages out, but for a single season that's certainly not true.

That's a valid argument. Two potential solutions, when evaluating single seasons:

1. Stat-based. Use multi-year fielding ratings. For Josh Hamilton in 2012 this would mean maybe taking his average fielding rating from 2009-2012, and combine that with his actual offensive numbers.
2. Subjective based. Grade every fielder from terrible, poor, average, good, great. Award runs on those categories -10, -5, 0, 5, 10. We can all see Adrian Beltre or Brendan Ryan play great defense, so give them the top rating. Someone is a complete butcher? Give him the bottom rating.

Either solution is preferable to burying your heads in the sand and pretending defense doesn't matter.

Agreed. I like both solutions, I just sometimes question the accuracy of the positional adjustments, the fact that war has no way to properly value a player who gives you positional flexibility, the obvious discrepencies in catcher defense and first base defense and of course the run value of defense. (In response to my own complaints, I know that war isn't designed to grade positional flexibility, as that is a concern with potential tactics that is beyond the scope of war. I also understand that the math shows that positional adjustments is correct so it's just my personal worry)

Barney's 70 assists behind Aaron Hill and only third in the NL among 2B. Starlin Castro, the Cub SS, is first in NL assists by 41 over Reyes in second place(practically identical innings).

Castro's getting to more grounders than other NL shortstops and Barney isn't getting to more grounders than other NL 2B (or really even close). Hmmm. That can only make us highly skeptical about Barney's defensive WAR -- the overwhelming amount of which should be found in his range. (Yes, Barney's very sure-handed.)

We assume this averages out, but for a single season that's certainly not true

While remaining an agnostic on the specific question of Darwin Barney, I am very ready to accept that a fielder can have large year-to-year swings in value. Maybe one year he's in superb shape, and the next he's got some problem with his knees that doesn't reduce his hitting or even his straight-ahead speed, but cuts his range measureably – and on top of that, other factors (positioning, opportunities) kick in, and pretty soon you're talking real runs. I think one problem with confidence in defensive stats is that people tend to think that defense doesn't slump, that it's a skill that doesn't have its ups and downs.

Aaron Hill only has 6 errors at 2B (Barney has 1), he's blowing Barney away in assists, and is ahead of Barney in Total Zone Runs. Hill's RF/9 is 0.24 higher than Barney's. Barney gets to fewer balls vs. league than his fellow middle infielder, Starlin Castro, and Hill gets to more than his (though the D-Back SS is a statue).

Yet, Hill has 0.2 dWAR and Barney has 3.6. That simply can't be right. Doesn't pass the laugh test.

It might improve the evaluation, but that highlights the problem. Perhaps the player really did have a great defense year. We see that a player's OPS+ can bounce around - so too can his defense. Uggla's OPS+ went from 110 to 130 to 110 to 130 to 110 to 100.

How can we tell when a player's defense bounces whether it is a flaw in the system or if he really did turn in that kind of a year? A player's OPS+ might bounce due to dumb luck - e.g., BABIP - but we still know that it bounced, and we still know it had value to the team. We _don't_ know that with defense.

The main difference is that defense is estimated while OPS is known. We know exactly how many homers Uggla, but we are guessing how many runs he saved. It's perfectly fair to regress defense but not offense because you expect some measurement error.

That seems about right, or certainly not absurd. If Barney were to go 55 for his next 55, all singles, his OPS would still be ca. 100 points short of Josh Hamilton's.

This is the wrong way to do it. If UZR/RF/whatever says Barney made 55 more plays than league average, that means he turned 55 would be hits into outs. In order to balance that on offense, you'd have to take 55 or Barney's *already accumulated outs* and turn them into singles. You don't add 55 singles and 55 PA to his line. Your method is drastically underrating the impact of Barney's defense. Please recalculate.

Also fWAR has Barney's defense at +11 runs and his WAR at 2.5. No one says you have to put complete faith in defensive metrics. You're allowed to use your head. WAR just gives you the starting point.

#296 Raw range factor has some huge illusions. Just to start with, Arizona pitchers gave up 6% more ground balls (you'd want to break this down into more detail. Clearly it doesn't matter much how many balls were hit to the left side or down the line, but this is an indicator)

Cubs pitchers gave up 9% more line drives, but the Cubs were better than the DBacks at turning line drives into outs (and if some of that is Barney shifting, well that counts. A lot). And yes, some of that may be the way stringers score certain balls in play. It'd be nice to break things down home and road. It'd give us a first order park effect. (They've had 6 more Line outs into DP as well. Only one by Barney which is one more than Hill)

The infields are about equal in turning the DP. This is tricky though. Takes two to turn many DPs. We can tell roughly how good Barney/Castro are compared to Hill/Crowd but it's really tough to separate out the specific contribution of any one player. A WOWY type analysis isn't going to work with the Cubs. When Castro isn't at short, Barney is.

This is the wrong way to do it. If UZR/RF/whatever says Barney made 55 more plays than league average, that means he turned 55 would be hits into outs. In order to balance that on offense, you'd have to take 55 or Barney's *already accumulated outs* and turn them into singles. You don't add 55 singles and 55 PA to his line. Your method is drastically underrating the impact of Barney's defense. Please recalculate.

Ron (298) -- Those are factors, and matter, but they don't (and can't) come close to making Hill a replacement-level defender and Barney a superstar. Whatever impacts they have are measured to a degree by seeing whether they help/hurt other IFs, too. Castro's getting to more balls vs. league than Barney is. The extra ground balls AZ pitchers are giving up aren't being fielded by the D-Back SS.

OK, Ariaona pitchers gave up 6% more grounders than Cub pitchers. Aaron Hill has 16.8% more assists as a 2B than Darwin Barney. Barney does, it appears, have a marginally higher RF/9 as a 2B, but RF includes popups and liners caught.

The primary measure of prowess for a 2B, what separates them from each other, is their ability to get to ground balls. There's no indication Darwin Barney does this any better than Aaron Hill, much less in a way that could explain the huge dWAR discrepency.

Barney could be a better fielder than Hill. There's no way Barney's a superstar if Hill's replacement level.

Up until recently Bill James had Win Shares on his pay site. [I see he still does, I should have noted that "up until recently" meant "up until recently, when I stopped paying"]

Probably a big reason WAR dominates now is that it's freely available.

I remember just four or five years ago this time of the year was super fun time plugging numbers into a spreadsheet to spit out my own deeply flawed version of WAR for the season, inputing some version of linear weights and a witch's brew of various Hardball Times defensive stats. That was super fun, though now with freely available WAR my laziness takes over.

Man, remember when B-Ref didn't have minor league stats, or wasn't updated daily? Seems like twenty years ago to me, but I guess that's just my naturally awful memory talking.

The solution for me is to go back to something that Bill James said in the 1980s - don't combine everything into one big number.

Even leaving aside James' later change of mind (Win Shares), this is a classic of the genre of statements that sound profound but are in fact ridiculous. The nice thing about WAR is that it's components are readily transparent, so you can, for example, substitute your own defensive assessment if you aren't a fan of DRS. You could even, I suppose, demonstrate your boldly independent thinking by refusing to add up the components! But since they are added up in real life -- real players do hit, run and field, and teams actually hire individual players to do all of these things in a single season, and even in single games -- I'm not sure what the objection could be to arriving at an overall assessment.

Even leaving aside James' later change of mind (Win Shares), this is a classic of the genre of statements that sound profound but are in fact ridiculous. The nice thing about WAR is that it's components are readily transparent, so you can, for example, substitute your own defensive assessment if you aren't a fan of DRS.

But you can't easily do this for each player in a list of 15 (e.g., if you're trying to determine a player's HOF case). Or if you are going to do it, you might as well start from a baseline where all the players in the list are average defensively. Because in a list ranking players by WAR, the needle is all over the bleeping map with defense. At least with VORP or oWAR, you understand that quality of defense isn't included, and it's FAR easier to eyeball the adjustments.

You could even, I suppose, demonstrate your boldly independent thinking by refusing to add up the components! But since they are added up in real life -- real players do hit, run and field, and teams actually hire individual players to do all of these things in a single season, and even in single games -- I'm not sure what the objection could be to arriving at an overall assessment.

The objection could be that a list where the players are ranked by WAR is more off base than a list where the players are ranked by VORP or oWAR.

Yes, which actually happens to be similar to my objection to Play Index not allowing us to pull up a list of players ranked by oWAR.

I would use oWAR to get a first cut of ranked players much more than I use WAR, but I'm shunted into WAR instead.

(Actually, I would first use VORP, but Baseball Prospectus's stats page is bizarrely not easily workable. If any BP guys are lurking here, it's a big problem for your business, and the pay aspect compounds it.)

But you can't easily do this for each player in a list of 15 (e.g., if you're trying to determine a player's HOF case). Or if you are going to do it, you might as well start from a baseline where all the players in the list are average defensively. Because in a list ranking players by WAR, the needle is all over the bleeping map with defense.

That's not really true for a HOF assessment. At the career level, the main problem with defensive stats is that they are artificially regressed -- everyone is relatively close to zero -- not that they are "all over the map."

For a single year assessment, yes, evaluating defense is challenging and there is a case for heavily regressing the fielding component. But at the career level, the story is very different.

But you can't easily do this for each player in a list of 15 (e.g., if you're trying to determine a player's HOF case).

Well, you can if you put the time into it. [Edit: which is where "easily" comes in I imagine. But I think this is the crux of the debate. You want a version of WAR that does all the leg-work of triangulating a bunch of different defensive systems for you to increased certainty. Which would be great. If someone had a site where they did that I'd probably visit it often.]

Not to me. 3b is quite possibly an inherently more valuable position to play than CF to a team. In other words, Cabrera hasn't been bad enough to detract from the real value that having a 3b represents.

I mean, what compels you to have a player standing at that place in the field? Why not have an inner outfielder instead?

The solution for me is to go back to something that Bill James said in the 1980s - don't combine everything into one big number.

Another option would be not to treat research that is still a work in progress as definitive.

Sean does a great job with the new WAR. Unfortunately, the underlying data and the nature of statistical analysis means that it's impossible to get an answer with pinpoint, definitive accuracy. (Yes, impossible.)

That doesn't mean it's not a task worth pursuing. It also doesn't mean that the numbers we have now don't have value and don't provide us useful information. It just means that the results shouldn't be taken, or used, as the end of discussion. Instead, they should be used as the beginning.

That doesn't mean it's not a task worth pursuing. It also doesn't mean that the numbers we have now don't have value and don't provide us useful information. It just means that the results shouldn't be taken, or used, as the end of discussion. Instead, they should be used as the beginning.

What I don't get is that it seems like EVERYONE on this site understands this ... and yet the same thing keeps happening with people lazily treating the WAR numbers like gospel, and the same debate springs back up.

Isn't that what happened here? Darwin Barney's bb-ref dWAR was used as the starting point, examined, and found wanting. It's out of proportion by scale and magnitude to oWAR, not confirmed by plays made as compared with other NL 2Bs and Cub shortstops, and not confirmed by metrics that use play-by-play data with sabermetrically sensible adjustments, like FRAA.

What I don't get is that it seems like EVERYONE on this site understands this ... and yet the same thing keeps happening with people lazily pretending other people are treating the WAR numbers like gospel, and the same debate springs back up.

The biggest difference between Fielding Runs Above Average and similar defensive metrics comes in the data and philosophy used. Whereas other metrics use zone-based fielding data, Fielding Runs Above Average ignores that data due to the numerous biases present. Fielding Runs Above Average instead focuses on play-by-play data, taking a step back and focusing on the number of plays made compared to the average number of plays made by a player at said position. The pitcher's groundball tendencies, batter handedness, park, and base-out state all go into figuring out how many plays an average player at a position would make.

Here is an example of the Fielding Runs Above Average spectrum based upon the 2011 season-for the sake of consistency, the players featured below all play the same position (center field):

Honestly, I'm really curious how the whole argument got reduced to WAR vs. Triple Crown. Objectively, all that separates the two players offensively is about 10 HR by Cabrera, which are mitigated by a ton of stolen bases by Trout. Anybody who has seen the two guys play should have some idea that Trout more than makes up the difference with his fielding. The debate always should have been over whether or not the MVP voting should consider defense, but then there are a lot of writers who like to pretend it does while ignoring it so...

Honestly, I'm really curious how the whole argument got reduced to WAR vs. Triple Crown. Objectively, all that separates the two players offensively is about 10 HR by Cabrera, which are mitigated by a ton of stolen bases by Trout.

How much is 10 home runs - it's actually 14 - "mitigated" by 48-4 on the bases? (And FWIW Cabrera is 4-1 on the bases.)

How much is 10 home runs - it's actually 14 - "mitigated" by 48-4 on the bases? (And FWIW Cabrera is 4-1 on the bases.)

Didn't say completely mitigated, and I figured around 10 HRs better because he also has 20 more games. That gives Trout 41 extra net steals. I don't mean to say Trout has been as good offensively as Miggy, but that the gap isn't so far that Trout's defense doesn't bridge it (or at the very least make it a legitimate question between the two of them).

Identifying and explaining the outliers is an important part of the process. First, it can sometimes help to point out the flaws in the system. Second, after inspection, it allows us to appreciate special seasons from unexpected players. What it doesn't do is invalidate the whole system.

yet the same thing keeps happening with people lazily pretending other people are treating the WAR numbers like gospel,

FTFY, etcetera

There are those who do treat WAR numbers like gospel. They aren't a tremendous percentage of visitors to this site, but they are out there.

How much is 10 home runs - it's actually 14 - "mitigated" by 48-4 on the bases? (And FWIW Cabrera is 4-1 on the bases.)

Didn't say completely mitigated, and I figured around 10 HRs better because he also has 20 more games. That gives Trout 41 extra net steals. I don't mean to say Trout has been as good offensively as Miggy, but that the gap isn't so far that Trout's defense doesn't bridge it (or at the very least make it a legitimate question between the two of them).

The Miggy defenders and those attacking WAR (many of them the same people) are making a good case. But at the risk of going all SBB, I'll say this. If WAR is so flawed as to be off by ~40%, if Miggy really is better/more valuable than Trout despite a 40% lower total, then it is totally useless, even as a conversation starter, and I question how it ever came be accepted as a tool in the first place. IOW, those claiming Miggy is better, but WAR can be useful for some things remind me of the writer who in 2001 claimed Ichiro should be ineligible for ROY because of his prior experience as justification for voting for C.C first, and then put Ichiro second on his ballot.

Either Miggy is better and WAR, with at least a 40% error bar, is completely useless for anything, or there just might be something else going on.

He had a structure fairly similar to WAR in the 1983 and 1984 Abstract. And the intro to the 1984 Abstract player rankings section has a much more nuanced (and fairly lengthy) intro, talking about the pluses and minuses of a "great statistic".

#315 Somebody (Mr. High Standards IIRC) also has supplied a spreadsheet to calculate Win Shares yourself. Kind of neat in that if you want to recalculate them using (say) EQR for the offensive component you can.

Also, WAR analysis can be interesting, but it's strange that it's come up in this debate at all. Trout was leading the batting title race for a long while and will likely finish a close second, hit 30 HR, maybe 50 steals at a phenomenal rate, 129 runs (way ahead of anyone else), despite missing the first month of the season, plus he's a Gold Glove contender in CF with several memorable highlight-reel plays. If anybody might be 40% better than Cabrera, it would be this guy.

If God has a perfect omniscient metric for judging on-field baseball value, I would be shocked if Trout's 2012 wasn't at least equal to Cabrera's, and in all likelihood it's probably better. Guess I won't know until the afterlife (unless God is watching this thread and cares to comment).

And for narrative, doing all this as a rookie in his age-20 season while reversing the in-season fortunes of his team ain't too shabby either. Frankly, Trout's narrative is a lot more interesting to me than Cabrera's triple crown, which is a neat bit of history/trivia but it essentially just boils down to simultaneously having a high average and being a slugger.