11 September 2014

The Problem With WAR

Jeff Passan recently wrote an article about WAR that sparked some major debates on Twitter. In response, Dave Cameron wrote this article. Reading these articles got me thinking about WAR and one of the major issues with it.

When I was a kid I didn’t know about WAR, wRC+, ISO, wOBA or
other advanced stats that quantify a players offensive contributions. I’m not
even sure I was familiar with OBP, SLG or OPS. I judged a players’ offensive
ability based on his batting average, home runs and RBIs. If I really wanted to
look closely at a player I would maybe consider how many doubles and triples he
hit as well as how many times he walked and struck out.

By themselves these stats have limited utility but they are
still useful. Batting average does give a basic idea of how often a player gets
on base even if it's not as good as OBP. Home Runs and RBI do indicate how much power a player has even if they're not as useful as ISO or even SLG. Most people
would have agreed back than that a home run is more valuable than a triple, a triple
more valuable than a double, a double more valuable than a single and a single
more valuable than a walk.

Due to the creation of these basic statistics one understands
the necessity of comparing different facets of offense. For example, is a
player batting .260/30/80 more valuable than a player batting .310/15/50? You can't really answer the question with those tools and in fact the answer doesn't really matter. What’s important is that these basic statistics allow
us to ask the question. It becomes clear that the number of hits as well as the
quality of hits matter.

The fact that these basic offensive statistics exist means
that it’s easier for us to understand more advanced offensive statistics. Take
SLG for example. It makes sense that certain hits are more valuable than
others. And it makes logical sense that a home run that is worth four bases is
four times the value of a single that is worth one base. Basic offensive
statistics allow us to build that model. Once we start assigning arbitrary values
to home runs, triples, doubles, singles, walks etc it becomes easier to
understand how we can assign values to them based on historical data.

Furthermore, most basic offensive statistics are easy to
define. There’s little difficulty in defining when a player hits a double. It’s
reasonably straightforward because either the batter is on second or he isn’t. One
can argue about the value of a double but in the vast majority of cases it’s
hard to argue whether or not a double occurred.

Decades of statistics has accustomed us to quantifying the
value of offensive production as well as give us easily defined and understood
tools to do so.

But basic statistics aren’t nearly as helpful quantifying
defensive contributions. The only basic defensive statistics are errors, putouts,
assists and fielding percentage. Putouts and assists are fairly straightforward but errors are often subjective. Basic defensive statistics aren’t as objective as basic offensive statistics.

Furthermore, you can’t really use fielding percentage to
compare two players playing the same position let alone compare players playing
at different positions. Fielding percentage simply doesn't quantify range. It doesn’t
quantify whether one fielder made a lot of excellent plays or a few excellent plays. I would say that it’s similar to batting average. Batting average is a helpful basic statistical stat but one
wouldn’t use it by itself to quantify offensive contributions.

All I’m saying is that I’d feel far more comfortable claiming
that a player with a .330/30/120 line is better offensively than a player with
a .210/15/50 line than I would claiming a player with a .985 fielding
percentage is better defensively than a player with a .975 fielding percentage.

Basic defensive statistics tell us very little
about defense. It was pretty much impossible for the casual fan to objectively quantify
defense prior to advanced defensive statistics for large populations of players.
After all, most casual fans simply aren’t watching a majority of the games for
a majority of teams. It would be pretty time consuming.

This means that advanced statistics like UZR were really the
first attempts to actually objectively quantify defense. The problem is that
basic defensive statistics don’t give people a frame of reference. It’s hard to explain why defense is as valuable as it is considered by UZR
because it has never been quantified before. People watching games probably
would agree that Cal Ripken is a good defender. No one watching games would
have claimed that Cal Ripken’s glove is worth 15 runs a season and seriously
meant exactly 15 runs. And without using an advanced statistic like UZR it
would be hard to tell whether being worth 15 runs defensively is good or bad.

This makes explaining defensive statistics a challenge because people don't really have the background to understand them. The
way to explain them is by understanding that the concept is difficult and by openly
disclosing all of the data and the formulas. The concept behind UZR has been explained
as comparing the play that actually happened (hit/out/error) to data on
similarly hit balls in the past to determine how much better or worse the
fielder did than the "average" player.

But the public isn’t informed of how likely it is for a
fielder to successfully field a certain play. For a given player I can see
whether he was successful at any given at bat. I can’t tell whether a
player was successful defensively on any given play. If a ball is hit to the outfield I don't know how likely it is that a different fielder would have made the play according to UZR. Unlike with offense, there is no play index for UZR. It is impossible to determine a
fielders’ UZR when at home or away. It is impossible to determine a fielders’
UZR for a given month. In short, UZR and most of the rest of advanced defensive
metrics pretty much boil down to one number for a given player and you can either
take it or leave it. They may be the best numbers that we have available. But
we’re pretty much taking their creators word that they’re accurate. It shouldn’t come as a surprise that many people simply choose
to leave it.

The problem with WAR is that it relies on defensive statistics that haven't been adequately explained. Unlike statistics like FIP and WAR it is impossible to derive UZR numbers on our own. It doesn't really matter whether UZR is accurate or not. The point is that we have no background to determine its accuracy and it is impossible to test it ourselves.

The discussion about WAR has nothing to do with whether position players deserve 57% of WAR or 52% of WAR. It isn't about whether pitchers are responsible for 93% of run preventation. It's about the fact that defensive metrics aren't adequately available to the public for study.

Without further disclosure people are going to resist
accepting advanced defensive metrics and as a result are going to question WAR.
If the public is unable to repeat the methodology then these metrics have
similarities to opinion. People aren’t going to fully trust a metric that can’t
be fully understood or duplicated and simply shouldn't be asked to do so.

6 comments:

Anonymous
said...

The population of baseball fans who understand WAR is in the minority. The population of those same individuals who have a basic understanding of how UZR, DRS, etc. are measured are even less present. That doesn't mean that WAR is flawed. I don't understand the entire science behind how every piece of my car works, but that doesn't make my car defective.Currently, there is no conceivable means of measuring defensive impact with great statistical precision. Sources citing defensive metrics often caution that the numbers are subject to some controversy. To discredit WAR, in its entirety, as a result of this shortcoming is naive.

I don't think the car analogy fairly represent Matt's point. His point isn't that he doesn't understand UZR, as you say you don't about your car, but that the information to fully understand UZR isn't available to us.

Whether the aspect of advanced defensive metrics you describe can be called a "flaw" depends on what you consider the function of those metrics. If they're intended (1) primarily as a way for professional insiders (GMs, managers, journalists) to objectively assess player value, it's not a flaw (as long as the metrics are accurate). If they're intended to also (2) appeal to hardcore fans willing and able to understand how they're derived (like many of this blog's readers and most/all of its writers), and as a signalling mechanism to build a sense of community among those hardcore fans, then it's not a flaw. If they're intended to also (3) enhance the casual baseball fan's appreciation of the game, then it is a flaw. In this last case, the fans in question probably need to not only know how the metrics are derived, but be both able AND WILLING to do it themselves (at least a few times) to buy into the reality and importance of the metrics. Frankly, that doesn't seem all that likely. How much of a problem is that, though? Not every fan engages the game in the same way, and advanced metrics don't have to make sense to everyone in order to be valuable for purposes (1) and (2). (I speak as a fan with only a marginal interest in such metrics myself; I'm interested enough to read an enjoy the posts here, but not enough to want to learn how to write one.)

The word "flaw" was not used in this post. It is a problem that UZR and other advanced defensive metrics are "closed-source". I agree that doesn't mean they're flawed. Jon had a good summary of my point.

It is possible to understand the theory behind UZR. It is impossible to understand how it works practically.

The importance is huge. It is possible to explain stats like OPS or wRC+ to the average fan that is willing to listen. How are you possibly supposed to explain why a player has such and such UZR?

You can discuss the theory but in the final analysis you have to say that we use UZR numbers or whatever because they're the best numbers available even though they could have been created in a random-number generator for all we can prove. And if that's the case, how can you possibly know whether they're right? Or whether there isn't something like chaining for fielding?

Contact Camden Depot

We look forward to your questions as well as any suggestions you may have for us.

Additionally, we are always looking for new contributors, so if you want to write for the Depot then e-mail us with an example column that you think fits the tone of the site.

Contributors

Jon Shepherd - Founder/Editor@CamdenDepotStarted Camden Depot in the summer of 2007. By day, a toxicologist and by night a baseball analyst. His work is largely located on this site, but may pop up over at places like ESPN or Baseball Prospectus.

Matt Kremnitzer - Assistant Editor@mattkremnitzerMatt joined Camden Depot in early 2013. His work has been featured on ESPN SweetSpot and MASNsports.com.

Patrick Dougherty - Writer@pjd0014Patrick joined Camden Depot in the fall of 2015, following two years writing for Baltimore Sports & Life. He is interested in data analysis and forecasting, and cultivates those skills with analysis aimed at improving the performance of the Orioles (should they ever listen).

Nate Delong - Writer@OriolesPGNate created and wrote for Orioles Proving Ground prior to joining Camden Depot in the middle of 2013. His baseball resume includes working as a scorer for Baseball Info Solutions and as a Video Intern for the Baltimore Orioles. His actual resume is much less interesting.

Matt Perez - Writer@FanOfLaundryMatt joined Camden Depot after the 2013 season. He is a data analyst/programmer in his day job and uses those skills to write about the Orioles and other baseball related topics.

Joe Reisel - WriterJoe has followed the Norfolk Tides now for 20 seasons. He currently serves as a Tides GameDay datacaster for milb.com and as a scorer for Baseball Info Solutions (BIS). He is computer programmer/analyst by day.

Joe Wantz - WriterJoe is a baseball and Orioles fanatic. In his spare time, he got his PhD in political science and works in data and analytics in Washington DC.