As it turns out, evaluating player value and comparing production among MLB players.

Watch This

No, I’m not talking about the violence and weapons kind of WAR, I’m talking about Wins Above Replacement.

Even the most casual of baseball fans have been exposed to WAR, despite it being a fairly new invention — FanGraphs did not publish the first WAR leaderboards until 2008 — but despite its popularity as a metric, its calculation is nowhere near as intuitive as something like batting average or ERA. So if you’ve ever wondered how WAR is calculated, what it means, and how to use it, here’s the article for you.

What is WAR?

The name “Wins Above Replacement” alone deserves an explanation.

When the 2016 Mets lost first baseman Lucas Duda to injury, they desperately needed a player to step in and take Duda’s slot. All the big-name free agents had already signed, so the Mets traded for James Loney. At that point, Loney spent the first half of the season at Triple A El Paso.

In 100 games for the Mets, Loney was worth -0.1 rWAR and -0.2 fWAR, which makes him an example of a “replacement level player." When discussing WAR, we are discussing the value that a player provides over a player such as Loney: a minor league player who usually only ever sees play as a replacement for a better player.

What about the "win" part? If you take 25 players like Loney and put them all on a major league team together, one would expect that team to win about 48 games. But if we take one of those players off the team and replaced them with Mike Trout, that team would win about eight to 10 more games: Trout is usually worth about eight to 10 wins more than a replacement player like James Loney, or 8-10 WAR.

Though the scales may vary, WAR can also give us a notion of what role a player is best suited to fill.

Over a full season, if a player produces 0-2 WAR, they might be best suited for a bench role — pinch hitting or a defensive sub — guys like Eric Sogard or Eric Young Jr.

Players worth 2-4 WAR are generally considered good enough to start regularly, but aren’t really anyone special: think of Neil Walker or Ryan Zimmerman.

Players worth 4-6 WAR are thought of as being All-Star caliber, like Michael Conforto or Justin Upton.

Players worth more than 6 WAR are guys in the MVP discussion: the Mike Trouts and Aaron Judges of that season.

Here are the MLB leaders in fWAR compared to the MVPs of each season for the past five years. They align fairly closely, in part because voters are starting to use WAR more frequently in MVP voting, and in part because players who are productive in most other metrics are also very productive in WAR.

Year

AL fWAR leader

NL fWAR leader

AL MVP

NL MVP

2017

Aaron Judge (8.2)

Anthony Rendon (6.9)

Jose Altuve (7.5)

Giancarlo Stanton (6.9)

2016

Mike Trout (9.2)

Kris Bryant (8.3)

Mike Trout (9.2)

Kris Bryant (8.3)

2015

Mike Trout (8.9)

Bryce Harper (9.5)

Josh Donaldson (8.8)

Bryce Harper (9.5)

2014

Mike Trout (7.9)

Clayton Kershaw (7.6)

Mike Trout (7.9)

Clayton Kershaw (7.6)

2013

Mike Trout (10.5)

Andrew McCutchen (8.4)

Miguel Cabrera (7.5)

Andrew McCutchen (8.4)

2012

Mike Trout (10.3)

Buster Posey (7.7)

Miguel Cabrera (6.4)

Buster Posey (7.7)

Calculating WAR

Of course, Mike Trout doesn’t single-handedly win games for his team: he can’t play every position, take every at-bat and pitch every inning. But based on empirical research, we know that a win is worth approximately 10 runs, and we can quantify run production. Note that this is not simply scoring runs but creating runs: for example, if a player hits a single, the number of runs scored in that inning increases by an average of .47. If a player hits a single — regardless of the context of that single — he receives credit for creating that many runs.

Eventually, a player receives credit for so many runs over the course of a season, and to find WAR, we subtract the number of runs we would expect a replacement level player to produce, and then divide that by the number of runs that are equivalent to a win.

This is a very generalized description of how to calculating WAR — the specifics are much trickier.

There exist different versions of WAR: FanGraphs has one version (fWAR), Baseball Reference has another (rWAR), and Baseball Prospectus has a third version (WARP). Each of them calculates it differently, so you’ll sometimes see big differences in these values. Julio Teheran posted 4.8 rWAR in 2016, which makes it seem like he was a really good pitcher, but FanGraphs pegged him for only 3.2 fWAR and BP put Teheran at 3.8 WARP.

WAR across all the sites is calculated consistently, however, in the sense that each is calculated using offensive, base running and defensive components for position players and using run prevention for pitchers. Each of these components relies upon the same principle: each play has a quantifiable value in terms of run prevention, be it stealing a base, knocking a double, or striking out a hitter. By awarding players credit for making these plays and adjusting for replacement level, we can see how valuable a player is.

Stop the WAR?

Given how frequently it pops up in sabermetric discussions, WAR might sound like the pinnacle of sabermetrics. But no stat is perfect, and WAR is far from it.

“Aaron Judge was nowhere near as valuable as Jose Altuve. Why? Because he didn’t do nearly as much to win games for his team as Altuve did. It is NOT close. The belief that it is close is fueled by bad statistical analysis — not as bad as the 1974 statistical analysis, I grant, but flawed nonetheless. It is based essentially on a misleading statistic, which is WAR. Baseball-Reference WAR shows the little guy at 8.3, and the big guy at 8.1.”

James’ argument represents the principal criticism of WAR — WAR is context neutral. If a player recorded only one single per game for 162 games, but that one single knocked in the game-winning run every game,WAR would credit that player with exactly as much value as a player who did the same thing but his team lost every game.

As a result, it would seem as though WAR is undervaluing the first player, and overvaluing the second player. Judge consistently performed poorly in high-leverage situations: he recorded the worst Clutch score of any player in 2017 per FanGraphs, despite leading the MLB in fWAR.

Player

fWAR

Clutch

AL MVP vote points

Jose Altuve

7.5

-0.56

405

Aaron Judge

8.2

-3.64

279

Jose Ramirez

6.6

-2.57

237

Mike Trout

6.9

-1.04

197

Francisco Lindor

5.9

1.05

142

At the same time, the context neutrality of WAR is considered by proponents of WAR as important and necessary. If we look at Judge’s performance and remove the context of that performance, Judge was stellar in 2017: Judge can’t control how frequently he comes up to the plate with runners on and the game on the line, so if WAR was adjusted to take that into account, Judge would be punished for factors out of his control.

Is it a feature or a bug? The jury is still out. Still, as a context-neutral stat, WAR is one of the best available to be used in evaluating players and player performance. Given its current trajectory, it might not long before WAR completely dominates MVP and Hall of Fame discussions.

But in the words of Bertrand Russell, “WAR does not determine who is right…” so don’t think that WAR is the be all end all to every statistical discussion.

(OK, Russell was talking about the shooty-shooty war, the point still stands.)