Baseball Therapy

WARP for People Who Didn't Like Math Class

Over the weekend, there were plenty of end-of-season retrospectives from columnists who cast non-existent ballots for the MVPs, Cy Young award winners, and Rookies of the Year. As might be expected, many of the columnists brought up the WARP (Mike Trout) vs. Triple Crown (Miguel Cabrera) angle. There was a common theme running through the pieces that argued for Cabrera: WARP is a complicated and math-heavy stat, and because it is so complicated, how can we be sure that Trout was actually the better player?

WARP (Wins Above Replacement Player) does take a little bit of math to arrive at, and not everyone enjoyed math class in high school, but it's actually a pretty simple theory. In the spirit of fairness, I will lay out the basic idea behind WARP. You can make up your own mind from there.

I promise, there won't be many gory mathematical details.

W is for Wins
We're trying to get an idea of a player's complete value, including his hitting, base running, and defense, expressed in the common currency of baseball: runs. Not runs scored, mind you. But by getting hits, a batter increases his team's chances of scoring runs. If he makes a defensive play, he’s decreased the other team's chances of scoring runs. If he strikes out, he's decreased his own team's chances of scoring. It's the increase and decrease in these chances that we're interested in.

The first step in generating WARP is figuring out how many runs each player has contributed to his team or taken away from the other team.

Hitting
Hitting has the most well-known suite of statistics from which to draw. It's not as simple as "a double is worth half a run." That would be nice, but baseball doesn't quite work like that.

It's true that a double brings you halfway around the circuit, but what a double really does is give a team a better chance to score a run than it had before. If, before the double, the bases were empty and there were two outs, the chances of the batting team scoring a run were low (say 10 percent—I'm making numbers up for illustration). By reaching second, a batter improves his team's chances of scoring in the inning to 40 percent (again, fake number). Just by getting to second, he’s added 30 percent of a run (0.3 runs). This gets credited to his account.

If there were runners on base, a double is also good because any runners on second or third go from being potential runs to actual, scored runs(!) Maybe the guy on first scores too. The batter didn't put those ducks on the pond, so he doesn't get credit for them. But if a runner on second has a 40 percent chance of scoring before the double, he has a 100 percent chance of scoring after the double. The batter who hit the double added 60 percent of a run.

Here's an important thing to note. Let's say that there are two fictional players, Smith and Jones. Smith plays on a team with a bunch of guys who can't hit. Smith hits a lot of doubles, but there's never anyone on to drive in, and no one hitting behind him who can drive him in. Jones is lucky and hits behind a couple of guys who are always on base. Jones' team scores more than Smith's. But Smith and Jones both hit the same double. Should we penalize Smith for the fact that his teammates are terrible? WARP says no.

As far as WARP is concerned, Smith and Jones get the same amount of credit for their doubles (usually the average value around the league that a double adds to a team's chances of scoring). In this way, we can compare apples to apples, and Smiths to Joneses.

Baserunning
We can look at baserunning in the same way that we look at the value of hitting events. Stealing second means that you've taken yourself from first to second, and again, increased your team's chances of scoring. You get credit for the increase.

There are other ways to add value on the bases. Going from first to third on a single is like "stealing" an extra base. So is going from second to third on a groundout. Then again, you might be thrown out on the basepaths and take away a chance for your team to score.

When evaluating baserunning, we usually compare a player's performance to the rest of the league’s. If on a single, about 70 percent of runners across baseball go from first to third, and you get to third 80 percent of the time, you have added value above what the average player would have done.

Defense
It's not easy to measure defense in baseball, but we have a decent idea how to do it. Suppose you’re a shortstop, and there's a ball bounding up the middle in your general area. If you get to the ball and throw out the runner, you've decreased the other team's chances of scoring. There's now an extra out on the board, and there's no runner at first. If you can't get to the ball because you are slow, the ball trickles into center field. There's a way to measure how that affects the chances of the batting team scoring a run, much like the methods we summarized in the sections above.

No fielder will get to every ball. But there seem to be a lot more balls that trickle into center field with some shortstops than with others. There are some center fielders who seem to have a lot of putouts, rather than just being the guy who fielded the base hit. Every time you throw a guy out, you get the credit that comes with stopping the other team from scoring. Every time you let a ball through or make an error, your account gets docked. Usually, fielders get compared to what we would expect from the league-average defender.

Summing it all up
When you add up the positives (and subtract the negatives) that each player has given his team over the course of a season, you get his value in terms of runs.

Often, the number of runs that a player is responsible for is converted into wins. The rough rule of thumb is that 10 runs equals one win. It changes a little bit from year to year, for reasons that we won't get into here. The point of that is so that we can compare players across years. If you are comparing two players from the same year (say Miguel Cabrera in 2012 to Mike Trout in 2012), it's not that big a deal. But that's why you'll often see wins above replacement, rather than runs above replacement.

ARP is for Above Replacement Player
What would happen if Player X were removed from the lineup? Say he decided just before Opening Day that he should spend the year pursuing an advanced degree in civil engineering rather than playing baseball.

The team would find the next-best player it had to play that position. He might be the team's utility infielder/fourth outfielder. He might be a hot-shot prospect (or an "insurance" veteran) from Triple-A. He might be a guy on the waiver wire trying to catch on. He won't give you zero production, but there's a reason that he's either on the bench or a journeyman. This is a "replacement" player. The nice thing in baseball is that these fourth outfielders and utility guys do get to play sometimes, and we can see how well they produce. The important thing to note here is that position matters. It's a lot easier to find a guy who can play first base than one who can play shortstop (and not embarrass himself). Brendan Ryan can hit below .200 and still have a job because he's that good on defense and he plays short. No first baseman would ever be allowed to do the same.

Each player is compared to the average backup player in baseball that plays his same position. So, at the end, we can say that Smith is X number of runs (and wins) better than some backup who also plays his spot.

I've never been a fan of WARP (although I do appreciate what it is trying to do and I do think it can be a rough guide to value). I don't have anything better in a single number to offer though.

But I was curious about two things.

Since teams seem to be deploying more extreme defensive shifts against batters, are defenders getting too much (or too little) credit in the defensive part of the WARP calculation? Or is the assumption that it all washes out in the end?

For baserunning, do you just look at how many times take an extra base without regard to where/how hard the ball was hit? It would seem to me that going from first to third has a good size component on what field the ball was hit to...just curious how this is taken into account.

Can anyone explain to me why the Offensive and Defensive WARP figures do not add up to the overall WARP figure? Sometimes the sum of the O and D WARP is substantially greater than the overall, and sometimes it is less....

On the way into work today, I was listening to yesterday's podcast where Ben talked about how several old-school writers (I wish people wouldn't use the term "mainstream media"--it seems to assume that all newspaper writers are old-fashioned Pct/HR/RBI guys) have written articles against WAR, without even having made an effort to understand what it is. It did re-awaken a couple of questions I have about it.

1. Why are there so many different versions? I keep reading about how Trout has a WAR above 10 this year, yet the BP statistics page shows him at only 8.7. Why does every site have a different number? It's no wonder some people won't trust a new statistic when even its supporters can't agree on how it's calculated.

2. I was hoping you'd go into more detail on the ARP part. How is the actual level of this baseline set? Specifically, I was wondering if it depends on the actual performance of a player's position peers that season, or is the replacement level set before the season begins? If it is somehow related to the average production level at the position, it seems to me that a player could be over- or under-credited depending on whether the performance of regulars at his position that year is unusually weak or strong, respectively.

The reasons that there are several different versions of WARP come down to different methdologies that are used to determine the run value of various events. There are also different philosophies on where to set replacement level. They're all roughly the same from 30,000 feet, which is why you don't see a guy with 10 wins on one measure and 2 on another, but when you get down into the gritty details, they do differ.

question: is baserunning weighted equally with defense and hitting? i believe it is, but am not 100%. wouldn't that be somewhat flawed though, considering even the best players are rarely in a position to be baserunners more than 40% of the time, while they have a chance to hit 100% of the time?

though this would also raise questions about how much defense should be valued, as players probably have chances to make defensive plays more frequently than they get ab's.

Baserunning is weighted at being as valuable as we think baserunning actually is -- in other words, one run in terms of baserunning is equal to one run obtained any other way. It's on us to be correct (or at least, reasonably so) in how we convert baserunning events into runs, of course. But once you do so, a run is a run is a run. Now, there are fewer baserunning runs than batting runs, in an absolute sense -- it's harder to be a +10 baserunner than it is to be a +10 hitter. But there's no reason to explicitly dock the contributions of baserunners any further than that.

For all the people who knock WAR, I'd just once like to hear them explain how they account for baserunning and defense. Most the problems people like to complain about in WAR are still there in the metrics they prefer, they just feel like ignoring those issues entirely or accounting for them in a completely subjective, ad hoc manner somehow produces better results.