The great run estimator shootout (part 1)

You have a lot of options, of course. Most baseball websites will give you one—maybe more!—of an alphabet soup of offensive measures. You’re left to pick and choose between them as you please. How are you supposed to know which is the best?

So let’s put ten different methods of measuring a player’s offensive production through the wringer, and see which one comes out on top. I am going to start from the presumption that you belive in something a little more modern than RBIs and batting average to evaluate how good a hitter is. I am also not going to spend a lot of time on the finer points of run estimators; this is essentially a spin-off of my work for this year’s THT Annual, and so anyone looking for a lot of background will be best served looking there.

So, for this week, let’s take a gander at the contestants and take them for a quick spin through some typical accuracy tests. First, let’s talk about the types of run estimators we are testing this week.

Dynamic run estimators

Typically we’re interested in the question of how many runs a player contributes to a typical team. Without a great deal of additional work, dynamic run estimators like BaseRuns and Runs Created measure how many runs a player would contribute if he took all of a team’s plate appearances. This isn’t, to me at least, particularly useful. But dynamic run estimators have their uses, and so they’re included.

Rate stats

Most of these are not actually run estimators in the traditional sense. But we can convert them to run estimators if we like. We like.

Linear weights

They’re not as accurate as their dynamic cousins (that’s the idea, at least, though sometimes the linear weights can surprise you). But they’re dirt simple to use and can be applied directly to an individual player’s batting line.

We’ll look at some popular ones, some oldies but goodies, and some of my personal favorites (as well as one or two I’ve created myself). Then everything goes through the ringer.

Unless otherwise noted, the categories under consideration are:

Singles

Doubles

Triples

Home runs

Walks (including hit-by-pitch and intentional walks)

Stolen bases

Caught stealing

Outs, as defined as at-bats – hits

In a player value metric, it may be useful to remove intentional walks; for an accuracy study I thought it was prudent to include them. The period under study is 1993-2008, the “modern era” of baseball offense. Limiting the scope accomplishes two things – it lets us evaluate run estimators as they pertain to evaluating players right now, and it lets me get away with using less computer processing power to run some of these tests.

Dynamic Run Estimators

This is not meant to be a comprehensive—or even fair—assesment of dynamic run estimators; mostly we’re interested in linear run estimators when it comes to player evaluation (with one key exception).

Runs Created

Probably the oldest run estimator still in use, Basic Runs Created follows this simple formula:

OBP*SLG*AB

And that’s the version we’ll test here.

There are increasingly more complex versions of RC available; I don’t care for any of them (again, see the Annual for more detail on this point). So why test RC (yet again)? And why the most simplistic version available?

The answer is VORP.

The core of VORP is Marginal Lineup Value, which was a groundbreaking way of applying a dynamic run estimator to a player’s performance.

But it’s quite possibly the most popular use of a run estimator in existence, and so in the test it goes.

BaseRuns

The absolute king of battle when it comes to dynamic run estimators. Created by Dave Smyth, BaseRuns follows this simple concept:

Runs scored = Baserunners * % of runners who score + Home Runs

Where the percentage of runners who score is estimated based upon the hitting performance in question. That’s all I really have to say about BaseRuns right now; it’s included mostly because I like it a lot and am trying to spread the word.

Rate stats

These are things that were, for the most part, not concieved of expressely as run estimators. In fact, most of them don’t expressely measure anything at all—they correlate with run scoring, but on their own do not directly measure anything, be it runs, bases or otherwise.

All of them require the expenditure of additional effort to put them on the scale of runs scored. This requires careful attention; doing this incorrectly can make the rate appear less accurate than it really is.

OPS

OPS, as you are probably aware, stands for On-Base Plus Slugging. The formula, as you can probably surmise from the name, is:

OBP+SLG

It is very easy to calculate on your own, and ubiquitious enough that you don’t have to. And it does seem to correlate pretty well with team run scoring, as you can see in this graph:

The important feature of this graph to note is the slope of the line: the graph tends to increase more on the horizontal axis than the vertical axis. The relationship between runs scored and OPS relative to average is about 2:1, and this is why a lot of run estimator studies underrate OPS; they don’t take that into account when translating OPS into runs.

OPS+

A popular derivation of OPS is OPS+, popularized by Baseball Reference. Many people mistakenly believe that OPS+ is OPS divided by the league average OPS. The actual formula is:

OBP/LgOBP + SLG/LgSLG – 1

where LgOBP and LgSLG stand for the league average OBP and SLG. This bothers a great deal many people, although it really shouldn’t; this provides OPS+ with a more intuitive 1:1 scale with runs scored, and does a better job of capturing the relative importance of OBP and SLG in run scoring.

Gross Production Average

Another close relative of OPS is GPA, created by THT’s own Aaron Gleeman. The formula:

(1.8*OBP+SLG)/4

This also corrects for the relative importance of OBP and SLG in run scoring; dividing by four puts it on a scale similar to batting average, which may be more intuitive for some. Like OPS, and unlike OPS+, its relationship to runs scored is close to 2:1.

Equivalent Runs

Nominally a run estimator, it shares a lot of similarity with the OPS-based measures above. (Clay Davenport even says so.) The basic formula is:

The numbers produced by this formula tend to look a lot like OPS numbers; this raw measure even has the same basic relationship to runs scored that OPS has. This is then further translated into Equivalent Runs and Equivalent Average.

Total Average

Total Average is the most popular of an innumerable number of bases per out measures. Created by Washington Post scribe Tom Boswell, the basic formula is:

(TB + BB + HBP + SB)/(AB – H + CS)

It comes pretty close to a 1:1 ratio with run scoring.

The main reason I bring it up is the same reason that you studied World War II in high school: Those who cannot remember the past are condemned to repeat it. Somewhere, right now, on a message board or website, there is a young man who is proposing this as the Next Big Thing that will change the way we look at baseball players.

Please, do not be that young man.

Linear weights

The difference between these formulas and the rates above is that direct attention is paid to the value of each event in runs, rather than valuing events in relation to each other and then assigning a run value to the result.

There are countless linear weights formulas, and I have no real desire to drag them all into this. I have chose three formulas to stand in for the group as a whole.

wOBA

Developed by Tom Tango, wOBA is an ingenious recasting of a linear weights formula as a rate, on the scale of OBP. It has gained even more popularity with its use on Fangraphs.. A sample wOBA formula:

(0.72*BB + 0.75*HBP + 0.90*1B + 1.24*2B + 1.56*3B + 1.95*HR) / PA

To convert this figure to runs, use:

((wOBA-lgwOBA)/1.15+.18)*PA

The beauty of any linear weights framework is that you can change the weights to reflect the particular run environment in question; Fangraphs uses seperate weights for each season.

Regression

Some people like to estimate linear weights using a process called multiple linear regression. I have done this here, using team runs scored from 1993 to 2008. The formula looks like this:

I have no reason to think this is a very good linear weights formula. I do not recommend its use at all. It is in this evaluation as a stand-in for regression-based LWTS in general. Please do not use this formula. Thank you.

House

These are my own linear weights, developed primarily for my personal use. I use an approach similar to one used by Tom Ruane, which looks at the change in base/out state after an event. As with wOBA, these weights are tuned by season; an average set of these weights looks like:

Why do I call them my “house” weights? Because the house always wins. And in every run estimation study I’ve read by the creator of a run estimation formula, the creator’s own formula always wins. There are generally two reasons for this:

People rarely publish studies that show that their own run estimators suck eggs; this is called “publication bias.”

Generally, these tests are run against the same data used in creating the estimator in question, meaning that their estimator is best in that particular sample, but may not be better when taken outside the sample.

So for the purposes of this study, these are my house weights. I’m trying to paint myself into a bit of a corner, by publishing this before I actually run all of my run estimation tests. So no publication bias here, hopefully.

Qualifying trials

There are three traditional tests that are staples of the genre. We may as well take a look at them here.

This measures how closely two variables track each other. This is more useful for things expressed in different units; correlation doesn’t care about how close things are to each other, just the relationship between them. The larger, the better; a correlation of 1 is a perfect positive correlation and a correlation of -1 is a perfect negative correlation. Correlation can also be referred to as R.

This measures the average distance between two values, expressed in absolute terms; it doesn’t matter if a number is higher or lower than what it’s being compared to, just how much higher or lower it is. Smaller is better.

Root Mean Square Error

Similar to the above, except it it places a larger penalty on a higher variance.

So here’s how we’re testing these. With the exception of the dynamic run estimators (who don’t need the help) and the regression-based weights, all of these run estimators are being tuned to the specific year these stats are coming from. Again, we’re looking at the years 1993-2008. But unlike most tests, which look at runs scored by team, these results are by half-inning:

RC

BsR

EqR

OPS

OPS+

GPA

TA

wOBA

Reg

House

R

0.87

0.90

0.86

0.87

0.87

0.87

0.85

0.85

0.86

0.86

MAE

0.36

0.30

0.42

0.44

0.44

0.42

0.42

0.46

0.42

0.40

RMSE

0.65

0.49

0.54

0.56

0.56

0.54

0.83

0.57

0.54

0.59

Looking only at the linear run estimators, there isn’t a lot to differentiate any one from the other. Once you tune a linear offensive measure to the particular run environment (which is true for all of these measures), there is very little to differentiate them from one another in these tests. This would be fine, if we could count on these measures to agree at the individual player level. But we can’t.