Methods and Accuracy in Run Estimation Tools

Measuring Accuracy

How do all the offensive statistics measure up in an accuracy comparison? To
answer that question, I put together a study to examine how well each statistic
relates to run scoring. Unfortunately, setting up the study wasn't a
straightforward proposition. That's because there isn't a general agreement
among sabermetricians about how accuracy is best calculated.

As a matter of fact, the two greatest minds in sabermetrics disagree about
this. Bill James uses standard deviation (really Root Mean Square Error or
RMSE) and absolute errors to compare accuracy. Pete Palmer uses regression
equations to answer the question.

How did I decide to answer the question? Rather than restrict my study to
either Mr. James' or Mr. Palmer's accuracy standard, I decided to study the
matter employing both of their standards. By doing so, I figured, anyone
looking it over could chose whichever standard they prefer.

Because this is a baseball book and not a math book, I won't take up a bunch
of time explaining the entire process in detail. If you don't already have an
understanding of standard deviations, regression equations, correlation and the
like, I regret to tell you that you won't find an explanation of them here.

The reasons are simple. First, there aren't an infinite number of pages that
this book can hold. If I had to explain every math term and concept here, it
would have to be at the expense of other material. Other material that is a
heck of a lot more interesting to read than mathematical techniques.

Second, I'm not properly qualified to teach them to you. I'm a baseball
analyst, not a math professor. Rather than trying to learn it from me, you'd be
better off taking a good introductory college statistics class. Of course, you
might not have the time or desire to do that. If that's true, but you'd still
like to get a basic understanding of the methodology, I suggest you pick-up a
copy of Baseball by the NUMBERS: How Statistics Are Collected, What They
Mean, and How They Reveal the Game by Willie Runquist. In this book the
author examines many baseball statistics with a focus on the underlying
mathematics. Even though reading it won't replace a good stats class, if you
are comfortable with math, it will help you understand the methods used in this
essay.

How
did I generate run totals for the different stats?

As I said in the Why Do We Need
Another Player Evaluation Method essay, rate statistics need an
additional step to express the measure in runs. How did I do this? I divided
the team measure by the league average and then multiplied the result by the
league runs per out and by the number of team outs (which I calculated
as AB-H=SH+SF+CS+GIDP).

Here's an example using batting average and the 1955 Boston Red Sox. In
1955, the Red Sox hit .264 that season and consumed 4145 outs. The league
averaged a .258 batting average and scored .170 runs per out. To generate the
Red Sox run total, I put all the numbers together: .264/.258 x .170 x
4145=721.037 Runs. Using this method, I produced run estimates for every stat
included in the study. This was done for every team from 1955-1997 (1002 team
seasons).

For the statistics that are already expressed in runs, things were much
simpler to calculate. I directly compared estimated runs with actual runs.

What numbers did I calculate?

I calculated figures that correspond to the methodologies used by Bill James
and Pete Palmer.

Bill James Methodology

I applied the standard that Bill James used in his Baseball
Abstracts.

Gross E - Gross Error or the sum of absolute difference
between the projected and actual runs

AAE - Average Absolute Error - the average of the absolute difference
between the projected and actual runs

% Off - Gross E divided by the sum of actual runs

MAE - Median Absolute Error or the error located at the halfway point
of all errors

Pete Palmer Methodology

In The Hidden Game of Baseball, Pete Palmer ranked a few stats using
what he called "linear curve fitting". Linear curve fitting, in this
context, is nothing other than the use of a regression equation to measure the
linear relationship between estimated runs and actual runs.

SE regr - Standard Error of the Regression using the standard formula
y=mx+b where y=actual runs, m=slope of the regression line, x=estimated
runs, and b=intercept point of the regression line

R - correlation coefficient or how closely the estimated runs and
actual runs conform to a linear relationship (If there is a perfect correlation
between the two, this number would equal 1.)

R^2- Coefficient of determination or the proportion of the variation
in runs that can be explained by relating actual runs to estimated runs

What are the numbers?

Rate Stat Scorecard

Run Stat Scorecard

Although most of the statistics'
abbreviations are defined elsewhere, a few aren't. Included are a couple of RC
spin-offs: LinearRC is the linear component of the new RC calculation
that I explained in Deciphering the New Runs Created essay.
RC-H23-24-Player is the number of runs that the player context formula
generates if team stats are placed into the formula. BR(.081fixed)
is Palmer's BR using a set value (-.081) for outs for the entire period.
Grab is David Grabiner's suggested modification to OPS (1.2*OB+SLG).

What should you make of these numbers?

I'd prefer that you draw your own conclusions, but since I realize that you
may want my opinion, here's a few things to keep in mind. (Feel free to stop
reading here, if my opinion doesn't really interest you. :)

XR was developed with the same data used in the validation study. This
means XR gets something of a helping hand. Because of that I also supply a few
other comparisons. The first is a decade-by-decade comparison (with only RMSE
and SE regr). The numbers indicate that XR holds its accuracy advantage across
different periods.

Decade Match-up

On the team level, there really isn't that much of a
difference between most of the run estimation methods. Although XR comes out on
top, the gap isn't earth shattering. Of course, as Jay and I pointed out in the
Deciphering the New Runs Created essay, the numbers generated on
the team level don't necessarily equal the numbers used for player comparisons.
This means that for most methods, the accuracy on the player level is different
than the accuracy on the team level. This is true even for Palmer's Batting
Runs. Since BR contains a term for Outs On Base for team calculations, but
excludes the term for player calculations, accuracy for individual players is
worse than this study indicates.

EQA's accuracy lies in the eyes of the beholder. If you figure things out
the way Clay Davenport tells you to, it ranks at the top of the rankings; if
you don't, it doesn't. As a matter of fact, if you compare EQA to Grab, you
won't find much difference. This indicates to me that EQA isn't really much
different than OPS.

OPS doesn't possess the accuracy that Pete Palmer's study implies. David
Grabiner pointed out a possible explanation. David explained that the reason
OPS is considered to have better accuracy than my study shows is that Palmer's
OPS accuracy claims are based on OPS's correlation with OTS. What this means is
that if OPS had a perfect linear relationship with OTS, the accuracy of OPS
would be the same as the accuracy of OTS. I investigated this and found OPS's
correlation with OTS was indeed very high (.998556). I then took a look at
David's suggested modification 1.2*OBP+SLG (GRAB). GRAB had a correlation
coefficient (R) figure of .999173. Although this correlation figure is pretty
high, it's still not a perfect correlation. To generate a better correlation
figure, a much more involved formula is required. David sent me a formula [
OTS=.333*.400 + (OBP-.333)*.400 + .333*(SLG-.400) + (OBP-.333)*(SLG-.400)
-.333*.400 + (1.2*OBP+SLG)/3 + (OBP-.333)*(SLG-.400) ] which produces an almost
perfect correlation figure (1.00000000).
You might be thinking, "OK, what does this really mean?" Well, what
it means is that although OPS is a very good quick and dirty method, it's not
as accurate as some of its proponents claim. So if you want to get a good quick
estimate, OPS works fine; if you want a more accurate assessment, you're better
off using one of the other methods.

My study shows that with the proper selection of event values, a linear
formula is more accurate for the different run scoring periods.

Linear vs. non-Linear Match-up

Although the other formulas move around in ranking, XR stays near
the top for both run scoring environments. This finding directly contradicts
James' assertion in the Historical Baseball Abstract that linear
formulas cannot accurately estimate runs because "run scoring is not
linear." Although I agree with Mr. James that run scoring is not linear, I
don't believe that this fact prevents the use of a non-linear formula to
estimate runs created. That's because as long as the frequency and distribution
of events is pretty stable (and it has been) run scoring can be looked at as
linearthe more positive events that a team packs into a game, the more runs it
scores. My numbers confirm this assertion.

XR is designed for use from 1955 onward. Although you can use the
formula for seasons prior to 1955, I haven't confirmed its validity for that
period. Having said that, I've already begun work on creating other versions
for seasons prior to 1955.

Closing thoughts

As I mentioned at the top of the article, I'm a baseball analyst, not a math
professor. With that in mind, I'd like to encourage any and all input from all
the math experts reading this article. Although I'm confident that I've got a
good handle on the topic, I'm open to other ideas about how to examine the
accuracy question.

Also, if any of you math experts have written (or can write) something that
explains all the underlying math in a simple, easy to understand manner, and if
you're willing to share your knowledge, please contact me. I'd really like to
include a nice explanation in future versions of this article.

Speaking of the web, I encourage everyone with Internet access to check out
the web version of this article. Since my web site does not suffer from the
space constraints that the printed word does, I'm free to include more, more,
more data. You can find the webified version at
http://www.baseballstuff.com/btf/scholars/furtado/accuracy.htm.