Baseball performances are an imperfect
measure of baseball abilities, and consequently exaggerate differences in abilities.
Predictions of relative batting averages and earned run averages can be improved
substantially by using correlation coefficients estimated from earlier seasons
to shrink performances toward the mean.

key words: least squares, sports,
predictions

Do Baseball Players
Regress Toward the Mean?

1. INTRODUCTION

Baseball players who win coveted
awards and sign breathtaking contracts often disappoint fans, managers, and
owners. These disappointments might be explained by regression toward the mean,
which occurs when real phenomena are measured imperfectly, causing extreme measurements
to exaggerate differences among the underlying phenomena. The degree of exaggeration
depends on the correlation between the measurements and the real phenomena.
In baseball, the correlation between performance and skill is far from perfect
and, as a consequence, observed performance differences substantially overstate
skill differences. Players who do exceptionally well in any particular season
typically do not do as well the subsequent seasonthey regress toward the
mean. We can improve our forecasts by adjusting our predictions accordingly.

2. REGRESSION TOWARD THE MEAN

Galton (1886) observed regression
toward the mean in his seminal study of the relationship between the heights
of parents and their adult children. Because heights are affected by diet, exercise,
and other environmental factors, observed heights are an imperfect measure of
the genetic influences that we inherit from our parents and pass on to our children.
A person who is 78 inches tall may have been pulled above a somewhat shorter
genetically predicted height by positive environmental influences or may have
been pulled below a somewhat taller genetic height by negative environmental
factors. The former is more likely because there are many more people with genetically
predicted heights below 78 inches than with genetic heights above 78 inches.
Thus the observed heights of unusually tall parents usually overstate the genetic
heights that they pass on to their children.

Regression toward the mean can
be seen in sports, where observed performance is an imperfect measure of skill.
Of the last twenty major league baseball champions from 1979 through 1998, only
two repeated the following year. Of those major league baseball teams that win
more than 100 games in a season, 90% do not do as well the next season (James
1981). Outcomes depend on luck as well as skill, and those teams that do unusually
well are more likely to have experienced good luck than bad. Few teams are so
far superior to their opponents that they can win a championship in an off year.
Thus the performance of most champions exaggerates their skill and, because
good luck cannot be counted on repeatedly, most champions regress to the mean.

The same is true of individual
players. In 1989, Sports Illustrated reported that of those baseball players
who hit more than 20 home runs in the first half of the season, 90% hit fewer
than 20 during the second half. The writer concluded that there was a "second-half
power outage" (Gammons 1989, p. 68). The regression-toward-the-mean explanation
is that their skills did not deteriorate, but rather that their unusually good
performances during the first half exaggerated their skills. Regression toward
the mean can also explain such cliches as the Cy Young jinx, sophomore slump,
rookie-of-the-year jinx, and Sports Illustrated cover jinx. Baseball players
have good and bad years, and it would be extraordinary for a player to be the
best in the league while having a bad year by his own standards. Most players
who do much better than their peers are also performing better than their own
career averages.

Tables 1 and 2 show the players
who had the ten best batting averages (BAs) and earned-run averages (ERAs) in
1998 among those players who had at least 50 at bats or 25 innings pitched in
1997, 1998, and 1999. The mean batting average was approximately 30 points lower
in the adjacent seasons than in the top-10 season; the mean earned run average
was roughly 1 run higher. Thirteen of these twenty players had worse records
in both 1997 and 1999 than in 1998.

More generally, Figures 1 and
2 compare the 1998 and 1999 BAs and ERAs of all major league players who had
at least 50 at bats or 25 innings pitched in each of these years. (These figures
also show the locally weighted scatter plot smoothing (LOWESS) lines with a
bandwidth of 0.5. Ramsey regression specification error tests for second-, third-,
or fourth-order terms have p values of 0.305 for batting averages and 0.172
for earned run averages.)

These graphs have two striking
characteristics. First, although the relationships are highly statistically
significant, the correlations are modest. For the 381 batters, the two-sided
p value is 95.3 x 1013 and the correlation coefficient is 0.36; for the
300 pitchers, the two-sided p value is 0.00000004 and the correlation coefficient
is 0.31. Although the 1998 performances are statistically helpful in predicting
1999 performance, the predictions are far from perfect.

Second, the slopes of the least-squares
lines are less than 1, indicating that performance regresses to the mean. Because
the least-squares line goes through the average values of both variables, the
0.378 slope for batters means that a player whose BA was 0.050 above (or below)
the 1998 mean is predicted to have a 1999 batting average that is only 0.019
from the 1999 mean. The 0.340 slope for pitchers implies that a player whose
ERA was 1.000 from the 1998 mean ERA is predicted to have a 1999 ERA that is
only 0.340 from the 1998 mean.

For another suggestive indicator
of regression toward the mean, we looked at all major league players since 1901
who had at least 50 at bats or 25 innings pitched in two consecutive seasons.
Of 4026 players who had BAs of .300 or higher in any season, 3210 (79.7%) did
worse the following season. Of 3849 players who had ERAs of 3.00 or lower in
any season, 3077 (79.9%) did worse the following season. Clearly, baseball players
regress toward the mean. To formalize these observations, we use the following
model.

3. A MODEL

Let Y be a statistical measure of
a players performance (batting average for batters, earned run average
for pitchers). In order to compare players from different seasons, we standardize
performance by taking the difference between the players performance in
any given season and the mean performance for all players that season, and dividing
this difference by the standard deviation of performance across players that
season. Thus Ted Williams .406 batting average in 1941 was a standardized
batting average of Z = 3.14; that is, 3.14 standard deviations above the mean
that year. Sandy Koufaxs 1.73 earned run average in 1966 was standardized
to be Z = 1.93, or 1.93 standard deviations lower than the mean ERA that
year.

A players performance
Z in any year is related to an expected value m,
which we can think of as this players true ability (what his average performance
would be over many similar seasons). The players actual performance in
any particular season differs from his ability by a random term e
that we assume has an expected value of zero and is independent of ability and
of the value of the random term in other seasons:

Z = m
+ e

(1)

There is a distribution of abilities
across players. If e is independent of m, then
the variance of Z is equal to the variance of m
plus the variance of e, and is therefore larger
than the variance of m: var[Z] = var[m]
+ var[e]. Thus the variation of actual batting
averages (or earned run averages) across players in a season is larger than
the variation of abilities. An extreme performance typically overstates
how far this players ability is from the mean ability.

If we knew the value of a players
ability m, we could use m
to make an unbiased prediction of his performance in a season. However, players,
fans, managers, and owners are interested in the reverse questionusing
performance to predict abilityand unbiased predictions require an adjustment
for regression toward the mean.

Imagine that we had data on
each persons ability m and his performance
Z in a season. The least-squares equation for predicting performance from ability
is

(2)

where r is the correlation between
Z and m. The least-squares equation for predicting
ability from performance is

(3)

Unless the correlation coefficient
is 1 or 1, these two equations are not simply the reverse of each other.

Because they cannot use the
unknown value of m to predict performance Z in any
given season, fans, managers, and owners might use the value Z(1) in the
preceding season:

Z = a + bZ(1)
+ g

Using Equation 1 and assuming abilities
are constant in adjacent seasons, the true relationship is

Z = Z(1)
+ (e  e(1))

If var[e(1)]
= var[e], then the correlation coefficient between
Z and Z(1) is a simple function of the variation in abilities and the
variation in performance about ability:

The greater the variation in performance
about abilities, the smaller is the correlation between adjacent-season performances
and the more we should shrink predicted performances toward the mean.

4. DATA

We looked at season batting averages
and earned run averages for all major league baseball batters and pitchers from
1901 through 1999 (Thorn, et al, 1999). To reduce the influence of outliers,
we excluded players who had fewer than 50 official times at bat or 25 innings
pitched. This gave us a database of 29,310 seasons for 5262 batters and 20,385
seasons for 4212 pitchers. To compare players from different years, each players
performance each year was standardized by subtracting the mean for all players
that year and dividing this difference by the standard deviation of performance
across players that year. In addition, in a competitive sport like baseball,
relative performance is what mattersnot whether a player bats .290, but
how many standard deviations the player is above or below the mean that year.

5. PERFORMANCE CORRELATIONS

We calculated correlation coefficients
in two ways. For the most-recent-season calculation, we compared a players
performance in a given year with his performance in the most recent preceding
season; for the more restrictive adjacent-season comparison, we only considered
cases where the most recent season immediately preceded the current season.

For batting averages, a total
of 24,047 most-recent-season observations have a correlation coefficient of
0.37 and 22,243 adjacent-season observations have a correlation coefficient
of 0.38. For earned run averages, a total of 16,172 most-recent-season observations
have a correlation coefficient of 0.22 and 14,714 adjacent-season observations
have a correlation coefficient of 0.24. In all cases, these correlations are
highly statistically significant, decisively demonstrating that performance
is not uncorrelated across seasons. However, the correlations are far from perfect.

These results are not an artifact
of a few outliers who did extraordinarily well or poorly one season and more
nearly average the next season. Excluding batters who were more than 2 standard
deviations from the mean in the most recent or adjacent season, the correlation
coefficients drop slightly, to 0.34 and 0.35. Excluding pitchers who were more
than 2 standard deviations from the mean, the respective correlation coefficients
remain 0.22 and 0.24.

Looking at individual seasons,
the BA correlation coefficients for adjacent seasons ranged from 0.18 in 1991
and 0.25 in 1940 to 0.63 in 1906 and 0.59 in 1931; the average was 0.39. Since
World War II, the highest correlation coefficient was 0.46 in 1957. The ERA
correlation coefficients for adjacent seasons ranged from 0.00 in 1913 and 1948
to 0.55 in 1939 and 0.48 in 1932; the average was 0.25. Since World War II,
the lowest correlation coefficient was 0.05 in 1981 and the highest was 0.38
in 1977.

6. PREDICTING RELATIVE PERFORMANCE

Our model suggests that instead
of using this years relative performance to predict next seasons
relative performance, more accurate predictions can be made by shrinking each
players performance toward the mean; that is, predicting a players
Z(+1) from rZ rather than Z. To see whether this is so, we considered batters
and pitchers who played in two adjacent seasons, say 1998 and 1999. The unadjusted
prediction of each players 1999 Z value is simply his 1998 Z value. For
the adjusted predictions, we used the data for all persons who played in both
1997 and 1998 to estimate the correlation coefficient between adjacent-season
performance; we then predicted the 1999 Z value of each person who played in
1998 and 1999 by multiplying his 1998 Z value by this correlation coefficient.

We did this for every season
and measured the overall accuracy of the predictions by the root mean squared
error (RMSE) each year and for the entire time period. The first rows of Tables
3 and 4 show these results. For both batters and pitchers, the RMSE was improved
in every one of the 96 years by shrinking the performances toward the mean.
For the entire time period, the RMSE was reduced 16% for batters and 18% for
pitchers.

For those who have played more
than one year, more accurate predictions might be expected from an averaging
of these previous seasons performances. Our model suggests that these
averages, too, should be shrunk towards the mean. To investigate this question,
we considered batters and pitchers who had played in 2n adjacent seasons, using
the average Z value for the n earlier years to predict the average Z value for
the n subsequent years. For example, with n = 2, we looked at persons who had
played in 1995, 1996, 1997, and 1998. The unadjusted prediction of each players
average Z value for 1997 and 1998 is his average Z value for 1995 and 1996.
For the adjusted predictions, we calculated the correlation coefficient between
average 19931994 Z values and average 19951996 Z values for all
persons who played these four seasons. We then predicted each players
19971998 Z by multiplying his 19951996 Z by this correlation coefficient.
We only made predictions when data for at least 25 players were available for
estimating the correlation coefficient. There were consequently no BA predictions
for horizons longer than 7 years and no ERA predictions for horizons longer
than 6 years.

Tables 3 and 4 show the results.
As the horizon lengthened and the Z values were averaged over more seasons,
there is typically an increase in the correlation coefficient and a decline
in the RMSE. For every horizon, the predictions were substantially improved
by shrinking each players relative performance toward the mean.

These results are robust with
respect to the minimum number of at bats and innings pitched. Tables 5 and 6
show the predictive accuracy for next-season forecasts with the minimum number
of at bats ranging from 50 to 400 and the minimum number of innings pitched
ranging from 25 to 200. These increases in the minimum numbers raise the average
correlation coefficient, but do not alter our conclusion that shrinkage gives
more accurate predictions of relative performance.

At the suggestion of a referee,
we also looked at this performance measure for batters: on base average plus
slugging average, where a players on base average is the sum of his hits,
bases on balls, and times hit by pitches divided by the sum of his times at
bat, bases on balls, times hit by pitches, and sacrifice flies; a players
slugging average is his total bases (one base for a single, two for a double,
three for a triple, and four for a home run) divided by his times at bat. Table
7 shows that the correlations between performances in different years are somewhat
higher, but that the results are otherwise very similar to those for batting
averages.

7. CONCLUSION

Because baseball performances are
an imperfect measure of underlying abilities, batting averages and earned run
averages regress toward the mean. Outstanding performances exaggerate player
skills and are typically followed by more mediocre performances. The average
correlation coefficient for adjacent-season performance is 0.39 for batting
averages and 0.25 for earned run averages. Predictions of standardized batting
averages and earned run averages can be improved consistently and substantially
by using correlation coefficients estimated from earlier seasons to shrink performances
toward the mean.