The Accuracy of Component ERA

(Note-I added
new information below on June 17, 2005. Using regression analysis, I made an
improvement in the model. This is explained at the end of the article.)

Please let me
know if someone has done this already. I looked at the 70 pitchers who pitched
at least 3000 innings between 1920 and 2000. The correlation between their
actual ERA and the Component ERA that Bill James invented is .918. So the
r-squared would be .843, meaning that 84.3% of the variation in ERA across
pitchers could be explained by their component ERA.

The formula is complicated.It uses a pitcher’s number of batters faced,
hits allowed, HRs allowed, walks allowed, and HBP. It can be found at

Only 8 of the 70
pitchers had a CERA that was more than .25 different from their actual ERA.
Even if a pitcher pitched 40 complete games in a season, a differential here of
.25 or less means less than 10 runs in a season, or about 1 win.

One interesting
thing is that only 18 pitchers had lower CERAs than their actual. So 52 had
higher CERAs than their actual. It seems like it should be close to half for
each. I have not tried to figure out how it could be made better. The average
differential is .066 (including negatives). The average absolute differential
is .116.

The table below shows the
results.First their actual ERA, then
CERA, the differential. If anyone has any theories as to why these pitchers are
the best, I would be happy to hear them. I listed the most negative values
first, indicating the pitchers who went the most below their CERAs.

Pitcher

ERA

CERA

Diff

Tom
Zachary

3.74

4.01

-0.271

Whitey
Ford

2.74

2.97

-0.225

Mike
Torrez

3.96

4.15

-0.188

Claude
Osteen

3.30

3.44

-0.141

Tommy
John

3.34

3.48

-0.138

Murry
Dickson

3.66

3.79

-0.135

Phil
Niekro

3.35

3.46

-0.109

Larry
French

3.44

3.55

-0.106

Jim Kaat

3.45

3.54

-0.087

Lefty
Grove

3.06

3.12

-0.061

Bucky
Walters

3.30

3.36

-0.057

Jim Palmer

2.86

2.91

-0.054

Jesse
Haines

3.65

3.69

-0.045

Rick
Reuschel

3.37

3.41

-0.037

Mel
Harder

3.80

3.83

-0.028

Joe
Niekro

3.59

3.61

-0.017

Jim Perry

3.45

3.46

-0.015

Don
Drysdale

2.95

2.96

-0.012

Bob Welch

3.47

3.47

-0.003

Bob
Friend

3.58

3.58

0.003

Frank
Tanana

3.66

3.65

0.007

Curt
Simmons

3.54

3.53

0.013

Steve
Carlton

3.22

3.2

0.015

Dutch
Leonard

3.25

3.22

0.028

Jerry
Koosman

3.36

3.33

0.029

Billy
Pierce

3.27

3.24

0.030

Jerry
Reuss

3.64

3.6

0.037

Freddie
Fitzsimmons

3.51

3.47

0.038

Milt Pappas

3.40

3.36

0.038

Bobo
Newsom

3.98

3.94

0.043

Bob
Gibson

2.91

2.86

0.054

Warren
Spahn

3.09

3.03

0.055

Dolf
Luque

3.25

3.19

0.058

Ted Lyons

3.67

3.61

0.058

Orel
Hershiser

3.25

3.19

0.060

Early
Wynn

3.54

3.48

0.060

Bob
Feller

3.25

3.19

0.064

Burleigh
Grimes

3.65

3.58

0.067

Doyle
Alexander

3.76

3.69

0.068

Earl
Whitehill

4.36

4.29

0.070

Jim
Bunning

3.27

3.2

0.070

Bert
Blyleven

3.31

3.23

0.084

Rick Wise

3.69

3.6

0.087

Waite
Hoyt

3.60

3.51

0.088

Larry
Jackson

3.40

3.31

0.092

Charlie
Hough

3.75

3.65

0.096

Mickey
Lolich

3.44

3.34

0.097

Carl
Hubbell

2.98

2.88

0.097

Gaylord
Perry

3.11

3

0.105

Dennis
Martinez

3.68

3.56

0.120

Lew
Burdette

3.66

3.53

0.126

Vida Blue

3.26

3.13

0.130

Luis
Tiant

3.30

3.16

0.144

Paul
Derringer

3.46

3.31

0.148

Charlie
Root

3.59

3.43

0.156

Juan
Marichal

2.89

2.73

0.160

Tom
Seaver

2.86

2.67

0.192

Roger
Clemens

3.07

2.88

0.193

Robin
Roberts

3.40

3.21

0.195

Greg
Maddux

2.83

2.63

0.197

Sam Jones

3.94

3.74

0.201

Danny
Darwin

3.75

3.54

0.210

Eppa
Rixey

3.34

3.12

0.220

Jack
Morris

3.90

3.61

0.290

Ferguson
Jenkins

3.34

3.04

0.298

Nolan
Ryan

3.19

2.89

0.303

Catfish
Hunter

3.26

2.94

0.317

Red
Ruffing

3.80

3.48

0.319

Dennis
Eckersley

3.49

3.11

0.380

Don
Sutton

3.26

2.88

0.381

Source: The STATS, INC. All-Time Major League Handbook,
their 2001 Major League Handbook and Lee Sinins Sabermetric
Encyclopedia.

New
Information

Bob Clark of Louisville suggested that I take lefties into
account in looking at component ERA (ERC). The big reason to me might be that
lefties face more right handed batters so they might get more GIDPs. So they
might have lower ERAs than projected based on hits, BBs, and HRs than RHPs. Also, the more strikeouts a pitcher has, the fewer GIDPs he will probably have. That might increase runs allowed.

So I ran a regression in which the dependent variable was
ERA and the independent variables are ERC, strikeouts per IP and a dummy
variable for being a lefthanded pitcher (1 if yes, 0 if no). The regression
equation was

ERA = .427 + .881*CERA + .1188*SO/IP - .0986*DUM

Dum had a t-value of about -3. The r-squared was .866 and
the standard error was .116. The formula predicts all pitchers to within .299
or less. 63 were off by less than .2 and 48 were within .1. The average
difference between the predicted value and the actual ERA is basically zero
(.0006 again), probably because the over and under predictions cancel out. The
average absolute deviation was .062. 36 were over predicted and 34 were
underpredicted. ERC by itself underpredicted 51 of the 70. So adding in the
dummy variable made things just a little more accurate than the ERC +
strikeouts model.

I used all
pitchers with 3000+ IP from 1920-2000.

I also tried the regression with interaction or slope
terms for the dummy variable. The results were similar.

Below are the results. The Predicted ERA is the one
predicted by the equation above.