Tuesday, November 05, 2002

And the Beat Goes On: Derek Jeter and the State of Fielding Analysis in Sabermetrics - Part 2

First up on the chopping block: Pete Palmer.

In the Beginning - Fielding Runs

In 1984, Pete Palmer and John Thorn wrote The Hidden Game of Baseball. This book provided the first detailed look at Palmer’s Total Player
Rating (TPR) system, which attempts to rank players based upon the number of
runs that they produced (on offense) or saved (on defense) beyond those
produced or saved by a league-average player. All aspects of a player’s game
are considered, and were intended to be additive, so that Batting Runs, Stolen
Base Runs, Fielding Runs, and (for pitchers) Pitching Runs could be added
together and converted to wins using a season-specific value for Runs per Win.
Theoretically, a player who played in a high-run environment could be compared
to a player who played in a low-run environment, since the number of runs per
win would be used to convert the numbers to the same scale.

Palmer’s defensive measurement system, Fielding Runs, has been widely
critiqued and (mostly) criticized. In fairness to Palmer, the system was
developed long before we had the kind of data collection that Project
Scoresheet/Baseball Workshop and STATS, Inc. have brought to the table since
the late 1980s, and the system must be seen - as all pathfinding systems tend
to be - as a first cut at making sense of the data.

I have written an analysis of Fielding Runs in which I
describe how Fielding Runs are calculated, so I won’t repeat that information
here. I will note that instead of using Palmer’s estimator of playing time in
the FR formula, I used actual innings played from the Palmer/Gillette
play-by-play data base.

Table 1. AL SS Fielding Runs 1998-2000 (min 800 innings)

1998

TEAM

G

GS

INN

FR

M Bordick

BAL

150

144

1238.3

18.26

D Cruz

DET

135

132

1163.3

16.10

K Stocker

TB

110

108

940.0

11.63

O Vizquel

CLE

151

149

1316.0

7.44

M Tejada

OAK

104

104

915.0

2.16

A Rodriguez

SEA

160

160

1389.3

2.06

G DiSarcina

ANA

157

155

1370.7

-3.63

M Caruso

CWS

131

129

1121.3

-7.33

P Meares

MIN

149

145

1270.0

-7.68

A Gonzalez

TOR

158

157

1398.3

-9.40

N Garciaparra

BOS

143

143

1255.3

-15.26

D Jeter

NYY

148

148

1304.7

-20.02

1999

TEAM

G

GS

INN

FR

M Bordick

BAL

159

155

1357.3

35.38

R Sanchez

KC

134

131

1131.7

31.94

T Batista

TOR

98

98

860.7

10.18

A Rodriguez

SEA

129

129

1116.0

7.23

M Tejada

OAK

159

156

1377.3

7.13

D Cruz

DET

155

151

1302.3

4.68

R Clayton

TEX

133

133

1149.0

3.10

O Vizquel

CLE

143

140

1214.3

1.57

C Guzman

MIN

131

126

1069.0

-7.20

N Garciaparra

BOS

134

133

1173.7

-7.53

M Caruso

CWS

131

125

1114.7

-19.92

D Jeter

NYY

158

158

1395.7

-33.55

2000

TEAM

G

GS

INN

FR

F Martinez

TB

106

103

887.7

31.20

J Valentin

CWS

141

136

1212.3

20.59

R Sanchez

KC

143

140

1198.0

15.18

A Rodriguez

SEA

148

148

1285.0

8.05

D Cruz

DET

156

154

1355.3

3.95

M Tejada

OAK

160

159

1400.3

3.17

O Vizquel

CLE

156

154

1328.7

-0.54

N Garciaparra

BOS

136

135

1185.0

-2.55

R Clayton

TEX

148

144

1237.0

-2.59

A Gonzalez

TOR

141

140

1225.3

-6.82

C Guzman

MIN

151

148

1307.0

-14.15

M Bordick

BAL

100

100

865.0

-14.38

D Jeter

NYY

148

148

1278.7

-36.47

I touch on some of the limitations of Fielding Runs as a measurement
of defensive performance in the referenced article above, and Bill James
also touches on them in his Win Shares book. The biggest issue with
FR, in my opinion, is the implicit assumption that the only factor that affects
the number of plays that a shortstop (or any other fielder) makes is the total
number of balls put into play against a team. But whwn we look at the play-by-
play data we can see that this assumption is erroneous. There are two other factors
that are known to affect this distribution - the ground ball/fly ball charateristics
of the pitching staff, and the L/R distribution of batters faced (which can be
estimated by using the L/R distribution of the pitching staff). Batters as a group
tend to pull grounders and hit fly balls to the opposite field, so if a team faces
proportionately more RHB than the norm, the 3B, SS, and RF will tend to have more plays
than the norm.

We can see these team-to-team differences in the play-by-play data. Gillette and
Palmer use the zones defined on the Project Scoresheet/Retrosheet ball location diagram to identify where
balls are put into play. If we assume (for the moment) that a shortstop,
regardless of his exact positioning, could conceivably field ground balls hit into the
‘56’, ‘6’, and ‘6M’ zones, we can use that as an upper bound on the number of
opportunities he could have. In this part of the analysis, I used the ‘56’, ‘56D’,
‘6’, ‘6D’, ‘6M’ and ‘6MD’ zones as providing the bounds for the area where a
shortstop could be expected to have a reasonable chance to field a ground ball,
because my initial runthrough of the data suggests that either the pitcher or the
3B will field the vast majority of grounders in the ‘56S’,‘6S’, and ‘6MS’ zones.
Also, in this part of the analysis, I didn’t concern myself with who actually
fielded the ball, but only where the ball was hit, since I was interested
in establishing the boundaries of what might have happened given positioning
variances between teams.

Here’s what the play-by-play data shows for the years 1998-2000. “SSF” is
the number of shortstop fielding opportunities, and “Hole”, “Direct”, and
“Middle” shows the disribution of balls in the ‘56’, ‘6’, and ‘6M’ areas
respectively:

Table 2. AL SS BIP, 1998-2000 (min 800 innings)

1998

Team

G

GS

Inn

BIP

GB

FB

%GB

SSF

SSF/9

Hole

Direct

Middle

FR

Cruz, D

DET

135

132

1163.3

3605

1783

1822

49.5%

652

5.04

262

280

110

16.10

Caruso, M

CHA

131

129

1121.3

3504

1574

1930

44.9%

560

4.49

245

244

71

-7.33

Bordick, M

BAL

150

144

1238.3

3764

1787

1977

47.5%

608

4.42

230

289

89

18.26

Stocker, K

TBA

110

108

940.0

2848

1272

1576

44.7%

461

4.41

222

181

58

11.63

Tejada, M

OAK

104

104

915.0

2898

1286

1612

44.4%

447

4.40

164

210

73

2.16

Rodriguez, A

SEA

160

160

1389.3

4167

1873

2294

44.9%

672

4.35

260

290

122

2.06

Meares, P

MIN

149

145

1270.0

4096

1756

2340

42.9%

580

4.11

221

268

91

-7.68

Vizquel, O

CLE

151

149

1316.0

4084

1894

2190

46.4%

596

4.08

245

260

91

7.44

DiSarcina, G

ANA

157

155

1370.7

4138

1901

2237

45.9%

608

3.99

288

245

75

-3.63

Jeter, D

NYA

148

148

1304.7

3876

1789

2087

46.2%

578

3.99

252

251

75

-20.02

Garciaparra, N

BOS

143

143

1255.3

3835

1694

2141

44.2%

535

3.84

195

260

80

-15.26

Gonzalez, A

TOR

158

157

1398.3

4158

1837

2321

44.2%

573

3.69

175

269

129

-9.40

1999

Team

G

GS

Inn

BIP

GB

FB

%GB

SSF

SSF/9

Hole

Direct

Middle

FR

Sanchez, R

KCA

134

131

1128.7

3636

1714

1922

47.1%

605

4.82

252

157

196

31.94

Clayton, R

TEX

133

133

1149.3

3611

1773

1838

49.1%

593

4.64

215

145

233

3.10

Rodriguez, A

SEA

129

129

1114.7

3463

1578

1885

45.6%

547

4.42

170

166

211

7.23

Bordick, M

BAL

159

155

1355.0

4097

1997

2100

48.7%

663

4.40

210

188

265

35.38

Cruz, D

DET

155

151

1300.3

4043

1873

2170

46.3%

623

4.31

206

190

227

4.68

Batista, T

TOR

98

98

860.7

2654

1258

1396

47.4%

409

4.28

128

113

168

10.18

Caruso, M

CHA

132

125

1114.7

3547

1588

1959

44.8%

526

4.25

197

121

208

-19.92

Tejada, M

OAK

159

156

1377.3

4333

2067

2266

47.7%

642

4.20

243

176

223

7.13

Guzman, C

MIN

131

126

1069.0

3454

1527

1927

44.2%

497

4.18

150

155

192

-7.20

Vizquel, O

CLE

143

140

1214.3

3635

1737

1898

47.8%

552

4.09

209

157

186

1.57

Garciaparra, N

BOS

134

133

1171.7

3489

1602

1887

45.9%

526

4.04

192

166

168

-7.53

Jeter, D

NYA

158

158

1395.7

4143

1942

2201

46.9%

565

3.64

205

174

186

-33.55

2000

Team

G

GS

Inn

BIP

GB

FB

%GB

SSF

SSF/9

Hole

Direct

Middle

FR

Martinez, F

TBA

106

103

887.7

2801

1351

1450

48.2%

467

4.73

161

143

163

31.20

Tejada, M

OAK

160

159

1400.3

4405

2158

2247

49.0%

694

4.46

276

189

229

3.17

Sanchez, R

KCA

143

140

1198.0

3751

1783

1968

47.5%

592

4.45

223

158

211

15.18

Garciaparra, N

BOS

136

135

1185.0

3519

1695

1824

48.2%

576

4.37

195

186

195

-2.55

Gonzalez, A

TOR

141

140

1225.3

3852

1765

2087

45.8%

593

4.36

220

139

234

-6.82

Cruz, D

DET

156

154

1355.3

4294

2072

2222

48.3%

653

4.34

215

237

201

3.95

Valentin, J

CHA

141

136

1212.3

3645

1681

1964

46.1%

579

4.30

180

193

206

20.59

Clayton, R

TEX

148

144

1237.0

4051

1718

2333

42.4%

570

4.15

228

159

183

-2.59

Rodriguez, A

SEA

148

148

1285.0

3929

1731

2198

44.1%

568

3.98

169

193

206

8.05

Vizquel, O

CLE

156

154

1328.7

3921

1894

2027

48.3%

587

3.98

217

152

218

-0.54

Bordick, M

BAL

100

100

865.0

2709

1245

1464

46.0%

379

3.94

137

84

158

-14.38

Guzman, C

MIN

151

148

1307.0

4080

1753

2327

43.0%

549

3.78

159

165

225

-14.15

Jeter, D

NYA

148

148

1278.7

3934

1693

2241

43.0%

497

3.50

185

144

168

-36.47

The correlation coefficient r for the number of fieldable shortstop chances
per nine innings (SSF/9) and Fielding Runs is +0.74. The correlation
coefficient for the percentage of ground balls hit and Palmer’s FR is +0.51.
This strongly suggests that Fielding Runs for SS are highly dependent on the
number of balls hit in the vicinity of the SS. Jeter was at the very bottom of
the list in both 1999 and 2000, and very close to the bottom of the list in
1998.

Fielding Runs falls short on a number of counts. The method, as noted above,
fails to account for the effects of team defensive context; players who see more
balls will generaly rank higher than players who see fewer balls in their vicinity.
It is also obvious that Fielding Runs fails to consider defensive failures, other
than errors, by ignoring hits. Finally, Fielding Runs penalizes fielders for plays
made by other fielders on their team. Charles Saeger, in his 1999 BBBA article in
which he details Context-Adjusted Defense, demonstrates this point nicely:

“Suppose a team allows an average number of hits, but strikes out 100 fewer
batters than average. Also suppose that the left fielder recorded 40 of those
extra outs and the second baseman recorded the other 60. Now, the shortstop
recorded an average number of outs. ... the Palmer method ... would show him
to be a below-average fielder, since (it) posits that he had 100 more chances
to field a ball.”

Reader Comments and Retorts

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

We could dance about many of these things for hours. I really appreciate and enjoy the work you are putting out here.

I looked at the GB/FB ratio in your charts. Why is it 0.85 instead of 1.16? Is that a GDP thing?

I looked at the zone % (z56, z6, z6M). There is a scoring change between 1998 and 1999/00. The average z6M% in 1998 is 0.154. In 1999 and 2000 the average z6M% is 0.366 and 0.358. There is a shift from the 98 z6% to the 99/00 z6M%. The z56% goes .408=>0.351/0.351 (same % in 99 & 00).

Probably will, too :) Maybe we can host a Primer Chat at some point down the road after I finish a few more installments of this series.

I looked at the zone % (z56, z6, z6M). There is a scoring change between 1998 and 1999/00. The average z6M% in 1998 is 0.154. In 1999 and 2000 the average z6M% is 0.366 and 0.358. There is a shift from the 98 z6% to the 99/00 z6M%. The z56% goes .408=>0.351/0.351 (same % in 99 & 00).

I noticed that, too. I think the zone assignment system used for balls in play was refined between 1998 and 1999. I don't know whether Pete and Gary had access to STATS' stuff; IIRC STATS made some zone changes after 1998, and Pete and Gary might have followed the STATS assignments. The net effect is small, I think, because there's not a whole lot of difference between z6 and z6M in terms of conversion of opportunities - but in light of what the data shows about Jeter's fielding skills in each area (which is coming in later installments) I should split 1998 out and look at 1999-2000 separately.

Note that these are fielding opportunities, not balls actually fielded by the SS.

What I'm most interested in seeing (and hopefully this will come in a later installment) is whole well the estimated number of chances a SS would see (based on team balls in play, pitcher handedness and ground-fly ball ratio) correlates with the SSF/9 stat above...

It's coming, in part 4. Context-Adjusted Defense is the first method (and AFAIK the only method) that directly attempts to place SS chances in this context. DFTs (part 3) don't do this directly, and Win Shares doesn't do it at all.

I suppose I should lay out the rest of the installments, just so
people have the plan:

Part 3: Davenport Fielding Translations (DFTs)
Part 4: Context-Adjusted Defense (CAD)
Part 5: Win Shares
Part 6: ZR/UZR
Part 7: What we can learn from the play-by-play data
Part 8: Summary and conclusions

Dan S has Part 3, and I assume he'll post it later in the week. I need Charlie to review Part 4, which is about 90% done (I didn't talk about CAD in Boston because the revisions were still in process, and I wanted to wait until Charlie's article appeared here). Part 5 was written for the SABR32 presentation, but I wanted to present DFTs and CAD first to demonstrate the extent to which James used things that Clay and Charlie had already done. I have outlines for the rest of the series.

I have STATS zones/assignments. There weren't any at that time - they changed from double-counting DPs then. The zones wouldn't change wrt BIP anyway.

They did something in the outfield, I know - OF zone ratings went way up between 1998 and 1999. But I suspect you're right and that there was a transcription error in 1998 that cause a number of z6M balls to be labeled as z6. Have to remember to ask Gary why that is.

I have a little free time this weekend, so I am going to redo my UZR methodology, incorporating some "positioning" and L/R factors. When I'm done, I'll write a little article describing the methodology, and including the results. Hopefully they'll put it up on this web site. Unfortunately I don't have my 2002 data (I will soon), so the results will just be a rahash, with the "tweaked" methodology, of what was already presented in my Superlwts articles...

Mike, I was going to ask the same question as Doug. I'm sure you'll go back and see this in Part 1, anyway, but I wanted to re-ask the question for those that won't. So, here's Doug's question:

November 5, 2002 - Doug

-- How many of these balls would you estimate there are per team/season? -- Which teams give up more of them? Which teams give up fewer? -- What kind of pitchers give up more of them? What kind of pitchers give up fewer of them? -- What are the techniques that you would use to estimate the number of these balls that are put into play? -- How would you verify that your estimate was of the right order of magnitude?

OK Mike, good questions, but I think there should be ways to answer them in a reasonable way. And, gut feel, I just can't imagine that the unfieldable BIPs are negligible.

You alluded in your piece to the fact that your study makes use of Play-by-Play data. I haven't ever seen these data but, from what I've read in other pieces, included are, among other things, a location code to which every BIP is hit - and it's pretty precise, breaking the field down into several dozen distinct zones. So, if you've got that data, you can tell, pretty much straight-away, in which zones BIPs are seldom turned into outs, and in which zones outs come much more readily. Seems like a reasonable basis for coming up with estimates of how many BIPs really should be considered fieldable.

To take it a step further, if the play-by-play data tells you what zone a BIP was hit to, it probably also tells you who fielded it. So that should also give you a start on coming up with estimates on expectations for which fielder is more likely to field a ball in a certain location (or, to put it another way, to come up with a fairer way to establish expectations on which fielder should handle which BIPs).

If I'm wrong about the play-by-play data, and none of the stuff I've talked about exists, then please diregard my points. But I do think it exists somewhere, based on other pieces I've read on this site. And, obviously, I'm only talking conceptually here and it would undoubtedly be a mountain of work to account for all the nuances, but my point is I think it's quite possible to come up with reasonable answers to your questions. I guess you can tell I've got no problem with doing estimates. My view is I think it's utterly impractical to assume that methods used to try to answer meaningful questions about the game will ever be 100% accurate 100% of the time. Just isn't going to happen. So, instead, why not try to make the best insights you can based on whatever data you've got? Having said that, I completely respect that you may have a different philosophical bent on this.

So, if you've got that data, you can tell, pretty much straight-away, in which zones BIPs are seldom turned into outs, and in which zones outs come much more readily. Seems like a reasonable basis for coming up with estimates of how many BIPs really should be considered fieldable.

I took a pretty difficult fielding situation, a line drive hit just over the infield into one of the xD zones, where x is an infield position. There are eight of these zones - 3D, 34D, 4MD, 6MD, 6D, 56D, and 5D. I looked at these using the 2000 PBP data.

I suppose those who want to make the uncatchable argument would note that the plays in the zones nearest the lines were *rarely* made - but I note that players manage to catch some balls even in those areas of the field, and thus one can't claim that the balls hit in that area were totally uncatchable. This is about the worst case fielding situation that I can imagine, and there aren't any places where plays are never made.

I myself have cast a trout eye on Fielding Runs ever since I perused my spankin' new 1989 edition of Total Baseball (plucked out of a bookstore's albatross bin in mid-1990). I have always been taken by Jerry Martin's curious 1976 season, where (as Greg Luzinski's Designated Glove Caddy(TM)) he had 130 games and only 129 PAs. That has to be some kind of record! But I digress.

Anyway, I look up in my 1989 Total Baseball Jerry Martin's Fielding Runs for that year. Imagine my deep and abiding shock to see attached some historically large negatives thereunto. "That ain't right," said I, and so I still do say. And I bet Danny Ozark would say it, too.