The Ball in Play

By Bill James

April 28, 2016

I am being peppered in "Hey, Bill" with questions about Batting Averages on Balls in Play, which are good questions, but I did a couple of short studies related to the issue, and I thought I would report those here, and then I’ll answer the "Hey, Bill" questions as well, but some of them I’ll answer by referencing this study.

One of the core functions of Sabermetrics is to distinguish between what there is in the statistics which is real and can be relied upon, and what there is that is merely a manifestation of luck. The first meaningful illustration of this in our field had to do with pitchers’ won-lost records. In the 1970s and up until the 1970s, it was almost universally accepted by baseball men that a pitcher’s won-lost record was the truest indicator of how well he had pitched, because "luck evens out over the course of a season." If a pitcher went 14-10 but with a 4.38 ERA, while a teammate went 9-15 but with an ERA a run better, people would say that the 14-10 pitcher had pitched well when it mattered, that he was a winner, that he knew how to win, that he knew how to close out a victory, etc. etc. Which, by the way, is the actual data from Joe Niekro and Ken Holtzman with the 1968 Cubs; Niekro was 14-10 with a 4.38 ERA, while Holtzman was 9-15 but with a 3.35 ERA. The next year Holtzman went 17-13, while Niekro went 8-18, so apparently over the winter he forgot how to win, although he later remembered.

In 1978 Jim Slaton went 17-11 with a 4.12 ERA, while a teammate, Dave Rozema, went 9-12 with a 3.14 ERA. The same year Chris Knapp went 14-8 with a 4.21 ERA while a teammate, Paul Hartzell, pitched almost as many innings but went 6-10 with a 3.44 ERA. In 1977 Bob Forsch went 20-7 with a 3.48 ERA, while a teammate, Harry Rasmussen, posted the exact same ERA in more innings, but went 11-17. The explanation that prevailed at the time was that Forsch was a winner. The next year Forsch went 11-17. Harry Rasmussen changed his name to Eric.

What we saw, in our field, was that these things were not actually a result of a pitcher being a "winner" or a "loser", but of luck not evening out. When you actually looked at the number of runs scored for the "winners" and the number of runs scored for the losers, this was pretty obvious. All we were trying to say was that the won-lost record, in some outlying cases, some relatively unusual cases, was misleading because the element of luck reflected in it was imbalanced.

For ten years after we began making this argument, people pitched silly arguments at us in an effort to defend the traditional view of the issue. People took what we had said as if we had said that a pitcher’s won-lost record was meaningless. Well, they would say, but the pitcher himself is a hitter in the National League. Isn’t one of the reasons that one pitcher gets more runs than another just because he is a better hitter? And what about park effects. . .doesn’t the pitcher who gets more runs usually pitch in the high-run parks? And won’t he continue to pitch in the high-run parks the next year? And. .here’s one I heard a thousand times. . ."That’s what I’m saying, Bill. Blyleven pitched just well enough to lose."

Eventually, almost everyone who works in baseball came to understand what we were saying. Nobody who makes decisions for a baseball team in 2016 really believes that the won-lost record is a reliable indicator of how well a pitcher has pitched, although the old-line thinking survives among a few announcers and the oddball scout. This is what we do, in our field. If a team scores 670 runs and allows 700 runs but finishes 90-72, that isn’t a special, hidden ability; it’s luck. If a team goes 30-15 in one-run games, that’s luck. I don’t care how many clever ways you come up to explain why it might not be luck; it’s luck. If 16% of the fly balls allowed by a pitcher turn into home runs, that’s bad luck. I don’t care how many ways you come up with to explain the subtleties of pitching to me; that’s just bad luck.

Allen Craig hit .400 with runners in scoring position in 2012, and .454 in 2013. That’s luck. Nobody actually has an ability to hit with runners in scoring position. The ability is to hit; there is no special ability which arises only in limited circumstances. When it happens, it’s luck. If it happens twice in a row, it’s just a LARGE pile of luck.

This Ball in Play issue is structurally identical to the run support/won-lost record debate. What we are trying to say is that a batting average in many cases involves an element of luck, and that we can actually see what that element of luck is by looking at the player’s batting average on balls in play. If he has hit .370 on balls in play, he’s exactly like a pitcher who got 6 runs a game to work with last year: he was lucky, and he’ll be less lucky next year, and you can bet on it.

Apparently, based on my mail, this message hasn’t been widely understood yet. Aren’t there some hitters who hit .340 or .345 with runners in scoring position over the course of a career? Well, yes, of course there are a few hitters who have better-than-normal ability to hit the open spots in the defense; no one ever suggested that there were not. That’s like the occasional pitcher who hits .250 and has a little power, so he bumps up his offensive support a little bit. If a player hits a lot of line drives, doesn’t that increase his batting average on balls in play? Well, first of all, if a player hits a lot of line drives, that’s luck. 21% of balls in play are Line Drives. Miguel Cabrera hits 22% (Fangraphs). Derek Jeter in his career was 20%. David Ortiz in his career has been 20%. If you hit 25%, you’ve just been lucky.

I took all players in history who had 6000 or more plate appearances, and who were not active either in 1912 or before (because strikeout data is spotty from before 1912) and are not active now (meaning that they did not play in 2015, except I included Torii Hunter and Michael Could Die Here because they have announced their retirements. This came to 499 players. . .

You use long-career data for studies of aging so that you don’t get people popping into and out of the study. If you don’t use long-career data you get a constantly changing pool of players.

So, these 499 players. . .these are their Batting Averages on Balls in Play, by age:

AGE

IPAvg

AGE

IPAvg

AGE

IPAvg

21

.301

27

.305

33

.297

22

.301

28

.304

34

.297

23

.305

29

.304

35

.295

24

.305

30

.303

36

.293

25

.306

31

.302

37

.293

26

.308

32

.300

38

.288

This is the area in which we have at least 50,000 plate appearances at each age. The batting average on balls in play starts at .301 at age 21, goes up until age 26, and declines after age 26.

Well, that’s interesting. ..there is an age-related pattern. OK, now I’ve got the data together, let’s look at some other things.

In my data, these are the highest single-season batting averages on balls in play (300 or more plate appearances):

First

Last

YEAR

AGE

BIP

Hits

IPAvg

Babe

Ruth

1923

28

388

164

.423

Rogers

Hornsby

1924

28

479

202

.422

Harry

Heilmann

1923

28

466

193

.414

Rod

Carew

1977

31

547

225

.411

Rogers

Hornsby

1921

25

523

214

.409

From this we learn that the highest batting averages on balls in play were in the early 1920s. These aren’t the highest ever, possibly; these are just the highest in my data. Players who had long careers. In the dead ball era the outfielders played very shallow, much more shallow than they do today. When the lively ball era arrived (1920) you had outfielders playing shallow with batters who were capable of hitting the ball over their heads. These are the lowest in my data:

First

Last

YEAR

AGE

BIP

Hits

IPAvg

Ted

Simmons

1981

31

334

68

.204

Dick

McAuliffe

1971

31

392

81

.207

Al

Cowens

1983

31

311

66

.212

Everett

Scott

1915

22

338

72

.213

Willie

Jones

1953

27

417

89

.213

Frank

Bolling

1964

32

303

65

.215

Ed

Brinkman

1965

23

357

77

.216

Ed

Brinkman

1972

30

459

99

.216

Carlton

Fisk

1985

37

425

92

.216

So you can hit anywhere from .200 to .400 on balls in play, basically.

Next thing I looked at was batting averages on balls in play of players up to age 28, and after age 28. Up to age 28, the highest (career) batting averages on balls in play were these ten guys:

First

Last

BIP<28

Rogers

Hornsby

.370

Wade

Boggs

.370

Kiki

Cuyler

.370

Rod

Carew

.367

George

Sisler

.361

Paul

Waner

.360

Derek

Jeter

.357

Jeff

Conine

.355

Al

Simmons

.353

Earle

Combs

.353

Conine is kind of a fluke there; he didn’t have many plate appearances up to age 28. Anyway, one of my questions in "Hey, Bill" was to the effect that Wade Boggs was consistent in having very high Batting Averages on Balls in Play. Actually, Boggs’ batting average on balls in play was .370 up to age 28, but .337 after that—the largest slippage on this list, other than Conine.

First

Last

BIP<28

Past 28

Rogers

Hornsby

.370

.357

Wade

Boggs

.370

.337

Kiki

Cuyler

.370

.338

Rod

Carew

.367

.358

George

Sisler

.361

.337

Paul

Waner

.360

.329

Derek

Jeter

.357

.344

Jeff

Conine

.355

.309

Al

Simmons

.353

.323

Earle

Combs

.353

.328

But everybody regresses toward the mean, over time. Players who hit .340 up to age 28 will hit less than .340 after 28, although they will stay higher-than-normal. These are the players who have the lowest in-play averages up to age 28:

First

Last

BIP<28

Past 28

Mark

McGwire

.228

.272

Graig

Nettles

.232

.250

Harmon

Killebrew

.237

.254

Dave

Kingman

.239

.253

Rocky

Colavito

.239

.258

Ralph

Kiner

.243

.264

Tom

Brunansky

.245

.275

Willie

Jones

.247

.261

John

Mayberry

.248

.249

Matt

Williams

.249

.303

The guys with the lowest in-play averages are the guys with the uppercuts, the home run swings. Speed also has something to do with it. If you have a very low in-play average up to age 28 it will go up after age 28, but it will remain low.

Anyway, I did one more study, and this is really the critical one; this is the money shot. The essence of the issue is:

a) whether the player has a high in-play average in the base season, and

b) whether his batting average goes up or down in the next season.

If a player hits .385 on balls in play in one season, how likely is it that his batting average will go down the next season? Not his in-play batting average; his OVERALL batting average. Is it 60%? 70%?

It’s over 80%. If we had more data, it might be 90%.

I sorted the players by their in-play batting averages, what people have taken to calling BABIP. Hate that expression; it’s like scraping chalk on the blackboard. Anyway, Group 1 was players who had an in-play average of .380 or above, Group 2 was .360 to .379, Group 3 was .340 to .359, Group 4 was .320 to .339, etc. Group 10 was .220 and below.

Of those who had an in-play average of .360 to .379, 85% had a drop in their overall batting average in the following season (if they had 300 or more plate appearances in the following season.) Of those who had an in-play average of .220 to .239, 90% had an INCREASE in their overall batting average in the following season:

Group

Batting Average

Count

Up

Down

Majority Pct

1

.380

and up

68

12

56

82%

2

.360

to

.379

175

27

148

85%

3

.340

to

.359

402

102

300

75%

4

.320

to

.339

844

286

558

66%

5

.300

to

.319

1202

532

670

56%

6

.280

to

.299

1125

611

513

54%

7

.260

to

.279

684

475

209

69%

8

.240

to

.259

283

220

63

78%

9

.220

to

.239

77

69

8

90%

10

Below

.220

13

12

1

92%

At the ends of the chart, we approach predictions with near to a 100% reliability, 90% or thereabouts. But what is more surprising, to me, is the strength of the indicator, even if the deviance from normal is relatively modest. If a player hits just .330 on balls in play—a fairly modest average, really still in the fat part of the chart—but even at .330, the odds that his batting average will decline the following season (if he has 300 or more plate appearances) are two to one. If he hits just .270 on balls in play, the odds that his batting average will increase are two to one. So even in the fat part of the chart, the Ball in Play average provides a very strong indicator of whether the player has been living on luck.

Thanks for reading.

COMMENTS (12 Comments, most recent shown first)

Leave a comment

Name:

You may use the following tags: [b]Bold[/b], [u]underline[/u], [i]italic[/i]

Report inappropriate comment

Type of Abuse:

Comments:

Steven GoldleafThe part that I still have trouble with, after reading this patient explanation is: aren't hitters TRYING to hit line drives? If that's so, and I think it is, why wouldn't some batters be better at it than others? If you threw a hundred MLB fastballs to me, I'd be lucky to hit more than one line drive. If you throw 100 to a minor-leaguer, he'll hit maybe 10 line drives. If you throw to a AAAA batter, he'll hit fifteen, and to a MLB regular 20. Isn't that because they're skilled at hitting line drives?​1:36 PM May 2nd

shtharSo is BABIP real and can be relied upon, or is it merely a manifestation of luck?1:09 PM May 2nd

tangotigerNot to get too much into the weeds: I define the point at which the stat you observe is 50% luck and 50% talent the point at which the observed standard deviation is 1.41 (or square root of 2) times the random variation standard deviation (luck).

For example, that group of pitchers that I mentioned: they averaged 3064 balls in play (BIP). When you allow that many BIP, the amount of random variation you will get is one standard deviation = .0083, or sqrt(.3*.7/3064). That is, had we observed the spread in in-play batting average of these pitchers was .0083, we'd conclude EVERYTHING was luck. But since we deal with humans, nothing can be COMPLETELY luck.

What we actually observed among these pitchers was one SD = .0125, or 1.50 x random variation. So, at 3064 balls in play, the spread we observe is a bit more talent than luck. At about 2400 balls in play, we'd find that what we observe is an equal amount of luck and talent.

For things like strikeouts, the number of PA needed falls DRASTICALLY, somewhere around 150 PA or less. 12:14 AM Apr 29th

studesYou hit the ball out of the park, that's not luck.

True if you're just defining the fielding part of the equation as luck. But there is some luck in home run rates (admittedly not nearly as much as in BABIP). Pitchers, ballparks, wind blowing, gameday temperature, humidity can vary from game-to-game--even from PA to PA. These all impact BABIP too.

Guess it all depends on what you're trying to measure.7:37 PM Apr 28th

studesPizza Cutter, for one example, found that batter BABIP stabilizes at 820 balls in play, while it takes 2,000 balls in play for pitcher BABIP to stabilize.

He defines stable as the point at which future outcomes can be predicted equally well by either player-specific data or by general randomness. I hope I said that correctly. Tango can do better--I know he has some issues with Pizza's definition.

If I'm thinking of my proportions right, at 200 balls in play, the batter's performance would be 80% "luck" and the pitcher's would be 90% "luck". But the difference would get wider as the data increases, maybe maxing out at around a 20 point difference.

At every step, Voros would remove one component, so that each metric is independent of the others, a very binary tree approach.

Of course, there's no real reason to have to follow that form. For example, in the third line, you can instead say: $H = H/(PA-BB-SO) or BACON $HR = HR/H

And so on. The idea is to try to isolate events in a way that makes sense.

There's a dozen other combinations you can try.6:19 PM Apr 28th

bjamesI disagree with the point about BACON being more useful than BABIP, although it is certainly a better acronym. ..I mean, who doesn't like Bacon. If you offered me BABIP with my breakfast, I'd say, "No, thanks; I threw up yesterday." But BACON isn't useful because it doesn't measure what is truly relevant, which is the extent to which the batter has just been lucky. You hit the ball out of the park, that's not luck.

Anyway, while it is true that the "skill" element for hitters is larger than it is for pitchers, this is more misleading than instructive. For pitchers, it is 90% luck and 10% skill. For hitters, it is 80% luck and 20% skill. But it's still mostly a transient phenomenon. And thank God it's not a transient epiphenomenon. 1:16 PM Apr 28th

Scott_RossI'd like to add that one of the things that gets lost in a lot of BABIP talk is that each player has their own baseline BABIP, though they all hover around .300. And so where I find BABIP particularly useful is when I see a guy whose average is way down or way up, e.g. Daniel Murphy is batting .391 right now, looking at his career BABIP heading into this year, it was .314, and so far this year it's .439, and so we can see that he's been very lucky. Conversely, Kyle Seager is batting .143, but his 2016 BABIP is .121, well down from his career BABIP of .288. It can be an unsatisfying statistic for a lot of people because its real value is confirming or debunking what other stats are telling us.​1:13 PM Apr 28th

schoolshrinkGreat article. A cool study might be pre-steroid, steroid, and post-steroid era comparisons. McGuire obviously sticks out, but can BABiP be used to set apart steroid-era hitters from other eras, as if we need more evidence to suggest the impact of steroids. Just a thought.11:53 AM Apr 28th

tangotigerI sent a note to Bill the other day. I should have copy/pasted it. Anyway, it was something like I took all players born since 1971, looked at their stats age 23-29, min 3000 PA. I got something like 185 or 188 pitchers and batters each.

The spread in BABIP was one SD = .012 for pitchers, .020 for hitters.

If it was pure random, we'd have expected one SD = .008 or something. So, backing out that number, that leaves us with a true talent level of one SD = .009 pitchers .018 hitters

Essentially, the spread in talent is twice as wide for hitters than pitchers.

Then again, if you look at HR and SO and BB rates, you'll get wider for hitters than pitchers.

The point is that for BABIP, the spread in talent and the spread in luck, for pitchers is around the same amount, when PA = 3000 to 6000 or so.

For batters, it's not as bad.

***

And I agree with Studes that, generally speaking for HITTERS, wobaCON (wOBA on Contact) is more useful than removing HR from the numerator and denominator.

Makes much more sense for pitchers. 11:34 AM Apr 28th

studesFor hitters, I think I like BACON (Batting Average on Contact) better than BABIP or IPAvg. BACON includes home runs. I get why we look at BABIP (or IPAvg) for pitchers, but I don't really get why it's useful to do so for batters. I mean, I've done it lots of times but I don't know why.

I also want to mention that every study I've seen shows that there is much more luck involved with pitcher BABIP than hitter BABIP. Hitters have more consistent ability to influence their batted ball outcomes.

Pop-up rate is a better predictor of future hitter BABIP than line drive rate. There's lots of variance in line drive rates, but pop-up rates tend more to be an intrinsic "skill" of a batter.10:57 AM Apr 28th

mrbryanThanks, Bill! I've always found your work to be an antidote to the run of the mill thoughtless comments that pass for discussion in sports. I don't think a day goes by in which one does not hear reference to a player who is "showing signs of emerging from a slump" or "very dangerous right now, because he is hot." One repeatedly hears reference to players gaining or losing skills which cannot be measured - "he came in to the game 0 for 10 against this pitcher, but he must have learned something because he has two hits today," or the never-ending discussions of "clutch hitters" and the importance of batting order. Articles like this one are a glass of cool fresh water set apart from a great salty sea.10:03 AM Apr 28th