The sabermetric blog of Cyril "Cy" Morong, professor of Economics at San Antonio College

Thursday, August 19, 2010

Batting Average On Balls In Play Over Time

I used data from Baseball Reference and broke things down into the two leagues from 1901-2010. Balls in play (BIP) = BFP - HR - SO - BB - HBP. Here is the graph for the AL. Hits on BIP is H - HR.

I don't know why it has changed over time and I am not even sure if it is important. But I was curious to see what it was. Now the NL, which has a similar pattern.

The next graph shows the difference between BABIP and overall AVG for the AL, year-by-year. Again, I don't know if it is important or why it changes.

Now for the NL.

A few years ago I ran a regression using league-wide data for the AL from 1920-2002. BABIP was the dependent variable and the frequency of HRs, SOs, and BBs were the independent variables. The equation was

BABIP = .338 + 2.19*HR - .467*SO - .48*BB

T-values

HR 5.28SO -56.1BB - 2.89

So it looks like all the variables were significant. The r-squared was .287 and the standard error of the regression .0097. So it seems like when HR frequency rises, BABIP also rises. Maybe this means that balls are being hit harder, so they are harder to catch or it is just a general lack of good pitching. If SO rate rises, BABIP falls. If it is harder to make contact, it might be harder to make solid contanct. But if BB rate rises, BABIP again falls. Maybe it means that there are more pitches outside the strike zone, making it harder to get good wood on the ball.

Thanks for dropping by. I took doubles out and the pattern is very similar. I just looked at the AL and only from 1913 onward. I had to switch to using batting data instead of pitching data since Baseball Reference does not show doubles allowed by the pitchers for the whole league in a given year. But even so, the timeline charts parallel each very closely, with the one without doubles being lower, of course.