Team Sites

Ahead in the Count: Home Runs, Fly Balls, and Popups

I have always loved pitchers’ duels. One of my favorite childhood baseball memories is watching Curt Schilling throw a complete game shutout for the Phillies in a 2-0 win against the Blue Jays in Game Five of the 1993 World Series, with the Phillies facing elimination. I was only 12 years old at the time, and I did not know anything about sabermetrics, but Schilling appeared majestic as he pitched yet another brilliant start in what would become a magnificent playoff career. He only surrendered five singles that night and extended the series one more day.

When I finally did learn about Defense Independent Pitching Statistics (DIPS) and Voros McCracken’s discovery that pitchers have little control over batting average on balls in play, I found it mildly disappointing to look back at that box score and discover Schilling actually had rather mediocre DIPS numbers that night—just six strikeouts and three walks. However, I still felt confident in my memory that I had seen a great game as a youngster, and I convinced myself that the lack of home runs that Schilling allowed were evidence that he was unhittable that night. Unfortunately, I later discovered that pitchers have little control over the rate of home runs allowed per fly ball, and that Schilling actually generated just 10 ground balls that night on 24 balls in play. Though DIPS Theory finds Schilling’s entire career amazing, his numbers from that alleged gem appeared pedestrian even on that front.

Some fans probably dislike DIPS Theory because it takes the glamour away from memories like these, but I find myself appreciating individual pitching performances more now that I have a better understanding of the game. Seeing a pitcher like Tim Lincecum strike out 14 hitters in his playoff debut adds to the majesty of the 2010 postseason, rather than takes away from Roy Halladay’s playoff no-hitter the day before. The knowledge of what good pitchers do and do not control has given me an ability to watch a brilliant pitching performance critically and appreciate the masterful performances more.

Elementary DIPS theory is well established. Pitchers have a lot of control over strikeout, walk, and ground-ball rates, as evidenced by the year-to-year correlation of this statistic, but they have poor control over batting average on balls in play (BABIP). While we have known for some time that pitchers also have little control over HR/FB, statistics like FIP remove all of the variance from BABIP and reproduce an ERA with a league-average BABIP for all pitchers, but leave in all of the variance in HR/FB. This treats HR/FB as a skill and hides the fact that home it is a statistic that pitchers show even less ability to maintain than their BABIP. Looking at home runs per outfield fly ball is important, because some pitchers do have some tendency to allow more popups than others, even controlling for team.

Below I list the year-to-year correlation of different statistics for 549 pitchers with at least 300 balls in play in consecutive seasons from 2003-10. For BABIP and HR/FB, I give numbers net of team performance (because defense and parks affect these numbers similarly for pitchers on the same team).

Pitchers actually appear to have more control over the rate of hits allowed on balls in play than they have control over the rate that outfield fly balls go for home runs. Yet statistics like FIP remove the luck from BABIP, but not HR/FB, because it is known that the rate of HR/FB is a little persistent. However, since BABIP is even more persistent, this does not make much sense. Statistics like SIERA and xFIP that neutralize the number of HR/FB do a better job of explaining reality when you have only a limited amount of information about a pitcher. Tom Tango has often argued that several seasons of data will show patterns in HR/FB, and that FIP will outlast xFIP when predicting ERA for pitchers with several years of data available. The problem is that excluding BABIP for several years of data by using FIP will ignore the control that pitchers have over BABIP as it reveals HR/FB skill, and ERA itself may be better at showing a pitcher’s skill level with several years of data than FIP will.

Somewhat interesting was that pop-up rate has a high year-to-year correlation itself (.582). But what was surprising was that even after adjusting for team (and therefore park and foul ground area), the rate of popups per fly ball also had a pretty high year-to-year correlation (.328). This suggests that inducing popups is a skill beyond being a product of being a fly-ball pitcher. Therefore, SIERA’s inclusion of outfield fly balls and infield popups together may be something worth revisiting as we learn more about pitching.

All numbers in the below table are computed relative to total team numbers, and are for all pitchers with at least 100 balls in play in consecutive seasons between 2003 and 2010 (a sample size of 1459).

Correlation

Pop-ups per ball in play(same year)

Pop-ups per ball in play (next year)

Fly balls per ball in play (same year)

Fly balls per ball in play (next year)

Pop-ups/(Pop-ups + Outfield Fly balls) (same year)

Pop-ups/(Pop-ups + Outfield Fly balls) (next year)

Pop-ups per ball in play(same year)

1.00

Pop-ups per ball in play (next year)

.582

1.00

Fly balls per ball in play (same year)

.542

.494

1.00

Fly balls per ball in play (next year)

.571

.494

.635

1.00

Pop-ups/(Pop-ups + Outfield Fly balls) (same year)

.838

.356

.044

.293

1.00

Pop-ups/(Pop-ups + Outfield Fly balls) (next year)

.317

.058

.085

.072

.328

1.00

However, the rate of popups per ball in play does not have a significant correlation with the rate of home runs per outfield fly ball (-.043). The importance of making an adjustment to SIERA seems minimally important, as the correlation between next-year’s ERA and this year’s pop-up rate is low as well (-.028). On the other hand, regressing next year’s ERA on this year’s pop-up rate and this year’s outfield fly-ball rate shows an interesting effect.

The net pop-up rate coefficient is significant, with p=.022. This suggests that differentiating popups and fly balls may be more useful and could be used to enhance SIERA.

Although the rate of popups per ball in play does not correlate with the rate of HR/FB, this does not mean that we cannot learn something important about this statistic. In fact, part of the strength of SIERA comes from the fact that it implicitly models the rate of HR/FB similarly to how it implicitly models BABIP.

I checked the correlation of HR/OFB with DIPS statistics:

Statistic

Correlation with HR/FB (100 balls in play)

Correlation with HR/FB (300 balls in play)

SO/PA

-.116

-.083

BB/PA

-.001

.018

GB/BIP

.028

.003

The reason that the strikeout coefficient in SIERA is so large is that not only do strikeouts lead to lower ERAs in and of themselves, but they also correlate with lower BABIPs and lower HR/OFB rates, which likewise correlate with lower ERAs.

Running a regression (on all pitchers with 100 balls in play or more) of HR/OFB rate on all three statistics does not show a statistically significant coefficient for anything but strikeout rate:

Variable

Coefficient

P-Statistic

SO/PA

-.100

.000

BB/PA

.014

.653

GB/BIP

.003

.747

Constant

.013

.037

The equation of the regression with only strikeout rate is:

Net home runs per outfield fly ball = .016 – .100*(SO/PA)

In the above regression, the p-statistic on strikeout rate is also less than .001.

Overall, this article and yesterday’s article both show the benefit of using regression analysis to study ERA. Pitchers clearly exhibit far more control over defense-independent pitching statistics, but they still do have some control over BABIP and HR/FB. While looking at statistics like FIP removes the noise inherent in these two metrics, they also remove the skill. Since pitchers’ BABIP and HR/FB skills are significantly correlated with their DIPS skills, running a regression actually controls for these effects and gives an accurate reading of pitchers’ true skill levels with a unique methodology that picks up on factors that other measures do not.

I love this stuff, Matt; thanks.
With respect to BABIP and ground-balls, I found Jamie Moyer's season last year very interesting. There was certainly a difference between those games when he got good results vs. bad. I bring him up because I watched a lot of his games last year and have been trying to figure how to square what I *thought* I saw with what the advanced metrics told us about his value to the Phillies.
We talk about a very low BABIP as being unsustainable, and rightly so, I think, but it seems to me that we make a mistake if we look at a low BABIP and simply conclude "he was wasn't *really* pitching that well". In those games in which Moyer posted the best results last season, he did indeed (somehow) induce a lot of weak contact, weak grounders and flies that are easy to field. Obviously weakly hit balls can find holes and be misplayed too, so luck remains a factor. Still, it seems to me that DIPS theories don't tell us much about any one game, but rather about the persistence of certain tendencies. With a pitcher like Moyer, he needs to have very good command to be successful (and to "get" the outside strike), but when he does have that command he *really* did pitch that well--weak contact matters--but if he's slightly off, he gets hit much harder.
Am I making any sense? If so, is there anything to what I'm suggesting?

I think that any detailed statistical analysis of one particular game needs to have a big asterix on it. That said, I think that pitchers (particularly ones like Moyer) are great at having plans for games. They are really good at mixing pitches and finding hitters weaknesses. As a result, they can often have good BABIP in a fleeting moment, but those strategies don't tend to be persistent. It's tough to keep getting the same hitters out if you are not striking them out or getting them to hit ground balls. Moyer will surprise a lot of hitters with a fastball down the middle on an 0-2 count, and then they'll hit a pop-up when they suddenly realize that it is actually a fastball. I don't think that Moyer will get the same guy to do that again, so I wouldn't say his BABIP is repeatable (as evidenced by his ever so slightly good but not amazing career BABIP), but he's definitely able to make a good pitch to get an out in an individual game. One thing I have also noticed with Moyer is that when his location is off, his BABIP is terrible, and no pitcher can always have their location on. It's a fleeting thing-- even Cliff Lee and Greg Maddux have games where the ball isn't going where they want it to be within the strike zone. So I would say it's possible to have a good game with mediocre K/BB, but probably it's rare, and the times it does happen, it's probably when the pitcher gets ahead in the count a lot.

I've mentioned this to Matt on my blog, but I'll repeat it here.
FIP takes no position on the amount of skill level captured in any metric. The single and sole purpose of FIP is to capture the HR, BB, HB, SO observed results (a subset of a pitcher's results) and expresses it as a single number. (And, for ease of use, scale it to ERA.) This is no different than OBP treating BB and HR identically.
That FIP includes persistent results like SO and less-than-persistent results like HR is irrelevant. That's exactly the case with OBP as well, as it includes a persistent result like BB and less-than-persistent results like singles.
If you want to know how the observed results, things that actually happened, is associated to runs, then FIP gives you that. SIERA will not give you that. That's not a knock on SIERA, but neither are we going to hold it to the standard that it doesn't tell us what actually happened, when it doesn't purport to do that to begin with.
Observed results is a combination of the true talent level, other biases, and random variation. If you want to know the pitcher's true talent level inside that observed result, then clearly you have to do something to FIP in order to use it for that.
Back to Matt's article now...

can't the answer be that FIP does both, depending on how it is used? As an ERA predictor, it acts like Matt describes--ignores noise and skill. As a representation of specific skills, it acts like Tango describes-- just observed results.
If you want to know the pitcher's true talent level inside that observed result, then clearly you have to do something to FIP in order to use it for that.
Matt isn't trying to find a pitcher's true FIP talent. He's trying to find a pitcher's true ERA talent, and these articles are about the adjustments made to SIERA that may allow it to be used for that.

I don't want my discussion of FIP to be mischaracterized. When I say "FIP assumes no BABIP control," I do not mean "Tom Tango assumes no BABIP control." I'm ONLY saying that his model is designed to neutralize BABIP, and one of the advantages SIERA has is that it neutralizes BABIP luck with neutralizing the whole thing. One of the model assumptions of SIERA is that all we know about BABIP can be explained by K, BB, and GB%. That's not a 100% true assumption either. But it's a feature of the model.

What about asymmetric ballparks? I seem to recall one of the Yankee pitchers (I think it was Sparky Lyle) claim that he pitched so that the ball would go to left-center if it were drilled. Would that affect HR/FB?

I believe that is accounted for when they compare a given pitcher's data to the team data. Matt explicitly mentions popups WRT infield foul territory, I would have to imagine it acts similarly for OF asymmetry.
It's possible that there are pitchers who adjust their style to fit a ballpark. But that sounds suspiciously similar to the old claim, "a good pitcher throws to the score" (cough cough jack morris). In that case, the onus would be on the person making the assertion to prove it. I just doubt that there's much of an effect from it.

Yeah, it's probably very unlikely that many pitchers can really control where hard hit contact is made. If you're going to have that much control, you should probably aim to get the guy to make weak contact or miss. But a few pitchers probably do have the skill and intelligence to get that job done, which is why we do see somewhat positive correlations for a lot of the numbers above, even the ones close to zero.

This is very interesting, but I am confused on one point. Matt wrote near the end of the article:
"Pitchers clearly exhibit far more control over defense-independent pitching statistics, but they still do have some control over BABIP and HR/FB."
I didn't see the evidence for the latter part of that sentence, nor any data on how much control the most effective pitchers might consistently show over BABIP. Thank you.

Well, the proof mostly comes at the beginning. The correlations of BABIP and HR/FB are positive and small. They are also significantly negatively correlated with K%, and BABIP is significantly correlated with GB%. So we do say evidence of that small amount of control in that respect. Pitchers who miss bats also tend to induce weaker contact.

I think what Kaiser's point is one that permeates everywhere: if you see a correlation of r=.10, then, that's it. It's r=.10. But, it's only r=.10 when BIP=400 (or whatever Matt used). Indeed, the r numbers are useless unless you also know the number of opps.
If r=.10 when BIP=400, then r=.50 when BIP=3600.
Unless you have a systematic bias, you can get r to approach 1 on almost anything, as the number of trials approaches infinity.

I think what he is saying is that any variable that has any slight persistence at all will do that. That's true, I think. But we're talking about saying suppose that all pitchers gives up infinity fly balls in 2010 and infinity fly balls in 2011, all against the same competition, the correlation of his HR/FB in those two years would be 1.0. That's technically true.
The sample size issue that Tango is talking about with HR/FB is cancelled out somewhat by the fact that HR/FB is lower than BABIP. HR/ofFB is around .130 on average. BABIP is around .300. About 28% of BIP are oFFB. So if you take a pitcher with N BIPs and .28*N ofFBs, then your standard deviation due to randomness is:
For BABIP: sqrt{(.300)*(1-.300)/N} = .46/sqrt(N)
For HR/ofFB: sqrt{(.130)*(1-.130)/(.28N)} = .64/sqrt(N)
So there is a slightly larger standard deviation due to randomness, about 1.39 times as large for HR/oFFB. But the correlation for BABIP is .122 vs. .075 for HR/ofFB, which is 63% larger. So we're still looking at BABIP having higher variance in skill level (***assuming I did all that correctly***)