Wednesday, January 29, 2003

Game-Calling Revisited

Chris looks at catchers and DIPS.

In the 1999 Baseball Prospectus, Keith Woolner wrote what has become one the oft cited works regarding the value of a catcher and his ability to affect the game with his handling of pitchers. Woolner?s conclusion: "Looking at these results, though we would colloquially say that game-calling doesn?t exist, it?s more accurate to say that if there is a true game-calling ability, it lies below the threshold of detection."

It is a fantastic article and appeared to take a thorough approach to the problem and resolve it.

But then?

In 2000, a young unknown named Voros McCracken strolled into rec.sport.baseball and dropped a bomb called Defense-Independent Pitching Statistics (DIPS). The ensuing discussions were intense and thorough themselves. The theory behind DIPS is that, subtracting HRs, Ks, and BBs, a pitcher does not largely control the rate of hits on balls in play. Voros had done his homework, and he turned out to be right. DIPS made its way to BaseballStuff and then to Rob Neyer and Bill James. It was groundbreaking research, much like everyone thought of Woolner?s work the previous year.

You may say, what does one have to do with the other?

If the pitcher does not largely control hits on balls in play, then how can an analysis of catcher input that uses run values of balls in play be used? Let me be more clear: if the pitcher doesn?t control hits on balls in play, then analysis of a catcher?s game-calling ability cannot be related to hits on balls in play.

Woolner?s work, as near as I can tell, is interesting, but using the wrong data. The methodology is sound and should be used, but with a twist. In order to evaluate a catcher?s game-calling ability, one must study the only things that the battery does impact: HRs, Ks, and BBs.

I am not a mathemagician, so I?ll leave the Z-scores to someone like Voros or Keith.

Instead, I took all the pitcher-catcher pairs for 1999-2001 that faced approximately 200 batters in season N and N+1. I used Woolner?s study as a guide for that calculation. Keith did a great job of explaining what he did (although I still couldn?t understand it). Dan Szymborski explained I wouldn?t be able to use z-scores due to the way Keith used them for all hit events. I only have one hit event.

Ideally, one would take these pairs and calculate the catcher?s DIPS ERA for each, rather than using Pitching Runs. I started to do that, but in addition to not knowing Z-scores, I?m sort of lazy ? I?m just an idea man. I know someone will be right behind this article running the numbers, like Joe Dimino or Tangotiger, who are wizards with statistics and spreadsheets.

I calculated each battery?s HR/PA, K/PA, BB/PA rates and compared each to the same pitcher with a different catcher. I then subtracted the catcher?s rate by the "not catcher" rates to get a difference in the catcher and the "not catcher". I then ran correlations comparing season N with season N+1. If game-calling is an ability, the numbers should have a decent r and r^2 ? that is, if pitcher-catcher has a lower HR rate in season N and N+1, then it may be a skill that has this battery keeping the ball in the park. To be clear, I am not looking at a single catcher, but rather the set of 100 battery sets. If a pitcher-catcher has a higher HR rate in season N and N+1, that also counts.

The HR/PA and BB/PA is "catcher ? w/o catcher", while the K/PA is "w/o catcher - catcher". This is to keep negative numbers and positive numbers to always have the same meaning ? negative is good for the catcher, positive is bad.

Here you can see that Piazza?s relationship with Leiter, in 1999, resulted in fewer HRs, BBs and Ks. In 2000, his work behind the plate got Leiter more HRs, but fewer walks and more strikeouts. All of this is compared to the Mets? other catchers: Todd Pratt and Vance Wilson.

The correlations came up as follows:

Stat

HR/PA diff

BB/PA diff

K/PA diff

R

-0.17237

-0.09412

0.0912

R2

0.02971

0.00886

0.0083

All of these correlations are very weak to non-existent. This supports the idea that the catcher?s game-calling skills are not affecting the success of the pitcher.

After working this up, I also gathered all the data for catchers with and without pitchers. This data was treated in the same manner as the pitcher-catcher data, using the same pairs.

I knew what I thought, but I turned to smarter people for interpretation. After discussing with Ron Johnson, who knows regression, and a pass-by Mike Emeigh, the data seems to say the strikeout and walk rates show some significance in correlation, but the meaning here is the opposite of above. The K rate with a good R^2 indicates that the pitcher controls the rate, with no apparent effect from the catcher. The walk rate has good correlations and regressions indicate some impact from the catcher ? in the neighborhood of 10%. The relationship isn?t strong, but there does appear to be some effect. The HR rate still has no significant correlation -?pitcher or catcher in these data sets. This may not be enough data due to the few number of HRs allowed at all.

Conclusion:

This DIPS-related look at catchers? effect on game-calling supports Woolner?s assertion that catchers do not largely impact the game with work behind the plate, running game excluded.

A more detailed study using an actual DIPS calculation may reveal more, including home/road statistics and ballpark effects. A larger group of battery sets would improve the ruggedness of the study.

Reader Comments and Retorts

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

Chris, excellent study. The problem that Keith, who had a great study on his own as well, faced was that there is so much noise in ERA, that to try to find some correlation or causation based on the catcher was almost a hopeless battle.

By breaking things into piecemeal, by trying to see *where* the catcher may have an influence, that seems easier (though still hard) to do. As for game-calling, it's not only about setting up the pitch, but about pitch sequencing as well.

What would be interesting is to look at the other events, namely SB, CS, catcherPO, PitcherPO, BK, WP, PB. Where is the influence of the catcher most felt, between the pitcher/catcher dynamic?

we must assume that pitch-calling in general has no effect on any pitching stat. If that is the case, then Pedro throwing fastballs right now the middle

No, we can say that there is no variation between MLB catchers, which is far different than what you are suggesting.

As for DIPS, there are differences between GB and FB pitchers (big ones too!), just as there are differences between any trait you can think of. The point of DIPS is that these differences, while real, have so little effect or have so much noise inherent in them, that you could ignore them altogether, apply league averages, and still be able to make a good evaluation of a pitcher's true talent level.

(Note: why FB out% are higher, there are also more XBH/H on FB, almost, but not quite, cancelling the effects out.)

If you argue that some pitchers are better at picking the next pitch to throw...

I really see no reason to separate a pitcher's physical tools from his mental tools. The combination of the two determines the effectiveness of the pitcher. Maybe Maddux doesn't have the same physical tools as Randy Johnson, but the result is that they are equally effective (more or less).

Minks, what you're suggesting is that catchers have a significant impact on pitch-calling effectiveness. At least, significant enough to show up in the home run/strikeout or walk rate of a pitcher.

Pitchers and catchers study a lot together, including the backup catcher. So in the case of Pedro, I actually doubt that the catcher has that much of an impact. He knows what he wants to throw and when, and I'm sure he lets the catcher know it, too.

As someone (you?) stated, it may be valuable, as a next step, to study this data for relatively inexperienced catchers.

DIPS is hard to get hold of. Essentially, though, DIPS research says that MLB pitchers have very little effect on the percentage of hits off the total number of balls hit into play off them.

It certainly doesn't say anything about any particular ball hit into play. I used this before, but we all know that a when a ball is hit into play off a belt-high mid-80s fastball, it's likely to be a double into the alley. When a ball is hit into play off a mid-80s slider breaking low and outside, it's likely to go 4-3 or 3-1.

DIPS says that MLB pitchers, as a general rule (and there are exceptions), don't differ in their ability to prevent hits (generally) on BIP (generally). You shouldn't draw any other conclusions about this. The discussion above about groundball vs. flyball pitchers is instructive... it's not that there's no difference between groundball and flyball pitchers, but rather they tend towards the same result.

"if the pitcher doesn?t control hits on balls in play, then analysis of a catcher?s game-calling ability cannot be related to hits on balls in play."

This may be true, but it is not obviously true. Variations on hits on balls in play may be affected by the catcher without being affected by the pitcher. In fact -- although it is very unlikely -- the numbers might show that variations are entirely due to the catcher, and since most pitchers on the same team throw the same percentage of innings to each catcher, that is why BABIP tends to be the same.

Of course, the above quote is probably right, but since we're breaking out the R squareds, I'd like to see what the R squared is for BABIP.

On the same team, with the same manager and pitching coach, it's not surprising that there are similarities between starting and backup catchers. I would be surprised if Vance Wilson and Mike Piazza had significantly different approaches to pitch sequences and pitch locations, since they study film and go over opposing lineups together.

This is close to a statistical certainty (or maybe is one). A C's walk rates are an aggregate of pitcher walk rates, and whenever you aggregate, you reduce variability (as a % of mean or when dealing with a rate stat). I won't guarantee that it could never happen -- for example I suppose it might happen if C's were the dominant force in determining walk rates.

So aside from HR, groundball pitchers have no advantage over flyball pitchers? Increased double plays, fewer gappers and doubles off the wall, etc.?

It's a mix. GB pitchers give up a higher BA/BIP, but give up fewer 2Bs and 3Bs and get more double plays, than FB pitchers. As luck would have it, these things happen to pretty much balance out, so unless you're getting really picky, you can usually ignore them.

I suppose we should switch from "results from BIP are pretty much the same across pitchers" to "results from BIP are equivalent across pitchers".

What about pitch location? Don't the same skills that lead to more Ks and fewer homers also lead to more popups and dribblers to the mound or first? The logic would seem to be that even though Pedro is much, much harder to hit than Albie Lopez, when you do manage to put the ball in play against him, although you're usually lunging at a brilliant curve or way behind a high-90's fastball, your chances are roughly as good as when you're taking BP off Albie. That just doesn't make sense. You can run all the numbers you want, but I just can't get past the irrationality of it all.

This is my biggest source of cognitive dissonance with regard to DIPS as well. I mainly think about it in the following 4 ways.

(1) When lunging at that curve, even if you make contact, you usually foul it off.

(2) Remember, we took HR out of the equation because we're looking at DEFENSE-independent stats. That's different than asking "how hard is it to hit this pitcher (when you can hit him at all)?" For those purposes, HRs are still "in play" and you do see substantial differences among pitchers. Albie does groove more pitches than Pedro (actually I've never looked at Lopez's HR rates).

(3) It's my understanding that recent play-by-play data suggests pitchers don't have much control over the direction the ball is hit (though someone demonstrated that Glavine apparently can) either. This suggests (to me at least) that whether a batter is "late" on a pitch (put in play!) has more to do with the batter. If that's so, I find it easier to believe that they're responsible for whether it gets dribbled down the 1B line too.

(4) In a sense, what we're really "measuring" is the number of hittable pitches per PA; Pedro may have a much lower rate of hittable pitches per pitches thrown, but if his PAs go longer or he throws more strikes per pitch, he may throw as many per PA.

For example, try the following extreme thought experiment. Every pitch a pitcher throws is a strike. There are two kinds of strikes, hittable ones (i.e. can be put in play) and unhittable ones (not put in play). Batters always put hittable strikes into play, never put unhittable ones into play.

Now we've got a pitcher, we'll call him Livan. 50% of Livan's pitches are hittable. Truly oddly, he alternates -- the first pitch is unhittable, the second one is hittable. Consequently, all of Livan's PAs last 2 pitches and result in a BIP.

Then we've got Moose. Only 1/3 of his pitches are hittable, but like Livan he's amazingly consistent, 2 unhittable ones followed by a hittable one. Result? Every PA lasts 3 pitches, but still results in a BIP.

So in this world, although Moose is obviously a much better pitcher than Livan, his results would be exactly the same.

Then we've got Pedro. Only 1/4 of his pitches are hittable, but again consistent: 3 unhittables followed by a hittable. Result? A strikeout, BIP, strikeout, BIP ....

So here Pedro is twice the pitcher Livan is, and he's got the Ks to prove it (unlike poor Moose). But his results on BIP? The same. Where does Pedro's unhittability show up? In his Ks.

Let's complicate things a bit more by not requiring that hittable/unhittable alternate. But we'll simplify by dropping Moose. Again Livan throws hittable pitches 1/2 the time, Pedro 1/4 the time. We still assume batters always put the ball in play. Here are the probability of different outcomes for each (U=unhittable, H=hittable).

For Livan:
1/8 - UUU
1/8 - UUH
1/4 - UH
1/2 - H

On average, Livan throws 1.75 pitches per PA. Over 64 PA (you'll see why shortly), he would have 8 Ks, 56 BIP, and throw 112 pitches. Out of those 112 pitches, 56 would be hittable.

For Pedro:
27/64 - UUU
9/64 - UUH
3/16 = 12/64 - UH
1/4 = 16/64 - H

On average, Pedro throws 2.31 pitcher per PA. Over 64 PA, he would have 27 Ks, 37 BIP, and throw 148 pitches. Out of those 148 pitches, only 37 would be hittable. Clearly he's a better pitcher, but still no reason to expect him to have better results on BIP.

And note the "paradox". Livan throws hittable pitches at twice the rate Pedro does. But Livan throws 56 hittables in 64 PA while Pedro throws 37 per 64 -- so Livan throws only about 1.5 times as many hittable pitches, not twice as many.

Now let's complicate things a bit further by (kinda) bringing balls into the pitcher. Assume each pitcher lasts an average of 32 PA per start. Assume each pitcher averages 100 pitches per 32 PA. Assume the other assumptions above hold. But let's assume there's no such thing as walks.

From the above, we know that Livan would throw 56 strikes per 32 PA (remember he threw 112 per 64 PA back when all pitches were strikes). So we assume the other 44 pitches are all balls.

Pedro would throw 74 strikes per 32 PA, so that leaves him with just 26 balls.

Let's look at some stats again. 74% of Pedro's pitches are strikes, compared to 56% for Livan. Half of Livan's strikes are hittable while only 1/4 of Pedro's are hittable. Pedro would strike out 13.5 guys while Livan would strike out 4. These of course are huge differences and would lead to substantially different outcomes (on average). And we've done this without having to assume that Pedro had a better BA/BIP.

Obviously the real world is more complicated, but hopefully you get where I'm going.

I hope I haven't confused the picture more. And I know I haven't proven anything. But the points are: (1) yes Pedro throws more unhittable pitches, but that's where his Ks come from; and (2) there are a lot of variables here that aren't controlled for in our "common sense" scenarios (or in my extreme examples either). And believe me, I thought (and still think to an extent) along the same lines as our mystery poster.

No offense intended, but how hard is pitch-calling anyway? Up and in, low and away, a little down and in to mix it up. Fastball, fastball, slop ... but mix it up a bit. When there's two strikes, you either throw it a little more low and away or you throw it a little more up and in, preparing to throw it a little more low and away with the next one.

Take that, plus some basic things like "this guy likes fastballs in" and "hmmm...curveball's not too good today, let's stick mainly to fastball and changeup" and "it's Rey Ordonez, throw it wherever you want" and I'd think you've got about 95% of game-calling.

For what it's worth, the best evidence I've seen of a pitch-calling ability is Steve Stone. Regularly, during Cubs broadcasts, he'd say something like "if he throws him a curve here, he's gone" or "if he throws him a fastball here, it's gonna get hammered" and he was frequently right. Of course Steve was a pitcher, and a pretty average one at that, so this suggests it's either not that big a deal or he's no better at it than anyone else.

Batman: Pitchers who give up tons of line drives don't tend to last too long.

The thing to remember about DIPS is that it does not say that pitchers do not affect how often balls in play become hits. It does not say that there are no difference between pitchers, or even that those differences are not statistically significant.

It says that, for the vast majority of successful major league pitchers, those differences are not nearly as signficant as K Rate, BB Rate and HR Rate, and not nearly as important as we assume.

We know that line drives tend to be hits much more often than fly balls or grounders. So those pitchers who tend to give up a lot of line drives probably don't make it to the majors in the first place.

Most of the questions have been answered internally, and they were really for Voros, so I'll move on.

Phillybooster - you are right.

In fact, Ron Johnson mentioned this exact thing to me before we posted the article. The possibility does exist. As I told Ron, how could you possibly separate that? I plan on crunching as much more of the data as I can, but it's very tedious - I've been working on this for a while.

Common sense says that if pitch A is more likely than pitch B to be swung on and missed, the same pitch would also be more likely, when it is put into play, to be put into play poorly--a bloop single every now and then, but mostly dribblers and popups--while pitch B gets hit on a line--sometimes right at someone, but more likely for a hit.

The analogy doesn't quite hold. In general, a fastball is more likely to be swung on and missed than is a curve ball (because the hitter has more time to react to the curve). However, a curve ball is more likely to be hit *poorly* than a fastball - because the hitters has to gauge not only the speed of the pitch but also the extent of the break. Generally, curve balls that don't hang aren't hit well. Chart pitches at a game some time and see if that isn't so.

Sean,
no, not yet. I really wanted to get the concept out there with some data for indications. I have a lot of data still to sort through.

I'll have to flip through the WR to see how it works out. In terms of predictive value, I was always looking at season N and season N+1, so something may appear.

David,
you are right. There have to be some differences. The questions is do those differences impact HBIP? If they did, they should show up in these three catagories. As Phillybooster says, there could be other effects that I haven't controlled for - one of the good things about Keith's study is that it is complementary to mine - Keith covered HBIP, while mine covered the DIPS slots. Both Keith and I offer the disclaimer - this doesn't mean there is no skill, but that it possibly is smaller than the noise.

Fortunately, the Primer staff has the right stats guys around - and further studies on this will happen.

"It seems to me the biggest effect will be the home plate umpire. Each one has a different strike zone--smaller zones will result in more walks, fewer K's and more hits. This could completely swamp the effect of the catcher. Has anyone ever calculated umpire ERA's?"

It's a very good point. Something that Chris' idea about examining thing on a PA by PA basis will help to deal with.

You can get umpire data with Ray Kerby's Retrosheet parser (though
Umpire names aren't consistently entered) and BP has the current data. As you suspect, the differences are dramatic.

It may be that if we can get umpire and park effects out, we can locate a meaningful catcher effect.

I agree with Chris' sentiment, that it's the search for truth that motivates many researchers, and the "practical application" is a secondary issue. And, while you're doing the work, you will usually end up discovering something (even completed unrelated) that would be practical.

The most important point brought up so far, in my view, is that even if Chris and Keith will end up finding no differences among MLB catchers, this doesn't mean that the catchers have no impact on game-calling. It just means that, of the guys already selected to play MLB (and that may be based on game-calling), there is no difference (similar to DIPS). But there could be a world of difference between a typical MLB catcher and a typical Double-A catcher.

Yes, tango, a friend of mine who pitched in collegemade a similar observation: yes, game-calling is a skill, but the difference isn't much at this level. He also suggested looking at a particular subset: less-experienced pitchers with experienced catchers and inexperienced catchers. Say Wade Miller and Roy Oswalt with and without Ausmus.