The Stats Go Marching In

Catcher Framing Before PITCHf/x

Analysis of framing has intensified over the past couple of years, with Joe Maddon talking about it on the radio and (via Ben Lindbergh) Clubhouse Confidential and MLB Network’s Diamond Demo series featuring discussions of the issue with guests like Jonathan Lucroy. Ben has been running a weekly column on the subject since the start of the season: in the first installment (as well as this piece for Grantland) he provided some background on the research so far, so you’re invited to have a look at that article before you read the rest of this one.

Framing evaluation is one of those research subjects that has been made possible by PITCHf/x data, which means that we’re now into the sixth full season for which catcher framing can be measured. However, for quite some time, I’ve been thinking about this: if one could get a good approximation of the framing numbers just using Retrosheet pitch sequences, 20 years of catcher framing could be added to the discussion. When Ben jogged my memory recently, I decided it was time to stop thinking about it and start doing some number-crunching.

The Method
Going back to 1988, Retrosheet has data with a fair degree of completeness for pitch sequences, indicating the outcome (ball, called strike, swinging strike, foul, and so on) of every pitch thrown.

For each plate appearance, I counted the number of pitches not featuring a swing by the batter (basically balls and called strikes), with the useful Chadwick Tools saving me a lot of time and work.

In the original model I created with PITCHf/x data, in addition to using the location coordinates as measured by the camera system and the pitch type as classified by the MLBAM algorithm, I controlled for the effect of the ball/strike count, the home plate umpire, the pitcher, and the batter—plus, obviously, the catcher.

Since that model requires a lot of computing time, in order to update my numbers once in a while, I switched to a simpler but quicker model in which the pitcher and the batter are not accounted for. In fact, once the location and pitch type are factored in, the batter has very little effect on the call by the umpire (mostly due to his stance and proximity to the plate, I suppose). The effect of the pitcher is also reduced, and I decided that the tradeoff between accuracy and computing time was worth the exclusion. However, with Retrosheet data, we have no information on pitch location and type, so throwing the pitcher and the batter back into the model was necessary.

In short, for every plate appearance I have the percentage of strikes on pitches not swung at as the outcome variable and the four actors involved (pitcher, catcher, umpire, batter) as the predictors. As I have done many other times in my baseball analysis, I have used a Cross-Classified Multilevel Mixed Model, which for saber-oriented people I’ll call WOWY-on-steroids.

Note that when using PITCHf/x data, an extra strike is more or less attributable to something framing-related, being it a good reception by the catcher, the pitcher hitting the target, or the umpire being deceived (or, more likely, a combination of the three). However, when no information is available about location, several other factors come into play: among the called strikes are, for example, pitches thrown right down Broadway that may have not been swung at because of the batter’s tendencies (partly accounted for as the batter is in the model) or because great sequencing has fooled the batter. Thus, this version of framing might include at least some pitch-sequencing effect as well.

Comparing Retrosheet and PITCHf/x numbers
Obviously, the first thing to do before calculating and showing numbers going back to 1988 is to test how the rankings based on Retrosheet-only data compare with the PITCHf/x version for the years that have the more detailed data.

Let’s start by showing a scatterplot featuring framing runs saved (prorated to 5,000 pitches caught*) by catchers in the seasons from 2008 to 2012. The darker dots denote a higher number of pitches caught, signifying more reliable estimates.

* Keep in mind that from here on, when I write “pitches caught” I really mean “pitches caught with no swing attempt by the batter.”

Not a bad start. The chart displays a good agreement between the two different models; the Pearson correlation coefficient, weighted for the number of pitches caught, is a healthy 0.72.

One important difference between the two methods is the distribution of ratings. The PITCHf/x-based numbers are more dispersed: when one considers catcher-seasons with at least 1500 pitches caught, the standard deviation is close to 13 runs for the PITCHf/x numbers and about 7.5 for the Retrosheet ones. That means the Retrosheet-based values (I’ll call them “RetroFraming”) will yield more conservative results.

Given the good agreement of RetroFraming with the PITCHf/x-based numbers, we can move on to showing some numbers going back to 1988, keeping in mind that we’ll less likely see extreme values with this metric.

Single-season achievements
The best catcher-framing season of the last quarter century belongs to Brad Ausmus, with 36 runs saved for the 2000 Detroit Tigers.

Here a note is due. In the previous section, I warned that RetroFraming numbers give more conservative results: in fact, there is no trace of a 50-run season. A recent revision of my algorithm has changed Jose Molina’s PITCHf/x framing value for 2012 to 41 runs, but that would still make it higher than Ausmus’ 2000. RetroFraming has Molina’s 2012 at 25 runs saved, which is quite a difference.

I know such discrepancies can be enough for some people to turn away altogether from this article and others on framing, as they often do when two play-by-play-based fielding metrics disagree on an evaluation of any position player. However, what I make of these numbers is this:

There are two metrics that strongly agree: no catcher over the past five years is rated above average by one and below average by the other.

According to either method, a good framing catcher can be expected to bring his team a handful of extra wins in a single season.

The PITCHf/x-based method is more precise and less likely to be pulling in other aspects of a catcher’s defensive performance, so for seasons where both methods are available, I would tend to trust its output over the Retrosheet estimate. If you’re skeptical that the big numbers associated with the PITCHf/x approach could be accurate, Mitchel Lichtman’s testing from last year might lay some of your concerns to rest.

Teams with analytically minded front offices are already making seven-figure decisions based on numbers like these.

Career framers
Ausmus also gets the career laurel as the cumulative king of framing for the past quarter century. In an 18-year career behind the plate, he added roughly one win per season through his ability to earn extra strike calls. Once more, the purported divide between scouting and statistical analysis is revealed to be a false one: way before numbers-based discussions on framing were made, teams were willing to give playing time to weak-hitting catchers like Ausmus because of their defensive ability.

Jose Molina is a solid second, despite much more limited playing time. In fact, over the same amount of playing time, we’d estimate Molina to be close to twice as valuable as Ausmus. Below is the Top 10 list for prorated (to 5,000 pitches caught) values, minimum 25,000 pitches.

At the bottom of the list, depending on whether you prefer the counting stat or the prorated version, are either Charles Johnson (costing more than a win per year for 12 seasons) or, once more, Ryan Doumit.

Year-to-year correlation
So what do we do with 25 seasons of ratings? The first thing I thought of is running a year-to-year correlation. I did the usual matching of every catcher with his previous-year-self and produced the following plot, which shows the year-to-year correlation for runs saved per 5,000 pitches caught. Again, the shading of dots indicates the underlying number of pitches (minimum between the two seasons considered). The weighted Pearson correlation coefficient is 0.52.

A look at aging
The second analysis it made sense to perform with 25 available seasons is an exploration of aging. I looked at the subject through a few different statistical lenses, but the results were fairly consistent. Basically, the aging effect is very small, with no more than two runs separating the prime from the career nadir. Below is a chart showing an estimated career curve, featuring a slight improvement until age 25, followed by a gentle decline.

Below are charts for a few interesting careers. In each one of them, the dots indicate the seasonal ratings, the thinner line is a smooth curve through the data points based on the displayed catcher’s data only, and the thicker line makes use of data coming from the other catchers as well (sort of regressing the curve).

Here’s Jose Molina, who just keeps getting better:

Ausmus also improved throughout his career:

Posada, on the other hand, displayed a declining trend:

Finally, Piazza’s numbers were consistent throughout his career:

What’s next?
So far I’ve been reluctant to combine game-calling numbers with PITCHf/x-based framing ratings because they’re derived from different sources, with different levels of granularity. But with the framing approach presented here, I now feel more comfortable in subtracting framing from what I termed game-calling, which actually was more of a sum of framing plus calling. Thus, in the future I plan to explore the quantification of game-calling further.

In this article I’ve used pitch-by-pitch data without PITCHf/x information to generate historical leaderboards. However, this kind of data is also available for Minor League Baseball going back a handful of years, so numbers like those shown above can be calculated for lower levels of baseball as well. In that way, good framing catchers might be identified before they reach The Show. And while it might be a long time before we see ubiquitous pitch-tracking technology in the college game, recording pitch outcomes is much more feasible, meaning that teams might even use this information for drafting purposes.

Incidentally, while refining this article, I mentioned its contents to a baseball insider (who obviously will go unnamed here), and he stated, “It's an idea potentially worth millions of dollars.” So, clubs with college pitch-by-pitch data: feel free to knock at my door.

Amazing Max, this is fantastic. You talk about 'game calling' which is a combination of calling a pitch, calling for a pitch location, and then framing that pitch. Is all this attention on pitch-framing worth little if a good framing pitcher is a terrible pitch caller / locator. Shouldn't more attention be on the other two pieces of this puzzle before we start lobbying Jose Molina / Brad Ausmus for the Hall of Fame. I recall in a previous article of yours that there are good framers that ultimately lose that value as a result of poor game calling. Or have we dismissed 'calling' to be less of a skill and more of a manager function. Thanks again.

I plan on looking more into the 'game calling' issue.
From explorations on the data I have done, the composite value is much driven by framing, but there are exceptions.
And the exceptions seem to be consistent year to year: for example, I have A.J. Pierzynski nowhere close to the top in framing, but he seems to be one really improving his pitchers.

So, yes, you'd want everything correctly rated.
I prefer having separate numbers because maybe one skill can be trained more than the other, or someone else might take charge of it (you may be OK with a good framer / bad caller by having all the calls coming from the bench, for example).

This is extremely fascinating, but I have never been able to come to grips with how values are derived from these extra frames strikes. Can someone kindly lead me to a concise explanation of how we get from extra strikes to extra runs?

Interesting. So if the value of a ball/strike varies per the count at the time, should we not also be assigning value accrued to receivers based on when they gained or lost a call?
To those who would say that it evens out, I would reply that catchers have the ability to call that type of pitch when they want to, thereby putting their framing skills to the test.
Am I wrong?

This is fantastic work, I really enjoyed this. Pitch framing is actually one area where I would have expected increased performance as catchers age as opposed to a decline. I would have guessed calling games and hitting would have been higher on the list than pitch framing, especially further back in time, so that catchers would have gradually improved at this skill as they aged. I guess I'm surprised that the peak is around age 25 and not older.

I think there's a fairly big split between good framers and bad in how they age. A top-notch framer (Ausmus, Molina, Lucroy...) is going to refine his craft and get better at it (though there's probably a practical upper bound), even as physical skills may decline. A bad framer most likely doesn't care much, and may even develop and cement bad habits that make him worse over time. This is even more likely given that there hasn't historically been all that much attention paid to this subject. Piazza probably had a pretty decent natural feel for it, but without much incentive or opportunity to work on getting better.