Framing and Blocking Pitches: A Regressed, Probabilistic Model

A New Method for Measuring Catcher Defense

[T]he expected runs produced from each plate appearance starting with a strike decreases by .029 runs and increases by .040 for every ball thrown on a first pitch. In other words, having as many of those 0-0 'striballs' called strikes can greatly impact the outcome of the game.

IntroductionThe mechanics of framing pitches are simple enough to explain: Quiet, mechanically sound catchers with a knack for good receiving help their pitchers by getting favorable calls from the home plate umpire. This effect has been known ever since umpires started calling balls and strikes. Although it wasn't always called framing, it has long been a source of speculation and commentary about prominent catchers.

Since the beginning of the PITCHf/x era, researchers have calculated framing in several different ways. We are presenting a new method that we will call the "Regressed Probabilistic Model" of framing (RPM for short). In brief, RPM works by calculating the combined probability (and associated run value) that each pitch will be called a strike; summing those probabilities (and run values) across opportunities; attributing those values to a player (catcher or pitcher); and regressing "career" values to the mean.

We will freely admit: If you haven't seen the results of previous framing studies, it can be tough to wrap your mind around the size of the impact of a good or bad framing catcher. These effect sizes are not out of line with what has been reported in the past, but they're still obscenely large. Everyone agrees that Mike Trout was either a deserving MVP or a deserving runner-up in each of the past two seasons, which the stats say were worth close to 10 wins apiece. Our data suggest that over the past five years, the teams that have employed good framers like Jonathan Lucroy, Brian McCann, and Jose Molina have received essentially "free" MVP-caliber seasons from framing alone. (Each of those catchers has been worth about two extra wins per season over that span). This is a staggering amount of value. Add in the fact that these wins are almost assuredly not properly priced into the free agent market, and the difference between having a good framing catcher or a bad framing catcher can make or break a cost-conscious team.

MethodCalculating Probability for Each Pitch

Rather than identifying a single strike zone and giving binary credit for each pitch relative to that strike zone's borders (i.e., strike or no strike), our model gives partial credit for each pitch based on that pitch's likelihood of being called a ball or a strike. To determine that, we created a probability map of likely calls.

To create this map from the raw data, we used a generalized additive model (using the mgcv package in R), which creates a smoothed "surface." Although there are other alternatives for creating smoothed surfaces (Dave Allen popularized the LOESS method), Brian Mills, Carson Sievert, and others have recently adopted the GAM alternative, which has the benefit of empirically identifying the correct smoothing parameter rather than setting it by hand (as in LOESS). The package also has a special function (BAM) specifically for running large models. And crucially, it supports multi-core processing, without which the processing would have taken so long that we'd be writing this article next year.

To reflect what is best known about the way the size and position of the strike zone shifts from count to count and batter to batter, we ran individual models for each set of batter and pitcher handedness as well as "pitch group" (see Table 1). The smoothing parameters of each model were allowed to vary by count, so that while the general shape of the strike zone derived for each variable combination did not change, the width and height of it did (reflecting, for example, a larger strike zone on 3-0 counts than on 1-2 or 0-2 counts). We also accounted for the changing size of the strike zone from season to season (although these yearly changes are much smaller than the other changes we measured).

Table 1: Pitch Groups

Group

Members

Fastball

Four- and two-seam fastballs/sinkers

Curveball

Standard, spike ("knuckle"), and "slow/eephus"

Slider

Sliders and cutters

Offspeed

Changeup, splitter, screwball

Knuckleball

Knuckleball

We also corrected the data in several ways before running these models. First, all pitch classifications were hand-labeled by Pitch Info to eliminate variability in pitch labels. (This is the same improved dataset that powers the BrooksBaseball.net player card pages). To account for batter height differences, we normalized the height of each pitch by the batter's height using what is now the standard formula (first published by Mike Fast). We also used the correction scheme that Mike published at BP for correcting the X and Y location of each pitch based on the likely distribution of pitch locations that each pitcher would use against left-handed hitters and right-handed hitters (the one difference here was using the LOESS smoothing algorithm rather than a moving average, which we tuned to more aggressively correct for outliers).

After we created these probability maps, we assigned strike probabilities for each pitch at each half-inch location.

Run values

Rather than simply give a single credit for each pitch (~.14 runs) as has been done in many previous models, we looked at the count in which each pitch was framed and gave credit equal to the difference in runs between framing or not framing that pitch. For example, a frame in an 0-2 count was counted as more valuable than a frame in an 0-0 count, because a frame in an 0-2 count can result in a large change in run expectancy while a frame in an 0-0 count does not have quite the same impact.

To be clear, both the positive and negative frames were calculated with that increased difference in an 0-2 count (or in any other count, depending on the run value for that count). We should note that this decision may be somewhat controversial because it's possible that counts will be unequally distributed based on the catcher's team pitching talent. However, see the following section on pitcher adjustments and note that we provide uncalculated calls above average, which should allow the interested reader to create unbiased estimates if they wish.

The run value for a framed pitch is the run value differential for that count (see Table 2) multiplied by the residual of the probability—in other words, if an 0-0 pitch is called a strike in a spot where it's normally called a strike just 80 percent of the time, the catcher will get 20 percent of the available value (.08) for a total of .0004 runs credited (which will later be adjusted based on the pitcher and umpire impact). Failing to get a strike on the same pitch would result in a .0016 run deduction.

As you can see, a framed pitch on a 3-2 take is worth a lot. How is that number derived? Think of it this way—a strikeout costs the batting team -0.28 expected runs, while a walk earns them 0.31 expected runs. The difference is .59 runs. These 3-2 takes don't happen often (0-0 pitches contribute the most, which is not surprising), and when they do, the catcher just gets credit proportionally as described above. RPM isn't doling out half-runs in a single shot with any kind of regularity

Player Attribution

Because catching necessarily involves pitching, and because pitching talent is not equally distributed across the league, it can be difficult to correctly assign credit for each catcher's contribution to a framing total. For example, if Mariano Rivera, Brian Wilson, or Derek Lowe is your batterymate, you are likely to get more favorable calls than if your batterymate is Andrew Miller, Brandon League, or Micah Owings.

We empirically determined each pitcher's value—to isolate it from each catcher's value—by performing a WOWY ("With or Without You") analysis. We note that we also compared these values to a linear regression model that included pitcher and catcher as separate factors; the high correlation between these measures suggested a good degree of ability to correctly assign credit (or blame) to individual players. The WOWY adjustments provide a viable and modular means of assessing the impact of pitchers on framing.

The adjustments derived from the WOWY analysis reflect two aspects of our approach. First, pitchers who throw a pitch that may not fit the norm for a given pitch group may show some difference in the WOWY results (such as hard cutters in the slider/cutter group). Second, pitchers with better command of a pitch than their peers (or the unqualified respect of the umpire) will seem easier to frame.

The WOWY analysis created adjustments ranging from +/- .1 called strikes per opportunity and from +/- .01 runs per opportunity. The largest gross beneficiary of easy-to-frame pitchers was—Yadier Molina. The perennial gold glove winner started the analysis with 127 runs added before giving 60 back to his pitchers. This reflects the command contributions of teammates of the class of Chris Carpenter and Adam Wainwright and is no knock on Molina, who still ranks high overall.

The difficulty of framing a particular pitch—the difference between a fastball and a knuckleball—is already accounted for in the probabilistic model. R.A. Dickey's catchers may earn an adjustment above and beyond the credit already given to them for handling a knuckleball if Dickey is harder to catch than his peer group. Dickey actually outperforms the model by a bit, so his catchers get a small deduction.

According to the RPM method, Tom Glavine was a wizard at getting extra strikes, which supports his reputation.

Umpire Adjustments

One More Thing: Blocking

But wait—there's more.

The RPM concept can be applied to another catching skill—prevention of passed balls and wild pitches. For lack of a better term, we'll call this blocking. Sometimes a block is simply catching a pitch. This model uses the spatial location of the pitch (where it did strike or would have struck the ground) to determine the probability of a passed ball or wild pitch. Pitch types are accounted for and the model is further adjusted—also via WOWY—for the impact of the pitcher. Runs are a generic .28 per PB or WP prevented, allocated proportionally.

This blocking skill is quite real, but not as spectacular as framing. Instead of the top-to-bottom difference in the league being on the order of 50 runs, as it is with framing, the blocking skill range is closer to 10 or 15 runs. It also takes a heavier dose of regression.

You'll find blocking on both the player cards and the sortable stat pages (see the "New Site Features" section below).

We also made systematic but small changes to the data based on the umpire who was calling each game. Because umpires are randomly distributed throughout the data, they tend to have a very small effect on a measure of framing, although they might seem to have a large effect within any individual game. For example, if a particularly generous umpire calls a Jose Molina game on Monday, and then a particularly conservative umpire calls a Jose Molina game on Thursday, although the umpire will have exerted effects within an individual game, Jose Molina's skill will come through in the aggregate.

From our earlier example of a (not) randomly selected Molina, Yadier lost just three runs to the umpire adjustment from his post-pitcher WOWY adjusted tally.

Regression to the Mean

Like other skills, catching involves not only some amount of talent, but also some amount of luck. We've dealt with some of that luck by attempting to correctly attribute runs to catchers (who don't ordinarily get to choose their batterymates), but there are also other sources of luck and inexplicable variability.

To control for this luck, we have regressed career totals to the league average. The amount that we regressed each catcher was based on a measurement of stability for both framing calls and framing runs determined by the intraclass correlation ("ICC") of each measurement. See the "Results" section below for a description of how these correlations were computed and the determination of stability. Because seasonal variability is different from career variability, we also regressed seasonal totals to career totals based on a similar formula.

ICC consistency and agreement both showed that a 50/50 point (where a player's regressed data would consist of 50 percent his own and 50 percent of the mean values) occurred after ~290 framing opportunities (a pitch that isn't swung at and has a called strike probability >0 in our model) for the number of called strikes, and ~430 opportunities for the associated run values. Even the busiest catchers in our sample were regressed to at least .06 percent and .09 percent of the mean for their called strikes added and run values, respectively.

ResultsFraming Runs

The big winners in total framing runs are the good receivers who provide enough offense, combined with durability, to pile up innings behind the plate. But you'll also find the likes of Jose Molina, a catcher who wasn't a perennial no. 1 guy until he met a team that properly valued what his glove could do.

Slicing runs into a rate stat, we can see some less-used catchers who stand out, even with their numbers regressed more the everyday guys. Seven thousand opportunities is roughly a full season's workload (some catchers handle over 8000, but those are unusual seasons), so we've set that as our standard for comparison. Jose Molina distinguishes himself further in this view, and Yasmani Grandal also appears among the elite.

In 2011, Jonathan Lucroy contributed a net difference of 39 framing runs to his team compared to Carlos Santana. That's somewhere in the area of four wins if we simply assume that "average" is a sufficient proxy for "replacement level" when it comes to this skill (a true analysis of that question waits for another day). If we include blocking, the difference grows.

With the combined tools of RPM framing and blocking (see sidebar), we begin to get a more complete picture of a catcher's value. Carlos Santana has an elite bat, one that's worth getting into the lineup. But beware the hidden costs: factor Santana's receiving and blocking into the equation, and Lucroy looks like the more valuable player, despite his somewhat weaker bat.

Santana's value is not found behind the plate. He is an extreme case where even his loud offensive skills are nearly washed out by his receiving deficiency. Reducing Santana's time behind the dish (see 2013 and his conversion to third base this offseason) brings him back into the value range you're used to seeing next to his name.

External ValidationIt's always good to get some validation of so-called advanced metrics. In this case, we contacted two professional catching coaches (former major leaguer Rob Bowen of Red Alert Baseballand Kevin Wheeler) to get their ratings for close to 30 current catchers. We didn't share our data with the raters, so they weren't influenced by RPM's ratings.

While the agreement isn't perfect—nor should we expect it to be—the correlation between RPM and the combined catching coach ratings is a satisfying .771 (r2=.595).

New Site Features

You can find all of this new framing and blocking information in a couple place on the Baseball Prospectus site. Since our information is based on PITCHf/x, you'll find it only back to 2008, when the system was installed in all parks.

For any catcher who has played since 2008, you'll find a new tab on his player card called "Catching."

Under this new "Catching" section you'll find numbers for framing, blocking, and a combined value for both.

More detailed numbers can be found in a new section in our Sortable Statistics called "Advanced Catching Metrics." Mouse over the "Statistics" tab on the navbar at the top of any BP page, then click on "Sortable Statistics."

Then select the proper report. The advanced catching stats in the sortables and on the player cards will be updated daily during the 2014 season.

Kevin Wheeler for his catcher framing ratings. Listen to Kevin on WXOS 101 St. Louis.Max Marchi for inspiration and guidance at the inception of this process.
Mike Fast for his previous research on this subject.
Russell Carleton for a review of an early draft and our methodology.Rob McQuown and Bill Skelton for their assistance with website data integration.
And various unnamed analyst who helped with our "Did Bayes Weigh in on Time Travel" inquiry on regressing seasons toward careers.

In table 5b, five of the bottom ten catchers have caught for the Mariners; Mariner (or ex-Mariner) pitchers also show up three times in the bottom 10 in tables 3a and 3b. Is there a possible park effect here?

Interesting question. It's possible. The strike zone adjustments would address calibration issues but not some other factor that could make things challenging. That's something we should explore as we work on version 2.0

All of the location data is park-corrected, so any calibration offsets in the cameras are taken care of.

But, the numbers are not park-factor adjusted (as home runs or doubles or whatever might be in a context-neutral stat). It's an interesting idea. Of course, some of those guys are also probably pretty bad at framing pitches, so...

Incredible work. Does the WARP in the chart below for catchers already include base stealing runs?. Also, has there been any study to show the effects of pitch calling, and whether some good pitch framers actually lose this value because they are poor pitch 'callers'.

I'm sorry but the whole pitch framing adding value argument is like adjusting your personal financial budget on the likelihood of a cashier giving you the wrong change, or your employer incorrectly paying you the wrong amount.

I am not discounting the fact that pitch framing is a valuable skill for catchers to have but its based on the subjective error of an umpire.

To use your change example, it's more like being a good or bad negotiator in a flea market. Sure, there's a price for everything and your overall rate will vary based on who you're negotiating with, but above and beyond that, there's people who have a knack for getting a good price.

Catcher framing is a way to assign that skill to each catcher, independent of the umpire. Also, if you read the article, we account for umpires. So...

I may have some of what I'm about to say wrong, and tell me if I do (seriously), but:

One of the principle points of the offensive component of WARP is making everything context-neutral. You don't get extra credit for hitting a home run when it makes the most difference for your team - that sort of thing is accounted for by WPA instead. It strikes me as strange, then, to use count-dependent run values in this work. To my reading, including these numbers in a WARP calculation adds context-dependence that was strictly avoided previously. If 0.14 runs is the average run value of a changed ball/strike (and I understand that it is), why not just use the probability against that value across all counts?

this is a topic of debate. I was on the "use a constant value" side until we realized the importance of resizing the zone by count. So the model is context-aware at it's core. Our decision (and we could be wrong) was to retain that context for the run values.
If you look at our sortables, you can see the .14 based values for comparison. The field is "FR_RUNS_ADDED_BY_CALL" which reminds me I need to update the glossary before Ben notices I haven't.

Great work guys. I think this is an interesting question from John. I've always thought it would be wonderful to use count-specific run values, both for descriptive purposes of past seasons but also to get an idea if there are any catchers who show repeatable skill in framing in certain situations (say with two strikes, for example).

I can see how it's a tricky call whether to use these for WARP though, as they do seem to not follow the philosophy used otherwise.

In any case, having all these numbers are wonderful, and how they get treated at the last stage will be up to user preference, I suppose.

Love that you guys had the Pitch Info tags and x,y corrections in place - great additions.

Other factors that I've found that had small impacts on the zone size are out state, base state, and league (NL slightly lower). These would be minor compared to basically everything you've controlled for....and only worth thinking about if you're going to let the model run for months in the background to see if they make any difference at this level.

You *could* use the re-sized strike zone by count, but remove the leverage of the count so that each pitch has the same potential swing of value.

I'm on the fence about this, too. On the one hand, I generally like to have things context-neutral, but pitchers, catchers, and hitters seem to have much more control over what they do in a particular count than they do in a particular base-out situation or WP situation.

My first reaction was that using context-dependent values was a bad decision, however, if the zone changes in certain counts as much you claim (which I am certainly willing to believe) it makes sense.

We use context-neutral values for things like "the value of hitting a home run" because, how I understand it at least, hitting a grand slam isn't any harder than hitting a solo home run, it's just less likely because of the fact that the bases need to be loaded. However, framing pitches in certain counts DOES become harder/easier if the size of the zone changes.

Great work, very interesting article--can't way to see how this develops.

Excellent work! Although I'm going to need to re-read this a time or two because I don't fully absorb the methodology yet.

I do find it heartening that several catcher studies have looked at this in a few different ways now, and the player lists for each study seems to have the same names keep popping up at the top and bottom.

Fingers crossed that Pitch f/x is not eventually scrapped in favor of a tracking system that is not made public. It would be very unfortunate if MLB makes a series of decisions that result in analyses like this being impossible to underatke in the public domain.

I read this article before reading Ben's (sorry Ben, I started at the top of the main page and worked my way down). Ben covers this topic very well.

Personally, I'm skeptical that the public will get to see much of these data. I know that MLB has a long and storied history of taking things of value and providing them for free as a public service, but something tells me that they will want to monetize this one.

In regards to the pitcher extra strikes/balls analysis (Tom Glavine example) - I'm trying to grasp how this should be added to their value. In a sense it already at least partially accounted for in strikeouts, walks, and balls in play, right?

Which brings me to a larger point. Since all runs are accounted for somewhere on the field, if we were to add pitch framing to WARP calcs, we would have to subtract it from someone else, no?

basically what ever credit we give (or blame we assign) to the catcher is taken away from the pitcher and defense. How that's shared is influenced by, in this example, Glavine's ownership likely being more than the norm. We shall see.
But, yes, you have to take it way from someone else if we add it to the catcher, and the first victim will be the dudes on the bump.
Also, if a pitcher had bad framers working for him, whether or not he was a good one, he'll get some credit back.

I think a good compromise for the "count" thing would be to do it the way you are doing it, just in case there is some "skill" at being more or less of a good framer at the various counts, and then normalize the results to an average count distribution. That is pretty much what I do with UZR, to some extent.

Also have to be careful with the regression thing which translates the observed into "skill." Most of the components of WAR and WARP do NOT do that, so if you are adding this or comparing it to other WARP components, you are adding or comparing apples and oranges.

So, are the other catching components on the player card and stats pages regressed also? Such as the catcher blocking?

Are you planning on updating these on a regular basis as the season goes on?

Yes, we will be updating in-season. We'll use the 2013 model and hold off on xy corrections until we have enough to establish a reliable version for each. We'll also have to deal with new pitcher/umpire corrections, so the #s will be squishy day-to-day.

And, yes, the block data is the same RPM approach with fewer factors (we ignore count and season).

1. How repeatable a skill is catcher framing? Can we (or, say, PECOTA) use it to predict future results?

2. If the same catcher tends to catch the same set of pitchers (and those pitchers tend to be caught by that catcher), we don't really have much ability to attribute the skill of getting the calls to one or the other, right? Are those catchers regressed to the mean? Is the extent of that problem quantified?

3. It's very difficult to get a sense of how good this information really is. Do you feel like you've got this skill nailed 99% for every catcher in baseball (as strong as, say, the ability to quantify offensive contributions), something less than that but still fairly strong (like some of the quantitative measures of ordinary defense), or somewhere less than that?

simple 3-2-1 weighted projections correlated something like .85 to next season performance (we'll re-run that, it's been a while) so the skill is stable (Dan did some more analysis on that, maybe he chime in--in short we haven't found the aging curve yet).

the wowy analysis attempts to tease apart the pitcher and catchers, we have a fair amount of confidence in that particular aspect but it's something that we intend to tune and explore in future versions

I think we have a lot more to learn about framing, but we're happy with the direction we're moving with this model (warts and all)

This seems to be a very persistent skill, one not likely to change in any large way. Which makes me wonder if it is a skill that can be learned and improved through coaching, if it improves with age and similarly declines, or if it is just Born Catcher Magic/Jedi Mind Trick.

I wouldn't be surprised if a certain portion of pitch framing is actually just adaptive pitch calling. That is, knowing what location and pitch type a specific umpire is most likely to call a strike erroneously and trying to get your pitching staff to locate there more often. That would be a skill, obviously, but a very different kind of skill.

This seems fairly unlikely. Umpires do not differ so significantly, pitchers are not able to hit targets so reliably, and all the available scouting evidence suggests that framing has more to do with catching mechanics than pitch calling.