Baseball ProGUESTus

Explaining Mistake Splits

Most of our writers didn't enter the world sporting an @baseballprospectus.com address; with a few exceptions, they started out somewhere else. In an effort to up your reading pleasure while tipping our caps to some of the most illuminating work being done elsewhere on the internet, we'll be yielding the stage once a week to the best and brightest baseball writers, researchers, and thinkers from outside of the BP umbrella. If you'd like to nominate a guest contributor (including yourself), please drop us a line.

Evan Petty is a 22-year-old lifelong student of the game who’s studying Magazine Journalism and Applied Statistics at Syracuse University. Raised on the North Shore of Massachusetts, Evan remains an ardent J.D. Drew defender.

As a player, Evan was a catcher who was more Molina than Mauer. He moved to Singapore in High School and became the first Massachusetts native to win a baseball championship on two continents and be named to an All-Southeast Asia Slow-Pitch Softball Team in the same season.

At some point in a given baseball game, there’s a good chance that the broadcaster will mention a “mistake pitch.” A two-seamer might run over the heart of the plate, or an errant fastball might miss up in the zone. Perhaps a slider doesn’t bite and ends up in the upper deck.

“The pitcher made a mistake and he didn’t miss it!”

The fear of making a mistake is what makes pitching one of the more stressful jobs in sports. It’s what makes pitching inside so terrifying. And it’s what keeps fans on the edge of their seats even when the opposing pitcher appears unhittable.

Every pitcher makes mistakes. Some stress making as few as possible and others aren’t as fastidious. Everyone knows mistakes happen; but nobody knows when. There will always be uncertainty about the quality of every pitch—a value I’m surprised analytics ignores. Mistake analysis aims to fuse the two.

In May of 2013, a friend and I watched the Red Sox and chatted about Will Middlebrooks. For a player who seemingly had so much talent, his production seemed to come and go. We rationalized that Middlebrooks is a streaky hitter—not exactly Earth-shattering. But the more I thought about it, the more I realized that most of Middlebrooks’ production came on similar types of pitches. He’s not a hitter who turns a tough pitch around. But pitchers know not to leave one out over the middle to him.

And therein lies the question: Do Middlebrooks’ ups and downs have more to do with what he does at the plate or the pitches he gets? And are some hitters more dependent on the pitch they get than others?

I started watching the game with these questions on my mind, looking for differences in styles. It became evident that considering the specific pitch when examining a batter’s production is important. People are quick to attribute results to what the batter does; but perhaps the pitches he sees have something to do with these results.

I began developing a split that works just like any other split in baseball by dividing a player’s total production into different categorical variables. Instead of analyzing how much of a player’s production comes at home versus on the road, or during the day versus night, mistake splits divide a player’s total production into whether it came against a mistake pitch or not. I started watching tape, taking data and spitting out thoughts in hopes of finding anything interesting. Here’s where I try to explain some of my thinking, discoveries, and further questions derived from my analysis of mistakes.

How to Record Mistake Splits
The intuitive first question asks what defines a mistake. A mistake is defined as the following:

A pitch to an unintended area of the hitting zone that clearly gives the batter a better opportunity to produce.

The two italicized words are key in defining a mistake pitch. The intended location potentially differentiates two identical pitches. Grooving a 3-0 fastball down the middle isn’t a mistake if it’s what the pitcher meant to do. Missing there 0-2 almost certainly is. The definition refers to location only, which rules out scouting or decision-making. Practically, some consider a pitch a mistake to Miguel Cabrera but not to Dustin Ackley. This notion stems from Cabrera’s ability to consistently hit certain non-mistakes that Ackley doesn’t.

“Clearly” refers to the grader’s binary opinion and not the degree of mistake. The assessment is only a yes or no and doesn’t take the degree of mistake into account; “clearly” doesn’t mean the mistake must be extreme. It’s not the graders’ responsibility to differentiate mistake from mistake; it’s to distinguish mistake from non-mistake.

It’s important to note that a pitch can also miss badly without necessarily being a mistake. If a pitcher misses enough on the inside corner with a pitch that he wanted outside, it may not be a mistake. Identifying a mistake involves knowledge of the situation, definition of mistake, and of the game of baseball.

Yes, mistake splits are subjective. They aren’t the first subjective figure in sports and they won’t be the last. That’s because subjective statistics can still be precise.

Accuracy vs. precision is probably best explained with the diagram below.

Accuracy refers to closeness about a fixed or universal point. Precision refers to closeness about the other points. This applies to data, too.

The working definition of mistake isn’t completely accurate. Different graders surely conflict in spots. It’s already very accurate, and will continue to improve. In a couple of comparisons, graders assessed pitches with about 90 percent overlap. But unless there’s perfect science, perfect accuracy can’t exist. Mistake data are precise, which helps validate them to a large extent. So long as the criteria remains the same through an entire data set, the set will be precise. The key is consistency.

Consider an umpire calling balls and strikes. Guidelines determine strikes versus balls, similar to defining mistakes versus non-mistakes. All umpires have slightly different strike zones, even though the separation shrinks every year. As long as pitches are graded consistently, there’s validity. And there’s no reason why grading accuracy won’t improve with repetition, like any other subjective data.

Walks are the most obvious omission from mistake analysis. Mistake splits include only at-bats that end with a swing—strikeouts or balls in play. Walks count only when the bases are loaded—when they’re considered mistakes. Batters hit by pitch and wild pitches follow the same guidelines.

Other than walks and hit batters, mistake splits ignore baserunning, which mostly affects pitchers’ splits. Mistake splits don’t reflect a third of an inning pitched when a runner is caught stealing. This explains why some pitchers’ splits don’t reflect their exact innings pitched total. Pitchers also fail to receive credit for an out when a batter reaches on an error. The batter gets credited with an 0-1 in such a situation, but the pitcher doesn’t, which means that mistake splits potentially inflate a pitcher’s BAA slightly.

Other oddities include inherited runners. The pitcher who allows an inherited runner to reach base gets hit with the earned run, as per usual. If an inherited runner reaches base on a mistake and scores on a non-mistake, the run is a mistake earned run charged to the pitcher who put him on.

With the exception of Earned Production (which is defined in the glossary below), stats recorded are common baseball metrics. For a batter, hits, outs, and strikeouts are split into either mistake or non-mistake. Splits show the percentage of a batter’s production that comes against a mistake versus a non-mistake as well his production rate against each. Below is partial data for Matt Holliday of the St. Louis Cardinals through 231 at-bats of the 2013 MLB season. (Click to enlarge.)

His total numbers are split based on whether they came off a mistake or a non-mistake, and the split also shows the percentage of his home runs, RBIs, and strikeouts that coincide with a mistake or non-mistake pitch. The “rate” numbers at the far right indicate how many at-bats Holliday averages between each hit, home run, RBI and strikeout. For instance, Holliday averages a home run every 11 at-bats that end on a mistake pitch. He averages a homer for every 35.2 at-bats that end on a non-mistake.

Pitchers’ numbers work the same way. Lance Lynn’s splits through 85 innings of the 2013 MLB year are below.

Each out Lynn records marks a third of an inning. The split identifies whether hits, earned runs, and home runs came off of mistakes or non-mistakes and shows how his production and production allowed break down. Strikeout rate and home run allowed rate indicate how many batters on average Lynn faces between each strikeout or homer allowed.

Here is some more sample data, so you can get a feel for what exactly mistake splits measure. You can see each player’s effectiveness against both a mistake and non-mistake as well as how their total production is divided between mistake pitches and non-mistake pitches. (Click to expand.)

The column labeled “Quest” indicates how many grades were questionable. Below is an example of a mistake box score, which displays one full game. The column furthest to the right indicates the inning the questionable grade came. The top row for each player represents his mistake splits, and the bottom, his non-mistake splits.

What Mistake Splits Measure
Mistake data has conceptual and practical application to the game. This section focuses on both applications and their worth. The conceptual application distinguishes styles and helps explain under/overachievers.

Consider the classic power hitter who strikes out often and hits a lot of home runs. Against this type, pitchers work meticulously to avoid mistakes because these batters prioritize driving a pitch over just putting it in play. Often described as “all or nothing” hitters, mistake hitters crank pitches misplaced in their wheelhouse but prove to be pretty easy outs when the pitcher hits his spot due to limited plate coverage. Think of guys like Adam Dunn, Jay Bruce, and Mike Napoli.

Some batters aren’t as dangerous against mistakes but distribute production across a wider range of pitches. While Adam Dunn’s production might be heavily clustered among hittable pitches down the middle and up, Dustin Pedroia’s production comes off of a flatter distribution of pitches across the entire strike zone. His production isn’t as reliant on pitch location.

Mistake data still applies to pitchers but, at least initially, it’s more applicable to hitters. Pitchers typically either make more mistakes with stuff that’s harder to hit or seldom make mistakes with more hittable stuff. The latter would be a Bruce Chen, Bronson Arroyo type, while the first example might describe a prototypical reliever who has nasty stuff but makes too many mistakes over the middle.

Distinguishing players’ styles is the first conceptual application of mistake splits. It also highlights the most important concept of this paper: that different batters depend on what the pitcher does to varying degrees.

One of the most frequent questions in baseball has to be why a player is over/underachieving for a various stretch of time. Why is Jose Iglesias hitting .440 since being called up? Ray Lankford had a 143 OPS+ last year—why is it under 100 this year? The baseball world has identified a million explanations for why production strays from expectation. But the pitches themselves might be more overlooked than anything other.

The final section entertains where this trend of quantifying and rating each pitch might go. Mistake splits do a so-so job of explaining stretches of unusually high or low production through pitch quality, but they’re also very basic. For now, noting that a player’s awesome production comes with an abnormally high number of mistake pitches, or that another’s struggles coincide with a stretch of tough pitching, has to do. A batter’s fortune depends a lot more on what the pitcher does than we often credit. And mistake splits provide access to information that might be otherwise ignored. This application remains conceptual because of how underdeveloped it is (covered below in the “What it Means—The Future” section); the practical application has a lower ceiling but is more valuable at this time.

Mistake splits naturally fall into a five-part model that gives a manager practical, game-to-game application. The splits identify a hitter’s effectiveness against a mistake and a non-mistake and the pitcher’s effectiveness when he throws a mistake and a non-mistake, as well as mistake frequency. Those five variables are synthesized to project a result. The projection already makes mistake splits more valuable than the established numbers managers and other forecasters still use to assess matchups. Like any other platoon, inserting a mistake hitter into as many situations as possible against a pitcher who makes many mistakes gives him an opportunity to maximize efficiency. Of course, many other variables combine to forecast a matchup: handedness, career numbers, hot zones, ability to hit specific pitches. Mistake splits are just another tool.

The same knowledge allows for projection based on a change in mistake rate. This applies to a postseason atmosphere where mistakes may be scarce. The notion that non-mistake hitters tend to thrive in the playoffs while mistake hitters suffer has yet to be confirmed nor rejected. But it’s an interesting angle in an era very much dominated by “Moneyball” ideals. Nothing productive would come from opposing the spirit of Moneyball, but looking at another side of the game with the same type of methodology can be. That’s mistake analysis’ focus.

The pitch a batter hits heavily influences the play’s result. Another interest is figuring out the extent to which a batter controls the pitch he hits. The pitcher surely has a big hand in this, but does the batter? Why do some batters see more mistakes than others? While the pitcher controls the ball’s location, the batter determines which pitch he swings at. Both randomness and batting approach dictate the batter’s determination.

Although it’s unusual, batters experience extreme fluctuations in opportunity. Of course, extreme patterns get less common as the sample size grows. Many times, the batter does not have much, if any, control over the pitch that ends an at-bat, but sometimes he does, thanks to his batting approach.

An “approach” at the plate encompasses many aspects of hitting. The most prominent aspects are skills like patience, pitch recognition, and anticipation. But the best word might be discipline.

Improving mistake rate with patience works much like on-base percentage generates offensive production. Whenever a batter gets on base, he doesn’t make an out, and therefore extends the inning. The longer a lineup extends an inning, the more expected runs it scores. Similarly, patient batters see more mistake pitches. Free-swingers tend to end at-bats on non-mistake pitches more often than patient batters with knowledge of the strike zone.

Disciplined hitters don’t only hit more non-mistakes due to patience. A sense of what to swing at and when limits bad swings against a “pitcher’s pitch.” Pitchers make many mistakes that go unnoticed throughout an outing, because mistake splits record only pitches that end at-bats. A batter’s mistake frequency increases if he hits a higher rate of the mistakes available to him. The more refined a batter’s pitch recognition, the higher his mistake rate will be. Conversely, a batter’s mistake rate suffers when he chases pitches. Numbers like O-Swing Percentage and Z-Swing Percentage help quantify a batter’s discipline.

If a batter’s approach dictates the pitches he sees, which dictates his results, the next question attempts to lengthen the chain. What dictates approach? And does it fluctuate? The answers piece together an explanation of how streaks manifest.

What it Means—Streakiness
At every level of baseball, hitters with low confidence levels tend to struggle. Hitters who press at the plate chase bad pitches and react late to good ones, while a confident batter often benefits from a more relaxed mentality, rarely chasing and often capitalizing on mistakes. So to answer the first question posed at the end of the previous section, confidence seems to influence at least some of approach. And if confidence impacts approach, it seems likely that at least some approaches fluctuate through the season (and from season to season), because confidence is rarely constant. Hypothetically, confidence could be responsible for some of a batter’s variation in approach over time.

An unproven but believable assumption is that confidence relates directly to recent production. In other words, confidence is higher when a player is hitting well, and lower when he isn’t. The relationship between production and confidence, explained by approach, helps outline the way in which a streak develops.

Production impacts confidence, which impacts approach, which impacts production. This hypothesis doesn’t disclose the degree of each variable’s impact on the others, but it outlines the structure. With it, we expect a hitter’s production to increase with confidence, and his confidence to increase with production. But does a player hit well when he’s confident or is he confident when he hits well? Even more likely is that confidence and production spiral together to build an element of streakiness.

Many smart people have spent a lot of valuable time scientifically testing whether streakiness exists. And both sides have compelling mathematical arguments. Streakiness is often explained as a stretch of time when a batter experiences unnaturally low or high levels of production. This is true, but under this definition, “streaks” also manifest due to variation alone. Just because a batter has eight hits in his last 17 at-bats doesn’t necessarily mean he’s hot. The true notion of streakiness suggests that something going on actually makes him a better hitter in those 17 at-bats. Being “hot” means a batter’s expected production is influenced by something that’s already happened or is happening. Whatever that may be spirals back and forth in a cycle with production to create a streak. The “cycle” in the example of production, confidence, and approach looks like the following:

The production cycle suggests, generally, that production leads to even more production while struggles lead to more struggles. This assumes that the player’s confidence is volatile to an extent.

Something has to kick-start this cycle, though. Something that initially affects production, which will then affect confidence, which will affect approach, which will cycle back to affect production. There are a ton of nuances of baseball that could suffice. Facing a familiar pitcher or having a couple weak grounders find a hole might get a hitter going. Getting a couple mistakes to hit might also do the trick.

Replace the initial “production” with “mistakes,” and an intriguing causal relationship applicable to mistake splits exist. Mistakes lead to production, then production seems to lead back to mistakes—at least to a degree. The quality of pitches a batter has to hit might well touch off this spiral. According to the relationship, a good pitch to hit leads to success, and success leads to more good pitches to hit because of production’s impact on confidence and confidence’s impact on approach. Hitting bad pitches leads to struggles, which lead to hitting more bad pitches due to poor approach.

The quality of pitches that a batter has to hit doesn’t completely dictate streakiness. But it does seem to potentially explain a small portion of what is an extremely complex phenomenon.

What it Means—The Future
I write about what mistake analysis can do in theory, but in practice, it’s difficult to do much at present. The current rewards of mistake data are not ripe enough for the time-consuming process involved. But mistake splits are part of something bigger. They start the movement toward attempting to quantify the value of every pitch thrown. And if there’s one thing to take away from mistake analysis, it’s that mistake splits are a primitive form of a future trend in the game.

Mistake splits quantify a pitch in the simplest form by categorizing it as either a mistake or a non-mistake. Baseball people have categorized pitches throughout history, starting with names. Fastball, slider, curveball and changeup are nominal data. One fastball is different from another fastball, but there’s no way to distinguish them by name alone. Velocity and movement are interval data, but "98 MPH" and "98 MPH" aren’t necessarily practically equivalent despite being physically the same. Not all 98 MPH fastballs are created equal, for a variety of reasons. The end game of mistake analysis looks to data that give free reign to distinguishing one pitch from another—where a “98” is entirely equivalent to another "98" and "97" is definitely better than "96."

Mistake analysis primitively displays the start of this process. Categorizing a pitch as either a mistake or a non-mistake is one of the few ways to assess the pitch itself, and the only way that sorts production accordingly. The next step is to distinguish good from good and bad from bad. To quote myself above:

It’s not the graders’ responsibility to differentiate mistake from mistake; it’s to distinguish mistake from non-mistake.

The long-term goal differentiates mistake from mistake. This task gets harder because of all the variables involved: exact location, trajectory, and nastiness, among others. Future graders of mistake analysis can consider all relevant variables to better quantify each pitch. They will also certainly not be humans. Once a more in-depth means of quantifying a pitch is set, a computer could calculate and store the quality of every pitch thrown.

The process is long, and the end lies somewhere far in the future, but it must start somewhere. Mistake analysis is a movement more than anything else—the tip of the iceberg. Only time will tell how big the iceberg actually is.

Earned Production (EP)
The proportion of non-mistake production in a specific statistical category, and the proportion of total non-mistake at-bats. This is calculated by taking the proportion of non-mistake production in a single category and dividing it by the proportion of non-mistake at-bats.

Theoretically, a player’s EP always lands somewhere between 0 and 1.00 because of the literal definition of “mistake.” The working definition of “mistake” reads that the batter “clearly gets a better opportunity to produce.” So greater production off of mistakes is an assumption. Hypothetically, a player with an EP > 1.00 can exist, however.

In the EP columns, the Earned Production is the number that coincides with the non-mistake split. Under the EP column, the number that coincides with the mistake split represents the same proportion, but for mistakes. It takes the proportion of mistake production and divides it by percentage of mistake at-bats. Theoretically, this number should always be greater than 1.00 for the same reason EP of non-mistakes is always between 0 and 1.

Grader
The person who assesses whether the pitch was a mistake or not.

Mistake
A pitch to an unintended area of the hitting zone that clearly gives the batter a better opportunity to be produce.

Mistake Hitter
A batter whose total production depends heavily on mistake pitches.

Reading this feels like reading the early writing on catcher framing. In the early years of BP we could regularly expect to learn about surprising findings on a grand scale, but that's gotten much harder to pull off.

You've pulled it off as far as I'm concerned. Thank you. Please write more, and publish it here!

Very nice thinking. I agree with you completely that the direction analysis will go is in per-pitch evaluation.

The problem is that before becoming useful it needs to be scalable. You've created a definition of a mistake that's precise, but it's too much work to collect it to really test if it's accurate. The breakthrough will be to define a mistake in terms of location, count, and sequence so that it can be examined on a large scale.

Each pitch will already have a worth based on type, count, and location. The trick will be to add in sequencing filters and find which ones tilt the league-wide average toward the batter. The stronger the tilt, the better the definition of a mistake.

Thanks for the reaction. And I agree with everything you're saying here. My biggest goal going into this was to introduce the concept of splitting a players' numbers based on something that quantifies the pitch, itself. And then talk a little bit about how it can be valuable. Like I talk about, the "mistake" and "non-mistake" categories are simply the easiest way I could find that answered the question: "is the pitch good or bad?" Like you're saying, as you add variables to this means of quantifying each pitch you begin to really have something. At least I believe.

A team I was a part of a couple weeks ago pitched this idea at an MLB.com college challenge. We had to introduce a product that used big data from PITCHf/x. We created the idea of this statistic called IPI - isolated pitch index. We didn't introduce a formula but rather this idea and some of the variables that would be involved. It would allow us to rate each pitch on a 1-100 scale based on its overall quality. Obviously, computers become necessary here and it goes from a subjective grade to a hard science.

Very interesting stuff. I think you've hit the nail on the head with the basic premise that the pitcher is often as much responsible for the outcome of an at bat as the pitcher. The most patient hitter in the world still needs a pitcher to actually throw him four balls to draw a walk.

As draysbay mentioned above, the trick will be refining this to a level where it becomes more broadly usable. But to me, I think this is a fascinating start and might eventually have the potential to clarify a lot of what we currently view as luck and uncertainty.

This is great stuff, Evan. Reminds me of when we first wrote about hang time at the Hardball Times. Now it's being tracked by all the major stats companies.

I worry that the classification of pitches might be influenced by the outcome. If this happens a lot, I don't agree that precision is good enough because I wouldn't have faith that precision is a proxy for accuracy.

If I may make a couple of suggestions about your stats, I think your rate stats are backwards. I believe they show at-bats per thing, when they should be showing thing per at-bat. Also, instead of pegging ERA to when the ball is hit, I suggest using FIP or some other ERA estimator to better capture the true impact of mistakes.

Thanks for the feedback! The outcome of the at-bat skewing the grading of the pitch was the most challenging bias I faced when I graded myself. I was aware of it and tried to be wary. Still, I'm sure it played a role. One of the very many limitations to this.

As for the comment about rate stats, are you suggesting to flip them so it will be a number between 0 and 1? Home runs per every one at-bat? Just want to be sure.

Evan, I said this on Twitter, but I will repeat it here. I know that this work is preliminary, but it is very strong and a major conceptual step forward. The ability to mix methods and data sources, blending the quantitative with the qualitative will drive the field forward in some very interesting ways. Thanks for conceiving of this and sharing with us.

This is very cool analysis. I'm interested to see where it goes from here since this is obviously far from being fully developed. I'm assuming one thing you looked at in trying to identify mistake pitches is where the catcher sets up and I'm wondering if in the future PITCHf/x will be able to look at the catcher to better determine mistakes from non mistakes or if it's only possible to gather mistake data by a bunch of people watching video of every pitch.

I think a better way to determine a hitter's approach ability to hit mistake pitches would be to look at things like whiff percentage and swing percentage on mistakes as well as maybe hard hit average, just to eliminate the defensive variable.