When a pitcher meets a hitter

Brad Lidge throws two pitches, a fastball and a slider. The fastball comes in at 92 miles per hour with a nasty amount of rise, while the slider breaks down sharply, making many a hitter look very silly when it’s on. A couple of months ago, Justin Inaz of Beyond The Box Score asked a very simple question: How often should Lidge throw each pitch?

The answer to that question lies in game theory, a topic which sounds a lot scarier than it actually is. Let’s think about this from Lidge’s perspective. What is his goal? Well, optimally Lidge wants to keep the hitter guessing; he doesn’t want the batter to feel that he can “cheat” by looking for one pitch or another and be better off doing so.

If the batter felt he could successfully cheat, say by looking for the slider, Lidge would want to throw fastballs until the batter no longer felt he was better off sitting on the slider. And conversely if the batter was sitting on his fastball, Lidge would want to throw sliders until the hitter realized that looking fastball was not such a good idea. Either way, we would eventually get to an equilibrium where the hitter no longer feels he can benefit from looking for a specific pitch, and at the same, it would be an equilibrium where Lidge no longer feels that he can benefit from throwing a specific pitch every time.

In other words, both Lidge and the hitter would be indifferent. Now, if you don’t know game theory I suggest you re-read those previous two paragraphs and get comfortable with the logic because otherwise you might find the next step kind of kooky: To determine how often Lidge should throw each of his two pitches, we need to look at things not from his perspective, but from the hitter’s. How can that be? Well, Lidge’s goal is to make the hitter indifferent between looking for a slider and a fastball, so to know at what point the hitter is indifferent we need to look at his payoffs.

I’m going to use an example lifted from Justin’s article to try to illustrate that point. Let’s say that when Lidge throws his fastball and the hitter expects a fastball, the hitter generally “wins” the confrontation. In fact, his expected payoff in that scenario is 3 (a meaningless number Justin grabbed out of thin air, but which you can imagine as runs per 100 pitches or something if you’d like).

If, on the other hand, the batter expects a fastball but Lidge throws a slider instead, the hitter does quite poorly and gets a payoff of -3. If the hitter is looking slider and Lidge throws a slider, he does well, though not quite as well as in the fastball/fastball situation (since the slider is Lidge’s best pitch, and therefore still harder to hit) and gets a payoff of 1.5. If he is looking slider and gets a fastball, the hitter suffers, though again not quite as badly as when he was looking fastball and Lidge threw a slider, receiving on average a payoff -1.5.

Here are all these outcomes summarized in chart form:

So how often will Lidge throw each pitch? Remember that Lidge wants the hitter to be indifferent between looking for a fastball or a slider (otherwise, the hitter can “cheat”). If Lidge throws his fastball with some probability, p, and his slider therefore with the probability, 1 – p, then hitter’s payoff from looking for a fastball will be:

3*p + -3*(1 – p)

The hitter’s payoff from looking for a slider, meanwhile, will be:

-1.5*p + 1.5*(1 – p)

In equilibrium, those payoffs must be equalized. In other words:

3*p + -3*(1 – p) = -1.5*p + 1.5*(1 – p)

If you do the math, you find that p = 0.5, so Lidge will throw his fastball 50 percent of the time (and his slider, of course, the other 50). What if he doesn’t? Well, to see the consequences of that, we first have to calculate the hitter’s equilibrium actions.

That’s actually very easy now that we’ve done the work for Lidge. All we have to do is the exact opposite; that is, while Lidge wants to make the hitter indifferent between looking fastball and slider, the hitter wants to make Lidge indifferent between throwing his fastball or his slider. So what the hitter cares about are Lidge’s payoffs, which can be summarized in the following chart:

Note that since baseball is a zero-sum game, Lidge’s payoffs are simply the opposite of the hitter’s. He does well when he throws one pitch and the hitter expects the other, and he does poorly when the hitter guesses right. If the hitter looks fastball with some probability q, and he guesses slider with the probability 1 – q, then Lidge’s payoff from throwing a fastball will be,

-3*q + 1.5*(1 – q)

Lidge’s payoff from throwing a slider, meanwhile, will be:

3*q – 1.5*(1 – q)

So, since in equilibrium the payoffs from throwing the two pitches must be equalized:

-3*q + 1.5*(1 – q) = 3*q – 1.5*(1 – q)

Doing the math, we find that q = 1/3, so the hitter will look fastball one-third of the time, while guessing slider two-thirds of the time. Now that we know how often the hitter will look for each pitch and how often Lidge will throw each pitch, we can cross multiply and figure out the hitter’s average payoff across all situations. Specifically, it will be:

1/2*1/3*3 + 1/2*2/3*-1.5 + 1/2*1/3*-3 + 1/2*2/3*1.5 = 0

In case it’s unclear what I’m doing here, half the time Lidge throws a fastball, one-third of the time the hitter is looking for a fastball, and when those two things happen, he gets a payoff of 3. Two-thirds of the time when Lidge throws a fastball, the hitter is looking slider and get -1.5. And so forth. In sum, he ends up with a payoff of 0.

Now let’s say that Lidge does not believe in game theory, and he in fact throws his fastball 60 percent of the time. In that case, the hitter can always look for a fastball, and his payoff will be:

0.6*3 + 0.4*-3 = +0.6

So by always looking fastball, the hitter will do better than he did in equilibrium. What if Lidge instead falls in love with his slider and throws it 60 percent of the time? Well, then it would be behoove the hitter to always look slider, which lead to a payoff of:

0.4*-1.5 + 0.6*1.5 = +0.3

Again, the hitter does better than in equilibrium. You could go through the same exercises with Lidge. For example, in equilibrium Lidge’s average payoff is 0; however, what if the hitter looks fastball more than he should, say 40 percent of the time instead of 33? In that case, Lidge would always want to throw his slider, and he would get:

0.4*3 + 0.6*-1.5 = +0.3

If, on the other hand, the hitter guessed fastball too rarely, looking for the slider say 80 percent of the time, Lidge would want to throw all fastballs, and he would get:

0.2*-3 + 0.8*1.5 = +0.6

Again, in both cases, Lidge does better than he can at equilibrium. You can try any number of combinations and you’ll find that if either player deviates from the equilibrium, the other will always be able to exploit it. So, to answer Justin’s question, if Lidge’s payoffs are what Justin outlined in his article, he’ll throw his fastball and his slider each half the time.

An empirical study, part I

So how can we test whether baseball players actually behave in the way predicted by game theory? If they did, that would obviously be a very important result: It would serve as a data point confirming the efficacy of game theory, and it would help us better understand the mechanics of the batter/pitcher match-up.

Well, the model we explored in the first part of the article makes some pretty important predictions. For example, if the above model is true, the values a pitcher derives from each of his pitches (and conversely, the value a hitter gets on each pitch type) should be equal. For Lidge, for example, even though his slider in this example is a much better pitch than his fastball, the average slider he throws will be worth:

1/3*3 + 2/3*-1.5 = 0

Similarly, his average fastball will be worth:

1/3*-3 + 2/3*1.5 = 0

By the way, just to be clear, there’s no reason the pitches have to be worth 0. They could be worth 1, -1, or 78.3 depending on their quality. What matters is that their values are equalized. This is really a pretty logical result: If Lidge’s slider had on average better outcomes than his fastball, he would want to throw it all the time. Eventually, hitters would come around and start sitting on his slider. An equilibrium only exists when neither side wants to change its behavior, and that can only happen when the two pitches are of equal value.

Therefore, if MLB players are indeed playing optimally, we should find that for every pitcher, the outcomes on his various pitches are of equal value, and that for every hitter, the outcomes on the various pitch types he faces are of equal value. Of course, over the course of a season random variation will ensure that this is not exactly the case, but what we can do is look to see if pitch values stay consistent from year-to-year. If players are behaving optimally, the correlation from one year to the next should be near zero.

First let’s test the hitters. Remember that to do that we actually want to look at the pitchers. If hitters are behaving as game theory would have us assume, then it is pitchers who should see no year-to-year correlation in their pitch values. Is that the case?

We can test this by taking every qualifying pitcher, finding the average pitch value for each (using numbers from Fangraphs, based on BIS data), and then subtracting that from the value of each of his specific pitches to find the relative value of each pitch. So if Johan Santana’s average pitch in 2008 was worth 0.91 runs (actually that’s per 100 pitches, but don’t worry about it) and his average fastball was worth 0.56 runs, then his fastball was -0.35 runs relative to his average pitch. If that effect persisted in 2009, that would be evidence that hitters do not behave optimally as they sit on Santana’s fastball too much and he would be better off throwing his other pitches. In fact it didn’t, as in 2009 Santana’s fastball was actually 0.26 runs better than his average pitch.

The following table summarizes the year-to-year correlations for a variety of pitch types among qualifying pitchers:

As you can see, the correlations are all very low, meaning that pitch values for pitchers show very little consistency from year-to-year. That’s exactly what would happen if hitters were behaving optimally according to game theory!

Now what happens if we test the pitchers (which of course means that we now have to look at hitter numbers)? Here, a slightly different picture emerges:

The correlations are mostly very low, except for one: Fastballs. Hitters who do well on fastballs relative to how they perform on other pitches in one year tend to do well the next, and vice-versa. If pitchers were optimizing according to our model, this would not be the case. Instead, pitchers would feed more off-speed stuff to fastball hitters until they had to adjust by looking for the fastball less often, while cranking up the heat on poor fastball hitters who too would have to adjust by protecting themselves against off-speed stuff a little less.

That doesn’t appear to be what happens, though. Why would that be? I have a few theories, but before I share them with you, I’d like to do another mini-study.

An empirical study, part II

In the above section we tried to test our simple model of the batter/pitcher interaction by testing some of its implications. The results were mixed, but the test was also indirect. We weren’t testing the model itself, but rather a specific prediction that it makes. What happens if we test the model itself?

Soon after Justin published his article and I started working on this one, Craig Glaeser published a wonderfully fascinating article here on The Hardball Times examining what happens when a plate appearance reaches a 3-2 count. In that article, Craig shared some useful numbers, breaking down what happened depending on whether the pitcher threw a strike or a ball and whether the hitter swung or held back.

If the pitcher threw a strike and the batter went after it, for example, the hitter ended up with a .330 wOBA, on average. If the hitter didn’t offer, his wOBA was just .273. If the pitcher threw a ball and the hitter nonetheless took a swing, his wOBA was really pitiful—.188—but if he kept his bat on his shoulder, it was a hefty .688. All these outcomes can be summarized in the following chart:

Here’s how to read the chart: If the batter swings and it’s a strike, he does .078 points worse than he will on an average 3-2 count (meanwhile, the pitcher is obviously .078 points better). If he takes on a strike, though, he does .135 points worse. If the hitter swings at a ball, he does .220 points worse relative to an average 3-2 count. If he doesn’t, however, he does .280 points better.

Assuming that each player makes his decision (for the pitcher, whether to throw a strike; for the hitter, whether to swing) independently (an imperfect assumption for the hitter, since he can adjust his behavior to some extent after the pitch is released, but it’s an assumption I can live with), we can easily calculate the equilibrium outcomes just as we did before.

The hitter wants to make the pitcher indifferent between throwing a strike or a ball, so if he swings with the probability s:

.078*s + .135*(1 – s) = .220*s – .280*(1 – s)

If you do the math, you find that hitters want to swing 74.5 percent of the time, which makes sense since pitchers will obviously just about always want to throw a strike given the high cost of throwing a ball when the batter is taking. In fact, hitters swing 73.2 percent of the time in 3-2 counts, which is just about what game theory would predict.

Now what about pitchers? They want to make hitters indifferent between swinging and taking, so if a pitcher throws a strike with the probability s:

-.078*s + .220*(1 – s) = -.135*s + .280*(1 – s)

Do the math and you find that pitchers want to throw a strike 89.8 percent of the time. In actuality, however, they throw strikes on only 59.2 percent of their 3-2 pitches. Again, the hitters confirm our model’s predictions while the pitchers do not! What is going on?

Obviously, pitchers are throwing way more balls than our model predicts they would. Perhaps they simply can’t locate the ball in the strike zone as consistently as they would like to. In that case, however, our model predicts that hitters would always take—instead, they usually swing. In this case, I can propose one of two possibilities. Either pitchers have some sort of advantage that forces hitters to adhere to our model even when the pitchers do not, or psychological factors make the hitters want to swing even when they shouldn’t, rendering the model more or less useless on 3-2 counts.

You can decide which explanation you like better (or propose an alternative), but that still leaves us with the issues from the first study to explain. I promised a few theories, so here they are:

The first is that pitchers are forced to throw a sub-optimally large number of fastballs to good fastball hitters because off-speed stuff simply puts too much wear on their arms. It’s a possibility, but one which I approach with doubt since it would seemingly make more sense for a pitcher to throw more off-speed pitches but fewer total pitches (i.e. go five innings instead of six) if it would help improve his performance.

It is also possible that the fastball helps establish other pitches, either by helping the pitcher establish control or by setting up his off-speed stuff (i.e., making it harder for the batter to hit a change-up or curveball after seeing a fastball). In that case, the fastball could have “hidden value” which would not show up in our numbers. (Jeremy Greenhouse recently wrote a great article on this topic, focusing on who else but Brad Lidge.)

Maybe it makes sense to throw Alfonso Soriano (a very good fastball hitter) more fastballs than our model would recommend if it makes him that much worse on off-speed stuff. Of course if that were the case, then Soriano should adjust by looking for the fastball less and we would get back to equilibrium; since that doesn’t happen, it’s possible that it isn’t pitchers who aren’t optimizing, but the hitters. There isn’t really any way of knowing, though, at least not as far as I can tell.

A third possibility was suggested to me by our own Jonathan Halket, and actually this is my favorite explanation of the three. Imagine that we have a hitter who is really good at hitting fastballs, about average on change-ups, but really bad when it comes to curveballs. Imagine moreover that there are two types of pitchers. One type throws a fastball and a change-up, while the other has just a fastball and a curveball in his arsenal.

The batter’s payoff matrix against the fastball/change-up pitchers might look something like this:

Doing the math, we find that the hitter will look for a fastball 25 percent of the time while looking for a change-up the other 75 percent. His average payoff on each pitch will then be +1.

If we cal culate, we find that the hitter will now look for a fastball 37.5 percent of the time, while looking for a curveball 62.5 percent of the time, making his average payoff on each pitch -1.

So, if half of all pitchers are of the first type and half of the second, overall, the batter will have an average run value of +1 on change-ups, -1 on curveballs, and 0 on fastballs. Suddenly, the equilibrium no longer requires him to have equal values on all pitches!

It’s a beautiful explanation, though I’m not sure it jibes with our results. First of all, such an explanation would predict that a hitter’s results on all types of pitches would correlate from year to year, whereas we observed a large correlation only for fastballs. You could argue that the correlation for fastballs is higher because that category has the greatest sample size, but there is another, much more serious, issue: The only category in which we observe a large positive correlation is actually the only category in which we would not expect to observe any.

The problem is that every pitcher throws a fastball. Therefore, while hitters could theoretically do better on certain pitch types year after year while still adhering to our game theoretic model, their performance on fastballs should still be equal to their performance on any other pitch. Instead, we see the exact opposite conclusion in the data. So beautiful as this explanation is, I don’t know that the data bear it out.

A short conclusion

Since this article has run over 3,000 words now, I figure I owe you a conclusion to wrap it all up as tidily as possible. What we’ve done here is try to explore the mechanics of the batter/pitcher match-up by employing game theory.

We tested a simple model, one which stated that each player bases his behavior by looking at the other player’s payoffs, and derived some important predictions, namely that if players are optimizing, their performance should not vary by pitch type. For every pitcher, the outcomes he gets on his fastball should be the same as the outcome he gets on his change-up (even if his fastball is the far superior pitch), and for every hitter, the outcome he gets when he sees a fastball should be the same as the outcome he gets when he sees a slider, on average (even if he much prefers to hit fastballs).

Our empirical tests were inconclusive; they seemed to show that hitters were behaving as the model would predict, but pitchers were not. Of course, in the complicated world of game theory, it is possible that the reason for this is that hitters are making sub-optimal decisions while pitchers take advantage of them. We don’t really know.

Still, it is my hope that this piece will be an important step on our way to better understanding what happens when a hitter and a pitcher face off. One small step for sabermetrics … well, that’s all.

Comments

I’m dubious of your “Empirical study, part 1”. For the second test (the one which didn’t give you the results you wanted), what you should actually be checking is that hitters get the same result no matter what pitch they’re _expecting_. This is symmetric with the first test, where you (correctly) check that pitchers get the same result no matter what pitch they’re _throwing_.

Of course, there’s no obvious way to tell which pitch a batter was expecting; so we can’t really check this at all.

First, “Hitters who do well on fastballs relative to how they perform on other pitches in one year tend to do well the next, and vice-versa. If pitchers were optimizing according to our model, this would not be the case.” I think it’s been ingrained into baseball culture to throw first-pitch strikes, and get ahead with the fastball. While the fastball has been proven to be the best setup pitch (and therefore should be thrown more often than a model that looks at pitch results as opposed to plate appearance results would suggest), I still think pitchers throw first pitch fastballs too often, especially on the very first pitch of the game. Derek Jeter would make for a great case study.

Also, this point “psychological factors make the hitters want to swing even when they shouldn’t, rendering the model more or less useless on 3-2 counts,” has been addressed before by Dave Allen. He found that hitters swing more often at balls on 3-2 than they do 2-2. It’s befuddling.

That’s not really correct, I don’t think. Hitters should get the same result on each pitch, since there expectations should be such that they are indifferent between seeing any given pitch. So it doesn’t really matter what they’re expecting on any given given pitch; the aggregate values should all be equivalent.

Nice job, David! I never got around to re-doing that article, and now I probably don’t have to. Great job extending it to empirical stuff as well. I’ve seen so much of these pitch run value studies floating around, and I think there’s not enough appreciation for how much those run values depend on pitch selection.
-j

Dave, really cool article. You really nailed the game theory stuff very well. Very nice to see it explained so clearly.

One thought I had about run values (and I’m not totally sure how this would jibe with the fastball run values correlating more highly than other pitches) is that game theory says that run values should be the same on a given pitch—but not on all pitches. Some pitchers are able to strike hitters out more than others, meaning that those pitchers should have higher overall run values for two-strike pitches than zero- and one-strike pitches, because they are particularly good at getting swing-and-misses. Since different pitches have different goals (0-strike and 1-strike counts have foul balls equal in value to whiffs, and 2-strike counts do not), there should not necessarily be equal run values across pitches for all pitches, just for pitches in each count.

Say that a pitcher has a fastball, cutter, and slider. Say the cutter is not much of a swing-and-miss pitch but the slider gets a lot of whiffs. If the pitcher mixes between the fastball and cutter primarily early in counts and fastball and slider late in counts, those pitches should have higher run values for those pitchers. I’m not totally sure how this might work, but if the fastball is mixed with the K-pitch for all pitchers, the fastball could be correlating with K-rate and end up with that high autocorrelation. I’m not sure that the other pitches wouldn’t have that result, but it could be a clue. This could also work if differences in walk rates are really what differentiate pitchers and pitchers overwhelming throw fastballs in 3-ball counts.

I’d need to think about this all more though. This is just me throwing what I was thinking out there about the run values equal equilibrium existing on a per-count basis rather than on a overall basis. Great article.

Matt is exactly right about the role of the count. Game theory requires pitches have equal value for any given count, but not overall. In fact, what’s really required is that each pitch type have equal value for any given count AND pitcher-hitter matchup. We can’t possibly measure that, of course. But you can look at pitcher and batter handedness, which definitely plays a role: the curveball will have a different 2-1 count value for LHH-LHP than for RHH-RHP. And you also have to take hitter quality into account: pitchers throw many more non-fastballs to good hitters, especially in hitters’ counts.

By the way, I think you’d be better off looking at just FB and non-FB, and the difference of those 2 run values. With bigger samples, and not comparing pitches to themselves by using the pitcher average, I think you’d find much higher y-to-y correlations.

David: I think the assumption of pitcher:hitter independence is reasonable when considering pitch type selection. But it clearly doesn’t apply to the 3-2 swing-take data, as Craig’s data shows: batters swing at 91% of strikes, but only 48% of balls. This is a huge correlation, and I imagine impacts the equilibrium a lot.

BTW, this same issue accounts for a major flaw in Craig’s argument that hitters would necessarily improve by taking more 3-2 pitches. He assumes these pitches will be 60% strikes, the overall average. But each marginal pitch taken comes from a pool of pitches (those currently swung at) that are 73% strikes, not 60%, which means each extra take will be much worse for the hitter unless he can do much better than choose randomly.

Another issue to consider is whether this kind of analysis really needs 3 location categories: obvious ball, borderline, and obvious strike. A lot of 3-2 balls are borderline pitches, with outcomes much better for pitchers than that of the average ball. So when we’re trying to decide if pitchers or hitters are acting optimally, it’s really about changing the distribution among those 3 zones.