Author Archive

I have the mixed blessing of living in New England, so I unavoidably run into local sports radio once in a while. They’re already looking ahead to the Red Sox’ inevitable World Series appearance, and of course given David Ortiz’s unprecedented combination of offensive skills and just incredible foot pain/immobility, there’s a legitimate question of whether the Red Sox should play him in the field when they lose the DH in World Series road games. Just a quick-hit here on some of the relevant numbers.

I’m not going to address in this article the extent to which playing the field might limit his ability to hit or run. I don’t dispute that that could be significant, but I have no idea how to value that.

The way I see it, there are three options:

1. Move Hanley Ramirez over to 3B, play Ortiz at 1B, take Travis Shaw out of the lineup.
2. Replace Ramirez with Ortiz at 1B, keep Shaw.
3. Don’t start Ortiz, but do pinch-hit with him in a high-leverage situation.

We’ll make the following assumptions: I’m going to estimate that Ramirez will have the same defensive value at 3B that he had last year in LF (-22.9 runs), and I’m going to further estimate that Ortiz will have the defensive rating of some of the worst 1Bs in baseball over the past few seasons (-25 runs). Travis Shaw was worth 6.6 runs of defense playing mostly 3B this year. Ortiz was worth 27.6 runs of offense + baserunning, Ramirez 17.1, and Shaw -10.3. We’ll estimate that playing Ortiz would get him 4.4 plate appearances per start, whereas if he doesn’t start, he gets one plate appearance in a situation that has a leverage index of 2.

Seems like the best option here by a decently wide margin is to use him as a pinch-hitter, and I’m surprised at how much of the value comes from just the pinch-hit appearance. Fairly robust to my assumptions, too — if you assume a leverage index of 1 in his one plate appearance, you still get the highest total with Ortiz as a PH. You could also give him a defensive rating of -15 (which would be incredibly generous) and the PH scenario comes out on top. Anything else I’m missing? Other lineup options for the Red Sox?

As a Red Sox fan, I got very excited opening day when Dustin Pedroia hit two home runs. One of the big questions of this offseason is whether he has upper-single-digit homer power, or upper-teens homer power. Of course, as a thinking baseball fan, my head tells me to avoid getting overly excited about a small sample size. But does the two-HR outbreak actually tell us nothing? I think the expectations going into the season combined with Pedroia’s performance in his first game is a perfect situation to use Bayes’ Theorem.

To elaborate, I think Pedroia’s expectations going into this season have a bimodal distribution. If you look at his 2008-2012 seasons, he averaged 16 HR per year. His last two seasons averaged 8 HR per year. Was this due to a real decline, or due to injuries that sapped his power? While someone like Mike Trout might have a nice normally-distributed expectation around 35 HR, I expected Pedroia to have an either/or season: he’d either get back to 2008-2012 production, or continue as a 8-HR guy.

Now for a review of Bayes’ Theorem: it tells you how to update your prior beliefs given an observation. The formula for this is P(A|B) = P(B|A)*P(A)/P(B), where A and B are events, P(A) and P(B) are the probabilities of those events, and P(A|B) or P(B|A) should be read as “Probability of A given B,” or “Probability of B given A,” respectively. Specifically, in this case, A is “Dustin Pedroia is a 16-HR guy”, and B is “Dustin Pedroia hit 2 HR in his first game of the season”. I had a preseason belief about P(A), but I want to update it given that event B has occurred.

As implied above, I’m going to simplify Pedroia’s season outcomes into two possible outcomes: He is an 8-HR guy, or a 16-HR guy. Before the season, I’m going to guess that I had about a 50-50 belief that he was either one. Another assumption I’m going to make, to make the math easier, is that a season will see 640 plate appearances. You can make your own assumptions, but this is a demonstration of how much Bayes’ Theorem helps us update beliefs based on just one observation.

We need to determine three quantities to do our calculation now:
1. P(A)—probability that Pedroia is a 16-HR guy
2. P(B|A)—probability that we would see Pedroia hit 2 HR in his first 5 plate appearances, given that he is a 16-HR guy
3. P(B)—probability that we would see Pedroia hit 2 HR in his first 5 plate appearances (taking our 50-50 chance that he’s a 16 or 8-HR guy as a given)

1. Probability that Pedroia is a 16-HR guy

Easy. By assumption, P(A) is 50%.

2. Probability that we would see Pedroia hit 2 HR in his first 5 plate appearances, given that he’s a 16-HR guy

Tougher, but we can use a binomial probability model. That is 5C2*P(HR)^2*(1-P(HR))^3. When we have 16 HR in 640 plate appearances, P(HR) is 1/40, and 1-P(HR) is 39/40. This turns out to be .00579. P(B|A)= 0.579%.

3. Probability that we would see Pedroia hit 2 HR in his first 5 plate appearances, with preseason assumptions

This is the weighted average of all his possible season outcomes—so probability of 2HR in 5PA, given that he is a 16-HR guy, times the chance that he’s a 16-HR guy, PLUS, probability of 2HR in 5PA, times the chance that he’s an 8-HR guy. The same calculation as in number 2 can be done for if he’s an 8-HR guy, yielding an answer that the chance that he’d hit 2HR in 5PA is 0.151%. Given our calculation in the above paragraph, and our preseason assumption that it’s 50-50 that he’s an 8 or 16-HR guy, that gives us a weighted average P(B) = 0.365%.

So now we can mash all of those numbers into Bayes’ equation, and we find that .50*.00579/.00365 = .794, or 79.4%! Turns out that my Red Sox-loving lizard brain was not wrong! If you believed preseason that there was a 50%-50% chance that Pedroia would return to his 2008-2012 form, you should rationally update your beliefs to 80%-20% on the minuscule sample size of just two home runs in five plate appearances! Another note is that we should be forward-looking: since he has nearly a full season of plate appearances remaining, it might be rational to think that he’s likely to be an 18-HR guy, now that he has 2 in the bag.

This method could be adapted to a continuous expectation of outcomes, allowing a chance that Pedroia might be something besides an 8HR guy or a 16HR guy (although you and I know that that is clearly absurd).

This post is going to examine the value of the opt-out clause in both the Clayton Kershaw and Masahiro Tanaka contracts. I think this is interesting because the Yankees gave Tanaka an opt-out one year earlier, and gave that option to a commodity with a much more uncertain value. As we will see, the opt-out clause for Tanaka is going to be a lot more costly to the Yankees than the clause was for Kershaw and the Dodgers.

Let’s start with the projections for each player. ZIPS and STEAMER don’t have anything for Tanaka, but we can make a guess based on the contract he was given that he’s at least expected to be worth a lot of wins over the next several years. Since he’s the same age, it seems approximately fair to start with 5 wins, and reduce in the same pattern that Kershaw got. I’ll use the values from Dave Cameron’s excellent article the other day for Clayton Kershaw, and I’ll also take the $/WAR from his projected inflation. Excess value is the value of that player’s WAR, minus salary.

Tanaka

Kershaw

AGE

WAR

$/WAR

Salary

Excess Value

AGE

WAR

$/WAR

Salary

Excess Value

25

5.0

6.0

22

8.0

25

5.5

6

30

3.0

26

5.0

6.3

22

9.5

26

5.5

6.3

30

4.7

27

4.5

6.6

22

7.7

27

5

6.6

30

3.0

28

4.5

6.9

22

9.1

28

5

6.9

30

4.5

29

4.5

7.3

22

10.9

29

5

7.3

30

6.5

30

4.0

7.7

22

8.8

30

4.5

7.7

30

4.7

31

4.0

8.0

22

10.0

31

4.5

8

30

6.0

The key here is not going to be the expected value — it’s going to be the possible variation. Kershaw is expected to get 5.5 wins next year because of the ever-present risk of injury — there are probably Dodgers fans going nuts over that projection because they know that a healthy Kershaw, pitching like he can, is going to be worth closer to 7 wins. There are certainly scenarios where he manages that, but also scenarios where he tears his rotator cuff and is worthless. While there is a continuum of possibilities, let’s break the world into two scenarios for each pitcher, an up and a down. The only requirement is that the weighted average of each scenario has to average out to their projections. I’ve made up some basic numbers here, and you might think they’re reasonable, you might think they’re not, but the point of this article is to illustrate how one extra year and some extra volatility can affect the value of an opt-out clause.

In each scenario, I make the downside a mirror image of the upside. For Tanaka, because he is an unproven commodity, I’ve added 2 WAR to the upside, and subtracted 2 for the downside. For Kershaw, I’ve just added/subtracted 1 for each. I gave each scenario a 50-50 chance of happening.

GOOD Tanaka-50%

GOOD Kershaw-50%

AGE

WAR

$/WAR

Salary

Excess Value

AGE

WAR

$/WAR

Salary

Excess Value

25

7.0

6.0

22

20.0

25

6.5

6

30

9.0

26

7.0

6.3

22

22.1

26

6.5

6.3

30

11.0

27

6.5

6.6

22

20.9

27

6

6.6

30

9.6

28

6.5

6.9

22

22.9

28

6

6.9

30

11.4

29

6.5

7.3

22

25.5

29

6

7.3

30

13.8

30

6.0

7.7

22

24.2

30

5.5

7.7

30

12.4

31

6.0

8.0

22

26.0

31

5.5

8

30

14.0

BAD Tanaka-50%

BAD Kershaw-50%

AGE

WAR

$/WAR

Salary

Excess Value

AGE

WAR

$/WAR

Salary

Excess Value

25

3.0

6.0

22

-4.0

25

4.5

6

30

-3.0

26

3.0

6.3

22

-3.1

26

4.5

6.3

30

-1.7

27

2.5

6.6

22

-5.5

27

4

6.6

30

-3.6

28

2.5

6.9

22

-4.8

28

4

6.9

30

-2.4

29

2.5

7.3

22

-3.8

29

4

7.3

30

-0.8

30

2.0

7.7

22

-6.6

30

3.5

7.7

30

-3.1

31

2.0

8.0

22

-6.0

31

3.5

8

30

-2.0

Let’s think about what happens in each scenario when it comes time to exercise the opt-out clause. Shockingly, GOOD Kershaw and GOOD Tanaka each exercise the clause. We can see this reflected in the positive “excess value” column of each chart — age 29 for Tanaka and age 30 for Kershaw. They could get more on the free market, so they will. BAD Kershaw and BAD Tanaka both stick with their contracts, because they’re being paid more than market value. Let’s re-do the charts from the teams’ perspectives, reflecting the opt-out clauses now:

GOOD Tanaka-50%

GOOD Kershaw-50%

AGE

WAR

$/WAR

Salary

Excess Value

AGE

WAR

$/WAR

Salary

Excess Value

25

7.0

6.0

22

20.0

25

6.5

6

30

9.0

26

7.0

6.3

22

22.1

26

6.5

6.3

30

11.0

27

6.5

6.6

22

20.9

27

6

6.6

30

9.6

28

6.5

6.9

22

22.9

28

6

6.9

30

11.4

29

0.0

7.3

0

0.0

29

6

7.3

30

13.8

30

0.0

7.7

0

0.0

30

0

7.7

0

0.0

31

0.0

8.0

0

0.0

31

0

8

0

0.0

BAD Tanaka-50%

BAD Kershaw-50%

AGE

WAR

$/WAR

Salary

Excess Value

AGE

WAR

$/WAR

Salary

Excess Value

25

3.0

6.0

22

-4.0

25

4.5

6

30

-3.0

26

3.0

6.3

22

-3.1

26

4.5

6.3

30

-1.7

27

2.5

6.6

22

-5.5

27

4

6.6

30

-3.6

28

2.5

6.9

22

-4.8

28

4

6.9

30

-2.4

29

2.5

7.3

22

-3.8

29

4

7.3

30

-0.8

30

2.0

7.7

22

-6.6

30

3.5

7.7

30

-3.1

31

2.0

8.0

22

-6.0

31

3.5

8

30

-2.0

Now let’s take the expected value of these two scenarios, which is in this case a simple average:

Expected Value Tanaka

Expected Value Kershaw

AGE

WAR

$/WAR

Salary

Excess Value

AGE

WAR

$/WAR

Salary

Excess Value

25

5.0

6.0

22

8.0

25

5.5

6

30

3.0

26

5.0

6.3

22

9.5

26

5.5

6.3

30

4.7

27

4.5

6.6

22

7.7

27

5

6.6

30

3.0

28

4.5

6.9

22

9.1

28

5

6.9

30

4.5

29

1.3

7.3

11

-1.9

29

5

7.3

30

6.5

30

1.0

7.7

11

-3.3

30

1.75

7.7

15

-1.5

31

1.0

8.0

11

-3.0

31

1.75

8

15

-1.0

We can see that in both cases, the post-option years of the contract become negative propositions for the teams — in fact, they would have to be, by how we’ve implicitly stated the conditions under which the players opt out: if the player were expected to provide positive value to his team, he would opt out.

So how much is the option worth? Ignoring the $20 million posting fee, the Tanaka contract, sans opt-out, was expected to produce $63.9M in excess value for the Yankees. With the option, the expected excess value drops down to $26.1M. That’s a drop of $37.8M. This could be thought of as the extra money Tanaka puts into his pocket from years 5 onward, if he comes into the league and becomes Justin Verlander. Kershaw, on the other hand, would be expected to generate $32.3M for the Dodgers, without the opt-out. Now his contract is only worth $19.1M to them. That’s a reduction in value, but because we’ve made him less uncertain, and because the option occurs after year 5, not year 4, the reduction is only $13.2M. So the extra year and the double variability make Tanaka’s option worth $24.6M more than Kershaw’s.

Again, this depends largely on the choices I’ve made for the range of possible outcomes, and I’ve kind of picked Tanaka’s projection out of thin air (since the excess value of the contract with the opt-out is only $6.1M, considering the $20M posting fee, I would argue that I’m not that far off). I could have made more possible outcomes, or maybe even defined a probability distribution function and integrated over that, if I knew how to do that sort of thing. The only lesson we’re going to be able to take from this is how one year and some extra variability affect the value of the opt-out clause.

I think one of the most fun parts of baseball is this part of the year; as we wind down, you can start to root for unlikely things to happen. For example, I’m kind of hoping the Pirates manage to lose at an .800+ clip and keep their sub-.500 streak alive. I’d love to see the Royals make the playoffs. Finally, I’d love to see Yasiel Puig win the NL batting title.

The rules of the game are that you have to have 502 plate appearances to win a batting title. If you’re short, you’re given an 0-fer for the rest. So if Puig finished with 492 PAs, he’d take an 0-for-10 for the purposes of the batting title. Right now, Puig is projected by STEAMER to finish the year with 435 PAs. We’ll accept that number for now, but given that number, let’s think about how likely it is that he has a high enough batting average to win the title.

The first step is to figure out the mark he needs. Let’s go with STEAMER again, and we see Michael Cuddyer, Joey Votto, Yadier Molina, and Chris Johnson all projected to finish at about .320. Let’s assume that one of those four players finishes right at his 87.5% projection (the middle of the highest quartile)…I’ll say Joey Votto, who is projected to go .302 for the rest of the year (the highest of the bunch). Using the binomial distribution, there’s a 16.2% chance Votto finishes 51/149 or better given his “true” .302 batting average. We’ll say that that is the target Puig has to reach: Votto (or one of the others) adds something like 51/149 to his current stats, for a .329 batting average.

What are the chances Puig reaches that clip? To keep it simple, let’s assume STEAMER is right on the number of PAs, ABs, and Puig’s true chance of getting a hit, and then figure out Puig’s chance of getting enough hits to finish at .329 or better. He’s going to end the year with 435 PAs and 390 ABs, if he keeps up his current pace. To that, add an 0-for-67 to get him up to 502 PAs. So he needs enough hits to have a .329 batting average in 457 ABs. That number is 150. He currently has 85 hits in 224 ABs, so for the rest of the year he needs 65 hits in 166 ABs.

Given that STEAMER projects a .293 batting average for the rest of the year, it’s pretty unlikely that he’ll hit at a .392 clip. In fact, his chances of doing so are only about 0.4%, using the binomial model.

What could help his chances? First, there’s no guarantee Votto/Johnson/Molina will get hot enough to make the mark .329. If we drop the required average to .320, using the same method as above, he’d only need 146 hits, which raises his chance to about 2.3%.

Another possibility is that he’s a better hitter than STEAMER projects. If he only regresses to .310, which would make him one of the better hitters in the league admittedly, he has about a 1.6% chance of winning the batting title. And if he is truly a .310 hitter, AND none of the other players near the top of the leaderboard stay hot enough to beat .320, Puig has a whopping 6.6% chance of winning the batting title.

Yeah, I know batting average is stupid. And I know this is a minuscule chance. But isn’t it amazing that Puig has a chance to do something like this at all, after making his debut in June? Baseball!