Whenever I was in a math class where probability was being discussed, the question often in the back of my mind was, “How can this be applied to baseball?” One of the things I love the most about baseball is how well it lends itself to situations of probability, compared to most sports. I’m not sure what that says about me. Anyway, I figured this would be the perfect opportunity to refresh my memory (and hopefully some of yours) on how to crunch the numbers on situations like this. Don’t worry — the principles work on useful things other than just calculating the odds of that gimmicky achievement we call the cycle.

OK, let’s get right down to the math. This won’t be too hard, really. Kind of long, but hopefully worth learning.

First example: say a batter gets a hit 40% of the time overall, and makes an out the remaining 60% of the time. Let’s break down the odds of how 2 plate appearances of his will turn out:

Results

Odds

Combined Odds

PA #1

PA #2

PA #1 Result

PA #2 Result

Possibility #1

Hit

Hit

40%

40%

16%

Possibility #2

Hit

Out

40%

60%

24%

Possibility #3

Out

Hit

60%

40%

24%

Possibility #4

Out

Out

60%

60%

36%

Total: 100%

For example, the odds of both PAs resulting in hits is 40% multiplied by 40%, which equals 16%, or 0.16. But there are two ways (“permutations”) that can result in getting a hit and an out between the two PAs, and each has a 24% chance of occurring… so, together, there’s a 48% chance this player will bat 0.500 over his two PAs. The remainder is the 36% chance of going 0-for-2.

The example is a really simple one… it gets a lot more complicated when you’re dealing with, say, 7 PAs, and considering the odds of a single, double, triple, etc. This being math and all, of course there are formulas you can use as shortcuts for coming up with the number of permutations. The formula for coming up with the total number of permutations is: n^r (n to the power of r), where n is how many types of things we’re considering (in the simple example, it’s 2 — hits and outs) and r is how many events we’re looking at (2 PAs in the example). 2^2 = 4 total permutations here. If we were considering singles, doubles, triples, homers, and outs as the only possible outcomes (there are 5 of them), and were analyzing the possible ways these could be arranged in a span of 7 PAs, the answer would be 5^7 = 78,125. So, yeah, that wouldn’t be fun to calculate by hand.

That formula, by the way, is specifically for situations where repeats are allowed (a.k.a. “with replacement”); since there’s nothing really making it impossible for a hitter to get several outs in a row, we can use this here. However, when it comes to breaking down the number of specific types of permutations (e.g. 1 hit and 1 out over 2 PAs), there’s another formula we should consider: r!/(r1! * r2! * … *rn!) . By the way, I saw this formula written with n’s instead of r’s, but I think that’s just confusing, since the variable here is the number of events. The exclamation mark stands for factorial, which tells you to multiply that number by all the positive integers that come before it; e.g. 4! = 1 * 2 * 3 * 4 = 24 … in Excel, =FACT(4) will do the trick. All the different r’s in the denominator represent how many instances there are of each type of event. I think that could use an example:

So if we’re talking about a cycle happening over the course of six plate appearances, since the cycle is achieved in only four of those PAs, we have two “spare” PAs to consider. Let’s simplify the possible outcomes to 1B, 2B, 3B, HR, and non-hits. Possibilities for those two spares include:

2 singles

1 single, 1 double

1 single, 1 non-hit

2 non-hits

… and you can imagine the rest. But let’s look at each of those. If the two spares are both singles, then there are a total of three singles in the six-PA sample. There are only one each of doubles, triples, and HR, and no non-hits in that situation.

1! and 0! both equal 1, which means we can ignore everything but singles in the denominator. If we wrote it all out, though, it’d look like 6!/3!1!1!1! … notice the different r’s in the denominator add up to the big r in the numerator. Simplifying down, the formula we end up with is 6!/3! = 120 permutations. That means there are 120 possible sequences of 6 PAs that could result in 3 singles, 1 double, 1 triple, and 1 HR. You’ll see why that’s relevant in a second.

OK, let’s say we’re dealing with a hitter who singles in 20% of his PAs, doubles 5%, triples 1%, and homers 9% of the time. We start finding the odds of him hitting for the aforementioned combination by doing: .2 * .2 * .2 * .05 * .01 * .09 = 0.00000036 . Not very likely, right? Well, that’s really the probability of each possible arrangement of that combination, which we discovered are 120 of them. So multiply that result by 120 to show that he has an overall 0.0000432 chance (or 0.00432%) of hitting 3 singles, 1 double, 1 triple, and 1 HR over the span of 6 PA.

Finally, let’s consider that the 2 PAs other than the cycle are non-hits. The non-hits are the repeat this time, so it’s again 6!/2! = 360 possibilities. Now, .2 * .05 * .01 * .09 * .65 *.65 *360 = 0.0013689. Our likeliest way to get a cycle by far.

Now you have to repeat the process for combos involving things like 3 triples (not likely), 2 doubles & 2 HRs, etc., but hopefully you get the point. The permutations will follow the same patterns, but the odds calculations will differ. But after you figure them all out, you add them all up and it gives you the total odds of that player hitting for the cycle given that many PAs.

The next step is doing the same sort of procedure for different given PA levels. You know that a cycle is impossible if you only get 3 PA in a game, so you can skip that. At 4 PAs, it’s really simple — there’s only one combination that allows you to get the cycle, and there are no repeats among them. There are 4! = 24 permutations of {1B, 2B, 3B, HR}. At 5 PAs, we’ll either have 1 non-hit in the mix — 5! = 120 — or there will be 1 repeat of a 1B, 2B, 3B, or HR — 5!/2! = 60. And so on. The next step is finding out how likely a player is to get 4 PAs, 5 PAs, 6 PAs, etc. in a game. I just did an analysis on 2012 Retrosheet data and found (by lineup position):

# of PAs in Game

Leadoff

2nd

3rd

4th

5th

6th

7th

8th

9th

Average

2

0.0%

0.0%

0.0%

0.0%

0.0%

0.0%

0.1%

0.1%

0.1%

0.0%

3

0.2%

0.5%

1.4%

3.2%

5.8%

10.7%

17.0%

25.7%

34.8%

11.0%

4

44.5%

52.6%

60.1%

65.7%

69.6%

69.9%

67.7%

62.7%

56.2%

61.0%

5

48.6%

41.5%

34.2%

27.8%

21.8%

17.0%

13.3%

9.8%

7.4%

24.6%

6

5.5%

4.3%

3.4%

2.7%

2.3%

1.9%

1.6%

1.4%

1.2%

2.7%

7

0.9%

0.7%

0.6%

0.5%

0.4%

0.4%

0.3%

0.3%

0.3%

0.5%

8

0.3%

0.2%

0.2%

0.1%

0.2%

0.1%

0.1%

0.1%

0.1%

0.2%

9

0.0%

0.0%

0.0%

0.0%

0.0%

0.0%

0.0%

0.0%

0.0%

0.0%

10

0.0%

0.0%

0.0%

0.0%

0.0%

0.0%

0.0%

0.0%

0.0%

0.0%

As you can see, hitting towards the top of a lineup can make a huge difference in how often a player will get those crucial 5+ PA games.

The Results

Using the 2012 PA breakdowns from above and MLB averages for 2010 through last week, I found the average odds of hitting for the cycle for a hitter with an “average” lineup spot to be about 0.0044% per game, or about once every 23,000 games. Well, maybe you can bump those odds up a little bit, because I didn’t consider the results of 8 PAs in a game and beyond. For a leadoff hitter (with the same MLB average stats, not with typical leadoff-hitter stats), it would be about 0.0071% per game, or close to once every 14,000 games.

But it turns out that Trout appears to indeed be the likeliest batter in the majors to hit for the cycle, with a neutral lineup position. He’s been hitting 2nd in the lineup recently, but if he were hitting first, you might figure him for about a whopping (relatively) 0.0375% chance of the cycle per game, or better than once per 2,700 games, based on his career rates. Add in the fact that he’s in a good lineup — and should therefore get more PAs than average — and things look even better for him. But even if we optimistically put him at a cycle per 2,500 games, that’s about once every 16 seasons, on average. Since triples are the hardest part of hitting for the cycle, we have to wonder how easy it will be for a bulky Trout to leg out a triple as he advances in age. It’s not going to get any easier — that’s for sure. So, sure, there’s a pretty good chance he’ll have another cycle, relative to most players, but it’s probably a 50-50 shot, at best.

Oh, 4th place on that list, by the way — Bryce Harper, at better than once per 3,600 games. The projections see him hitting fewer triples than he showed us last year, though. Here are your top 25 by chance of a cycle per game (based on 2010-present historical numbers, signified by “H”, or by the average updated Steamer and ZiPS 2013 projected numbers “P”, with a 400 PA minimum):

And a downloadable spreadsheet for you, if you want to see all of my calculations (watch out — it’s not pretty, and the numbers are from last week):

Some Caveats:

We can’t really be sure how relevant a player’s past rates are to their future odds (especially triples, since there are so few of them). You can try out Steamer or ZiPS projections in place of past performance, if you download the spreadsheet.

Park and lineup effects can make a big difference (changing teams can change the odds)

I’m sure PA frequency breakdowns aren’t entirely consistent between years, yet mine are based on only 2012 data

I only worked this out through 7 PAs in a game. Obviously, if you get 8 or 9 PAs in a game, your odds of a cycle go up considerably… but that rarely happens, especially over 9 innings.

Triples rates are basically the deciding factor here, and they are hard to predict.

Awesome piece… Just the kind of math I was in the mood for after a long weekend.

Would it be fair to say, based partially on your last sentence (Triples rates are the deciding factor) that the 3 things that matter most, in order, would be 1) Speed, 2) in-play% and 3) better-than-non-existent power?

As I see it, if a guy has wheels, puts the ball in play often enough, and can fathomably put one over the wall, he’s a candidate for the cycle.

Thanks Mike (and everybody). Yeah, I’d say that just about sums it up. For cycle purposes, a hitter is only as strong as his weakest link in the chain — be it triple or HR rates.

One thing I’d add — a hacker (who doesn’t take a lot of walks) will have better odds, since if you walked once in a 4-PA game, you killed your chances of a cycle. OK, unless walking instead of making an out allows you an extra PA later on…

A walk is really no different than a ground out with respect to hitting for the cycle. A hacker will only have a greater chance of hitting for the cycle if he is trading walks for hits. If he is trading walks for outs, it is not going to help him.

True, which is why I only considered non-hits as the alternative to hits. But since we’re talking about hits per plate appearance, free-swingers tend to have higher rates. Not necessarily, of course, but BABIP tendencies generally make it so.

My point at the end of that last comment, if it wasn’t clear, was that the fewer outs you (and your team) make, the more PAs you’ll have… which can effectively grant you a do-over opportunity at a cycle.

Calculating the odds of a perfect game are actually pretty straightforward. All you need to know is the odds of any hitter reaching base and raise it to the 27th power. I tried this just for fun and found a few interesting things to consider.

1) You need to subtract IBB% from OBP, because a perfect game assumes there are no IBBs.
2) You need to add the percentage of plate appearances that end in an error to OBP.
3) An error can be recorded on a pop foul and a pitcher can still through a perfect game (yes?)
4) Throwing errors due to players attempting to throw an advancing runner out are hard to remove from base data.

I came up with roughly 0.00000000001% chance of a perfect game occuring(on average).

And check my math, because I totally did that backwards. You need to determine the odds of a plate appearance ending in an out. I recalculated and came up with 0.002%. That means that there should be roughly one perfect game per decade.

A fifth thing to consider: These odds assume that the team who throws the perfect game also score a run before there are extra innings (I’m looking at you ’95 Expos).

In response to #3, yes, an error that does not allow a baserunner (i.e. a dropped foul ball) does not spoil a perfect game. As far as I’m aware that’s never actually happened in a perfecto at the major league level, but it could.

This all assumes that the likelihood of batter outcomes are the same in each plate appearance. But of course, they vary depending upon lots of things, especially the pitcher they are facing. In actual individual games, the odds must go up and down. It would be interesting to know the magnitude of the variance. It would also be interesting to know how the odds change during a game depending on the events of the game. For example, if you hit your HR, 3B and 2B early, there is a good chance your team has scored a lot of runs, and you might be facing mop up relievers later in the game, etc.

This is true. This analysis should only work on a large scale, that is, something along the lines of: if there have been x player games this year, what is the likelihood of there having been greater than 1 cycle hit? Percentages on the order of these, near a hundredth of a percent, are kind of meaningless for dealing with one or two games.

Works both ways though. You might be facing mop up relievers, You might be facing Mariano Rivera. He did quite a bit of Math to come to the conclusion he did. Their will always be questions with this stuff. That’s part of what makes it fun.

Isn’t easier, and at least as accurate, to take the number of times a hitter has hit for the cycle divided by the total number of games played to determine its probability. Using the play index at baseball-reference, I found that players have hit for the cycle 239 times since 1916. Divide that by the number of games played since 1916 gives one a pretty strong sample with which to work.

If you wanted to get a sense of the probability that any player will hit for the cycle in a given season, this may be useful. But the analysis here is better for dealing with specific players. Additionally, the raw number of cycles over a 100 year period could be misleading since it is not adjusted for context (i.e., cycles are more common in some eras than in others).

Chuckb, I did provide that chart towards the end for some historical context. It looks to me the 0.0044% per lineup spot per game I calculated matches up pretty well with recent historical rates. As you can see, though, mainly coinciding with the deadball eras there have been dips in the rates. Besides having lower hit rates, they had fewer PAs in a game to work with during those times.

And finding the average odds wouldn’t tell you much about a specific player’s odds. Mike Trout is about 8-9 times likelier than an average player to hit for the cycle… how else could you know that other than by this method?

Another, less accurate, way you could estimate this number is by bootstrapping their careers. Take all of their batter outcomes and place them in a bucket. Then choose six such outcomes repeatedly with replacement and record the proportion of times that the player hits for the cycle.

Makes sense. Thanks. I guess I was thinking about the answer to the question, “what are the odds that a player will hit for the cycle in a game?”; not, “what are the odds that (insert player’s name) will hit for the cycle in a game?”

Congratulations on all that crazy math which impresses me though am sad to say I didn’t understand much of it. The reason it seems to me that Trout is likely to hit for another cycle sometime is that he gets quite a few triples, the hardest part of the cycle. He has 6 already this season.

On a related note – it irks me when announcers say a player is “a triple short of the cycle”. While technically true, it’s just not notable of an accomplishment, since it’s the rarest/hardest part that they’re lacking. “Well gee, we almost got the weather cycle this September day – we had sunshine, high winds, and rain, we were just lacking the snow!” or “I almost completed the cycle – I played little league, high school, and college ball, just didn’t make the majors!” No crap! You’re not special!

One other interesting variable is that players look too cycle. Once you have that HR and triple it gets a lot easier, and players will sit at first on a double to complete it. I don’t know how this would effect things, but I have too assume when your random distribution gives you the hardest parts the odds spike. I would wonder how many times last season Trout hit HR 3B in his first 4 PA so that he was hunting for hits that were easier for him to complete. If for instance he had 5 hard side cycle starters last season, then I would put the odds at like 5% he completes one. I would maybe put it at 1/4 the original odds you purposed, since he will get at least 3-5 games this season with HR/3B, and since he will want to cycle if they fall early, he will try for it when ever distribution puts him in a position too. If you have the other 3 your hitting for the fourth part. One other curiosity, has anyone hit for the cycle with an inside the park as the HR?

Well, the main thing is you’ll have to guess how many more PAs he’s going to get in the game. If you say he’ll get 2 more, then it’s pretty easy — multiply his singles/PA and doubles/PA rate, and multiply that by two, for the two ways that can occur (single then double, or double then single). You can tweak those rates up or down if you think he’ll be facing an easy or a difficult pitcher. But on average, rounding off his singles and doubles rates to 16% and 5%, we could guess something like 16% * 5% * 2 = 1.6%.

Like Tim A says, players probably try for a cycle, at least when it’s close at hand. Maybe if it’s a blowout and he needs a single to complete the cycle, he slows down a bit and doesn’t go for a double… so perhaps it’s slightly higher than that.

If he has 3 more PA, then possibilities for completing the cycle are (S=single, D=Double, N=non-hit):

SDN, SND, SDD, SSD, SDS
DSN, DNS, DSS, DDS, DSD
NSD, NDS

So there are 6 situations that involve 1 single, 1 double, and 1 non-hit (with the formula, that’s just 3!, or 3*2*1=6, since there are no repeats of S, D, or N to divide it by). Say his non-hit chances are 73%. Then we do 16%*5%*73%*6 = 3.5% probability of getting a single, a double, and a non-hit over his final 3 PA. Now we have to figure the likelihood of him getting 1 single and also for 2 doubles and doubles and 1 single, and add both of those to 3.5% for his total odds. So, for 2 singles and a double, there are 3 ways to do that (6!/2!)… that makes the chance = 16% * 16% * 5% * 3 = about 0.4%. For 2 doubles and a single, it’s 16% * 5% * 5% * 3 = 0.12%. So, in total, 3.5% + 0.4% + 0.12% = 4.02%

I’m not sure that you can calculate the odds of a perfect game that simply. I understand the concept, but why not use the starting pitcher’s on base percentage against instead of the average hitter’s OBP? Or perhaps finding the percentage of Quality starts and multiplying by the OBP against of the quality starts? I think there are several approaches to making this determination, and I don’t know if they are self consistent.

Seems that someone has looked at probability using randome distribution here as opposed to actual. With the streakiness of players it should help here but hurt things like extra-long hitting streaks. Just like the odds of striking out 20 batters is pretty thin if teams had even distribution of talent but more likely with the existence of the Astros.

Well, you can’t assume even distributions of talent within teams, for that matter. You’d have to come up with probabilities for every batter vs. every pitcher, adjusted for park and defense effects, and then weight by frequencies. In other words, a ton of extra effort and complexity.

Ever consider that the probability of two or more hits are not independent of each other? Meaning, a player gets a hit or two, generates momentum (gets “hot”), and the probability of doing the cycle increases. Basically capitalizing on early game success. In this way, you could think about it as a conditional probability (not just multiplying the two probabilities together). Great post, great read, cheers.

I wouldn’t say the issue is completely settled, though — there may be some different ways to approach the research. One thing jumps out in the research, though — if you’re “hot,” you’re likelier to be intentionally walked.

Yea, I remember reading about how the “hot hand” is a fallacy in basketball back in a college stats class. Just thought it would be an interesting approach. And that’s a good point about being walked when “hot”. Cheers.

Steve, this is a great piece. I’m a trained history, geography, and theology guy, so most of the math is lost on me but I enjoyed the article!

A (very shallow) thought occurred to me as I’ve read the article and the comments: since, say, 1990, being a Blue Jay at some point in your career significantly increases the probability of hitting for the cycle. I’m thinking of Jeff Frye (a cycle I witnessed!), Aaron Hill (x2), Dave Winfield, John Olerud (x2), Bengie Molina, Paul Molitor, Tony Fernandez, Jeff Kent, Jose Reyes, Fred Lewis, Melky Cabrera, Orlando Hudson, and Kelly Johnson. 81 cycles since 1990, 15 by players who played with the Jays at some point in their career.

Or maybe I’m looking at it backwards: if you’ve hit for the cycle, the probability that you have been/will become a Blue Jay is significantly increased. That makes it a management issue. :)

If only there was some genuine correlative factor. Well, perhaps I shouldn’t swim in the deep end of the pool…