Analyzing the Four HR Game

Can History Repeat Itself?

By Tavis Bregel and Paul Bessire, WhatIfSports.com

July 10th, 2008

In today’s article I want to start by exploring the rarity of a player hitting four home runs in a game. In Major League Baseball history there have been fifteen four home run games. The list ranges from five Hall of Famers (Mike Schmidt, Willie Mays, Ed Delahanty, Chuck Klein and Lou Gehrig) to three players who never played in an All-Star game (Bobby Lowe, Pat Seerey and Mark Whiten). My apologies to Bobby Lowe, whose career preceded the All-Star game and who probably would have played in one if there was one at the time (late 1800s to early 1900s). Thus, there are not defined characteristics that allow for a player to achieve a four home run game. Players like Bobby Lowe and Mark Whiten are contrary to the myth that only home run kings like Barry Bonds and Babe Ruth can accomplish this feat. Only Schmidt and Mays hit 500 career home runs and four home runs in a game. As displayed below, I have listed the 15 players and historical totals from home runs per plate appearance to find how common a home run is. (See charts below.)

Fifteen "4 Homeruns" Players HR/PA

Name

HR

HR/PA

Bobby Lowe

71

0.917%

Ed Delahanty

101

1.204%

Lou Gehrig

493

5.104%

Chuck Klein

300

4.185%

Pat Seerey

86

4.107%

Gil Hodges

370

4.566%

Joe Adcock

336

4.600%

Rocky Colavito

374

6.184%

Willie Mays

660

5.283%

Mike Schmidt

548

5.446%

Bob Horner

218

5.174%

Mark Whiten

105

2.982%

Mike Cameron

228

3.620%

Shawn Green

328

4.566%

Carlos Delgado

442

5.624%

Baseball History (1871-2007):
Total Plate Appearances: 14,665,075
Total Homeruns: 245,898
Total Plate Appearances per Homeruns: 1.677%

(What is home run per plate appearance? Simply put, it is the amount of home runs a player hit divided by their total plate appearances. The resulting percentage shows how many of their plate appearances you would expect a player to hit a home run. This is the inverse to home runs per at bats, but in terms of plate appearances instead.)

As you can see, any player with any home run history could hit four home runs in a game, from Bobby Lowe (71 career home runs) to Willie Mays (660 career home runs) for a career average of 311 home runs among the fifteen players. Just because a player is Babe Ruth, Hank Aaron, or Barry Bonds doesn’t mean that he will tie (or even break) the single game home run record. For all we know, as the statistics display, the next four home run hitter could be Albert Pujols, Lance Berkman, Ian Kinsler or even Carl Crawford. Predicting the next four home run hitter could be difficult to impossible because of the rare occurrence of the home run per plate appearance (and imagine that happening four times in a game).

By looking at home run/plate appearance you can see that a player hitting a home run is not as common as media portray. Look at the career of Willie Mays. He hit an impressive total of 660 career home runs, but on average he hit a home run in just over five percent of his plate appearances, which works out to be one home run every twenty plate appearances. This is still an amazing feat when considered that it is being done additional three times during those plate appearances.

Referring to the second chart above, you probably noticed the differing totals between the Baseball History totals and Fifteen “Four Home run” players by HR/PA. Earlier it was noted that the difference in pedigree between the Fifteen “Four Home run” players and every other baseball hitter is minimal. Why do these numbers differ? The historical data include the at-bats of pitchers and substitutes, which cannot accomplish this feat.

In the chart below, I calculated the rarity of a four home run game when a player has four, five, six and seven plate appearances during a game.

Now how did I come up with these numbers? Well, these numbers were calculated by using statistical principles of multiplying the percentage of the occurrence of one home run four times to get the probability of four straight home runs. Then I factored in the games with additional plate appearances that did not result in home runs by using permutations to support an increased probability due to the availability to make an out, walk or non-home run hit and still accomplish the feat.

Thus, these statistics prove why only fifteen players have hit four home runs in a game over the 136-year history of this sport. Look at it this way, the chance of hitting four home runs, with three extra appearances is still less than three ten thousandths of a percent. In that situation, there is a 1 in every 361,011 chance of hitting four home runs in seven tries. And to even imagine hitting four straight home runs happens 1 in every 12,642,225 attempts, which is four times the odds of picking five correct numbers in the lottery. Again, these numbers will be slightly better for our fifteen home run hitters due to the fact that the historical home runs per plate appearances include pitchers and reserves as well.

After calculating these figures, my next objective was to compare the historical results with the WhatIfSports simulation. First, I decided to pick three players from the list of fifteen. Due to time constraints and the amount of data, I believed it was best to only focus on three players. My decision was to explore Mark Whiten, Carlos Delgado and Willie Mays’ performances. It was important to me to include an average (non all-star) player, a perennial all-star, and Hall of Famer to show the contrast based on talent level in achieving this feat. The only weakness is that I didn’t include a player from the earlier era that included Lowe, Delahanty and Gehrig, but I prefer to focus more on recent years that apply more to the nature of today’s game. After deciding on these three players, I had the simulation run 10,000 times with the same starting pitcher, roster and location from the date of the actual feat. Now, it is important to note that just because I am specifically trying to repeat a previous event, it doesn’t mean the situations will play out the same. This simulation is quite involved, but I cannot control the bullpen and in-game situations. For example, intentional walks are more likely to occur if runners are on second and third bases, so that can become a missed home run opportunity that didn’t occur before. Nevertheless, this is a good starting point in trying to see how the mathematical calculations are represented in the simulation. Please see below for the simulation results:

Home Run Totals

0

1

2

3

4

5

Mark Whiten

7701

2034

238

26

1

0

Carlos Delgado

7485

2217

278

19

0

1

WIllie Mays

7558

2166

258

18

0

0

Before we work in some additional math, let’s explore the results of the simulation. The first result that stands out to me is that just because the event happened once (in reality) don’t expect it to happen every time (in the simulation). This is important logic to consider when working with WhatIfSports simulators, but it should not take away from the accomplishments of these athletes. If anything, it makes their feat more impressive.

The second result is that a five home run game is not impossible, though fifteen people have hit four in a game and zero people have hit five in game. Will it happen? Probably so. When will it happen? Who will accomplish it? I could hardly start to guess.

So now that we have these totals, how do they compare to the earlier calculated home runs per plate appearance? I calculated this the same way, but for plate appearances we will utilize the average plate appearances per games throughout the season of the feat. However, it is important to realize that this is still estimation and will result in slight variance. Also, I am going to include park factors (as determined by WhatIfSports) to normalize the simulation results when comparing with the historical figures. For example, in the case of Willie Mays, he hit his four home runs in Milwaukee at County Stadium, which was not considered a home run park. This is why he did not repeat the feat in the simulation. Please see below to compare the results of the simulation with the historical data:

Yes, this is a lot of data to search through, but I want to make it clear how the final analysis came to be. First, I calculated the HR/PA of the simulation by finding the average plate appearances during the season and in turn, I used this to find the percentage of home runs, the simulated totals and per total plate appearances. As mentioned earlier, the plate appearances are estimations, which caused a slight variance in the results. From there, I compared the simulation results to the real statistics. The resulting difference showed a difference between the simulation and reality. As mentioned earlier, it is important to factor in the stadium to normalize the historical data. After finding out all of the results, the final analysis was quite impressive and showed some interesting factors among the players

Mark Whiten Factor

There seems to be an oversized difference between Mark Whiten historically and Mark Whiten in the simulation. Before you start to doubt the simulation, realize that Mark Whiten, for a good portion of his career, was not a full-time starter but mostly a reserve. This is where the portion of the difference results from real life to the simulation. It is important to note that hitting home runs off the bench is unlikely because of the difficulty of staying fresh and knowing what matchup to expect. If we calculate Whiten for the season his percentage of PA/HR will relate closer to the simulation. Remember, WhatIfSports sets up the simulation to value players per season and not per game or per career. The result for the season was about 4 percent, which is closer to the simulation, but there is still a significant variance. Could it be because the system over-estimated Whiten’s power? It is possible, but probably the data was high because this season was by far the best in his career, which in turn results in the highest numbers for Mark Whiten. There is still a concerning variance, but I believe that it is insignificant to the process.

Willie Mays Factor

Willie Mays had the least difference compared to the simulation. This does not surprise me because Willie Mays was the most consistent throughout his career. The one thing that is interesting about Mays is that his difference was closer prior to applying the park than afterwards. The first question is could this again be due to not comparing by individual seasons instead of career totals? For that season, Willie Mays had a 6.07 percent rate for the season. By utilizing this number, unlike with Whiten, the difference actually increases. Truthfully, the results with Mays are pretty accurate and relate closely to his season and career.

Before understanding the Carlos Delgado factor, I want to explore him further. My ambition was to compare Delgado at Toronto with different levels of pitching talent. One thing that hasn’t been stated much is the variation that is determined by the ability of the pitcher. I wanted to take this a step further by looking at the difference in the parks and repeating his matchup, but playing at Tampa Bay (his 4 home run performance was against Jorge Sosa and the Tampa Devil Rays). Then I looked at this further by comparing him to a similar player (as determined by Baseball Reference) at the same age, who turned out to be Fred McGriff from the Atlanta Braves. In my results, I am more concerned with how the range of numbers (0, 1, 2, 3, 4, or 5 home runs) varies with the previous simulation than the calculated HR/PA, so we will not focus any further on the math.

Carlos Delgado Home Runs Revisited

Home Run Totals

0

1

2

3

4

5

vs. Sosa @ Tampa Bay

7469

2203

300

25

3

0

vs. Benoit @ Toronto

6697

2790

458

48

7

0

vs. Pedro @ Toronto

8640

1286

71

3

0

0

Fred McGriff Home Runs

Home Run Totals

0

1

2

3

4

5

vs. Kyle @ Atlanta

8947

1005

47

1

0

0

vs. Bautista @ Atlanta

7640

2085

253

22

0

0

Carlos Delgado Factors Reviewed

1. The impact of the pitcher: What a large impact the quality of the pitcher has. During his 10,000 games against Joaquin Benoit, he hit 3,878 home runs and during his 10,000 games against Pedro Martinez he hit 1,437 home runs, which compares on the same cycle of the 2,835 against Jorge Sosa from the earlier simulation. Without diving in too deep, it is easy to determine that the amount of home runs (as well as multiple home run games) varies based on the HR/9 and talent level of that pitcher. Of course, a player is expected to hit more home runs against Jorge Sosa than Pedro Martinez. This simulation proves that the pitchers are valued down to their ability to prevent home runs.

2. The impact of the park factor: The park factor plays a role in results, but as compared in this analysis, it is not as huge as you would think. Delgado hit similar totals of home runs against Sosa at both Tampa Bay and Toronto. However, he seemed to have more multi home run games at Tampa Bay- but not by a huge margin. Ironically, Tampa Bay is less of a power park than Toronto. All this really proves is that park factor is a role in the short term, but in the longer term it seems to become negligent.

3. The comparison player – Fred McGriff: There is one key point that I would like to make about Fred McGriff’s performance. Yes, he again proves the pitcher factor (Darryl Kyle lowest HR/9 vs. Jose Bautista highest HR/9). However, one thing stands out in my mind: he could not accomplish the feat in 20,000 games. Yes, he compared favorably with Delgado, but he could not do something that Delgado did- hit four home runs in a game. Does this mean that the simulation only allows people who hit four home runs in a game to accomplish the feat? No, even Willie Mays didn’t repeat in the earlier simulation. What this again proves is how rare the four home run game is, and what an accomplishment it should be treated as. (As long as the next five games aren’t 0 for 25 with 15 strikeouts).

I hope you enjoyed this in-depth analysis of the four home run game. If you have any questions about this article or would like to share your thoughts and opinions, please send me an email at bregeltn@email.uc.edu. I will be happy to discuss it with you!