Checking the Numbers

Two Out of Three Ain't Bad

the archives are now free.

All Baseball Prospectus Premium and Fantasy articles more than a year old are now free as a thank you to the entire Internet for making our work possible.

Not a subscriber? Get exclusive content like this delivered hot to your inbox every weekday. Click here for more information on Baseball Prospectus subscriptions or use the buttons to the right to subscribe and get instant access to the best baseball content on the web.

Forgive me a lapse of obviousness, but Albert Pujols is one of the greatest players of all time, the type of all-around talent I will take pride in declaring incomparable when describing his career to my future children. He makes an ample amount of contact, knocks the ball out of the yard at least 30 times a year, drives plenty of his teammates in, plays Gold Glove-caliber defense, and makes up for a lack of raw baserunning speed with smarts on the basepaths. This confluence of characteristics makes Pujols the perfect specimen, sort of like the baseball equivalent of the comic book character Deadpool. It also makes Pujols a virtually unanimous choice to be a plausible Triple Crown heir apparent to Carl Yastrzemski, the man who last accomplished the near-impossible feat back in 1967. Sticking solely to the senior circuit, nobody has topped the leaderboards in batting average, home runs, and RBI in the same season since Joe Medwick did it for the Cardinals in the 1937 campaign.

Pujols is no stranger to the idea of a Triple Crown either, routinely finishing in the top five or ten in each category; however, as odd as it may sound, he has only actually led the league in one of the three categories once, when he led the National League with a .359 average back in 2003. This mere fact should serve as a testament to the difficulty in attaining a Triple Crown, as the premiere talent of the league, one of the more balanced and tremendous players in the history of the sport, has only led in a category on that one solitary occasion. Tim Kurkjian attributed some of this to talent specialization, wherein players sacrifice some of their power output for an increase in batting average, or vice versa. Still, if anyone in the game were to realize a Triple Crown in the next few seasons, you would think that it would be Pujols.

What are the odds that he actually wins it this year, given the fervent speculation pointed in that direction? About a month ago he led in both dingers and ribbies, and sat just a few points behind Hanley Ramirez on the batting-average front. In the relatively brief span from then up until Monday, however, Mark Reynolds has jimmy-jacked his way into contention for a Dave Kingman-like home-run title, Prince Fielder has knocked in teammates with reckless abandon, and Ramirez has increased the batting average gap between himself and the reigning most valuable player. While Pujols might be the odds-on favorite to win a Triple Crown out of all active players, it certainly stands to reason that his projected end of season line-a .326 average, 53 home runs, and 148 RBI, might not win in any of the categories, let alone all three.

Fat Albert has a solid shot at taking home the top prize in both the homer and RBI contests, but will likely have to settle for silver in the batting average department. Assuming Ramirez lives up to his adjusted projection and finishes the year at .342, Pujols would need to hit roughly .384 to jump from .326 to .343. Suffice to say, this isn't terribly likely, and translating the above text into a definitive odds ratio proves much tougher than our earlier look at Joe Mauer's chances of hitting .400, primarily because we have to evaluate not just Pujols' performance, but the relative performance of other players as well in this case. We can calculate the probability that Pujols hits .384 over his final 221 PA and ends up at .343, but that ignores how Hanley or others perform. For all we know, Hanley could continue his torrid pace and finish the year at .365, with Pablo Sandoval catching equal fire and ending his campaign at .345. In short, the methodology proposed here is going to produce ballpark results at best, as the underlying assumption will be that all players within reach of a categorical lead will play to their projection from here on out.

Essentially, to determine the overall probability of a Triple Crown using the aforementioned methodology, we need to calculate either the likelihood that Pujols surpasses or achieves a PECOTA-projected threshold in an area in which he trails, or, in categories he leads, the likelihood that his closest competitors reach the title-worthy mark. The probabilities for these competitors are then added up, subtracted from one, and Pujols's probability of winning that category results.

For instance, PECOTA projects Pujols to end the season at 53 home runs, which would lead Reynolds by four, and Adrian Gonzalez and Adam Dunn by eight. Find the HR/PA rate for each of the players, the number of probable PAs remaining to them, and the number of home runs needed to reach at most 52. Mark Reynolds projects to step to the dish 200 more times this year, and would have a HR/PA of right around 0.070. With 38 long balls already in the books, he needs 14 to reach 52, so the Excel formula would be 1-(BINOMDIST(14,200,0.070,TRUE).

The TRUE stipulates that the player will experience at most 14 successes in 200 chances given a 0.070 rate; subtracting that result from one provides the probability that the player exceeds that threshold, in this case the likelihood that Reynolds hits more than 52 home runs. For Reynolds, the resulting percentage stays strong at 42.9 percent, with Gonzalez at 3.9 percent, Dunn at 2.1 percent, and Ryan Howard at a mere 0.8 percent. Added together and subtracted from one, Pujols has a 50.3 percent shot at winning the home-run title if 53 home runs will, in fact, win him the title, and everyone within striking distance plays to their projection.

This process is then rinsed and repeated for the RBI title, but the only players with a realistic shot of winning the RBI title are Prince Fielder and Ryan Howard-note that when these probabilities were calculated, Pujols and Fielder were tied at 104 RBI. The projections peg both Pujols and Fielder as 148 RBI men, with Howard a fair distance behind at 134. Utilizing the binomial distribution once more, Fielder emerges with a 56.3 percent shot at exceeding 148 RBI, with Howard again at a mere 0.2 percent chance. This leaves Pujols with a 43.5 percent probability of winning the RBI title.

Multiplying both of the probabilities together-0.503*0.435 = 0.2188-gives Pujols a 21.9 percent chance of winning both the home-run and RBI titles. The third leg, batting average, is bound to produce a much lower probability, given the vast gap between himself and Hanley, and the relatively limited time in which ground can be made up. Given his rate of walking and the batting average he would need to realize in the playing time remaining, Pujols basically would have to go 70-for-183 (with a .326 established talent level) in order to finish the season at .343, a mark that would best Ramirez's projection by a lone point of batting average.

The resulting binomial, 3.4 percent, agrees that such an occurrence is very unlikely. Multiply the 0.034 to the other two legs of the Triple Crown, 0.503 and 0.435, and Pujols ends up with a miniscule 0.74 percent chance of winning the Triple Crown this season-or odds of 134 to 1-an astounding number and odds ratio given that he may very well finish the year with a final line of .326-53-148. If the rest of the current season were replayed over and over again, on average Pujols would win the Triple Crown once every 134 replays.

Again, this is not an exact probability given the number of assumptions made based on the in-season projection and how the variables, the other players, throw wrenches into the probabilistic machine. Regardless, it is hard to fathom that the Triple Crown probability would increase past, say, two percent with a more accurate and time-consuming methodology. I feel like a broken record in stating that the very low probability of achieving an historical feat should not take away from the season for the player in question, but that would be the understatement of the century for Pujols. Albert may win his third MVP award this season and, Triple Crown or not, he will remain one of the best players in history, one with another tremendous season to add to the back of his baseball card.

Eric -- You've implicitly assumed (by multiplying the probabilities together) that winning the HR title and winning the RBI title are independent events. Since Pujols' HR total & RBI total are clearly positively correlated, that assumption isn't correct. Similarly, there would be some positive correlation (albeit very mild) between Pujols' HR/RBI totals and his final BA.

I think your ultimate conclusion remains valid, but it would be interesting to see (e.g., via a Monte Carlo simulation) a better estimate of the probabilities.

I think the effect could be much bigger than Rowan suggests. Sure, the chance that Pujols goes on a tear and wins the batting title is remote, but if he does he's almost sure to win the RBI crown as well. Similarly, in the unlikely event that he hit 20 more homers, the extra RBIs will almost certainly give him the RBI title.

The interdependence between Pujols' average and homer run rate is less clear.

Instead of subtracting from 1 the probability of each of Pujols' competitors passing him, shouldn't we multiply the probability of each competitor NOT passing him. In the case of HRs, this would be: .571*.961*.979*.992 = .533 or 53.3%.

Yes, that's exactly right. You can't just add the probabilities of the different competitors catching him. Because there's a chance that 2 of them would catch him, which also means that there is a greater chance than (1-sum) that no one does. To illustrate, what if 4 different guys each had a 30% chance of catching him (adds to 120%)? Would that means Pujols has no chance of winning the title?

Regarding the interdependency of HR and AVG, I think we should consider them likely to be positively correlated in this case. Assuming a fixed contact rate, which we can do because we are looking at 1 player in isolation, HRs have the highest likelihood of being base hits (100%).

So given a fixed number of PA, and a fixed contact rate, more HR = more AVG. And furthermore, if he truly goes on a tear, it's quite likely that his BB rate will increase as well, thus increasing the marginal contribution of each hit to his end of season batting average.

Multiple people already commented on this, but treating these three crowns as independent events is clearly incorrect - not only are HRs and RBIs positively (and obviously) correlated, but if he goes on a hitting tear enough to win the AVG title, he's probably got the other two by default.

It wouldn't surprise me that the actual % is much closer to the 3.4% you specified for AVG alone.

If you're hitting that torridly, and you assume that you have a constant % of fly balls turning into HRs, you win the HR title. If you're hitting that torridly, and getting that many HRs, you're going to drive in enough people to win the RBI title.

Using an Excel function that I created, that measures the odds of Hanley getting exactly 0 hits * the odds that Pujols will have a higher BA then Hanley if Hanley gets 0 hits ... for all the possibilities, it gave Pujols just a 0.13% chance at the BA crown, without even figuring any other players. With this, I doubt he has better than a 1 in 2000 chance.

Several readers have commented on the interdependence of the three events. There's also an implicit error in the way Eric calculates the probabilities for the individual outcomes. As it stands now, the analysis takes Pujols' PECOTA-implied results as given, then computes the probability of Reynolds matching this number given his HR/PA rate. However, what one really wants to know is the how likely Reynolds is to catch Pujols however many home runs Albert hits. This is the JOINT distribution of outcomes of two binomial distributions.
As of this morning (Sept. 1), Pujols had hit 41 HRs in 566 PAs, a rate of 7.24%. Reynolds now has 40 in 540 appearances for a 7.41% rate. Projected over another 200 plate appearances, those rates yield 55.49 expected home runs for Pujols but only 54.81 for Reynolds.
But those are just the expected values. There's clearly a distinct possibility that Mark could hit more and that Albert could hit fewer. For example, there's a 9.84% chance that Reynolds could hit 56 or more home runs, and a 51.7% chance that Pujols could hit 53 or fewer. Summing across all these combinations (i.e., Reynolds hits X & Pujols hits fewer), there's a 41.05% chance of Reynolds passing Pujols (as well as a 7.61% chance they end up tied).
Clearly these results are highly contingent upon the rate at which each player is expected to hit dingers for the rest of the season. Using the 7.0% rate cited in the article for Pujols would raise the probability for Reynolds winning the HR title to just under 45%.