Pythagorean Theorem of Baseball

From BR Bullpen

The Pythagorean Theorem of Baseball is a creation of Bill James which relates the number of runs a team has scored and surrendered to its actual winning percentage, based on the idea that runs scored compared to runs allowed is a better indicator of a team's (future) performance than a team's actual winning percentage. This results in a formula which is referred to as Pythagorean Winning Percentage.

The rationale behind Pythagorean Winning Percentage is that, while winning as many games as possible is still the ultimate goal of a baseball team, a team's run differential (once a sufficient number of games have been played) provides a better idea of how well a team is actually playing. Therefore, barring personnel issues (injuries, trades), a team's actual W-L record will approach the Pythagorean Expected W-L record over time, not the other way around. The average difference between the actual and the Expected W-L is a bit more than 3 games at the end of a season (although a recent exception is the 2005 and 2007Arizona Diamondbacks, who both beat their expected W-L by 11 games), as did the 2012 Baltimore Orioles. Deviations from expected W-L are often attributed to the quality of a team's bullpen, or more dubiously, "clutch play"; many sabermetrics advocates believe the deviations are the result of luck and random chance.

Because of this, expected W-L makes for a good predictor of performance in mid-season. If a team has a 40-25 record, but a Pythagorean winning percentage at or below .500, it should not be surprising when this team's record drops as they start losing close games. In fact, expected W-L correctly predicted the fate of the 2005 Washington Nationals. On July 5 of that season, Washington was 19 games over .500 and 4 1/2 games ahead of second place, but had a Pythagorean W% of exactly .500. They went 30-49 the rest of the season to finish at .500 (four games ahead of their final expected W-L).

Nevertheless, given that advocates of the theorem point to teams that exceed their predicted number of wins as having done so due only to random chance, it is questionable whether the theorem provides anything indicative with respect to an individual team during a given season, as opposed to being a construct that shows the general relationship between scoring runs and preventing runs in winning baseball games. For instance, in the Nationals example above, another outcome from a different season might have shown the Nationals improving their run differential in the second half making their first half run differential the outlier.

While the strength of Bill James's formula is its simplicity, other sabermetricians have tried to refine it by playing with the coefficients, which in the original formula's case are simply squares of the original run values. One such alternative formula was developed by mathematician Stanley Rothman and reads as follows:

EXP(W%) = m* (RS - RA) + b

This formula includes two different constants, with (b) being 0.50 as the aggregate winning percentage of all teams in any given season is .500. The second constant (m), the "slope", is calculated based on the results of all games played from 1998 to 2013, i.e. from the time Major League Baseball reached its current number of 30 teams until the final season for which statistics were available when Prof. Rothman devised his formula. He came up with the constant 0.000683. The theory behind this number is that most games are not decided by exactly one run and that not all extra runs scored or allowed have an equal impact on winning percentage.

While this formula is more precise than the original, its application is limited to a specific era of baseball, and the slope (the m constant) will need to be recalculated if there is a change in the scoring environment (for example, if baseball were to enter a third "deadball era")