Baseball

From a Statistical Perspective

This post represents my first attempt at blogging. As such, the topic is rather obvious and possibly uninteresting. Nevertheless, several reporters suggest that the St Louis Cardinals must sign the 31 year old Albert Pujols. Generally, these "analysts" argue that a team simply can't let a player of his caliber leave. In other words, the Cardinals should spend top dollar to retain their star player.

While I think the Cardinals should try to sign Albert Pujols, I'm hesitant to let this contract exceed 5 or 6 years. All baseball players decline as they grow older. Albert Pujols, while starting from a higher level of offensive production will still enter a decline. To be sure, Albert still (most likely) has several years of quality play remaining. The question is what happens later. This blog post shows the well-known (but often forgotten) negative correlation between age and offensive production.

The hypothesis suggests that offensive production diminishes as MLB players grow older. To test this assumption I gathered data (from Baseball Guru) on all MLB players (with more than 50 AB's) between 2005 and 2010. Using this data I use OLS regression (with fixed effects for each player) to ascertain the effect of age on offensive production. Because this relationship should be curvilinear, I include the age squared in these models.

This figure displays the effect of age on both a player's OPS and average number of home runs per at bat. The black line represents the predicted offensive productivity for a given player at each age. The dashed red lines represent ninety-five percent confidence intervals. The histogram behind each plot displays the proportion of individuals in each age category. The regressions using both measures of offensive productivity suggest that a player's numbers decline as they age.

These graphs demonstrate that the effect of age on offensive production is negative and that the magnitude of that effect increases over time. What these graphs do not demonstrates is the exact age at which a player declines. Unfortunately, due to the competitiveness of Major League Baseball bad players do not last. As a result, when estimate the effect of age on offensive production, the players remaining after age 30 represent the good players from younger generations. Most authors call this the "survivor problem."

What is the effect of the survivor problem on the estimated presented in figure 1? If only good players remain in the sample past age 30 (or even younger) then the estimated production for older players is based on good players. This means that the estimated effect of age is less negative than it should be. In other words, if the bad players from age 20 were allowed to stay in the league, then the observed effect of age would be more negative because we can reasonably assume that these players would decline faster than the player's remaining in the league. (It's worth noting however that since Pujols is a "good" player, the estimates for older players may be appropriate).

The Cardinals should not award Albert Pujols a decade-long contract that requires a high salary in years 6 through 10. Reports suggest that Pujols desired a 10 year 300 million dollar contract last season. Given his current age (31) the Cardinals probably should not sign that contract. To be fair, it appears unlikely that the Cardinals' current management would off such a contract. But, it does not stop some fans from asking.