Schrodinger's Bat

The Biggest Booms and Busts?

the archives are now free.

All Baseball Prospectus Premium and Fantasy articles more than a year old are now free as a thank you to the entire Internet for making our work possible.

Not a subscriber? Get exclusive content like this delivered hot to your inbox every weekday. Click here for more information on Baseball Prospectus subscriptions or use the buttons to the right to subscribe and get instant access to the best baseball content on the web.

"The present is never our goal: the past and present are our means: the future alone is our goal. Thus, we never live but we hope to live; and always hoping to be happy, it is inevitable that we will never be so."--Blaise Pascal

Well, it's that time of year again. The season that was is past, and plans are being laid for the season to come. From a performance analyst's perspective this can only mean that a part of our attention turns to player forecasts, as Joe Sheehan's recap of "ShandlerFest" reminded me. I also received The Bill James Handbook this week, which has a "Hitter Projections" chapter written by James himself. In that chapter, we find a list of projections that went very well for Baseball Info Solutions (Brad Ausmus, Rob Mackowiak, Jeff Conine, Pedro Feliz, Carlos Beltran, and Adrian Beltre) as well as those that didn't (Carlos Quentin, Carlos Pena, Prince Fielder, B.J. Upton, Marcus Giles, and Richie Sexson). In thinking about cases like Quentin (a bust for the Diamondbacks) and Pena (a boom for the Devil Rays) I started to wonder about the biggest booms and busts of all time in terms of a reasonable projection of performance. Which teams were most let down by a player's performance, and which were most surprised, assuming of course that they had reasonable expectations in mind? (I grant you, that's a huge and demonstrably false assumption that bursts the entire bubble, but play along).

Projections on the Cheap

We'll start by examining almost 17,000 batter seasons from 1903 through 2006 in an effort to sort out the booms and busts. The first task here is to create some "projections." I say "projections" because the projections we're creating are for seasons that have already occurred, so there really is no future aspect to this. We can estimate a performance level for a given season and league based on a player's performance prior to the season in question. Since we're not trying to replicate something as complex as PECOTA or even the BIS projections, we'll base ours on normalized park-adjusted OPS (NOPS/PF). I've used this measure in the past to boil offensive production down into a single number on a rate basis, where 100 is a perfectly league average OPS when adjusted for the home park of the player. Although very simple to calculate, OPS is a great proxy for run production and matches up very well with more complex run estimators, including BaseRuns and Runs Created.

So the methodology for creating the projections goes like this:

Select all players with 300 or more plate appearances for a major league team from 1903 through 2006, excluding the Federal League.

Calculate the NOPS/PF, plate appearances, and age for the set of players from step one. This is the actual performance against which we'll measure our projection.

Weight the previous three years' NOPS/PF using a system where the previous year is weighted at seven, year minus two is weighted at four, and year minus three gets a two.

Take the value from step three and regress it to the mean by calculating a weighted average of the plate appearances for the previous three years and combining it with enough league-average plate appearances to total 2,000. In other words, if a player had accumulated 2,000 weighted plate appearances in the previous three years, his calculated NOPS/PF from step three would not be regressed at all. However, if he had, for example, a weighted average number of plate appearances of 312 (as Tadahito Iguchi did coming into the 2006 season) in his previous three seasons, his NOPS/PF would be regressed to the mean with 1,064 additional league average plate appearances, since (312*3)+1064 = 2000.

Given the NOPS/PF calculated in step four, apply an aging adjustment. This curve is similar to the kinds of aging curves that Nate Silver has talked about in the past, although I adjusted it somewhat to be less severe, since I'm pre-selecting players with 300 or more plate appearances, and in this population declines are not so precipitous.

Finally, apply a league adjustment using the Level Indexes that I discussed in a previous column. This allows us to account both for players who switch leagues and for the natural decline that happens as the league improves over time. In the former adjustment, for example, both Frank Howard in 1965 and Frank Robinson in 1966 have their projections increased slightly by moving from the National to the American League.

We end up with 16,900 projected NOPS/PF values for the actual NOPS/PF values, with a correlation coefficient of r=0.64. When compared with how other projection systems perform, that doesn't seem bad at all; in fairness, our correlation should be pretty good since we're using a backwards-looking approach and have selected only players who accumulated 300 or more plate appearances in the season we're trying to project. In short, we have a selection bias in play. Players who suffered injuries that wiped out their seasons, or who retired, or who simply performed so poorly that they didn't even garner the minimum playing time for what could be considered a regular player are all excluded from the projections. In 2007 terms, think Nick Johnson.

On the other hand, we haven't set minimums on the number of plate appearances for previous seasons; even if a player has one plate appearance in the previous three seasons, we'll create a projection, as we did for Hanley Ramirez in 2006 (who not surprisingly came in with a projection of 100). When we remove all players who had fewer than 100 projected plate appearances, the correlation coefficient goes up to 0.66.

To get a feel for the actual shape of the distribution, the following chart shows the histogram of the differences between the actual and projected NOPS/PF values, along with the cumulative frequency (the orange line).

From this you can see that there are almost an equal number of projections that fall on either side of zero, with the average difference being -.25 points of OPS. Altogether, 6,630--or 39 percent--of the projections fall within five points.

As an aside, one of the paragraphs that caught my eye in The Handbook chapter on projections was this one relating to the nature of projections:

We project, basically, that every player will continue to do in the future whatever he has done in the past. If a player has hit .250 in the past, we project that he will continue to hit .250. If he has hit .350 in the past, we project that he will continue to hit .350. If he hit .350 in 2006 and .250 in 2007, we project that he will hit .300 in 2008. We're pretty close to right most of the time, because most players in any season will continue to do about what they have done in the past.

I wholeheartedly agree with the last sentence; after all, past performance is what we use to project the future. But still, I may be reading the previous sentences too literally. I found that when doing these projections it was essential to employ regression to the mean and specifically not assume that a .350 hitter one year (or a player with an NOPS/PF of say 135) was necessarily going to be a .350 hitter the next. As has been shown time and time again, it turns out that when a player hits .350--which is near the far right end of the distribution for a major leaguer--it is much more likely that he'll hit something less than that the next season, simply because staying out on that tail is extremely difficult and more likely achieved with a little push from lady luck. In any case, of steps four through six discussed above, the regression to the mean in step four had the greatest impact on improving the correlation coefficient, moving it from 0.53 to very nearly its final value.

As you might expect, there were a few players that the system nailed both in terms of projected NOPS/PF and in plate appearances, some of whom are shown in the following table:

There were many more (over 600) that projected the NOPS/PF exactly but not the exact number of plate appearances, and around 300 that had the correct number of plate appearances. However, the players in the previous table aren't the interesting ones, nor are they the focus of this column: let's focus on the booms and busts.

A Question of Measures

Before we actually come up with the top ten booms and busts, we need to determine what we should use to rank them. At first glance it would seem the simplest thing to do would be to take the difference between the actual NOPS/PF and the projected NOPS/PF. If we do that, we'll end up with some extreme differences. However, that won't necessarily give us the biggest differences in terms of the expected performance relative to our projection. So instead, we'll measure the percentage by which the projection missed, and go from there. Without further ado, here are the top ten booms and busts:

The biggest boom of the past 103 years turns out to be Javy Lopez in 2003. The reason this ranks at the top is that Lopez was entering his age-32 season, and having been a regular for nine seasons had never eclipsed a .317 average (1999), 34 home runs (1998), 28 doubles (1997), or 106 RBI (1998). As a result, his projection came in at an NOPS/PF of 95, actually below league average, and heavily weighted on his 2002 campaign, where he hit .233/.299/.372 in almost 400 plate appearances; in 2003 he wound up at 144, a 51 percent trouncing of the projection. In real terms his projected NOPS/PF of 95 would have translated into 51 runs contributed, but his actual performance was more like 88 runs, a difference of 37.

A few notes on the rest of the booms:

Norm Cash, in just his second full season, put up a 157 when projected to be around 108 based on a solid 1960 rookie season. The fact that he would never come close to matching that 1961 performance makes it fitting that this age-26 season feat would be seen as an outlier even at the outset of his career.

At the age of 21, Shoeless Joe Jackson hit .408 after not collecting more than 75 at-bats in any of his previous three seasons. As a result, his projection of 107 was heavily regressed to the mean, despite a .387/.446/.587 line in 1910 in 20 games played for the Indians. In real terms his NOPS/PF of 153 was good for 124 runs, when his projection called for a mere seven. To a lesser extent, the same thing can be said about the "Hit Man," Mike Easler, who in his age-29 season hit .338/.396/.583 in just over 400 plate appearances; he had never been given a real shot in six previous seasons spent with the Astros and Pirates.

In the only year of his career where he hit more than nine home runs, Lonnie Smith smacked 21 in 1989 and put up a .315/.415/.533 performance for the Braves. Although a fine player for the Phillies and Cardinals earlier in the decade, he became involved in the scandal of that era which diminished his performance, eventually leading to his release by the Royals in 1987. Teetering on the brink he was offered a minor league contract during spring training in 1988, and played well enough at Richmond to get some at-bats that season, and a look in 1989. His below league-average projection is largely based on those 1987 and 1988 seasons. The rest, as they say, is history, including his Comeback Player of the Year Award.

Mark McGwire came back from two injury-plagued seasons in 1995 at the age of 31 to beat the projection by 42 percent.

In tenth place, Ted Williams' 1954 season is an example of where regression to the mean doesn't serve us particularly well. He had the highest projection of any in the top ten (111), but that was based primarily on his 1951 season (not a great one by his standards) and then heavily regressed to the mean, as he had only 120 plate appearances combined in 1952 and 1953 due to the Korean War. Even coming into his age-35 season, Williams would have been a good bet to put up a number more like 130 or 140.

Other seasons of note that didn't make the top ten include Tito Francona in 1959, who hit .363 (an NOPS/PF of 142) but was projected to be league average (102), good enough to rank 11th overall; Barry Bonds' 2001 campaign, projected at a healthy 139 but actually reaching a ridiculous 191 (ranked 15th); Rico Carty in 1964, where reality beat the projection 137 to 100 (17th); and Tony Clark's 2005 season with the Diamondbacks (ranked 26th), where he put up a 132 in 393 plate appearances but was projected at 97 in 284.

On the busts side, George "Boomer" Scott burst onto the scene in 1966 as a 22-year-old rookie, playing in all 162 games and hitting 27 home runs. He bested that performance during the "impossible dream" season of 1967 (wonderfully told, I might add, by our own Jay Jaffe in It Ain't Over) by hitting .303/.373/.465. But the year of the pitcher would prove to be more than a little rough for Boomer, as he put up a .171/236/.237 line in 387 plate appearances, good for an NOPS/PF of just 73 against a projection of 119. That's 38.5 percent off the mark (or -38.5, as represented); in real terms, a difference of more than 30 runs in production. Boomer would rebound somewhat in 1969 through 1971, but he didn't return to his 1967 level until he was with the Brewers in 1973.

Other notes from busts:

One wouldn't think that light-hitting Doug Flynn would have much room to disappoint, but that he did in 1977. After putting up league-average numbers in limited playing time with Cincinnati in 1975 and 1976, the projection was for a league-average season in 1977, in limited playing time. After just a handful of plate appearances he was dealt in June of 1977 to the Mets as a part of the Tom Seaver deal, and he hit just .191 for them en route to a .197/.223/.232 season. His actual NOPS/PF of 63 was -37 percent off of the projected mark.

Homer Bush was a one-hit wonder with the Blue Jays in 1999 when at the age of 26 he hit .320/.353/.421. That, combined with his age and good marks in limited time in 1997 and 1998 translated to a projection of 103 for 2000. He couldn't recapture the magic however, and wound up at 65, which is fourth from the bottom on our list.

Sort of like Ted Williams in reverse, Mario Mendoza's expectations should never have been that high in 1979, but limited playing time regressed to the mean will do that to a guy.

Pat Listach won the Rookie of the Yeasr in 1992 at the age of 24, but he struggled the next two seasons in playing time limited by injuries. In 1995, the projection was to jump back to league average at 102, but he struggled mightily and finished at .219/.276/.254 and an NOPS/PF of 66.

If you're wondering where Super Joe Charboneau is on the list, he's not. After batting more than 500 times in 1980, he didn't get back to 300 plate appearances in either 1981 or 1982.

Although not in the bottom ten, Jimmy Wynn's 1971 season with the Astros is similar to Boomer's 1968. Although older at 29, he was coming off three very solid seasons, including a phenomenal 1969 where he walked 148 times and hit 33 home runs. His projection for 1971 was 133, but he ended up at 89, a real difference of 49 runs, and "good" for 12th from the bottom at -33 percent.

Most recently, Jason Giambi's injury-riddled 2004 season ranks 32nd, where a projected value of 135 turned into a 95, a real difference of an astounding 82 runs.

While measuring the difference from the projection in terms of percentage is certainly adequate, there is also another equally valid way to look at this. Given that we know the actual number of plate appearances and have calculated a projected number of plate appearances, we can--as hinted at above--use these facts to calculate the number of runs contributed given the NOPS/PF values for both reality and the projection. So, the following top and bottom ten are calculated on this basis:

The top of this table is dominated by players who had excellent rookie seasons after garnering scattered playing time in one or more previous seasons. The bottom of the list primarily includes established stars who were the victims of injuries, such as Williams fracturing his elbow in the 1950 All-Star Game, or the Bambino's famous intestinal problems in 1925, or the Babe's other more self-imposed problems in drawing three suspensions in 1922. As you can see, at the maximum a team might lose 60 to 70 runs, which can be translated to six or seven wins when a superstar doesn't perform as expected. On the other end of the spectrum, players who contributed much more than expected were typically counted on to produce prior to the season and so these gains are not wholly unexpected.

Moving Forward

One of the great things about baseball is its variation from season to season within a larger stable structure. Part of that variation is our fascination with projections and what they ultimately translate to in terms of wins and losses for our favorite teams. On that note, I'll leave you with one other quote from Blaise Pascal to ponder through this season of projections:

The reason people find it so hard to be happy is that they always see the past better than it was, the present worse than it is, and the future less resolved than it will be.