Friday, February 28, 2003

Peak Projections

Prospects and long-term potential.

The purpose of the Peak Projection system is to project what a prospect will achieve in his prime - his best three consecutive seasons - given his production, age, level of competition, and the standard baseball player’s development curve.

In the 80’s, Bill James derived his Major League Equivalency formula with the purpose of translating the Double-A and Triple-A performances of Minor League players to the Major League level. These translations proved useful to show if an underappreciated Triple-A veteran would make for a quality Major League player.

However, these equivalencies have little stand-alone value when analyzing younger, less developed prospects. The Peak Projection system adds the age adjustment necessary to make the statistics a valuable tool in prospect analaysis.

The system takes the Major League equivalency concept and builds on it by projecting what the player is on target to achieve in his prime if he develops like the average Major League baseball player.

It is true that not every player develops the same. Magglio Ordonez is one of the best hitters in the game today yet put up unimpressive Minor League numbers. Similarly, Sammy Sosa has taken his game to the next level by evolving from an undisciplined hacker to one of the more patient sluggers in the game.

However, for the majority of players, the standard baseball player’s development curve is a fair representation of their growth.

To test the Peak Projection system and the assumptions it makes, I took Spring Training Magazine’s Top 100 prospect list and calculated the projections for all the hitters who accumulated at least 60 plate appearances in Low A or higher prior to 1994. It is important to note that while I normally adjust the minor league statistics for offensive context and league factors, I do not have that data for early-90’s statistics. Therefore, these are presumably less acurrate than the current projections.

The top prospect list was pared down to 39 batters who fit the criterion - ranging in quality from Carlos Delgado to Howard Battle. The average player was 21-years old. Of these 39 batters, Arquimedez Pozo, Brooks Kieschnick, Howard Battle, D.J. Boston, Chad Mottola, Michael Moore and Steve Gibralter never totaled 150 at bats for a three-year period (rendering their Major League statistics nearly meaningless) - lowering our sample to 32 batters. Out of that group, Pozo (.889 OPS) and Kieschnick (.854 OPS) are the only players to have projections north of D.J. Boston’s .753 OPS. It is safe to say that these two would not have met their projections. For reference’s sake, Pozo was born in the Dominican Republic and the current stringent requirements for proving once’s age upon entering the U.S. did not exist.

Now the actual Major League peak performances for the prospects who accumulated enough Major League playing time was calculated. Each player’s best three consecutive years were averaged. Since the statistics were not park adjusted, seasons in Coors Field were ignored. This came into play only for Todd Hollandsworth and Jeffrey Hammonds.

In comparison, Baseball Prospectus’s annual projections scored a correlation of .704 for OPS in 2001, according to Voros McCracken. Now, as with any projection, the bigger the sample the projection is based on, the more accurate the projection. Each projection is accompanied by a theoretical amount of plate appearances. Each stat line is weighted based on whether it was produced in the lower or upper minors (Single-A or Double-A and Triple-A) and the year it was achieved. A performance in Double-A in 1993 is a more reliable indicator of ability than an equal performance in Single-A in 1990. If these stat lines were created by the same player, the former would get more weight in the projection than the latter.

Therefore, the larger the amount of theoretical plate appearances the more likely the player is to achieve their projection. 18 players had 600 plate appearances - a “full season” of these theoretical PA’s - or more in 1994.

For an abbreviated look:

BA OBP SLG OPS
Corr. 0.76 0.84 0.86 0.86
R^2 0.58 0.71 0.74 0.75

In a less scientific test, Shawn Green was the only above average-great player in the group whose projection of .276 BA/ .329 OBP/ .364 SLG missed the boat completely. 19-year olds Derek Jeter and Johnny Damon, along with an unheralded 20-year old named Edgardo Alfonzo, hit for considerably more power in reality than their projection. However, Jeter and Alfonzo, much like today’s Joe Mauer, still projected to be above average Major Leaguers even without the power surges. Damon projected to be a typical light-hitting leadoff hitter. Also of note, after Alfonzo’s next Minor League season, his projected SLG rose considerably.

The tests had limited samples of 32 and 18 batters which does introduce a margin of error. As I continue to improve the system, I intend to continue to test it on bigger samples. Also, important to note that these players’ statistics were not apart of the sample used to derive the Peak Projection formulas.

Reader Comments and Retorts

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

Ah, I should have noted that all of these projections were based soley on Minor League data... So Pujols' peak projection was based on a few hundred plate appearances in Low A-ball a couple years back.

Whoops. I forgot to note that these Projections were based soley on Minor League data as they were meant for Minor leaguers. These are the Peak Projections SIN Major League statistics for players with Major League experience - Alfonso Soriano, Albert Pujols, Adam Dunn, etc.

I may eventually do all hitters under 25 or so in the majors and minors so you can compare Josh Phelps to Victor Martinez, etc.

Not saying you are wrong about Soriano and that might be a good idea...however, it is important to remmeber tha Peak Projections are the player's best three consecutive seasons, not their lone BEST season.

1) Not too important, but in the 1994 projections I believe the actual peak PA column is actual AB. For instance, Ramirez 1999-2001 peak had 1490 AB, not PA. And you defined peak by OPS? Or RAR, or what?
2) The OPS correlation of .65 for the 32 players seems leveraged on the two extremes on the high OPS side (Ramirez, Delgado) and the low OPS side (Goodwin, Alexander). The OPS correlation for the remaining 28 players is just below 0.36, which is just over half as much as with those extreme players.
3) The quoted age doesn't seem to be the "July 1" age. For instance, both Manny Ramirez and Shawn Green are listed as 21 years old, but Ramirez was born in the first half of 1972 and Green in the second half of 1972. Were you consistent with the age system? How did you define "age"?
4) You said that the peak projection system formulas were not derived from these 32 players statistics. What was the data set you used in creating your projections? How good are the projections for those players?
5) In your response to a comment regarding Soriano you stressed that these were projections of their 3-yr peak. However, there were many 21 and under (and some 18 year olds) in the 1994 peak projections who might now be, what, 28-30 years old? Woulndn't many of the 32 players not have completed their 3-yr peak?
6) What aging patterns did you use? How did you calculate *PA?

I am not crazy (to say the least) about "peak" season or "peak" multiple season statistics. What exactly does a "peak" season or multiple season OPS, for example, tell you? By defintiion, a player's peak OPS season is his luckiest season. Obvioulsy, it corresponds to his overall average OPS. The higher a player's average OPS, the higher his luckiest season will be. In fact, there might be a one-to-one correspondence, but I'm not sure. If there isn't, then it doesn't tell you a whole lot about that player (other than what his luckiest season looked like).

More importantly, there seems to be a beleif that a player's peak season or seasons tells you a lot about his peak ability - namely when that peak ability occurs. I don't think this is true. In fact, I know it isn't.

I ran some theoretical calculations on the computer to determine the following:

1) If a hypothetical player had a normal aging pattern, with a peak OPS ability at around 26-28, how often would his sample OPS peak at the various ages, by chance alone?

2) How much higher would the average sample peak OPS be than his "real" peak OPS, again by chance alone (a player's best year will always be much better than his ability, by definition)?

I used my own aging pattern chart, which is similar to Tango Tiger's. Here are the average RELATIVE OPS's for an average player at each age, from 22 to 34:

Remember that the above chart was generated by simulating the results of a hypothetical player over 13 seasons, with 500 PA per season. The hypothetical player had "true" stats at each age reflective of the first chart. I ran 1000 such player careers. As you can see from the above chart, only 42.5% of the time did the player's sample peak season occur at ages 26-28, even though we know that his peak ability occurred during that time. In fact, almost 19% of the time, the player's peak season occurred after age 29. Remember that any peak season other than at age 26 (which is the "true" peak season for our hypothetical player) was due to fluctuation alone.

Basically, what this tells is if we assumed that our player's peak sample season was also his peak ability, we would be wrong almost 85% of the time, and very wrong (we thought it wasn't at age 26-28) almost 58% of the time!!

Also, from the first chart, you can see that the average peak OPS of our player is .749 at age 26. What do you think the average peak sample OPS was? .825, 76 points higher than his true peak ability!
So when we look at a player's peak season, not only are we overwhelmingly likely to misjudge the age of his peak ability, we will also overshoot the value of his peak ability by 76 points!

What about if we look at the peak 3 years, as this study did? We should come closer to identifying his true peak ability age(s) and we shouldn't overshoot the value of his true ability by as much.

Here is the same chart as the last one, but this time each age represents the middle age of a 3-year peak:

Since 27 is the "correct" peak age (26-28), we are still "wrong" 82.6% of the time, and if we take 25, 26, and 27, as "close to" being right, we are "very wrong" almost 58% of time again!

Finally, the average 3-year peak OPS from the samples is .775, which is still 32 points too high (the average "true" 3-year peak OPS is .743).

The last point I want to make is that if we correlate a projection (a "peak" projection or otherwise), with a player's true peak year or years, I'm not sure, but I think we will automatically get an artificially high correlation coefficient, since by definition, a player's sample peak OPS is not randomly drawn (it is always going to be high) and is within a narrow range. So I don't think that you can fairly compare that kind of an "r" with the "r" that you get from regressing a projected OPS on a player's actual sample OPS from any given year (presumably the year you are projecting).

IOW, if you simply take a player's minor league MLE's, adjust for age by making a player look like he is age 27, and then "up that" value some more to make it look like a peak (lucky) year, I'm not sure that you wouldn't automatically get a very high r, if you correlate that number with the player's peak major league year or 3 running years, as was done in this study.

A lot of good commentary, which is why I wrote this for the great minds of Primer. :-)

I'll start with the easier ones and then hopefully tackle MGL tomorrow ;-)

" Question: if you're using MLE-based data to do peak projections, why don't you use major league data? It just wasn't in the database, you're projecting solely from minor league stats on principle, or some other reason? "

What happened was that I added a few Major Leaguers to the projection data so I could see how Peak Projections worked for a couple fast rising stars (like Pujols and Soriano) as well as guys like Brad Wilkerson and Eric Hinske.

"If you're not using major league data, then I'd echo the suggestion that players with significant MLB PT be at least grouped together if not tossed entirely. Pujols -- and, to pick a better example, Dunn -- might not quite fit in with the 1994 group in that both are still a long ways from his peak assuming a normal career progression (not a safe bet with either of these guys, and ignoring the age question for Pujols); they're probably closer to that group than to the bulk of the 2003 list, however."

Good point, I will consider it and possibly take out the Major Leaguers.

"1) Not too important, but in the 1994 projections I believe the actual peak PA column is actual AB. For instance, Ramirez 1999-2001 peak had 1490 AB, not PA. And you defined peak by OPS? Or RAR, or what? 2)"

Uuuuuugh. Yes, you are correct. Good catch... The *PA for the 1994 batch is correct though, which is the important part. I defined peak by OPS.

"The OPS correlation of .65 for the 32 players seems leveraged on the two extremes on the high OPS side (Ramirez, Delgado) and the low OPS side (Goodwin, Alexander). The OPS correlation for the remaining 28 players is just below 0.36"

Interesting, I wonder how this compares with annual projection systems...Is the middle the hardest to project?

"3) The quoted age doesn't seem to be the "July 1" age. For instance, both Manny Ramirez and Shawn Green are listed as 21 years old, but Ramirez was born in the first half of 1972 and Green in the second half of 1972. Were you consistent with the age system? How did you define "age"?"

No worries, I was consistent (Used the D.O.B. and then subtracted by the 6/30/YEAR)... I believe I rounded the ages which could explain the Green/Ramirez ages.

"4) You said that the peak projection system formulas were not derived from these 32 players statistics. What was the data set you used in creating your projections? How good are the projections for those players?"

Amazingly enough, this Peak Projection formula (version 1.1 after last year's version) was not made by computing the peak projections for a set of players and then tinkering with the formulas based on how they coorelated with the real results. Rather, I used a slightly altered aging chart of Tango's as well as my own Major League equivalency formula (I derived the level factors by comparing the performance of players from one level to another level in the same year.)

"5) In your response to a comment regarding Soriano you stressed that these were projections of their 3-yr peak. However, there were many 21 and under (and some 18 year olds) in the 1994 peak projections who might now be, what, 28-30 years old? Woulndn't many of the 32 players not have completed their 3-yr peak?"

Yes. Very true. And if anything, I think this hurt this particular sample as Todd Hollandsworth and Karim Garcia I feel will have better peak performances in another 1-3 years.

" 6) What aging patterns did you use? How did you calculate *PA? "

See above... *PA is the least statistically proven part of this system (IMO) though it's significance is obvious (IMO). Eyeballing the numbers I came up with a simple formula to adjust plate appearances to give you an idea of how much confidence you can have in the numbers.

Take Plate Appearances and multiply by: 1 if in the upper minors, .5 if in the lower minors, 1 if last year, 1/2 if the year before, 1/3 if the year before that...

Well, I don't know if there is enough data to really say one way or another...but, Myrtle Beach is a bigtime pitcher's park, which would depress hitter's numbers and Peak Projections arent' *park adjusted* (hopefully i can get park factors for Low A through Triple A for the future) so that could be a factor.