Making Changes

A New Look at Minor League EqA

As some of you may have noticed, there have been some changes in the Minor League EqA page.

Let's start with the simple. When you go there now, you'll get a short, simple, fast download, with what is essentially a page of links. The long list of every player in the minors? Not gone, but moved under its own link--so that only the people who really want it have to wait for it to download.

The main feature on the page is a list of all the leagues, along with their stats, sorted by offensive level. I'm always trying to remind people of the context of minor league statistics, and this is one more heavy-handed way to remind people that some leagues (near the top) favor the hitters, while others (near the bottom) favor the pitchers. Click on the league, and you'll get the information that was on the old minor league page: a top-10 list for each league, a breakdown of league statistics by position (approximated by games played at each position), and a list of all players in that league, sorted by team.

Completely new to the page are the links below the list of leagues--the Future DTs and PDTs. These are the projections that one would reasonably make about how players (DTs) and pitchers (PDTs) will perform at their peak in the majors. These numbers are based only on 2004, and as the link says it uses the player's performance, age, and level to generate the projection.

First of all, what do I mean by "their peak"? It does not mean their best season; I would expect most players to do better than what is shown for their best season. I am talking about the expected level of performance we would get from this player when he is between 27 and 31. This is a new projection scheme, developed by comparing players' entire minor and major league careers (not just individual seasons). The resulting routine is substantially different from the 'MjEqA' found on the league pages. That value is pretty simple: equivalent average in minor league, times difficulty adjustment for league, equals major league eqa. But the Future DTs calculate the future EqA by using:

Age. As most BP readers surely know, the age of a prospect can be as important as his actual performance. The more time a player has until he reaches 27, the more he is likely to improve. Moreover, the improvement is not linear; the average amount of improvement goes down as you get closer to age 27. A successful 20-year-old, on average, improves by more in one year than a successful 21-year-old does, and so on. There is an important caveat to that, however: while the youngest players, on average, improve by more than the older ones, the spread of their performance is also wider. The "peak" we are searching for is a combination of established ability plus expected improvement: the bigger the share of established ability (i.e., the closer to 27), the more certain the results.

Age compared to league. Players who are old for their league--a 25-year-old in A-ball, for instance--have a strong tendency to do worse than expected when and if they get promoted. There appears to be something--I am guessing experience, something knowledge-based rather than physical, what you might call "guile"--that gives them an advantage when playing against younger players, beyond what the normal improvements from being older would create. More to the point, the advantage disappears when they face more age-appropriate competition. Surprisingly, though, I haven't been able to document any effect from being unusually young for one's league; the average improvement for them tends to always be based simply on their age.

Differential changes in statistics. EqA essentially breaks "offense" down into four components: hitting for average, hitting for power, drawing walks, and stealing bases. What we call "normal aging" does not affect these components equally, and in fact most of what we call "normal aging" comes from improvement in power. The Future DTs treat the components separately; as a result, some players that we have gone a little too crazy over in the past based on higher minor league walk rates (ahem, Jackie Rexrode), don't project nearly so favorably under this system.

Size matters. This ties in with point #3, that the change in power is the single biggest driver in age-related improvements. Short players, under 6'0", rarely develop as much power as their taller teammates (the Giles brothers being a major, current exception), and so a height adjustment is built into the projection. There is another issue related to size that is not yet included, because I haven't yet studied it properly, but the logic goes like this: If improvement is largely driven by power, which in turn comes from the fairly common increase in strength ("filling out") of players in their 20s, then it stands to reason that players who, at age 20, have already filled out--players like Calvin Pickering, Jack Cust, Prince Fielder--are not going to see the same sort of gains as their skinnier cohorts. Like I said, I haven't built this factor in, since I first need to build a reliable database of player weights at different stages of their careers, assuming that "reliable database of player weights" isn't a complete oxymoron.

Strikeouts matter. Players whose translations suggest that they will strike out 130+ times tend not to develop well, although the data here can be misleading. If you look at players who had 650 plate appearances in the minors and majors who had a lot of strikeouts in the minors, you'll find that they actually did better than expected, which would make you think that strikeouts don't matter. The problem is that by selecting only those who have had 650-PA major league careers, you've created a selection bias--mainly, the ones who learned to cut down on their strikeouts and make it to the majors. Most strikeout-prone players don't. This is true even though, as in the majors, a strikeout isn't any more damaging than any other out. In a projection setting like the future DTs, it isn't about the value; strikeouts are a proxy for a skill set, the skill of "making contact," which is a very valuable skill for a successful major league career.

Extreme performances tend not to be repeated. While it might, in fact, be skill, it is more likely to be the combination of a good skill level plus some random chance in the player's favor. That is why, in addition to the regular adjustments for league and park effects, the future DTs incorporate a regression to the mean on the player's rate statistics. For some performances, like Jeremy Reed's 2003 singles rate, this creates a substantial tempering of enthusiasm.

The effect of all these changes is, I believe, a better set of completely objective projections than I've had before, although still not without its share of misses. I've supplied lists of the top 20 minor league players in runs above replacement for each season from 1996-2003 (I'm missing strikeout and height data on a lot of players in the database before 1996). For a set of 654 players with at least 650 PA in the minors and majors since 1970--yes, I'm introducing a selection bias, and no, this isn't every player, just every player in my database who meets these criteria--and only applying the regression to the mean adjustments to minor league data (or else I could regress completely to the mean and generate perfect results), the improvement between the test I would have used six months ago and today is remarkable (errors comparing minor league numbers per 650 PA with major league performance per 650 PA)

Of the 654 players, 272 of them, 42%, had an EqA difference of 10 points or less. A total of 495 of them, 76%, were within 20 points. (The pitching DTs are a completely different story, again using performance, level, and age in combination, but we'll cover that in a separate article.)

Note: Players with XX's in their names are players who have not had their biographical information entered into the database, or who have changed organizations and haven't been updated. The start of short-season play creates quite the hammer for data entry. Note that these players will all show an age of "23" because of a default entry in the program. Player positions and games played have been entered, but because we don't have defensive information yet, all players will show a fielding level of "0."