The Kemp Speed Theory: Do bigger players slow down earlier?

Will Matt Kemp’s size cause him to lose his effectiveness on the basepaths in 2010? (Icon/SMI)

A few weeks ago, Eriq Gardner noted how many fantasy analysts are already penciling Matt Kemp in as a top-five pick for 2010. Eriq played devil’s advocate a bit and discussed some of the reasons why Kemp might not be such a slam dunk. One such reason was his speed. To quote:

Speed: As mentioned above, Kemp is on a path toward surpassing 35 SB this season, an extraordinary achievement for a player who is 6-foot-3 and approximately 225 pounds. Players measuring those dimensions aren’t typically speed demons and when they do surpass 30 SB, as Alex Rodriguez did in 1998, it tends to be followed by a few years of more moderate steals production. In 2006, Baseball Prospectus writer Kevin Goldstein wrote this about the then-prospect outfielder: “At 230 pounds, Kemp’s plus speed could dissipate quickly.” Reportedly, Kemp showed up to spring training this year in excellent condition, and his success rate on the base-paths this year (81%) shows no cause for concern, yet we’ve likely seen the best from Kemp in the steals department.

This theory intrigued me, and I wanted to take a deeper look into it. Is this actually the case? And if it is, what’s the extent of it?

Age curves

To start, let’s look at an age curve for three groups of players: league average (all players), players 6-3 or taller, and players 5-10 or shorter. These groups will be known as “average,” “tall,” and “short,” respectively, from this point forward. The stat we’ll examine will be SB/SBO (steals divided by opportunities to steal), or the rate at which a player both attempts a steal and succeeds given that he reaches first base. We’ll use data from 1919 to 2008. To form the graph, we’ll look at year-to-year changes and display them as a percentage of Year 1 so that all three groups of players will start at the same place and will be easier to compare.

The main takeaway here is that “tall” and “average” players maintain the speed they had at age 21 longer than “short” players, who start trending downward at age 23. Tall players start that downward trend at age 24, but it’s much less pronounced as they’re able to keep at least 93 percent of their Year 1 speed all the way until age 28. Once those tall players start their decline, however, they face a steeper drop than the short players.

To illustrate this a bit better, here’s a chart showing raw year-to-year changes as opposed to the gradual aging approach we just took. We’ll also condense our age range to 24-37 to use ages with a little bit larger sample and to hone in a little bit more on what we’re looking at.

In this light, we see that short and average players behave very similarly. The short players show some wider swings, but that’s simply a sample size issue. The pattern is essentially the same. Tall players, however, follow a much different pattern, as we started to see in the initial age curve. Hopefully this graph makes it a little clearer. Each year from age 27 through 32, tall players unfailingly see a drop in their speed. Then there’s a bit of a resurgence at age 34 (almost certainly a sample-size issue—in all likelihood, there is probably a plateau for ages 33 to 35) and then some more decline.

To circle back on the short players for a moment, there is one noticeable difference between them and average players. At age 33, notice that their line begins to slope upward. This doesn’t mean that they gain speed, but rather they lose it at an increasingly smaller rate. In fact, from age 33 to 37, short players lose a total of just 6 percent of their speed. After that, of course, they decline.

Summing it all up

Essentially, short and average players see their skills decline at a pretty steady rate, short players easing up a bit from 33 to 37. They seem to lose roughly 5 percent of their speed per year until they reach 33. Tall players behave differently, seeing little overall change from 21 to 25, dropping a bit and leveling off until 27, then taking a nosedive until 33. They level off again from 33 to 35, then plummet until the end of their careers.

Application to Matt Kemp

It looks like Kemp’s big body won’t hinder his speed much, at least in 2010. (Icon/SMI)

Now let’s turn our attention back to Kemp. The poster boy for Eriq’s theory has fit our age curve pretty well up to this point in his career. He’s posted SB/SBO rates (including MLEs) of 18%, 14%, 22% and 20% at ages 21, 22, 23 and 24. Aside from that outlier at age 22 (which was accumulated during a somewhat small sample of 455 at-bats), he’s been pretty consistent, just as the age curve tells us (especially if we were to regress each of his rates to the mean).

So what can we deduce about Kemp (who turned 25 at the end of last month) going forward? Well, I think it’s relatively safe to say that his speed will stay in tact, for the most part, next year. Unless he puts on some weight, he should remain in that “initial plateau” area for tall players (lasting from age 21 through 25). After 2010, these age curves tell us to expect a small dip until age 27, then a precipitous fall off.

Overall, the Kemp Speed Theory seems to hold some real credence, it’s just that Kemp himself hasn’t reached the point where he’s likely to be affected.

Side-note on caveats and bias

You probably noticed that I didn’t use weight as a parameter, as Eriq’s theory suggested. While I think this would be an important variable, unfortunately the data we have available to us doesn’t allow it. You see, a database doesn’t seem to exist (at least publicly) that assigns a weight to a player for each individual season. Instead, we only get something like career-to-date or end-of-career weight data. This will create problems if we try to use it for age curves.

For example, when Barry Bonds was 25 years old and stealing 40 or 50 bases per year, he probably weighed around 150 pounds. At the end of his career, he weighed around 240 pounds. If we were to create a weight parameter in our age curve, Bonds would not be lumped in with the 6-2, 150-pound guys at the age when he actually was 6-2, 150 pounds. Instead, he would fall into the 6-2, 240-pound bucket at every age—even though that’s not who he was at age 24. This creates lots of problems and bias.

Using only height does introduce some problems, but not nearly as many, and it’s mostly just an offshoot of not having weight. For example, we have no idea which players are gaining weight and slowing as a result. If we’re predicting the future for a modern-day player, we’ll know that he’s maintained his weight, so ideally we’d want to eliminate guys who added weight from our study, but we simply aren’t able to do that. Instead, we’ll have “tall players who gain weight” and “tall players who maintain weight” all lumped together, despite the fact that “tall players who gain weight” will likely be skewing our results a bit. Overall, though, using just height is much sounder than including weight.

At some point I may run these age curves again, including a weight parameter, using data from just the past four years or so to eliminate some of the issues with weight, although that might just lead to a small-sample-size issue.

There’s also some selection bias inherent with age curves in general, and I’ve taken some precautions to avoid them, but some just can’t be completely eliminated, so I wanted to make note of it.

Finally, because we’re using stolen base opportunities as our denominator, our sample is much smaller than if we were using something like at-bats or plate appearances. I included 90 years worth of data to compensate, but the sample sizes are still less than ideal, especially for ages on the extremes. The general points should probably hold, though.

Concluding thoughts

I’m not yet ready to say that I’m drafting Kemp in the top five, but I’m not nearly as worried about his speed as I might have been a few weeks ago.

Comments

When you calculate the age curves, you mention some “sample size issues” between 32 and 34. This tells me that you’re using all players available for a given year to look at the curves. This introduces “selection bias” into your analysis because different types of players may be exiting/entering the league at different times, and this entrance/exit may have to do with the variable you’re looking at (size).

For more accurate age curves, you need to look at a sample of players who played every year in the same window of years (say between years 24 and 36). Otherwise you’re biasing your results when people leave the league. It would be interesting to see how this affects your results.

Jonathan,
Height can certainly change, but much less than weight, and very little over two years, I’d think.

As for adding a speed parameter, I remember trying this at one point but got a smaller sample size than I was comfortable with (after all, there aren’t that many 6’3 burners). I don’t remember exactly what I was requiring, though, so I’ll take a look again and play with how lax I’m being with the requirement.

Will,
I am using most players available in a given year, but not all. I am excluding players who play the final season of their career in Year 2 to eliminate some bias, although players whose first season is in Year 1 were included.

I’m not sure if I’m understanding you correctly, but wouldn’t only including players who play in the majors from age 24 to 36 be *creating* bias? Players who enter the league at 24 are probably a little different than players who enter at 27 or 28, and a player will necessarily be good (at least to some degree) to have a 12 year long career. Not to mention that this would probably lead to some small samples.

One would think that most teams have kept some sort of records of players weights over the years. Now that Bill James is an insider (and since THT has some connection with him), maybe he can use that access to get all the teams to donate their players’ weights to Retrosheet (or SABR, whatever), as far back as they got them.

Obviously, some teams would be anal and have them back to Aught-nine of the 20th Century and others will only have them for the past decade, but at least that could be made available, though perhaps HIPAA might interfere with that today, don’t know if weight is considered a personal bit of health data.

Lastly, I was wondering how hard it would be to split your data into eras and compare. Clearly, the fitness of players is dramatically different from today to 100 years ago.

Maybe you can compare pre and post WWII: I know that vets brought back ampethamine from the war, so perhaps some of them brought back fitness too (I’m thinking Ted Williams here as the prime example). Or even split post-WW II into two or more eras, choose your splits. I would also wonder how things have changed since the game changed in the 1993 timeframe, when, as Eric Walker calls it, the Silly-ball era started.

obsessivegiantscompulsive,
Here’s a link to a graph breaking things down by era. I looked at 6’3 players in the following eras, broken down by sample size proportion:
All years—100%
1946-1993—59%
1993-2008—33%
1871-1946—08%

So keep in mind that the earliest and latest eras have lower sample sizes.

All years and 1946-1993 were pretty similar. 1993+ was somewhat similar but with more gains in the 20s, and 1871-1946 was all over the place with some large gains in the mid-20s. Of course, some of this will be random fluctuation, especially for the latter group.

Say you have 2 guys. One has 20 steals each year between ages 20 and 30 and 10 steals each year between ages 30 and 40. The other guy has 20 steals each year between ages 20 and 30, but was forced out of the league because his speed declined dramatically and he would have had 0 steals between 30 and 40.

The actual age curve would should be 20 from ages 20 to 30 and 5 from ages 30 to 40, but if the curve is only calculated with players in the league, the curve is biased upwards in the 30-40 sample to 10 instead of where it should be at 5.

Put more simply, by including players with short careers, you are probably flattening the age curve. Your age curve doesn’t represent the decline in speed as a player ages, you’re showing the decline in speed CONDITIONAL on still playing. Those are two very different things.

Matt Kemp may have a very long career, and the age curve one year ahead probably has almost no bias given Kemp’s current age. This means that your analysis on him is probably right on. However, you can’t use your methods to project 10 years out because you’re not doing it quite right.

Ah, I see what you’re saying now Will. I should have some more time to think about this later on tonight or tomorrow, but quickly: do we care about those players who fall out of baseball? Any player we’re looking to project, at least for fantasy purposes, will still be playing next season. If our age curve limits us to just those playing, doesn’t that match the sample set we’re trying to project? Again, maybe I’m not thinking this through clearly enough. I’m kind of swamped at the moment and can’t devote the brain power to thinking about it at the moment.

Another quick note is that there is some definite bias inherent in age curves, which I mentioned at the end of the article. What you bring up is one such bias. Even if we did want to have an age curve that properly accounted for players whose skills deteriorated to the point of significantly lost playing time, the solution you suggest would just introduce another form of selection bias, no?

Yeah, I guess the age curve you need has everything to do with how you’re going to apply it. If the question is how do people age, then you should only use people who are in the league for every year of the wage curve. If we ask the same question, but feel certain that the guy we care about will be in baseball 10 years from now, then your implicit conditioning assumption is satisfied and it’s fine to use the conditional age curves you’ve created here.

We have now seen Kemp exhibit solid skills for 3 straight years, so I think he’s a relatively safe pick at this point even though he’s still pretty young.

To me, the first round for next year is interesting in that it appears there is a clear top choice at each infield position: Pujols, Hanley, A-Rod, Utley, and Mauer. However, I don’t think I personally could ever take a catcher in the first round. Taking Mauer out of the equation then, I’d be very content with a top 4 pick next year knowing I’d be getting one of those infield studs, thereby emphasizing positional scarcity (except at 1B with Pujols, but of course he’s worth it).

Anyway, right after those top 4, I think it’s an open debate between a number of players including Kemp, Braun, Crawford, Cabrera, Teixeira, and maybe even Lincecum (again though I would never take an SP in the first round). It’s worth pointing out that there may not be any other 2B, 3B, or SS worthy of the first round.

All in all, it seems the strange year we just witnessed among some players who were considered elite going into the year (Reyes, Hamilton, Sizemore and to a lesser extent Wright) is going to set up for a wide open back end of the first round in drafts next year.

For now, my top 5 is Pujols, Hanley, A-Rod, Utley, and then Braun just edging out Kemp.