Speed and defense

Part I

In this two-part article, I am going to look at two things. Part I examines the relationship between speed, as estimated by my own rough version of Bill James’ speed score, and defense, as measured by Ultimate Zone Rating (UZR).

Part II looks at whether there is also a relationship between speed and defensive value (for outfielders only) in a small versus a large outfield. In other words, is it an advantage to have a fast player patrolling a large outfield (and a detriment to have a slow one), over and above his overall defensive value, and can you leverage an otherwise poor defender by putting him in a small outfield?

It is conventional wisdom that the answer to these two questions is “yes,” though I am not aware of any studies or research looking into them. I have thought those assumptions were not necessarily true, even though they appear to be intuitively obvious and have been trumpeted by sabermetric analysts.

I used data from 2003-2006 to calculate every player’s speed score—sort of a 2007 speed projection. First, I rated each player on a scale of 1 to 5, based upon four criteria:

I then averaged all of those ratings and came up with a single number which represents each player’s total speed score. Suffice it to say that if you saw the list you would agree with most, if not all, of the ratings. The only real problems arise with older players who, within those four years of data, lost much of their speed; e.g., Barry Bonds. Basically, the algorithm comes up with an average speed across all four years.

Now, we can move on to defense, as measured by UZR. If you are not familiar with UZR, here is a quick tutorial:

UZR is a complex defensive rating based on play-by-play data, expressed as runs saved or cost as compared to an average player at each defensive position. Depending on the position (for example, since the shortstop gets quite a few more chances than the first baseman, the range of values for SS is larger than for 1B), an average fielder is around -5 to +5 runs per season, a good or poor one +-5 to +-10 , a very good or very poor one, +-10 to +-15, and a great or terrible fielder, between +15 and +20 or between -15 and -20, respectively.

These are true talent levels, or what you would see in many seasons combined. In any one season, because of the fluctuations associated with limited sample sizes, you will sometimes see values of plus or minus 20 or 30 something. That does not usually represent a fielder’s true talent level or what you could expect in multiple seasons or in the future.

The UZR data I used were two-fold: One, 2007 UZR for all fielders. Two, a 2007 UZR projection for all fielders, based on their 2003-2006 numbers. The algorithm for the projection is basically a weighted average of the four prior years, regressed toward some mean. The projection is also age adjusted. I did not use speed in any way (namely, for the regression) to come up with these projections, even though technically I should have. That way, when I look at the relationship between speed and UZR projection, I won’t be “cheating.”

Now the interesting part. For each defensive position, I broke up all players into two groups—fast and slow. The slow group contained players with speed scores of 1 or 2, and the fast players were those with speed scores of either 4 or 5. I then computed the average UZR per 150 defensive games for each group, slow and fast, for each defensive position. The first table is for the actual 2007 UZR numbers and the second is for the 2007 projections, based on the 03-06 UZR data, with a minimum of 100 total games played.

As you can see, the relationship between speed and defensive talent, as measured by UZR, is quite strong, at least if we ignore players with average speed ratings, which I did. The bold numbers (last entry in the column) in column four in each table represent the average advantage for all fielders. The bold numbers in the parentheses are the speed advantage for the infield and outfield, respectively. As you would expect, speed is more important in the OF than in the IF.

The relationship is most pronounced for center field and left. I am not sure why it is not as strong for right field, as fielding in right is essentially the same as in LF. First base is a little surprising too, as I would not have thought that speed would be much of an asset. Of course, the faster players tend to be the more agile ones as well, and agility is important at first base. Speed does not seem to be much of a factor at third base, which is not too surprising, as quick hands, good reflexes and good size (which correlates with lack of speed) are generally more important than pure speed at the hot corner.

I was also a little surprised at the magnitude of the relationship at SS. I expected it to be larger. I suppose that so many other skills are important at short (as opposed to, say, the outfield), such as a strong arm, agility and positioning, that speed becomes relatively less important to overall good defense.

As you can also see from the above numbers, most first baseman are slow, most 2B are fast, a small majority of 3B are slow, most SS are fast, most LF are fast, almost all CF are fast, and RF is split about evenly, with a small edge to fast players. That is probably because you need strong arms in RF, so you cannot pay as much attention to speed as you would in LF. And remember that UZR is measured in relation to the average player at each position. So a RF with a UZR of zero could be (and probably is) a poorer fielder than an average left fielder.

While the numbers in the chart and a little bit of logic and common sense suggest a linear relationship between speed and defensive ability at all positions other than third base, that is not necessarily so. There could be a non-linear relationship or there could be a weak relationship with a few outliers making it look like there is a strong relationship.

To test how much speed can predict UZR at each defensive position, I ran a linear regression between speed score (1-5) and UZR per 150, for players with a minimum of 100 games over the last four years, and looked at the resultant correlation coefficient, or “r”.

Remember that the magnitude of “r” depends not only on the quality of the relationship but on the sample sizes as well (keep in mind that the UZR projections are subject to sample error and even the data that go onto calculating the speed scores are subject to sample error, as well as the fact that we have a limited number of observations). Values in the .2 to .3 range are pretty strong considering the sample sizes of the data used. The .464 is quite strong.

The numbers above pretty much conform to the numbers in the previous tables with one major difference. While the original numbers (speed advantage) suggested that speed was somewhat of a determinant of defensive value or talent at SS, the above correlation (-.004 or essentially zero) suggests that there is no relationship between speed and defensive talent at SS. Since there is a wide confidence interval surrounding the above correlation coefficients (in other words, they are not that reliable), I would submit that there is probably some relationship between speed and defensive talent at SS, although perhaps not as much as we would have thought.

There is one other interesting thing about the numbers in the previous charts. It appears that at third base, right field and especially shortstop, the slow and fast players are below average fielders. This would imply that the average-speed players are above-average fielders.

I am going to go out on a limb and posit a reason for this. I think that some otherwise fast (and probably agile) players, maybe utility players, are put into defensive positions in which they are not capable of performing well. Slow players, we know, are not likely to be proficient at defense, especially in the OF. However, the average-speed players probably need to be extremely proficient in their defensive skills to play “speed” positions (like the OF and second base). That may be why the average-speed players are better than the faster ones at certain positions. Or it may just be a fluke.

Part II

To investigate the relationship between an outfielder’s speed and his defensive value in various sized ballparks, first I computed the approximate size (in fair territory) in square feet for each major league park in its current (2007) configuration. For that, I used the to-scale (hopefully) diagrams of each ballpark from this website (http://www.andrewclem.com/Baseball/) and a computer screen tracing program (Iconico Screen Tracing Paper).

Fair Territory area in square feet of each MLB ballpark (in thousands)

(As you can see, the size of a park by no means determines whether it is a pitcher’s or hitter’s park. Other factors, such as fence height and configuration, size of foul territory, altitude, ambient temperature and wind patterns are also determining factors.)

Using the above numbers, I classified any park which was 106,000 or less in size as small, and any park 116,000 or more as large. I excluded Boston and Colorado from either group for obvious reasons.

Obviously, all parks have the same size infield, so the relative size of a park represents the different-sized outfields. As well, some parks are not symmetrical such that for any given size, left, right or center field could be small, medium or large. I could have split each field into three equal-sized sections and treated each outfield position separately, but I didn’t (that would be a lot of tracing and protracting).

Once I established the two groups of parks, small, and large again, excluding Coors Field and Fenway Park, I computed every outfielder’s 2007 UZR in seven separate categories. Keep in mind that all UZRs are park-adjusted.

1. The small parks only.
2. All other parks (besides the small ones).
3. Large parks only.
4. All other parks (besides the large ones).
5. All parks other than the small ones, large ones, Coors, and Fenway (all medium-sized parks).
6. All parks.
7. All parks, a 2007 UZR projection, based on 03-06 data, as described in Part I of this article.

As described in Part I, for each player in the UZR database, I used a (Bill James-type) speed score, from 1 to 5, with 1 being the slowest. From those scores, each player was given one of two ratings—either a 1 (slow), for players whose speed score was 1, 2, or 3 (where 3 is actually an average-speed player overall), or 2 (fast), for players whose speed score was either 4 or 5. This divided all outfielders into two roughly equal-sized groups, since outfielders tend to be faster than infielders, on the average.

For 2007, all slow outfielders had an average UZR per 150 of -2 runs. The fast ones averaged +1.7 runs per 150. Even though these numbers appear intuitively reasonable, the true difference between fast and slow outfielders is much greater. The reason that these numbers are deceptive is that I did not break them down by position. Many more of the fast players play CF and even though the average cnter fielder is a much better defender than the average corner, their respective UZRs are each exactly zero. If that went over your head, don’t worry about it. Suffice it to say that, as I found in Part I, the average fast outfielder (speed score of 4 or 5) is a much better defender than the average slow outfielder (speed score of 1, 2, or 3), as you would expect.

Let’s now look at the average UZR of all outfielders in each of the seven categories above.

I don’t know why the average OF UZR in small and large parks is so high. I can venture a few guesses, though. Keep in mind that the way UZR is figured, it should not matter how large or small the park is. A catch or non-catch in a certain section of the field, defined by a 4.091 degree slice of fair territory, and x amount of feet from home plate (which is how the UZR sections are defined) should be the same in each park.

Here is my speculation as to why the UZRs are so disparate in the various-sized parks:

One, scorer bias. Even though the scorers are supposed to record each batted ball as caught or landing in a certain part of the field defined as above (slice and distance), they may be “fooled” by the size of the park. For example, in a small park, a fly ball caught near the wall may be recorded as 370 feet when, in fact, it is only 365 feet from home plate, making it appear as if it were a “better” catch than it actually was. Of course, that would suggest that outfielders in large parks should be underrated.

However, in large parks, maybe an overriding factor is that players can run full speed to areas in which they would have to slow down (because they are near a wall) in a smaller park. Of course that would suggest that players would be underrated in small parks.

Another reason for the disparity may be that pitchers and batters have different approaches in small and large parks, which creates a different distribution of balls in play than in the average park. Since UZR assumes a static distribution of batted balls in all parks, that could wreak havoc on the numbers in small and large parks. In what direction and of what magnitude, I have no idea.

Since we are dealing with only one year’s worth of UZR data, it could also be a fluke (random fluctuation). I have not calculated the confidence intervals for (and the difference between) the various numbers above. It could also be that for whatever reasons, players in the small and large parks were better fielders, at least in 2007. Any other explanations from the peanut gallery are more than welcome.

Anyway, let’s move on to how the various-sized outfields affect the UZRs of fast versus slow fielders. After all, that was the thesis of Part II. Here is the table above, with the UZRs split between slow and fast fielders:

There seems to be a difference between fast and slow outfielder of between five and seven runs per 150 games in all categories of parks (categories 1-5), while overall there is only around a four- run difference (category 6 and 7). In addition, there does not appear to be a clear-cut advantage for fast or slow players in either small or large parks. Fast players have a six- run advantage in small parks and a 6.9 run advantage in large parks, not much of a difference considering the sample sizes.

One problem with these numbers is that we are not comparing apples to apples. The players in the small parks are not necessarily the players in the large parks, and even if they were, they are certainly not weighted the same. For example, what if the best (UZR-wise) fast players’ home field were a large (or small) park? It would (falsely) appear as if fast players had an advantage in either small or large parks.

The correct way to do the comparison is with “matched sets” of players, weighting each set by the lesser of the two sample sizes (in this case, number of games played). Here is what I did:

I compared each fast player who played both in a large park and all other parks (categories 1 and 2 above) and took the difference in their UZRs in each park and weighted that difference by the lesser of the two games played. For example, if player A played 10 games in a large park with a UZR of +3 per 150, and five games in any other park with a UZR of zero, that data point would be +3 minus zero (the difference in the UZRs) times five (the lesser of the two games played), or +15. I did the same thing for all players, added up all the data points and divided by the total number of lesser games played. This gives us a weighted average of the difference between large and all other parks for all fast players.

I then did the same thing for slow players. Then I repeated the process for small parks and all other parks (besides the small ones). Here are the results of that little exercise:

Average difference in UZR per 150 between a player in a small park and all other parks

Slow players: +5.0
Fast players: +.9

Average difference in UZR per 150 between a player in a large park and all other parks

Slow players: +3.3
Fast players: +7.6

As you can see, fast players do indeed have a much bigger advantage in large parks and slower players have an advantage in small parks, just as conventional wisdom would suggest. In a small park, a fast player basically achieves the same UZR score (1 run better) as when he plays in any other park. For a slow player, his UZR score improves by five runs, or four runs more than the fast player.

In a large park, while both types of players “improve” over all other parks, the slow player improves only by a little more than three runs, while the fast player improves by almost eight runs, or four runs more than the slow player.

Now, because of sample size issues and the potential problems described above (possible scorer bias in large and small parks, etc.), it is not an ironclad conclusion that a team with a large outfield can benefit from fast outfielders and that a team with a small outfield can “hide” poorer (slower) defenders, but the data seem to suggest that that is the case.