Season similarity scores

Wingo had an interesting career. An alumnus of Oglethorpe University, he appeared in 15 games for Philadelphia as a 21-year-old in 1919. He acquitted himself well, with an OPS+ of 115, but he would make his next appearances as an outfielder in 78 games for Detroit five years later. As a 27-year-old in 130 games, the most playing time he would ever see, he batted .370/.456/.527, good for an OPS+ of 150 and a 12th-place MVP finish.

Together, those three seasons make Wingo into Ichiro’s most similar age-27 player on the incomparable Baseball Reference. The problem is, the two players aren’t particularly similar:

In a season’s worth of at bats, Ichiro had 18 more hits, 13 fewer doubles, seven fewer triples, 64 fewer walks and 40 more stolen bases. The two players aren’t actually similar at all.

The alert reader probably saw this coming. Ichiro, after all, was an internationally famous superstar the first day he stepped onto a major league playing field. Thhat day occurred when he was 27, due to his playing in Japan for much of the previous decade. Paradoxically, any player truly comparable to Ichiro should have so much playing time by age 27 that his career numbers wouldn’t be comparable to Ichiro at all. It’s not really a coincidence that his most similar player at age 27 was an outfielder who had a career year in his first real opportunity for playing time at precisely the correct age.

And yet, the question remains. When was the last time America saw a player similar to the 2001 Ichiro? Has any player ever had a truly similar year?

Deconstructing Bill

Similarity scores were introduced by Bill James in The Politics of Glory, a book examining the Hall of Fame selection process. James sought to bring order to a common Hall of Fame argument: If Player A is similar to Player B, who is in the Hall of Fame, then Player A should also be elected. In a characteristically insightful approach, James realized that what was needed was a way to fairly compare a player to every other player, find the most similar players, and describe how similar they were. If you can say that Player A is similar to Players B, C, D and E, all of whom are in the Hall of Fame, you’re starting to make a very strong case for Player A’s election.

Aside from their original purpose, Similarity Scores give an element of vivid detail to baseball statistics. Whenever I want to learn about a player I’ve never heard of, the first thing I do is look at his list of most similar players. Finding someone I already know about makes the player I’m investigating come to life. The point of looking at Similarity Scores isn’t that the current system doesn’t work, it’s that the idea of Similarity Scores is such a good one that it’s worth improving as much as we can.

As employed on baseballreference.com, similarity scores are calculated by starting at 1,000 points and subtracting…
{exp:list_maker}One point for each difference of 20 games played.
One point for each difference of 75 at bats.
One point for each difference of 10 runs scored.
One point for each difference of 15 hits.
One point for each difference of 5 doubles.
One point for each difference of 4 triples.
One point for each difference of 2 home runs.
One point for each difference of 10 RBI.
One point for each difference of 25 walks.
One point for each difference of 150 strikeouts.
One point for each difference of 20 stolen bases.
One point for each difference of .001 in batting average.
One point for each difference of .002 in slugging percentage{/exp:list_maker}In addition, there’s a positional adjustment to account for players who spent their careers at different positions. In this essay, I will focus on batting similarity scores only.

So what happens if we use James’ system, but look at individual seasons instead of entire careers? That’s easy enough to program. Instead of starting at 1,000 and subtracting points, though, let’s calculate a “similarity distance” by starting at zero and adding points according to James’ system. If we do this, the 10 most similar seasons to Ichiro’s 2001 are:

Kind of an unexciting list, isn’t it? None of these seasons leap out and strike you as a great match for Ichiro. Looking through the list, we see that Ichiro stole 56 bases in 2001. In what’s supposedly the most comparable season in history, Sam Rice stole only 13, and drew 55 walks compared to Ichiro’s 30! Looking through these columns, we can see that there’s a lot of variation in every column except two: All of the top 10 seasons are near-perfect matches in batting average and slugging percentage.

The problem here is that Similarity Scores are designed to compare long careers to one another—the kind of careers that might make it into a discussion about the Hall of Fame. For a career like that, it might be reasonable to give the same number of points to a single point of batting average as you do to five doubles. But over one season, five doubles is a lot, and a single point of batting average is nothing. For finding similar seasons, James’ system is unbalanced toward batting average and slugging percentage. It almost always will find seasons which are perfect matches in these categories, with large variations in all other criteria. If we want to devise a similarity score that works well for a single season, we’ll have to do something new.

What’s the point?

If we want to devise a new system for similarity scores, we have to look at the idea of a point with a critical eye. In the last section, we saw that James’ point system becomes unbalanced if we dramatically alter the length of the periods we’re comparing. Presumably, the same kinds of problems would arise if we were trying to find comparable players to a player whose career was very short. Can we develop a system that works for any length of time?

Another aspect to consider is that the field of sabermatrics is a lot larger now than it was when James devised his original system. There are many more people using sabermetrics to answer many more questions. It would be nice to have a system that connects to the rest of what we know about sabermetrics. It’s a less obvious problem than having bad matches for single seasons, but I have to admit that, as long as I’ve enjoyed using them, I have no idea what a point means in a Similarity Score. Are Similarity Scores consistent with the rest of sabermetrics?

At the level of a single batting event, they don’t match up very well. Using Pete Palmer’s linear weights formula, the runs created by a particular batter can be estimated by

We can now take the ratio of the run value of a single (.47 runs) and a double (.78 runs) to find that a single is roughly 60 percent as valuable as a double. But in James’ Similarity Scores formula, a single counts for 1/15 of a point, while a double counts for four times as much (1/5 point as an extra double plus 1/15 point as an extra hit).

Let’s make a table of the relative weights for the different batting events as compared to a single in Linear Weights vs Similarity scores:

The agreement here is much better, but still not great. It appears that Similarity Scores match up with square of the linear weights run value of particular offensive events. I suspect a lot of the disagreement comes a desire on James’ part for a system that could be worked out easily by hand. In this age of ubiquitous computing, that’s no longer an important consideration.

Run distance

We would like to construct a new system of Similarity Scores that weights different offensive events in a way that is consistent with Linear Weights. Ideally, we would like this system to be easily adjustable to the different offensive contexts seen at different points in the history of baseball. Fortunately, nothing could be easier. To find the distance between two points, you simply take the square of the difference in each dimension, add them up, and take the square root.

The only catch is that we have to use the same units for distance in every dimension we use in the calculation. It doesn’t make any sense to add inches to seconds, even if inches is a perfectly reasonable distance in space and seconds is perfectly reasonable distance in time. Similarly, it doesn’t really make any sense to add singles in one dimension to doubles in another. We’d like to use some common system of units in which both a single and a double can be expressed in a meaningful way. This is exactly what Linear Weights does.

If we use Linear Weights to convert Ichiro’s 2001 season to the number of runs he contributed with singles, doubles, etc., we find that he produced…

14*.52=7.28 runs from being caught stealing
53*.26=13.78 runs from strikeouts
3*.72=2.16 runs from grounding into double plays
394*.26=102 runs from all other outs

It’s now easy to calculate a “run distance” using the distance formula given above. Because strikeouts, caught stealing, and grounding into double plays were not official statistics for all of baseball history, we’ll leave those categories out of the calculation. Although it would be easy to adjust for different levels of scoring in different years, at the moment we’ll just use the linear weights formula given above.

We can now search for the player seasons with the smallest distance from Ichiro, as measured in runs. The new top 10 seasons are:

These seasons match up much better with Ichiro’s 2001 than the earlier list. Ichiro himself even appears in a later incarnation. It’s a little distressing that all of these players drew more walks than Ichiro’s 2001, but that’s due more to Ichiro’s own unusualness than anything else—there just aren’t many 242-hit, 30-walk seasons to choose from. A related issue is that all of these comparable seasons are distinctly worse than Ichiro’s 2001; this is more a list of “poor man’s Ichiro” seasons than true equals to Ichiro’s 2001.

Conclusions

Finding similarities between different players is one of the most interesting aspects of sabermetrics, but it has been sorely neglected as an area of research. In this essay, I have tried to put the concept of player similarity on more solid ground by introducing the idea of “run distance” between two different statistical records.

One advantage of looking at player similarity in this new way is that problems which were previously very difficult to address now become simple. For instance, a common complaint about Similarity Scores is that a mediocre player in a high offense era can show a superficial similarity to a much better player in a low offense era. It’s not at all clear how this problem could be corrected using the traditional formula, but simply dividing the run value in each category by the number of runs per game scored in a particular park or league naturally produces a historically corrected Similarity Score. It would be similarly easy to construct a rate-based Similarity Score, where each category is divided by plate appearances, to account for seasons with differing amounts of playing time.

Improved Similarity Scores can help sharpen Hall of Fame debates by pointing out when a season is truly unique, or comparable to the greats of the past. Mostly, though, I hope that the improved Similarity Scores presented in this article will help the enjoyment of baseball statistics by pointing out the unexpected similarities and parallels in baseball history.