Thursday, August 14, 2008

The Arbitrarian column is written weekly by David Sparks. You can read more of his work at his own blog. This week's head asploding column is on player positions versus styles, and what they're all about. Note to feed subscribers: The graphics in today's post won't come across in the feed, so you might want to click through to the original at HP.

What does it mean to be a Point Guard? Typically, point guards are expected to carry the ball up the court, set up the offense, make passes, and take few shots, at least relative to other players on the court. But how much can the term "point guard" actually mean if it applies to both Jason Kidd and Baron Davis? Further, what does it mean to be a "small forward" if Dominique Wilkins, LeBron James and Shane Battier all fall into that category? What do Vlade Divac and Amare Stoudemire have in common, aside from both being called "centers"?

The obvious point is that traditional position classifications, while they mean something, still convey relatively little information about a player's function on the court. As observers of the game, we attempt to compensate for this by adding any number of modifiers to these position descriptions: combo guard, pure point guard, defensive center, swingman, etc. Each of these is used to more accurately specify a player's style or role on the team, yet each is still somewhat definitionally ambiguous and subjective by design. One Tom Ziller has done some work in attempting to statistically classify guards on a continuum between "small two-guards" and "pure points," but this is only a small first step in the right direction. I present here a generalized methodology for structuring a playing style spectrum, and identifying each player's position within the continuum. By looking at actual statistics produced, we may eschew fuzzy descriptors of position and style in favor of a very specific, yet still highly flexible system of style identification--which provides us with an improved vocabulary with which to describe, among many other things, player types and team styles.

Very rudimentary factor and cluster analysis I performed a long time ago indicated that there are distinctions in the data between players who tend to try to score a lot, those who play a “smaller” game, and those who play like “big men.” In terms of the NBA’s tracked counting statistics, this translates to a differentiation between those who specialize in points and field goal attempts, rebounds and blocks, and steals and assists. I have chosen to call each of these three tendencies Scorer, Perimeter, and Interior, and collectively they form the SPI Style Trichotomy.

Calculation

To identify each player’s style is conceptually simple, but computationally somewhat more complex. Essentially, one sums each player’s fga + tr + bk + as + st, and determines what percentage of the total each SPI factor constitutes:

Scorer percentage = fga / (fga + tr + bk + as + st)

Perimeter percentage = (as + st) / (fga + tr + bk + as + st)

Interior percentage = (tr + bk) / (fga + tr + bk + as + st)

These numbers are interesting on their own, but for the calculation of an index of style, they require further manipulation. In the league as a whole, the Scorer percentage is around 50%, the Perimeter percentage around 20%, and Interior 30%. Thus, if using these percentages, the vast majority of players would appear to be very scoring-centered. My concern here, in constructing a useful index, is to identify player propensities relative to other players, and for that, I calculate the percentile of each player’s percentages.

Scorer index = percentile(Scorer percentage)

Perimeter index = percentile(Perimeter percentage)

Interior index = percentile(Interior percentage)

Thus, even though the maximum Scorer percentage in a season might be close to 75% while the maximum Perimeter percentage is closer to 25%, the players with the highest percentages in the sample under consideration will be assigned an index value of 1. Players with median values on a percentage will have an index value of 0.5, and so on. The percentilization normalizes across style tendencies and player subpopulations, and has the added virtue of scaling from 0 to 1.

Interpretation

Thus we have a set of three numbers for each player which can be used to characterize his playing style. The numbers easily translate to more qualitative descriptions. A player with a SPI triple of (0.8, 0.2, 0.7) is an interior scorer, without much perimeter production. A player with this triple (0.1, 0.7, 0.75) is anything but a scorer, sometimes called a “glue” guy. Someone at (0.5, 0.5, 0.5) produces the league median of each type, which is different from a player whose percentages are 33%, 33% and 33%. Such a player would have a relatively lower Scoring index, for example.

Since each individual is characterized by three variables, their SPI type can be plotted in three dimensions. Unfortunately, three dimensions are difficult to convey on a computer screen, so here is a plot which depicts Perimeter indices along the X-axis, Interior indices on the vertical axis, and Scoring indices as the size of the point.

(Click to enlarge)

Historical application note: Since steals and blocks have not been kept for the entirety of the history of professional basketball, players from earlier eras may have slightly skewed SPI values. While percentages and indices can still be calculated based only on fga, tr, and as, it is not difficult to see that leaving out blocks and steals, in comparison to eras in which those defensive statistics are included, will tend to skew players from an earlier era more toward the Scoring type. Unfortunately, without substantial era-specific correction, this effect is unavoidable. However, the sorting still manages to work well, especially if this detail is kept in mind when making certain cross-temporal comparisons.

Presentation

One of the advantages of using three sub-indices to construct the overall SPI Trichotomy is the convenient translation of index values to color. The three primary colors of light are Red, Green and Blue, and when combined in certain proportions, it is possible to generate infinite gradations of color (see Wikipedia). This means that each SPI triplet for each player can be represented as a single color. This aids understanding and comparison, as it is much easier to keep in mind that a certain player is a deep red than that his SPI triplet is (0.9, 0.1, 0.2), or that a player is a medium grey than that his triplet is (0.45, 0.53, 0.55). Further, a greenish-blue player is easily identified with another greenish-blue player, without having to specifically compare each of the players’ three index values. The human eye is capable of extremely high-resolution discernment, and using a single color to represent three numerical values takes advantage of this.

Here is the above plot, with color added according to RGB values derived from each player’s SPI indices, as you can see, “blueness” increases from bottom to top, “greenness” from left to right, and “redness” varies with the size of the point. The top-right corner is aqua or cyan, while the bottom left is mostly reddish, due to an absence of green and blue.

(Click to enlarge)

Unfortunately, this presentational format leaves a lot to be desired. Since each player can be represented by just one color, can we do better than a pseudo-3-dimensional plot? The answer is yes and no: No, because to ensure that the hue, saturation, and value of each color are captured, we still require three variables (see Wikipedia); yes, because most of what we are interested in here is hue–the underlying color for each player, red, yellow, green, aquamarine, vivid tangerine, indigo, etc. The other two components of HSV color space, saturation and value, allow us to see how “pure” the hue is, which in our basketball application, translates to how “pure” an individual’s playing style is.

Playing style as a continuous spectrum

Using polar coordinates, we can plot each player's position in a continuous spectrum of playing styles. Each individual may be represented as a vector, with Hue translating to direction/angle and Saturation+Value translating to magnitude/distance. The angle of the vector indicates the player's style, and the magnitude of the vector indicates the "fit" of that player to that style--that is, since it is unlikely any given player's statistical profile will assign him perfectly to a given category, there is a level of fitness that captures the extent to which they do. Very rarely will a player have some assists and steals, but no blocks, rebounds or field goal attempts, which would give them a P index of 1, but S and I indices of 0. Because of this, rarely will any player be a pure green, or pure blue or red. The degree to which they are a mixture of styles/colors is captured somewhat by their fit.

We can describe a player's style by their SPI indices, or by their color, but we can also describe them according to their angle, which is most easily communicated by referring to positions on a clock. In the graphic below, the top of the circle can be thought of as 12 o'clock, the far right translates to 3:00, the bottom is 6 o'clock, etc. This is yet another way to describe style more easily than by referring to the player's SPI triple, but more accurately and consistently than by descibing color. Finally, I have assigned arbitrary descriptive names to each of six major "spokes" on the diagram, which should help the uninitiated translate commonly-used adjectives into positions on the clock. Here is a listing of SPI indices, fit, clock positions, and shorthand labels for each player in the 07-08 season, as well as 500 all-time greats.

Graphical Display

Below is a graphical depiction of the SPI Playing Style Spectrum, with the positions of 250 of the NBA's all-time best.

As you can see, the SPI typology encompasses Mr. Ziller's point guard continuum, and much more. "Small two-guards" (exemplified by Barbosa, Ellis, Terry and Iverson) line up at about 1 o'clock; "Combo guards" mostly fall between 11:30 and 12:30; "Pass-first points" even more to the left; "Pure point guards" are seen at about 11 o'clock. The spectrum continues, however, to more defensive/bigger guards, more well-rounded perimeter players, point-forwards, glue guys, defensive stoppers, big men, widebodies, power forwards, pure scorers, and back to shooting guards.

One interesting use of the spectrum graphic is to make comparisons. Unsurprisingly, Kevin Johnson and Steve Nash have similar styles; Kobe Bryant and Michael Jordan are in close proximity; and Tim Duncan and David Robinson filled almost exactly the same role for the same team. It's also interesting to make comparisons across eras: Dennis Rodman/Bill Russell, Vince Carter/Rick Barry, Michael Jordan/Jerry West, Magic Johnson/Jason Kidd, etc. It's also possible to identify stylistic opposites: Chris Paul-David West, Shaquille O'Neal-Kobe Bryant, Allen Iverson-Marcus Camby, etc.

Here is a SPI plot for just the 2007-08 Season (note that player names are represented in abbreviated form):

Thus far, the SPI typology is useful mostly as a classification system, but if you're interested, I've spent some time looking into the relative value of certain types, as well as their interactions. There's much more to be done in this vein, but some of the initial findings have been interesting. (APBRmetrics discussion)

Conclusions

Evidently, it's possible to develop a comprehensive classification system of playing styles using statistics alone. Now that the SPI color scheme has been introduced, you might find it interesting to refer back to the graphics I presented last week, in which I've applied the scheme. It adds a dimension of information to the season and team history graphics. I'd be very interested in hearing your thoughts in the comments, as well as in the obligatory survey below.