Analytics, Models and the NBA Draft Part 2

In our last article, we featured a number of different analytic models to project the NBA draft. While each model presented is independently strong, they all suffer from some level of noise. By looking at seven different models collectively in the form of a composite ranking, we are able to widen our perspective and see where the models find consensus. To make this ranking even more robust, we have created what is known as a truncated mean' (which is how figure skating judging works) by taking out the highest and lowest ranking for each player to better establish where the models agree on players and avoid outliers.

Once we have our truncated mean composite ranking, we then want to take one more step and add a subjective component to our model. This is necessary as statistic-based models can only tell us so much. Issues such as if a player has a good attitude, is likely to continue growing, has injury problems, is a defensive star etc. play a huge role in determining where a player should be drafted and are not adequately measured by statistics. As a proxy for subjective scouting, we have taken the DraftExpress ranking for each prospect and blended that without composite ranking (we are choosing to use a 75% analytic  25% mock blend).

Now that we have our ranking, it is important to understand why these rankings matter.Simply put, many NBA teams are bad at drafting. Most of this has to do with how difficult it is to project future NBA performance, but there are still plenty of stubborn general managers fantasizing about 14 foot wingspans, international wonders, and high scorers from big college programs. Analytic models help teams see through some of the aforementioned red herrings.

l selected an NBA draft (2009 draft) at random, and compared the actual order of the draft to a ranking of how advanced stats rates each player's career. I found a .28 correlation between the actual 2009 NBA first round draft order and career success (what statistics defines as being a weak correlation). For comparison, the 2009 DraftExpress mock draft was 11% better with a .33 correlation (moderate). I chose a random analytics model to compare it to and it had a .63 correlation (strong). A model-mock blend had a correlation of .66. In every back-test I have performed, analytic models have consistently outperformed NBA teams in projecting career performance.

It is worth noting that the rankings provided in this article are likely more advanced than what many NBA teams are using. While the statistics used inside NBA organizations are far more advanced than those in the public sphere, this is a unique situation as there is no proprietary college data that only NBA teams have access to. Further, only a handful of teams actually create their own analytic models, and for those that do, they will not have as many different perspectives as represented here.

When an NBA team makes their draft selection, they are not just selecting a player, but an asset. Positions are not created equally in the NBA. A serviceable center is much more difficult to find, and consequently much more valuable than a good guard. If a team thinks that they might be trading their draft choice in the near future, they might prefer a player with a better trade value than actual basketball value. Conversely, teams may prioritize potentially great players who lack traditional box score stats, as they are likely going to be cheaper to re-sign in the future.

Although these rankings are objective, they are not universal. Each organization has to determine what they are looking for in the NBA draft. While teams should never draft an inferior player due to a positional need, there are many team specific factors that should be accounted for in a draft pick. For example, each team has their own cultural and stylistic fit, which makes certain players more or less desirable.

Alex Rucker, stat guru for the Toronto Raptors, has stated that teams in the second round should be looking for players with niche skills, so they can have a better chance of impacting an NBA roster. Very few second rounders play in the NBA, and for those that do, most only do so after leaving the team that drafted them. If a player has a niche skill, such as spot up three point shooting, they will likely be able to contribute to an NBA team from day one.

Another team-specific issue is potential and how a team balances risk versus reward. The more seasons a player has played in the NCAA, the more accurate their forecasts will be. Each additional season provides more data, painting a more detailed picture of a player's ability, while removing a year of potential growth. This forces teams to question what point in time they want their NBA draft selection to have an impact on their team. Younger players and big men take longer to develop their game, which might be good for a rebuilding team. Older players can more easily make an impact in their rookie year, which is good for competitive teams. This does not mean younger players are any better, just that their projection is more hazy and distant.

One thing many people have commented on about our rankings is how seniors such as T.J. McConnell, Seth Tuttle and Wesley Saunders appear underrated in comparison to mock drafts. While most amateur scouts have dismissed these college stars, they are an undervalued asset. None of these players will likely ever be elite NBA players, in fact, they will likely will never be a whole lot better than they are right now. However, they are all likely good enough to play some role in the NBA as a rookie. As these players can be signed to a league minimum contract, they are a great bargain for competitive teams near the salary cap.

Despite the obvious benefits that statistics provide, they also lie. These models are based on an amalgam of college statistical data, and if a player's college setting deviates from the norm, it skews. The more atypical one's college experience is, the less clarity you should expect from their ranking. Playing at Kentucky and sharing the ball with five or more NBA caliber players can impact a player's rating. Same with playing in a unique system, playing incredibly weak competition, being the only serviceable player on a team etc.

NBA teams have proven to overvalue youth, handsize, vertical jump, speed and wingspan while undervaluing steals and assists in making their draft selections (http://www.tothemean.com/2014/07/06/predicting-future-market-inefficiencies-in-the-nba.html). Stats such as steals, rebounding and low foul rates are important for projecting NBA success, while scoring and high usage shooting percentage seem to be less important. These rankings place Delon Wright, Kevon Looney, Wesley Saunders, Seth Tuttle as the most underrated, and Jonathan Holmes, Willie Cauley-Stein, Devin Booker and Anthony Brown as some of the most overrated in comparison to the DraftExpress rankings. While these tools provide guidance, they are incomplete without further investigation. Do the models underrate Cauley-Stein because they fail to capture his defensive abilities? Or do scouts overvalue his defense due to his athletic ability or his pairing with Karl Towns, by far the best defender in college basketball?

To summarize, these rankings are not perfect and will have a lot of misses. However, they are likely the best public tool available to assess the NBA draft, and a great starting point in making an analysis. If a player fares poorly in this ranking, it does not mean that they will be bad, but that you should be asking more questions about their game. These rankings should be used in supplement to all of the other information available about these prospects, and as one feature, in an open debate.

I produce college ratings/rankings for all D1 players, adjusted for team, strength of schedule, pace, etc. Here's some past complete rankings. To project players, I first take the adjusted ratings, and break them down into every possible statistical subset. These adjusted ratings are then each multiplied by their respective college to NBA weight, garnered from the rating changes of every college to NBA player the last 20 seasons. I use the last two weighted college seasons (if not one and done), and meld a projection for the player's first NBA season. From there, I use weights from the last 35 NBA seasons at every age grouping to project every season in a player's potential future, as seen here.

Using 28 different rating breakdowns/combos for each player's projected future, I found the max possible playing time for each player based on the last four NBA seasons, with the lowest of the grouping being the projected minutes. For example, if the highest percentage team minutes for someone with a -13.24 foul rating or worse in the NBA during the last 4 years was 31.2%, then the best possible projected minutes for a guy with a projected -13.24 foul rating is 31.2%.

Players with obvious statistical flaws might see low projected minutes (for example, Towns this year because of his high foul rate or McDermott last year due to his of low steal+block rate) despite positive overall per minute projections.

The idea isn't that a guy with a bad statistical outlier won't play because of that stat per se, but what that stat may "say" about him as a prospect. High foul rate traditionally means poor future NBA minutes. This rating might mean that the player is a poor defender, has poor movement, poor instincts, is out of shape, etc. For Towns, it might just mean that he knew he only would only be playing 21 minutes per game due to his team situation.

I then create a Wins Above Replacement rating for each player, by combining the production (HN/48, 100 is NBA average, 80 is "replacement level" player) and projected minutes (%Min, 50% would have played exactly half of the available seasonal minutes). Players who never project above zero are ranked by projected peak minutes up through their "best" production season (almost always around age 25 or 26).

There is a high likelihood that if I have a guy ranked quite low, he probably shouldn't be drafted in the lottery, as historically speaking, he is a risk. Conversely, the majority of past NBA 2nd round "sleepers" (Boozer, Arenas, Blair, etc.) all had top-10 (or better) level projections in their draft years. I only include actual basketball production in my model. The idea is that all things considered, a highly productive player at one level has a much better chance of being a highly productive player at the next level.

My P-AWS draft model is built using box score statistics, age, competition level, and high school rankings. Of the outliers between P-AWS and the consensus rankings, Kevon Looney, a freshman who was highly recruited entering college, may be the one I would point any organization to re-evaluate. Looney scored above average compared to the rest of the draft prospects in every statistical category except for scoring, which just happens to be the least predictable measure going from college to the pros.

In a sense I would say that statistical analysis is the worst way to select draft picks, except all of the others. In reality though, best practices are to combine scouting information and analytic analysis. For example, character is a tremendous variable in a young player's development. Some of that shows up through the stat sheet, poor decision makers will have worse shot selection and more turnovers, and stats like rebounds and steals show hustle.

But, many aspects of ability and willingness to learn or put in gym time or deal with adversity are less easy to infer. Defense also tends to be obscured in draft models, players like Willie Cauley-Stein and Rondae Hollis Jefferson may be undervalued because their contributions on the court are less likely to end up in the box score.

It is well known that player stats are less stable when they change teams, with different teammates around them they get different opportunities and fill different roles. Taking situation into account can help adjust the numbers players put up, at least at the margins. However, it is dangerous to over imagine how a player would blossom in a different environment, and the grounding that an objective model brings is one of the more important contributions analytic draft models can bring to a team's drafting process.

In comparing out-of-sample retrodictions to actual draft order, EWP does about as well as NBA decision-makers, while my "HUMBLE" model (which integrates scouting consensus) actually does a bit better than either. This is good support for the use of draft models, but I would advise a scientific approach to prospect evaluation. For example, none of the models account for progression across the season that might benefit players with late-season surges such as Justise Winslow. Most models account for net strength-of-schedule, but they don't give special weight to individual performances against the stiffest competition. If they did, it might hurt D'Angello Russell's rating.

If Coach K's system historically depresses defensive rebounding for big men (it does) it should probably be taken into account when evaluating Okafor. Speaking of Okafor, he and Towns offer contrasting Old school vs New school styles. Identifying whether and how much this matters might be helped through an analytic approach, but it definitely demands the subjective counsel of scouts who understand the complexity of NBA game-planning.

CPR ratings are not a draft in this order list. Simplicity in a model greatly enhances interpretation, which is why CPR is simple. It measures how excellent a player's best performances were in regards to box score metrics. The only caveat is that CPR adjusts based on the player's year in school. Otherwise, there are no adjustments for the player's height, his rating out of high school, or the perceived quality of his competition. There aren't even weights on the statistics themselves. A great performance in rebounds is the same as a great performance in steals. All of this allows for an easy interpretation of the results. CPR simply measures excellence.

While inconsistent play means season average statistics blur the value of a player, focusing only on the player's best performances has provided a clear view of the player's potential. CPR ratings have accurately identified high picks that should have been drafted in the second round, mid-first round picks that should have gone in the top five, and second round picks that should have been drafted in the first.

At the very least, teams should use CPR as a call for more evidence in support of a team's selection. If a team plans to draft a player that rates particularly low (say below 2.5) in CPR over a player that rates particularly high (say above 7.5), that team should have a good explanation for why the prospect they are choosing never put up the kind of excellent box score production that is consistent with the prospects that turn into excellent NBA players.

This leads nicely to this year's best example of a player who rates higher by the eye test than CPR. Noticeably missing from CPR's top 14 is Willie Cauley-Stein. In an NBA where rim protection is far more important than post scoring ability, Cauley-Stein appears to be a gem. His low CPR rating does not preclude that possibility. CPR relies on box score data and much of what Cauley-Stein brings to the game is not recorded in the box score. The real value of CPR in the case of Cauley-Stein is that it raises an interesting question. If Cauley-Stein is going to be a successful pro in the mold of a Tyson Chandler or Joakim Noah, why didn't this junior put up spectacular numbers at least on occasion in offensive rebounds or blocked shots? There could be very good explanations for this that justify Cauley-Stein's value and his selection in the top five of the NBA draft. CPR doesn't say don't draft Cauley-Stein in the lottery. Instead, it calls for an explanation of why he's worth that pick when his numbers in the college box score were not consistent with the numbers that we have seen for players that have had a high success rate in the NBA.

Since my models include high school ranking while several other models do not, those players with strong high school rankings will rate significantly better, and those with weaker or more importantly, no rankings at all, will rate significantly worse. Myles Turner, Cliff Alexander, and Frank Kaminsky are all examples of this phenomenon.

In addition, my model covers data that only goes back to 2002, so the weights and importance of each feature will only be reflective of those players that have been drafted since 2002. Karl Towns, for example, may suffer from my model only being trained on data going back to 2002 and missing all-star big men such as Tim Duncan or Shaq. Further, stars have a high amount of leverage in my draft models (as they should), and the recent draft history of stars is somewhat guard heavy, (Chris Paul, Stephen Curry, James Harden, Dwayne Wade, etc.). As I do not model my players by position, my system may favor guards.

My model's final output heavily leans on neural networks, and thus overfitting is a fair criticism. Overfitting is a statistical phenomenon that occurs when models are greedy and essentially memorize the training data in the pursuit of accuracy, rather than discover actual trends. To alleviate some of these overfitting concerns, I also take input from more stable regression based models, as well as perform a technique known as bootstrap aggregating on some of my neural networks. Bootstrap aggregating, or bagging, is a technique this is designed to alleviate overfitting and increase predictive stability by sampling (with replacement) from the original data set and training several neural networks on several different samples, and then averaging the outputs together. Despite the flaws, I utilize neural networks in my draft model because neural networks can be powerfully accurate.

Jesse Fischer@jessefischer33Like most other models, the "Longevity" model favors athletic players who have shown a high steal rate in college (D'Angelo Russell, Justise Winslow, and Stanley Johnson). Two players where the "Longevity" model diverges from the other analytic models are Frank Kaminsky and Delon Wright. While most analytics standouts (i.e. D'Angelo Russell and Justise Winslow) are well agreed upon, Kaminsky and Wright are not. Other models rank them lower likely because they were not highly recruited out of high school, nor did they blossom until the very end of their college careers. In "Longevity," college stats from more recent years are weighted higher and recruiting rankings aren't factored in, both of which may explain differences when compared to other models.

Many of the models in the public domain are generated using linear techniques. On the other hand, the "Longevity" model is more complex and also uses some non-linear based modeling techniques. For example, height (which highly correlates with success in the NBA) doesn't necessarily scale linearly with NBA success. Models (and even non-analytics based rankings) have typically failed when outliers don't follow this linear trend, such as Isaiah Thomas (5'9" - 3.1 Max VORP) and Darko Milicic (7' - 1.1 Max VORP). By incorporating techniques that are not strictly linear, the "Longevity" model is less prone to overvaluing nonlinear relationships between attributes. Kevon Looney is a player who is ranked highly by most models (good pace adjusted stats and combine numbers), but "Longevity" ranks him slightly less. One explanation might be that his analytics edge is inflated by his strong 7'1" wingspan.

The "Longevity" model is built from a blend of 60+ different models. One of the advantages of the model is its complexity, however this does have drawbacks. One of which is that it is difficult to understand "why" the model favors certain players. That being said, one can gain insight by looking at the results of the smaller sub-models. Cameron Payne is one example of a player who is interesting to analyze because his rankings are inconsistent, but is highly ranked in "Longevity." In the sub-models, which don't account for age, he stands out even more. This is likely because he is an "old" sophomore which could also explain his lower ranking in other linearly based models. Another player who might benefit from this is Jahlil Okafor, who is "young" for his age and has exceptional size/length (7'5" wingspan). While other models rank him highly, in the "Longevity" model out of sample results, he would have the lowest "Longevity" expectation of all top 3 draft picks ever (just surpassing Raef LaFrentz and Adam Morrison).

In a draft class lauded for its guards, three exceptionally talented, and wildly different, forward prospects sit in the top six of our mock draft, each taking a very different path to the top, and demonstrating wildly contrasting strengths and weaknesses. So who is the best prospect among the three?