Understanding Trends in the NBA: How NNMF Works

Over the past decade a true emphasis on three point shooting has emerged. This can be credited to the understanding of effective field goal percentage being a better underlying parameter for points per field goal attempt than raw field goal percentage. And it’s become an emphasis for some top-tier teams such as the Houston Rockets. Need proof? Simply compare their recent shot charts to any other team in the league:

Distribution of FGA the 2017-18 Houston Rockets.

Distribution of FGA for the 2017-18 Boston Celtics.

Above, we compare the Rockets and the Celtics, both teams that ended their seasons in Game 7 of their respective conference finals. The Rockets only posted a .460 FG% for the season (15th in the league, 11th among playoff teams) as the Celtics were a playoff-worst .450% for the season (22nd in the league, 16th among playoff teams). While the Celtics made their mark defensively by holding opponents to .440 FG%, and more importantly a .495% eFG% (2nd best in the league), the Rockets aired out their offense to a tune of .551 eFG% (2nd best in the league). Houston accomplished this by taking 42.3 3PA per game compared to their 41.9 2PA per game, the highest ratio of 3PA attempts in the history of the league 50.2%!

That said, the Rockets’ 2PA are primarily from high percentage ranges of 60% success, as opposed to mid-range locations that tend to hover closer to 40% success. And it’s this reason that their eFG% stays relatively stable at .551.

Despite this, how do we compare these shooting distributions? A naive manner would be to “bin” the shots in categories such as “paint”, “mid-range”, “rim”, and “three” and compare the distributions. The reason for its naiveté is due to the game not being symmetric. An offense that overloads to the right will have a distribution that tends to be right-dominant. We saw this very clearly in the league last year.

In fact, taking a look at the Rockets chart above shows a bit of right-dominance. So why would we just lump everything as a “non-paint 2”? We lose critical information on teams and, at an individual level, their players.

To this effect, we compare shooting distributions by understanding the underlying structure. The structure of these distributions will help us identify different types, or styles, of shooting for a team. And the most common way to identify structure is to look for an underlying basis or motif representation.

Singular Value Decomposition

The most common way to break down structure of a distribution is to perform the well-known dimensionality reduction technique called Singular Value Decomposition. This process takes a matrix of values and assumes that the values can be reduced into three parts: A rotation for the input space, a stretching of the input space, and a rotation to the output space. Pictorially, if we take a matrix of values A, we can write them as a matrix product USV, where V is the rotation of the input space, S is a diagonal stretching along each direction, and U is the rotation to the output space:

The SVD process. Deal with my terrible artistry.

Application to Shot Charts

Through this process, we can start to visualize how to capture structure in NBA shooting distributions. To start, we take the two-dimensional shot chart and bins the shot frequencies into pre-specified bin location, such as performed by Andrew Miller in his 2014 paper. In this case, it’s common to set the matrix, P, up as a 25×24 matrix as the half-court is given 2′ by 2′ bins. Therefore, the maximum rank possible for this matrix is 24.

Note that we used the matrix P and not the matrix A, which is our data matrix. This is because, we have not constructed our data matrix the way we’d like just yet. Recall that our goal is to identify the trends of teams and, more importantly, players. Therefore, we can breakdown the data matrix as an Nx600 matrix by unrolling the shot chart for each player. In this case, the input space is the player and output space is the player’s preferred locations to shoot.

The matrix V then identifies the orthogonal basis to compare shooters. The matrix S identifies the differences of shooters relative to their spatial locations. And finally the matrix U rotates these spatial differences to the actual spatial locations of preferred shooting. But in order to get here, we must impose more rigor; as we will not have an interpretable solution using only the SVD.

More Rigor: Nonnegative Matrix Factorization (NNMF)

Given the set-up above, it is not enough to proceed analyzing the data. For example, simply applying an SVD, we could obtain undesirable results. For instance, we may obtain negative counts at specific locations! To combat these types of results, we impose a nonnegative matrix factorization (NNMF) structure.

The SVD process in the previous section is a matrix factorization technique that can be summarized as A = USV = HW, where H is a weighting matrix and W is the structural matrix. When moving from an SVD set-up to another matrix factorization set-up of similar vein, we ideally identify the “additional requirements” we impose on the factorization.

For instance, Principal Component Analysis (PCA) assumes that the matrix A is mean-centered (columnwise typically). And therefore it is common to refer SVD as being equivalent to PCA.

Another example is if we impose W to have linearly independent columns. We then obtain Independent Component Analysis (ICA).

And if we impose that the elements of W and H are non-negative, then we obtain nonnegative matrix factorization (NNMF). It is this requirement we impose to ensure that we avoid obtaining negative shot attempts. NNMF is completed by imposing a number of structural elements, r, and minimizing the Frobenius norm

Application to the 2017-18 NBA Season

By writing some basic code to trawl through play-by-play, we can build a dictionary that identifies every player with their binned shot-chart:

This results in a shots dictionary that is player : court location counts. We can then unravel the N players and stack them in an Nx600 matrix as such:

To show how uneventful and effectively unhelpful the shot matrix is, we printed it out:

The Nx600 shot matrix. The bright lines are shots around the rim.

However, applying the NNMF, we obtain special structure. Let’s look at the first three components:

First spatial component: Three’s and rim attack. With emphasis on right side.

Second spatial component. Rim and mid-range towards the baseline.

Third spatial component. Rim only.

Here, we identify right-hand dominant players that either shoot the three from the wing and drive to the basket. The second component identified mid-range baseline shooters who also attack the rim. The third component is rim only shooters. The idea is simple. To discriminate between players in spatial components 2 and 3, we simply impose the weights to emphasize each respective component for each player. Therefore, a rim-dominant player will have a higher weight for component 3 than all other components. But a LaMarcus Aldridge type player will have a higher weight on component 2 than component 3.

Fourth spatial component: Right-dominant paint 2’s.

Fifth spatial component: Perimeter three point shooter.

Sixth spatial component: Corner three specialist.

As we include more structure components, we start to see the different roles pop out. Here, we find our paint shooters such as Kosta Koufos and Marcin Gortat (component 4), our perimeter shooters (component 5), and our corner three shooters (component 6). The remaining four components are:

We start to see similar components rear their ugly heads, such as components 3 and 10, but they are subtly separating shooters from front-on rim scoring (three) and left/right rim scoring (ten).

Using these components, we’ve effectively uncovered basic structure in shooting distributions. So let’s go back to our introductory example: Boston and Houston.

2017-18 Boston Celtics

If we take a look at the core of the Celtics roster from this past NBA season, we are able to extract out their respective weights for the spatial components defined above:

Let’s take a look at some of these players:

Kyrie Irving

Here, we see that Kyrie Irving is a “jack-of-all-trades” kind of player as he emphasizes all spatial components in his game. Taking a closer look, we find that his emphasis of shot selection rests strongly in components 5 and 2. These are perimeter threes from the wing and mid-range jumpers from the baseline with attacks at the rim. A key takeaway is that Irving is least-likely to take jumpers from the left elbow, as the eighth spatial component weight is small.

Given the volume of shots, if we take a look at the Celtics’ shot chart at the intro, we find a healthy dose of those mid-range shots, that Houston doesn’t take, are courtesy of Irving.

Al Horford

Horford is another mid-range shooter on the Celtics’ roster as his weights stress spatial components 2 and 9. Spatial component two suggests that Horford takes many mid-range jumpers with the ability to score at the rim; while spatial component 9 suggests that Horford can knock down the top-of-the-key jumper.

This is indeed Horford’s style of play as he is a consistent top-of-the-key shooter; typically as a second or third option on a play that kicks the ball outward towards the top of the key.

Jaylen Brown

Jaylen Brown is a complementary player to Irving and Horford, as his weights focus on spatial components 3, 5, and 6. This indicates Brown to either be a rim-attacker or a three point shooter. As Brown doubled up his minutes towards 30 minutes per game this past season, we saw an associated jump in his FGA per game. More importantly, 73.7% of Brown’s attempts were from these three spatial components.

Furthermore, Brown de-emphasizes the mid-range game, making him an invaluable player on the offense; making him a maximizing eFG% type player who can stretch the court and attack the rim.

Jayson Tatum

Break-out rookie Jayson Tatum had himself a fantastic year for the Celtics and provided much offense to help boost the team into the playoffs and ultimately the conference finals. His distribution reads as a mix of Rozier and Irving, providing himself not only as a strong complementary player, but also a number one scorer. His weights push at Rozier levels (indicating volume) but facilitates across spatial components 1, 2, and 5; indicating that he is a perimeter shooter with an ability to get to the rim.

However, Tatum also pops up at a team high component seven. This is the “dreaded” mid-range shot from the free throw line. In this region, he took roughly 15% of his field goal attempts at a .450 FG%; thus providing an expected point value of 0.90 points… well below a desired 1.10 per possession. And given Tatum’s volume in shooting, this colors in the remainder of the Celtics’ shot chart above.

2017-18 Houston Rockets

So let’s compare the Celtics to the Rockets. From the shot charts above, we see there is a distinguishable difference in mid-range attempts between the Rockets and Celtics. However, to gain insight on the player traits, we can look at the resulting weights for each Houston player.

The fact that so many zeros litter this table shows the discipline taken by the Rockets roster when it comes to shooting. Areas where significant number of zeros occur as in spatial component 2 (midrange), spatial component 7 (midrange), and spatial component 8 (midrange). Usually, there is a small amount showing randomness. But not here. This shows almost complete discipline instilled by D’Antoni and staff on this roster.

We also begin to see which players are given the green light for midrange attempts. This is James Harden and Chris Paul. As we creep closer into the paint, secondary players get their midrange attempts in: PJ Tucker, Eric Gordon, and Clint Capela. But these numbers are significantly small; with exception of one of the best mid-range shooters in the game: Chris Paul.

James Harden

Harden posted the highest weights across the team this past year, which should come as no surprise as he is the team’s leading scorer and volume shooter. His focus this past season was heavily influences by spatial components 5, 1, 2, and 9. All of which are a high 1.0+. Spatial component six is the perimeter three attempt. Houston valued this shot more than any other team in the league; and it accounts for 33.6% of his weight.

Combine this with his ability to drive and hit shots along the way contribute to components 1 and 2 spiking, as component 1 identifies three point shooters and attack the rim and component 2 identifies baseline mid-range shooters. On top of this, Harden pops on component nine, which is the top-of-the-key jumper; which he and Chris Paul are the go-to shooters at this location.

Chris Paul

Chris Paul follows a very similar strategy as Harden as components 2, 5, and 9 are significant to Paul’s shooting strategy. Missing, however, is component 1, which indicates that Paul is not so much as a rim attacking guard as Harden is. In fact, Paul tends to trade rim shots for mid-range paint floaters and jumpers as components 7 and 8 flair up.

Component 8 is a typical location for guards to shoot floaters; a shot that Paul is known for picking up after making an attacking dribble towards the rim from the perimeter. More importantly, it’s a left-side dominant play, which falls in line with right handed scorers; whereas Harden picks up the right-side dominant shots as he is a left-handed player.

Trevor Ariza / P.J. Tucker

Trevor Ariza and P.J. Tucker were Houston’s corner three specialist. We see that both Ariza and Tucker have their highest weights on spatial component 6, the corner three. Houston did not value the corner three as much as other teams; but having these two players dutifully serve their roles in position, defending teams had to keep track of these two players.

The different between these two players rested in the fact that Ariza was more likely to attack the rim than Tucker. This gave Ariza a different wrinkle than Tucker. That said, Tucker’s go-to move for driving was to attack the center of the lane, as evidenced by his spike in component 8; which Ariza weighs in at zero.

Next Steps

Given this introduction to non-negative matrix factorization, we are able to being characterizing shooting trends for teams around the league and begin to break down tendencies of specific players. However, this is just one of the first steps in breaking down an offense.

From here, we may look at year-to-year trends, or use this characterization to develop a spatio-temporal analysis. Or, we may use this as an underlying dimension reduction technique to understand if a player can change their shooting strategy.

How would you employ this data science capability to understand offenses better?

Post navigation

9 thoughts on “Understanding Trends in the NBA: How NNMF Works”

very interesting approach 🙂
as a basketball fan & data scientist (in neuroscience) myself, I never expected to see NMF in basketball related blog post.
btw, do you have any plan to share the code you used to collect the data? or Nx600 dataset?

very interesting approach 🙂
as a basketball fan & data scientist (in neuroscience) myself, I never expected to see NMF in basketball related blog post.
btw, do you have any plan to share the code you used to collect the data? or Nx600 dataset?