Data Science and Sports Analytics

How to create NBA shot charts in R

A while ago I found this fantastic post about NBA shot charts built in Python. Since my Python skills are quite basic I decided to reproduce such charts in R using data scraped from the internet and ggplot2.

Getting the Data

First we need the shot data from stats.nba.com. This blog post from Greg Reda does a great job explaining how to find the underlying API and extract data from a web app (in this case, stats.nba.com).

To get shot data for Stephen Curry we will use this url. The url shows the shots taken by Curry during the 2014-15 regular season in a JSON structure. Note also that Season, SeasonType and PlayerID are parameters in the url. Stephen Curry’s PlayerID is 201939.

Time to get this data into R and for that I use the package rjson and replace the PlayerID parameter with the R object PlayerID.

UPDATE: the NBA stats website has changed the JSON structure of its shot detail data. In this code, I added the new argument PlayerPosition and it should work just fine.

Now we have the JSON data as a R list object with 3 elements. The element important for the chart is the resultSets which contains the coordinates of each shot, shot type, range, made/missed flag and more. But first the data needs to be unlisted and saved as a data frame.

Basic Chart

This plot surely looks familiar. But it can improved by overlaying a basketball half court and fixing the aspect ratio of our court/plot. To solve the basketball court problem I simply googled “NBA half court” and found this. (EDIT: the jpg court file is no longer there. Instead, use this)

Shot Charts

Lets plot the data again but this time using the image overlay. For that I will use the packages grid and jpeg. The image is overlaid by using the ggplot2 function annotation_custom. For the axis limits I use -250 to 250 in axis x and -50 to 420 in axis y (I found these to be a good fit after a few hit-and-misses). These dimensions are also the exact length to width ratio of an official NBA half court, but they might differ if you use a different half court image.

There are a few things to note here. First you may see an error that reads “Removed 7 rows containing missing values (geom_point)“. In this case, Stephen Curry attempted 7 backcourt shots during the final seconds of a quarter. I am not interested in these shots and as a result of my y-axis limits, these are not going to be displayed. Secondly, note how shots labeled as “Left Corner 3” in green are actually located on the right side of the court. I will solve this problem by flipping the x-axis from left to right. One more thing: the coordinates are not fixed. As we resize the plot, it becomes distorted. This can be solved by using the coord_fixed function.

This is a much improved shot chart. The x-axis is now flipped, right corner shots appear on the right of the court and left corner shots appear on the left of the court. Coordinates have been fixed meaning that no matter how the chart is resized, the court maintains its true aspect ratio. The axis and legend titles have disappeared and a title for the plot, containing the name of the player, has been added. One cool aesthetic and informative feature in this plot are the rugs on each axis created by geom_rug. It works as a density plot and a guide of “hot zones” for each player.

This time I highlighted shots made in green and shots missed in red. I also added transparency to each points by using alpha = 0.8. The player photo and the footnote were added using functions from package grid.

Hexbin Shot Charts

Another cool way to display data with ggplot2 is to use hexbin instead of geom_point. You will need to install and load the package hexbin and use the function stat_binhex (which replaces geom_point and its components).

We know that Stephen Curry is an excellent 3-point shooter. In fact, he has taken 639 out of 1,341 shots from above the 3-line (left, right and centre). But this chart also reveals how active he is under the rim: 284 shots were attempted deep inside the paint, most of them were driving lay-up shots originated from Curry’s lighting fast transitions from defence all the way to the basket.

Accuracy Charts

Now I will have a look at shot accuracy for each of the 6 zones in the data (excluding backcourt shots). After excluding these shots, the data is summarised by shot zones using ddply. X and Y locations are averaged, shots made are summed up and attempted shots are counted and aggregated. I also create a column for accuracy labels. Again, I use ggplot along with geom_point for points location and geom_text for labels locations.

Note how the “Above the Break 3” point is located inside the 3-point line area. This is because the 3-point shots attempted from the corners drive the y-axis average location down close to the basket. You can adjust the y-axis by adding, lets say, 20 to shotS$MLOC_Y for “Above the Break 3” . But I will leave as is.

Now, the same accuracy chart for James Harden from the Houston Rockets.

Curry isn’t the 2014-15 MVP by chance. He made 48.7% of field goals attempted during the regular season. From the left 3-point corner, he converted 63.2% of shots attempted (almost 2 in every 3 attempts). Under the rim Curry is very effective with 66.5% accuracy when going for those quick lay-ups and finger rolls.

James Harden, the other MVP contender, is also a great 3-point shooter, but not as accurate as Curry. Harden is slightly better from the right 3-point corner but Curry is better from every other zone in the court.

You can find the code on my GitHub page. I also uploaded a list of 490 player ID’s and players who have available shot location data in the NBA stats web app. All you need to do is replace the object PlayerID with the ID of the player you would like to plot.

Great post – it would be really interesting to see the hexbin chart with accuracy as well as number of shots – probably you will want to adjust the size of the hexbins so that you see a relatively smooth output.

I enjoyed this tutorial, as I am a big fan of NBA and am currently starting to learn R. I have one question though. How did you find URL that contains all data about Curry’s shots? I’ve been trying to scoop around NBA.com’s website but I only managed to find the game logs from each player, I didn’t find statistics such as the ones you used. Is there a link somewhere on the NBA.com’s website so that I can fetch data about every player easily, or do I have to type in URL as some sort of query to retrieve the stats?

I went to Steph Curry’s shot tracking page. Then I started the developer tool in Chrome (More Tools -> Developer). Now you have to refresh the page. The click on the “Network” tab and select “XHR”. At this point you have a few options and it is a “hit and miss” process.
Note that this data is for 2015-16 season. But if you change this parameter for last season, you will get all shots for last regular season.

Thanks for you post. Very cool stuff. I’ve been trying to read in some data off of the shot logs page, but some values are recorded as nulls when I take the data from the webpage. As a result, when I try to turn the data into a dataframe, my columns don’t line up because of the missing values in the raw data. Any idea on how to remedy this?

Looking at the shot log page. http://stats.nba.com/player/#!/1938/tracking/shotslogs/ The blanks in the shot clock column are represented as nulls in when you look at the data in the developer tab. When I read it into R, it skips over that null value instead of inserting a blank or NA. Everything is fine until I get to that first blank.

Hi Ed, this has been very helpful. Though I have a question – when recreating the shot charts the hexagons and plot points look very pixelated and wonky unlike yours which seem pretty smooth. Any idea why this could be happening?

I ran these plot using the latest version of RStudio and ggplot2, in a Mac. I notice that when I ran the same plot in Windows, it looks a little more pixelated (not a lot!). Have a go at updating RStudio and ggplot2.

Try this:scale_fill_gradientn(colours = c("yellow","orange","red"), values = shotDataf$SHOT_DISTANCE)
Where values is the value used to colour the hex’s. If you use FG% then you first have to calculate the % per region.

Hi Ed, is it possible to have the Hex bins display accuracy for the player compared to a league standard. For example, if a particular player shoots above the league average for a certain Hex bin area, this Hex bin will be shaded red; and if a player shoots below the league average for a certain Hex bin area, this Hex bin will be shaded blue.

I suppose a starting point would be determining the accuracy for each Hex bin area, and then also obtaining the data for the whole league?

Yes, the tricky part here is to calculate the accuracy for each hex bin area. The size of the hex bin area is variable as it depends on the number of bins you use.
But once you have the value you can use the following:scale_fill_gradientn(colours = c("yellow","orange","red"), values = shotDataf$SHOT_DISTANCE)
Where values is the value use to colour the hex bins. Here I use shot distance as an example.

I do have an issue I’m encountering. I am trying to follow your steps for scraping the data, but whenever I try to load the stats.nba.com page for a given player’s ‘shotslogs’ (ex. “http://stats.nba.com/player/#!/201939/tracking/shotslogs/”, it loads the rest of the page briefly and then just gets stuck with a loading widget in the middle of the screen (on both Firefox and Chrome). Since it wont load any of that player’s data, the only thing I’m left to scrape from is a page that lists all NBA players in the database, and that’s not useful because it doesn’t come with any of the shooting data for any of the players anyhow. I’m hoping that this is just something that’s happening on NBA.com’s end. I’d like to get 2015-2016 data, for the players I’m interested in. Right now there’s a wrench in my project. Any thoughts?

Hi Jorge,
The URL you are referring to does not exist. Instead, try this one:http://stats.nba.com/player/#!/201939/tracking/shots/
Anyhow, you don’t need to go into this specific URL to scrape the data. Running the following code will get you all 15-16 regular season shots of Steph Curry.
library(rjson)
# shot data for Stephen Curry, regular season 2015-16
playerID <- 201939

[…] this example I use data scraped from nba.stats.com using the same method described in my previous post about plotting shot data. The data contains offensive and defensive stats for the 16 teams that played the 2015-16 playoff […]

[…] frequency at that particular coordinate [1. If you want to make a similar chart, I’d recommend this guide for R.]. It’s fairly obvious here that the vast majority of shooting fouls occur at the rim — […]

I am new to R, was able to obtain a similar dataset, and I have somewhat followed along with this but am stuck on the part where you use ggplot. When I use colour=EVENT_TYPE, I get an error message that says:

object ‘EVENT_TYPE’ not found

Is this part of some package I don’t have? How do I get the graph to color the made and missed shots differently?

Never mind, I think I figured it out. “Event_Type” must have been part of the original dataset, right? I changed it to “made” on mine, which is the column that contains 0’s and 1’s for missed and made shots, and it seems to have worked