Bookmarks

Visualizing Arkansas traffic fatalities, Part 2

A couple of weeks ago, I posted a map of the traffic fatalities in Arkansas in 2015. The data came from the NHTSA, and the graphic I posted was just scratching the surface. I’ve sliced the data a couple of different ways and created three more sets of visualizations about that data. For the next three posts, I’ll show the visualizations, my interpretation, and then the code so that non-programmers will get the goods on the front end.

The Visualizations

The first set of visualizations maps the raw number of traffic fatalities in the US. Each band represents a single year between 2000 and 2015. Each row within the band is a day of the week. From left to right (or top to bottom on small devices), you have drunk driving fatalities, non-drunk driving fatalities, and total fatalities. We’ll repeat this comparison a number of times, and the color coding in each set of graphs uses the same scale (so we compare apples to apples across the three visualizations).

For me, there are two things that stand out in this set of visualizations. First, drunk driving fatalities are heavily weighted towards weekends. Second, New Years’ Day (the left- and top-most block) is an especially dangerous time to be in the road.

As I will with the remaining posts, I repeated the same analysis on Arkansas-specific wreck information (this requires a single line of R code given below). The data is a little noisier, but the same results appear to hold. Note that this scale is different than the nationwide set. The scale is a little skewed towards lighter colors by one extremely bad Arkansas traffic Saturday in the fall of 2004.

The Code

Now to the code. First, we’ll load in all the data from the NHTSA (available at ftp.nhtsa.dot.gov/fars/). Because the files added data points over time, we’ll need to select just what we need for the visualization so we can combine them all.

Next, we need to write the code that will let us parse the data by whether or not any driver involved in the wreck was drunk. We do this by creating R vectors. The vectors allow us to aggregate the raw wreck data by day and then subset it by whether or not it is alcohol-related, as follows:

Now, we need to create empty entries for those dates that don’t have any fatal wrecks (not necessary for nationwide plots, but necessary for Arkansas-specific ones). We’ll add this to the data we created earlier.

Next, we’ll create some additional columns to give us the week of the year, which is how we plot the bands. This code is adapted from the very helpful r-bloggers post I mentioned in the last visualization post. As an aside, if you’re interested in data visualization, you should subscribe to r-bloggers, as those guys are always posting fascinating stuff.

We’re finally done with our data processing. The next step is to set up a theme for the visualization. To do this, I installed the development version of ggplot2 from github to use some of the newer features like subtitles and moving the facets (year labels) around.