Blog

Analyzing Public Data with D3

“Most of us need to listen to the music to understand how beautiful it is. But often that’s how we present statistics: we just show the notes, we don’t play the music.” - Hans Rosling

When presenting data, visualizations are a powerful tool in making information easily understood and quickly digestible. They bring insight to areas that may otherwise be overlooked, help people grasp difficult concepts, identify new patterns and trends in information, and add intrigue and interest for the reader. Visualization is often seen as an essential and valuable step in an organization’s overall data analytics strategy. From straightforward charts to complex knowledge graphs, Enigma’s data scientists and developers have vast and comprehensive experience in helping clients explore and make sense of our datasets.

A recent project I completed for Enigma Public focused on the gender wage gap using earnings data from the American Community Survey. I wanted to show the difference in wages between men and women for the some of the most common occupations in the US. A visualization is a great way to help an audience conceptualize relationships between two data points. A connected dot plot, with its minimalist style and clear readability, seemed like the best chart to present the information. If you want to learn more about exploring social issues through public data, see my previous blog post, here.

I used the D3 library to create the dot plot. Although there are a lot of JavaScript charting libraries, D3 is widely considered the gold standard for data visualization in JavaScript, allowing for the most customization and control over the end product. Although it can be intimidating and its syntax confusing, knowing a few basic concepts makes the library much more accessible. The tutorial below will cover how to make a connected dot plot in D3, along with basic D3 charting principles. This is what we’re going to make:

Getting the Data

First we have to get the data into our project from Enigma Public. We can use the API, which allows programmatic access to all of Enigma Public’s data. To learn more about the API and how to integrate Enigma’s data into your development projects, view the docs here.

We’ll use the Fetch API, a promise-based interface for getting resources on the web. There are two ways we can go about fetching the dataset:

Make an API call using search parameters to return only the data we want.

Import the whole dataset into our project then filter for selected fields.

The first option is useful if the dataset is large or if you only need a small, specific amount of data. For the chart above, we are looking for the selected occupations on the y-axis. We will therefore formulate a query that returns just the rows of the column name (Occupational_Category) that we specify. See Enigma Public’s API documentation to help formulate search queries. Since you must use URL encoded space characters (%20) within the quoted string, we’ll use encodeURIComponent() to encode the occupation names then interpolate it into the fetch query.

For the second option, we can make a request for the entire dataset, then filter for the fields we want. Notice that the promise chain has an additional filterData() function. Since the dataset is 560 rows, we need to set the row_limit high enough to return all the data. It is set to 600 here, but you can request up to 10,000 rows.

Formatting the Data

Now that we have the data we want, we’ll need to transform it into an array that can be passed to our D3 function. The function below maps each row to an object specifying the name of the field, the ‘max’ value (men’s earnings), and ‘min’ value (women’s earnings). For the fields we selected, the men’s earnings were all greater than women’s earnings.

We’ll also sort the data by men’s earnings so the higher paid professions will appear first on the chart.

Building the chart

Now that our data is in the correct format, we can start building the chart. For the purposes of this program, all the D3 code is wrapped in a drawSVG() function.

One thing to keep in mind about D3 is that building a visualization is like painting on a canvas. The bottom layer of the visualization is the code you write first, then each piece builds on top of that. If you are making a D3 bar chart and, say, your axis lines appear on top of your bars, you need to rewrite so that you create the axes first then the bars.

1. First let’s make a container div in the html where we will append the visualization:

<div id="container"></div>

2. Next we will set the margins (leaving a wide margin on the left for occupation names), width, and height; and create an svg and append it to #container.

Domain in the context of D3 refers to your data and the boundaries in which your data lies. If my data is an array of numbers no smaller than 1 and no larger than 10,000, my domain would be 1 to 10,000.

Range refers to the mapping between a domain input and an output (range). For example, if you have data points that go from 1 to 10,000, you likely will not have a chart that is 10,000 pixels in width. You will need to transform the domain into a workable range to accurately size the chart, while keeping proportions between data points.

d3.scaleBand() and d3.scaleLinear() are functions that map values across coordinate systems and put the data in the right place on the screen.

scaleBand() splits the range into bands, computes the position and width of the bands, and applies any specified padding.

scaleLinear() constructs a continuous linear scale with the specified domain and range, preserving proportional differences between the data points.

.tickFormat formats the ticks manually. We passed it a function to display the data points in a human-readable format of two significant digits (d3.format(".2s")).

6. Lastly, let’s make our circles (lollipops) representing each data point and append them to the chart. startcircles refers to the minimum number (women’s earnings) in each occupational category, while endcircles is the maximum number (men’s earnings).

And our chart is now complete. You can also add a legend (necessary for a chart like this) along with some tooltips and styles. I won’t cover how to do that here, but the code for those features is in the codepen.