D3 for Data Scientists, Part I: A re-usable template for combining R and D3 to build interactive visualizations

The path to D3 mastery is dark and full of terrors. D3 itself is a JavaScript (JS) library and on top of that, you’ll need a basic understanding of HTML (Hypertext Markup Language) and CSS (Cascading Style Sheets) to get the most out of it. If you’re a data scientist, chances are JavaScript, at best, ranks as your fourth language, after R, Python, and SQL. In this three-part series of blog posts, I will show you step-by-step how you can combine R with D3, HTML, and CSS to create a fully interactive data visualization from scratch.

As a guiding example, we will build the following interactive data visualization(available here):

For all movie franchises with at least four movies, we show the Rotten Tomatoes score on the Y-axis, plotted against the movie release dates on the X-axis. The size of each bubble is proportional to a movie’s worldwide box office gross. This will hopefully show us the relationship between a movie’s box office and its quality. Which franchises got better or worse over time? Which movies were the high and low points for each franchise? How are critical darlings faring at the box office and which franchises keep raking in the cash despite poor reviews?

The inspiration for this work comes from the Economist’s excellent visualization of TV shows, which you can find here:“TV’s golden age is real”, published November 24th, 2018. To give it my own twist, I combined data on the most popular franchises fromBy the Numberswith movie ratings data fromOMDb. For convenience, we’ve made the combined data set availablehere.

In this first part, we’ll create a template that we can use for this and any future D3 data visualization you’d like to build. We’ll start by reviewing some basic HTML and CSS. Then we’ll set up our canvas, including a title, subtitle, and caption, sizing, and margins. Finally, I’ll show how you can inject data from R into your HTML code and display your visualization — for now just a blank canvas — in your browser or in Rstudio’s Viewer Pane. In Part 2, we’ll build on this template by plotting a static version of the visualization. We’ll draw all circles, lines, and labels, to get you familiar with binding graphical objects to data, before moving on to adding interactivity (i.e., dropdown menus, tooltips, and dynamic highlighting) in Part 3.

If you want to play around with the fully interactive version you can do sohere, or download the full code for all three partshere.

Let’s get started!

1. The basic components of an HTML document

Most of our code will use JS, but our final product will be an HTML document that a browser can display. HTML uses opening and closing tags (for example, <html> and </html>) to mark the beginning and end of different code sections (note the backslash ‘/’ in the closing tag). Every HTML document has a head section and a body section, both wrapped within a pair of <html></html> tags:

The <!DOCTYPE html> and <meta charset=”utf-8"> tags let your browser know that this is an HTML document and that it should use the utf-8 character set, capable of encoding all characters on the web. They’re not strictly necessary, as your browser will assume these as its default settings, but if you ever come across them in the future, you’ll know what they’re for.

We will use the head section to import JS libraries and add custom stylings using a style section written in CSS. The body section will form the bulk of our code. Here we will define the overall layout of the page (using HTML) and build the actual visualization (using D3 and any other JS libraries we imported).

Create a new document called ‘index.html’ and add the code above. Double clicking it should open a new browser tab, showing a blank page.

2. Importing JS libraries

Add the following lines to the head section of your new index.html file (within the <head></head> tags):

The first line imports the current version ofD3 (v5). The ‘min’ in the name indicates that this is a minimized version, meaning that it is optimized to take up less memory.

The second line importsjQuery, a JS library for navigating and manipulating the Document Object Model (DOM). Think of the DOM as a hierarchical tree containing all the objects on your page. For example, <div> tags segment sections into containers, while <svg> tags hold shapes like circles and lines. We won’t be using jQuery just yet, but we’re including it here in preparation of Part 3.

Our third imported library isUnderscore. This library is great for extracting a column as an array from a JSON object (JS’s version of a data table) and for many other handy tricks like sorting or filtering an array or calculating its minimum or maximum.

Our fourth and final imported library isBootstrap. This is another very popular library, this time for defining a lot of nice custom stylings for free.

3. Setting up the canvas

Let’s put something on our page! We’ll use some basic HTML to add a title, subtitle, and caption. We’ll also create a space for our main visualization in between the subtitle and the caption.

Add the following code to your body section (within the <body></body> tags):

After loading this into your browser, you should see something like this:

Wow, what a great start! (If you still see a blank page, read the Troubleshooting section at the end of this article.)

Let’s review the code to see what we just did. We created three <div> containers: one for the title and subtitle, one for the visualization, and another one for the caption. As the name implies, a <div> container is a place where you can store different objects. It’s a convenient way to separate one part of your page from another. By default, these <div> containers will be stacked vertically one on top of the other, but you can also organize them side by side.

We assign each <div> an ID in its opening tag (e.g. id = ‘title’). IDs must be unique. Each <div> can only have one ID and each ID can only refer to one <div>. We also define the width of each <div> (here set to 1,366 pixels).

Within our title <div>, I’ve added two header blocks: a level 1 header (<h1></h1>) for the main title and a level 2 header (<h2></h2>) for the subtitle. These are just placeholders; feel free to edit the text within these tags to the titles of your choosing. In the caption <div>, I’ve added a paragraph of basic text (<p></p>). Note the use of style=’text-align:right’ in the opening tag of the paragraph. This aligns the end of the caption with the right edge of the container.

Finally, I’ve added an <svg> element to the vis container and a <g> element to the <svg>. SVG stands for Scalable Vector Graphic, which is a fancy way of saying “shapes,” like rectangles, circles, lines, and so on. ) The <g> stands for “group,” which implies that there will be a grouping of multiple SVGs coming. Think of the <svg> as your base canvas and the <g> group as all the elements within that canvas. If you were to translate or rotate the canvas, all its <g> sub-elements will move in sync. We’ll cover both of these in much more detail in Part 2.

Note that the <svg> and <g> elements didn’t get an ID. Instead, I assigned them a class. As opposed to IDs, each object can have multiple classes and each class can refer to multiple objects. These IDs and classes will come in handy soon. They allow us to set custom styling for objects of a specific ID and or class and offer a convenient way to select and manipulate all objects with a given ID or class. In all subsequent code, we’ll use IDs for <div> containers only and classes for everything else.

4. Adding custom styling with CSS

Let’s add some flavor to those titles!

Add this next code block to your head section (inside the <head></head> tags), and after your imported <script> and <link> libraries:

Our first order of business is to import a custom font. This is optional, and you can always use any web-safe font (e.g., Arial, Times New Roman, Courier. Seeherefor a full list). These come pre-installed with modern browsers, so you can use these freely without worrying about importing new fonts. But where’s the fun in that?!?

Google Fontsis a great place to start experimenting with typography. It has a wide selection of free fonts to choose from. Given that our goal is to visualize data, I highly recommend you choose a font in the ‘Sans Serif’ category. These fonts don’t have the little dangly end-bits, which makes chart labels easier to read. Once you’ve decided on the font you’d like to use (I went withBaloo Thambihere), you can easily import it with the following line:

Just make sure you replace <font name> with the name of your chosen font. If the font name has multiple words, combine them using a ‘+’ sign (e.g., family=Baloo+Thambi). If you decide to use an imported font, do so at the very start of your style section, to make it available for the subsequent stylings.

If something goes wrong and the browser doesn’t recognize the font family, it will still display the text, but using the browser’s default font (this can vary depending on your browser and settings, but it’s often Times New Roman).

Next, we’ll take a closer look at the styling blocks. The basic format of these blocks is as follows:

The selector allows us to specify which objects the style should be applied to. This can be one of three types:

To style an HTML tag, such as h1, simply use the name of the tag. This type of style block defines the styling for anything encapsulated within <h1></h1> tags (in our example, this is just the main title).

To style tags with a specific ID, we use ‘#<ID name>’ as the selector, such as #caption

To style tags with a specific class, we use ‘.<class name>’ as the selector, such as .chart

It’s a common mistake to forget the leading ‘.’ period or ‘#’ hashtag. It’s also possible to create selectors with multiple elements, separated by spaces, to create a more narrow focus. For example, ‘.chart p’ would change the styling for all paragraphs within .chart objects, but would leave paragraphs outside of .chart objects unaffected. You can use a single style block for multiple objects by including multiple selectors separated by commas. Goherefor even more options,

There is a longlist of CSS propertiesto choose from. The ones I’ve used here are self-explanatory, with the exception of fill and line-height. Fill controls the font color and line-height controls the height of a line of text. Setting line-height to 0.2em prints the title and subtitle closer to each other.

With these fancy new stylings added to your index.html file, your page should now look like this:

We’ll introduce more custom stylings (and new style properties) in Parts 2 and 3 as we add the circles and curves that make up the visualization, but these basic stylings should serve you well for now and form a solid basis for any other visualizations.

5. Set the dimensions and margins of your canvas

You’ve gotten a taste of HTML and CSS, and now it’s time to dive into JS!

Add the following code block to your body section, after your <div> containers:

The first thing to notice here are the <script></script> tags. These let your browser know that the code within the tags is JS.

We start by defining some variables. In JS, this is done using

var <name> = <value>;

The var keyword lets JS know we’re defining a new variable. The semicolon ‘;’ at the end terminates the statement. The vis_width and vis_height variables define the width and height of the canvas. These values will soon be used by the draw() function.

The draw() function is a bit of a personal preference. You could take its contents out of the function and it will still run fine. I prefer to use a function for this because it allows me to redraw the chart based on an input (e.g., from a dropdown menu). For example, you could add a dropdown menu that allows users to select an input data set. Then whenever they select a different data set, you can pass that along to the draw() function to redraw the chart. We’ll show an example of how you can use this mechanism in Part 3, where we’ll add a range slider. As users adjust the slider, we’ll redraw the chart such that the date range of the plot reflects the slider inputs.

In draw(), we define a margin variable that we’ll use to control the margins between the plot area (where the circles and curves go) and the outer frame of the chart. We add these margins to make sure there is room for things like axes, axis titles, tick marks, and tick labels. Without margins, these elements will be cut off.

With d3.select(‘.chart-outer’), we select the object with class ‘chart-outer’ (note the leading period!). Once we have selected this object, we can chain additional commands to manipulate it. Here we set the width attribute to vis_width (1366px) and the height attribute to vis_height (650px).

The var again indicates we want to create a new variable, this time called svg. In other words, we first select the object with class .chart (earlier we defined this as a group <g>); we then append a new SVG object, setting its width and height to vis_width and vis_height respectively. We append a new group <g> and translate this entire group to the left by margin.left pixels, and push it to the bottom by margin.top pixels.

This seems like a lot, but all we’ve done is create a useful short-cut. Later, in Part 2, when we define new graphical elements (like a circle), we will only need to call svg, instead of having to repeat the code above. This ensures that any new element added to this svg attaches to the right parent object in the page hierarchy and that it inherits the margin translations. Later on, if we decide to translate or rotate svg, all its child elements will translate and rotate along with it, which saves us a lot of hassle.

I’ve also added an empty params variable here that gets passed on to the draw() function. Later, in Parts 2 and 3, I will use this to customize the chart and add interactivity, but you can ignore it for now.

Fire off your new index.html and you will see the following:

Huh … ok, that’s … not quite right. The canvas section should be 650px high — the full height of the screen — but it looks like it’s much shorter. It’s time for …drumroll …

6. Troubleshooting

How do you know you did everything right? Debugging your D3 code can be difficult because loading your new page won’t show you any error messages, often you’ll just be staring at a blank page.

To check for errors, run your code in a browser, right-click anywhere on the page and select ‘Inspect’:

This will open your browser’s Developer Console:

I am using Google Chrome. If you’re using a different browser, your Developer Console may look differently.

The ‘elements’ tab shows us the DOM, the hierarchical tree of all elements that are on your page. If certain objects are not being displayed, check the hierarchy to make sure they actually exist or whether they were created but not visible. This can happen quite easily. You may have set the fill to white on a white background, or opacity to 0 or set the object to be invisible, or the position of the object may lie outside of the plotted area due to faulty scaling. Looking through the hierarchy is a good sanity check and allows you to narrow down the root causes of your errors.

The console tab is another great place for troubleshooting. One way to check for errors is to print log messages to the console using the console.log() command. You can either print a message to check whether the correct code is being executed (eg. console.log(‘in draw function’)) or check the value of a parameter or a table (e.g. console.log(vis_height)). You can put a break in your code by adding the phrase debugger;. Make sure you have the Developer Console open and trigger the code (e.g., refresh the page or hover over an object); the browser will pause when it encounters the debugger command, allowing you to investigate the current state of variables, data, parameters, etc. This is especially handy to look inside if-else conditions, functions, and for-loops, to make sure all input are within scope. Finally, the console is a great place for some trial and error whenever you’re unsure about the correct JS code. Just slap a debugger in your code to pause the code where you want and try out some code until you get the desired effect (for example, selecting and highlighting a group of dots or calculating the average of an array).

Returning to our example, it appears we forgot to input our data. We’ll fix that in the next section!

7. Injecting data into D3

After all this HTML, CSS and JS, we’re returning to what is hopefully more familiar territory: R! R nicely complements D3, because it excels at data wrangling, something that is rather painful to do in JS. So you can use all your favorite R tools (dplyr! tidyr! lubridate!) to get your data ready for plotting and then inject it into your JS code. We will be using a trick described by James Thompson in his 2014 blog postIntroducing R2D3. Until now, you’ve been writing all of your code directly into your index.html file. We will now split our code over three files:

header.txt (first part of your index.html code)

footer.txt (second part of your index.html code)

main.R (the R script that ties it all together)

Make sure these three scripts are located in the same directory.

The trick is to define a new div container with ID data in the <head></head> section:

<script type='application/json' id='data'>

The header.txt file will just consist of the imported libraries and this new line. Note that there is no </script> closing tag. This is intentional!

Everything else goes into the footer.txt file. Now add a </script> closing tag at the very beginning (before the <style></style> section) and add

var data = JSON.parse(document.getElementById('data').innerHTML);

to the <script></script> section (where you defined the var_width and var_height parameters).

I promise this will make sense soon. In our new main.R script, we can now inject the data by concatenating the header, data, and footer. This line

var data = JSON.parse(document.getElementById('data').innerHTML);

will check the data <div> and grab its innerHTML, where we just printed our data! Through this little detour, we’ve transferred our data from R into the JS code and we’re now ready to visualize it.

9. Next Steps

Congratulations for making it this far! We’ve covered a lot of material across four different programming languages! You now have a template that you can use again and again to build beautiful and fully interactive data visualizations with D3. If you just want to grab the final code, you can do so fromour GitHub repo. I’ve included additional comments with that code that are not in the code shown above. I hope to see you again in Part 2, where I will show you step-by-step how you can use D3 to define scales and bind graphical elements (circles, lines, labels, etc.) to your data.