Web designer, developer, business intelligence specialist, and all around nerdy tech guy.

Jan 21, 2017

Crushing Data in JavaScript (Part 1)

The breadth of data we have accessible to ourselves in the modern world is astounding. Data, however, is generally useless in it’s raw form. Although we clamor to get our hands on sweet, sweet datasets the real money is in the analysis of that data. Taking it from it’s raw, tabular form and compressing it pictorial images that can be convey a message visually to readers. That data can range from petabytes of data being analyzed on multi-server Hadoop architectures to your personal finances in a simple Excel spreadsheet.

No matter what your tool of choice you will likely be looking to do one of the following with your data:

Summarize (reduce): As I mentioned above large tabular datasets can not be easily interpreted by the human brian. Data needs to be aggregated in a way that makes more concise.

Subset (filter): In order to make decisions based on data you will likely need to limit the scope of the data you are looking at. This could range be to eliminate outliers or to focus your analysis on a subset of the data.

Transform (map): Transforming data is a critical step to making your data more accessible to your users. This would be taking poorly named categories and making them more friendly to the user.

Just a few other definitions I want to get out of the way as I am going to be using these pretty extensively:

Dimensions: These are the different values that we use to categorize our data. The most common example would be month & year

Metrics: These are any of the values you would aggregates across your dataset. Numeric values like revenue or counts.

Dimensions & Metrics

Hierarchies: When you have two or more dimensions that are subsets of each other they can be organized as a hierarchy. Make and model for automobiles is a prime example of this in that every model (Mustang, Charger) of car belongs to one and only one make (Ford, Chevy) and every make can have one or more models.

An Example Heirarchy

Detailed Dimensions: These are the lowest level detail of the dataset. Typically these are reserved for low level analysis and typically wouldn’t be in a data visualization. You might have the ability to drill through to this level of data.

With that out of the way lets talk about what you all came here for. Taking data from these datasets and manipulating it in JavaScript. For this exercise I am going to be using D3 (https://d3js.org/). D3, or Data Driven Documents, is a fantastic library that was released several years ago for JavaScript. It uses DOM manipulation to create visual representation of data in your HTML documents. We’ll be using it here to create some SVGs that present a dataset being read in from a CSV (Comma Separated Value) Spreadsheet. We’ll also being using a dataset from BuzzFeed from last year on Fake News because, come on, who doesn’t love to scrutinize fake news (https://github.com/BuzzFeedNews/2016-12-fake-news-survey)

A quick word of warning: I am going to be covering mainly the technical aspects of visualizing data with JavaScript. I will not be discussing the nuances of design, analytics, data quality, or anything of the like. The visualizations we will be making will, in essence, be pretty useless from an analytics standpoint but will serve as a foundation for building more meaningful visualizations yourself.

Getting Started

Before anything we are going to want to include D3 in our project. So go ahead and create an HTML document and add D3. You can link to a local copy or pull it off of a CDN:

Once you have D3 we can start pulling in our data. As I mentioned above we are using the CSV from the BuzzFeed Github repo. Save that CSV into the same directory as your HTML document and we’ll pull it in as such:

d3.csv(‘responses.csv’, function(data){// data processing});

where responses.csv is the dataset we downloaded. Next thing you want to do is familiarize yourself with the data we are working with here. Put a breakpoint on the line of code we just added and lets examine it.

An object in our dataset

First of all it is worth noting that the dataset we are working with is 18K records. Obviously this fits our criteria for data that should be reduced to a more manageable size. We have a few fields that can serve as dimensions. Lets start with headline: Headlines is a letter indicating which article was shown to a particular individual to determine if they recalled the article. In order to summarize this we are going to use JavaScripts reduce function for arrays.

What we have done here is run through the entire dataset checking for headlines. If the headline is already part of our object (a) we increment the metric (count) for that object (headline). If it is not we create a new object and initialize it’s count to 1. Let’s add another breakpoint into our code and examine the object that gets created.

Our reduced dataset

Now we have a much simpler dataset to work with. We have 11 objects that each have their own count that corresponds to the given headline.

That’s it for now. Please let me know in the comments if you are interested in this topic and if you would like to see more. If you are interested to see a working example it is available on GitHub https://github.com/ignoreintuition/Crushing-Data