Blog

I am constantly amazed by the energy and momentum around data science. Only a few years ago, I would be met with a blank stare when I told someone I planned on going to grad school for machine learning. Today, there is no need for my “it's like computer science, linear algebra, and statistics had a combined love child” analogy as most people instantly respond with “Oh, like AI!”

Faster is different. It sounds strange at first because we expect faster to be better. We expect faster to be more. If we can analyze data faster, we can analyze more data. If we can network faster, we can network with more people. Faster is more, which is better, but more is different.

The rise of Big Data created the need for data applications to be able to consume data residing in disparate databases, of wildly differing schema. The traditional approach to performing analytics on this sort of data has been to warehouse it; to move all the data into one place under a common schema so it can be analyzed.

This approach is no longer feasible with the volume of data being produced, the variety of data requiring specific optimized schemas, and the velocity of the creation of new data. A much more promising approach has been based on semantic link data, which models data as a graph (a network of nodes and edges) instead of as a series of relational tables.

One thing we love doing at Exaptive – aside from creating tools that facilitate innovation – is hiring intelligent, creative, and compassionate people to fill our ranks. Frank Evans is one of our data scientists. He was invited to present at the TEDxOU event on January 26, 2018.

A crucial aspect that sets a data application apart from an ordinary visualization is interactivity. In an application, visualizations can interact with each other. For example, clicking on a point in a scatterplot may send corresponding data to a table. In an application, visualizations are also enhanced with simple filtering tools, e.g. selections in a list can update results shown a heat map.

You can already try some linked visualizations to find the perfect taco. Now, we'll look at how some simple filtering elements enhance visualization, using a tech stock exploration xap I built over a couple of days. (A xap is what we call a data application built with the Exaptive Studio.) A few simple, but flexible interactive elements can help transform ordinary visualizations into powerful, insightful data applications. Humble check boxes and lists help produce extra value from charts and plots.

Parallel coordinates is one way to visually compare many variables at once and to see the correlations between them. Each variable is given a vertical axis, and the axes are placed parallel to each other. A line representing a particular sample is drawn between the axes, indicating how the sample compares across the variables.

Previously, I wrote how it's possible to create a basic network diagram application from just three components in the Exaptive Studio. Many users will require more scalable from a data application, and fortunately the Studio allows for the creation of something like our Parallel Coordinates Explorer. Often times, a parallel coordinates diagram can also become cluttered, but fortunately, our Parallel Coordinates component lets users rearrange axes and highlight samples in the data to filter the view.

It helps to use some real data to illustrate. One dataset that many R aficionados may be familiar with is the mtcars dataset. It's a list of 32 different cars, or samples, with 11 variables for each car. The list is derived from a 1974 issue of Motor Trend magazine, which compared a number of stats across cars of the era, including the number of cylinders in the engine, displacement (the size of the engine, in cubic inches), economy (in miles per gallon of fuel), and power output.

Let's say we're interested in fuel economy, and want to find out characteristics could signify a car with good fuel economy. Anecdotally, you may have heard that larger engines generate more power, but that smaller engines generate better fuel economy. You may also have heard that four-cylinder engines are typically smaller in size than larger engines. Does this hold true for Motor Trend's mtcars data?

To find out we'll use a xap (what we call a data application made with Exaptive) that lets a user upload either a csv or Excel file and generates a parallel coordinates visualization from the data. But a data application is more than a data visualization. We're going to make a data application that selects and filters the data for rich exploration.

In our dataflow programming environment, we use a few components to ingest the data and send a duffle of data to the visualization. Then a hand-full of helper components come together make the an application with which an end-user can explore the data.

Often times, when we're looking at a mass of data, we're trying to get a sense for relationships within that data. Who is the leader in this social group? What is a common thread between different groups of people? Such relationships can be represented hundreds of ways graphically, but few are as powerful as the classic network diagram.