Connect with me:

blog

Let’s agree that interaction is essential to exploratory data analysis. But big data imposes huge technical challenges for real-time response rates. This is a problem for programmers.

Visualizing every data point leads to over-plotting and visual clutter, which can overwhelm users’ cognitive capacity. This is a problem for the designer. What are options to resolve this problem?

Make each data point equal to one pixel, which maximizes screen real estate.

In instances where data points overlap, use a ‘jittering’ algorithm.

Use transparency (aka alpha blending) to denote density.

All of these options have inherent scalability limits.

Reducing the data to smaller, derived data, makes it more manageable. Strategies include:

filtering: A subset of the data is selected, whether at random, or with some algorithm. The drawback to this approach is that the subset may still be too large to visualize, and important outliers and patterns can be lost.

sample: The data is sorted and ordered, and then data points are selected at a given interval. This requires significant cleanup work, and here again outliers and patterns can get lost.

binned aggregation: An interval is chosen, and within that interval, all data values are replaced by a representative, often the mean.