Big Data Visualization: 3 Errors To Avoid

Avoid common visualization mistakes. Here's advice on how to clarify goals and get better results.

There has been a lot of talk about data visualization lately -- almost as much as there has been about big data. We're told that visualization is the best way (or the only way) to understand data, and that if we're not visualizing it, we're missing out.

Visualization is a great way to gain and share insight, but many big data teams are doing it the wrong way. How can it be done wrong? It turns out there are several ways to undermine data visualizations. Let's look at a few of the most common mistakes.

Error 1: Displaying all the data
Despite what you were told in school, most people don't care about seeing your work. They don't care about how much data you can process every day or how big your Hadoop cluster is. Customers and internal users want specific, relevant answers, and the sooner they can get those answers, the better. The closer you can come to giving them exactly what they want, the less effort they have to expend looking for answers. Any irrelevant data on the page makes finding the relevant information more difficult; irrelevant data (no matter how valid) is noise.

Noise is particularly prevalent in dashboards, where the guiding philosophy is often "Show the status of everything." But most performance measures are normal (and boring), not noteworthy. Showing all the normal conditions gives the abnormal measures a lot of places to hide.

A better dashboard approach is to show only what's interesting or important. Prioritize what matters, what's unexpected, and what's actionable, and deemphasize everything else. Deep dives into data can be important, but dashboards aren't the place for that. Broad overviews of non-actionable data are better handled as reports.

Error 2: Displaying the wrong data
This error is as dangerous as the first one. Showing subsets of information is fine, as long as the data relationships are relevant. If you care about sales, for example, you may also care about sales per region or sales over time. Consider how the data will be used to make decisions.

Showing several closely related graphs can be a nice compromise between showing too much in one graph and not showing enough overall. A few clean, clear graphs are usually better than a single complicated data visualization.

Error 3: Representing data poorly
Even when you're graphing the right data, you can still get it wrong. Most exotic graph types are seldom seen, because they don't work very well. The vast majority of visualization needs are well addressed with bar and line graphs, scatter plots, and (if done well) pie graphs.

Think about the key relationships among data fields, and consider putting those fields on the axes. Group by category, and then order the data by time or magnitude or importance. (Alphabetization is most useful when nothing else matters.) Use color for category, not magnitude; you can use brightness or saturation to illustrate magnitude. Use labels and other marks selectively to call attention without cluttering.

Good design: Think and plan first
The best way to avoid all these errors is to focus on your goals first. Before considering how your visualizations should look, think about the following questions, in this order.

What actions to you need to enable (or what do we care about)?

What decisions do you need to inform (and what are we going to do about it)?

What questions do you need to ask?

What data do you need to see?

What is the best structure for revealing the important relationships in the data?

What data do you need to highlight?

As you answer these questions, you can begin to design and implement the right visualizations using the right data. It's likely that you'll have to make changes. This is a good thing. Iterate, test, try different approaches, test some more, and iterate again. A deliberate, user-oriented design approach will yield effective, efficient, and useful data visualizations.

Noah Iliinsky is a visualization expert at IBM. He is coauthor of Designing Data Visualizations and technical editor of and a contributor to Beautiful Visualization, both published By O'Reilly Media.

These five higher education CIOs are driving critical changes in an industry ripe for digital disruption. Also in the Chiefs Of The Year issue of InformationWeek: Stop bragging about your Agile processes and make them better (free registration required).

Tips are useful when they help people avoid making some common mistakes, and this article includes some good ones. But tips-based approaches take shortcuts that bypass a rock-solid understanding of fundamental principles. In Data Visualization, those principles are clearly and comprehensively explained by Edward Tufte in his book, The Visual Display of Quantitative Information. His grounding assumption is that statististicians lack an understanding of graphics, and graphic artists don't understand statistics. It is unlikely that one can present statistics well without understanding the fundamentals, just like it is unlikely that urban planners can be successful without understanding the work of William H. Whyte.

I'm really glad the author made this point, asking: "What decisions do you need to inform (and what are we going to do about it)?"

Does the data help people make the decision at hand? And can your data experts grasp the nuances of the decision? This is another example of why having data science/analytics experts from varying backgrounds proves valuable.

I subscribe to the Stephen Few school in believing that good data visualizations are simple, clear visualizations. Noah has it right that showing two much information and using overly complicated formats is a rookie mistake. Avoid extraneous eye candy, like 3D effects and coloration without meaning. For more on effective visualization, check out our "Top 15 Data Visualization Tips."

Most IT teams have their conventional databases covered in terms of security and business continuity. But as we enter the era of big data, Hadoop, and NoSQL, protection schemes need to evolve. In fact, big data could drive the next big security strategy shift.

Why should big data be more difficult to secure? In a word, variety. But the business won’t wait to use it to predict customer behavior, find correlations across disparate data sources, predict fraud or financial risk, and more.