5 misconceptions about visualization

Last month, I had the pleasure of spending a week at the Census Bureau as a “visiting scholar.” They’re looking to boost their visualization efforts across all departments, and I put in my two cents on how to go about doing it. For being a place where there is so much data, the visual side of things is still in the early stages, generally speaking.

During all the meetings, there were recurring themes about what visualization is and what it is used for. Some people really got it, but others were new to the subject, and we ran into a few misconceptions that I think are worth repeating.

Here we go, in no particular order.

Visualization is for making data flashy

This is probably the most common one. It’s easy to look at a lot of the best visualization projects and want your data to look and feel the same way. So people ask, “I have such and such data. Is there a visualization technique that I can use to make it look cooler?”

Well, maybe. Not if you only have five data points though. You can spend a lot of time with icons or fancy print, but the graphics are interesting because the data that the visuals represent is interesting.

For example, I mapped the growth of Walmart a while back (It’s amazing how much mileage I get out of this graphic.), and people seem to like it because of the organic growth pattern. It starts in one area and spreads outwards like a virus.

Okay, compared to Toby Segaran’s original, I did add some interactive flourishes, but even without, the growth pattern is what makes the animation interesting.

For example, here’s a map with the same style as my Walmart one, but it shows the spread of Target. It’s not nearly as fun to watch, because Target took a more opportunistic approach of expansion. Locations pop up kind of randomly at times. It’s mostly interesting as a contrast to the Walmart map.

It should always be data first. Certain graphics get eyeballs because they show something that wouldn’t be seen in a table.

Software does everything

There are a lot of options for visualization, and the “best” one will change depending on who you ask.

Personally, I use a lot of R and have a lot of fun in Illustrator. More recently, I’ve been workingwithJavaScript. Flexibility is a huge plus for me, and I like to have full control over how my graphics look and how the interactive ones work. Most of what I do though is to present data to a wider audience. If I were an analyst tasked with digging through a large dataset, I might take a different route before I make something custom.

My main point is that there is no one piece of software that will do everything for you.

Some software is good for analysis, some is good for specific types of analysis, and some is good for storytelling.

The more information in a single graphic, the better

A misstep a lot of people take when they’re trying to advance “beyond Excel” is to layer too much information on top of their basic graphic. I’m all for providing context and highlighting interesting spots in your data, but at some point it’s better to split your one chart into two or three charts.

Some people try to be clever by using multiple axes on a single plot or multiple visual cues in a single chart to save space. Again, this works sometimes. A lot of the time it doesn’t. Oftentimes, simple and clear is better than clever and compact.

My favorite test is to show a graphic to someone who doesn’t know the data and isn’t a visualization expert and see what they take away from the visual.

Visualization is too biased to be useful

There’s a certain amount of subjectivity that goes into any visualization as you choose what data to show and how to show it. By focusing on one part of the data, you might inadvertently obscure another. However, if you’re careful, get to know the data that you’re dealing with, and stay true to what’s there, then it should be easier to overcome bias.

After all, statistics is somewhat subjective, too. You choose what you analyze, what methods to use, and pick what to point out in reports.

News organizations, for example, have to do this all the time. They get a dataset, decide what story they want to tell (or find what story the data has to tell). Browse through graphics by The New York Times, and you can see how you can add a layer of information that objectively describes what the data is about.

It has to be exact

If you’re using visualization to show the exact value of every single data point, along with every standard error, you’re probably using it wrong. Accuracy is important. Yes. But visualization is less about the individual values and more about the distribution of them over time and space. You’re looking for (or showing) patterns. You’re comparing and contrasting.

If all you care about are individual data points, you might as well put it in a table.

13 Comments

The first issue is the most prevalent. People want to make their data “cool”, without understanding that what makes visuals “cool” generally makes them harder to read. Faux 3D, advanced color and shading effects, and the like. And the software makers are all too happy to provide such effects.

Data visualization seems like it would take a while to become good at. I see your book is quite comprehensive. Are there more abbreviated resources out there to get a person started? Do you do infographics commercially for clients?

Error #1(b) Using visualization to make the analyst look cool. (Note that I said “analyst” and not “analysis.”) As you say, the focus should be on the data, not on the person or software or technique that visualizes the data.

This, in fact, is one of the problems I have with animations, including the Walmart growth and Hans Rosling’s TED example (and animated bubble plots, in general). These animations are fun to watch, and they give the viewer an important summary of the “big picture,” but not everyone understands that these flashy graphics need to be supplemented by other tables/graphs when someone wants to roll up his sleeves and really take a close look at the data prior to modeling it.

Another point I want to make is the importance of conventions and standard graphic displays. At FlowingData, we celebrate the new, the unique, and the different. But when you are trying to communicate data, there is real value in mundane graphs like bar charts, scatter plots, choropleth maps, and the like. Because these graphs are familiar, the viewers can instantly focus on the data with fewer distractions. I hope the Census Bureau starts with good standard graphics, even as they explore more flashy alternatives.

For scientific communication, visualization is an essential part of the writing process, and requires as much editting and attention as the text. I see a common myth that it is somehow separate or secondary to other parts of paper writing.

I would heartily agree with Raphael that visualization is considered separate or secondary to the scientific process. I’d say that many scientists think that data analysis happens first, then visualization. When folks ask for data analysis help, I often find out that they’ve made a basic chart and have run a statistical test. Rarely does their visualization enable their data analysis (e.g., bar charts for correlation data). Many times, the statistical tests don’t match their data. From my perspective, visualization is part-and-parcel of data analysis, not merely presentation of data analysis.

These are great comments, and most fall directly in line with Edward Tufte’s work (www.edwardtufte.com), who has produced probably the best introductory material about visualization, as it talks not just aesthetics but the philosophy of visualization as a moral act. Also, let’s not forget that it’s not just Javascript that you used for that example, but Protovis + javascript.

Personally, here’s my take away “But visualization is less about the individual values and more about the distribution of them over time and space. You’re looking for (or showing) patterns. You’re comparing and contrasting.”

Two comments on the 4th point:
* (Intentionally) misleading visualizitions are the result of the fact that ther are “lies, damn lies and *politics*”
* It’s harder too lie with a chart than without it. It at least takes some thought, skill and actually working with related data.