Stephen E. Arnold: Using Real Data to Mislead

Viewers of graphs, beware! Data visualization has been around for a very long time, but it has become ubiquitous since the onset of Big Data. Now, the Heap Data Blog warns us to pay closer attention in, “How to Lie with Data Visualization.” Illustrating his explanation with clear examples, writer Ravi Parikh outlines three common ways a graphic can be manipulated to present a picture that actually contradicts the data used to build it. The first is the truncated Y-axis. Parikh writes:

“One of the easiest ways to misrepresent your data is by messing with the y-axis of a bar graph, line graph, or scatter plot. In most cases, the y-axis ranges from 0 to a maximum value that encompasses the range of the data. However, sometimes we change the range to better highlight the differences. Taken to an extreme, this technique can make differences in data seem much larger than they are.”

The example here presents two charts on rising interest rates. On the first, the Y-axis ranges from 3.140% to 3.154% — a narrow range that makes the rise from 2008 to 2012 look quite dramatic. However, on the next chart the rise seems nigh non-existent; this one presents a more relevant span of 0.00% to 3.50% on the Y-axis.

Another method of misrepresentation is to present numbers, particularly revenue, cumulatively instead of from year-to-year or quarter-to-quarter. Parikh notes that Apple’s iPhone sales graph from last September is a prominent example of this tactic.

Finally, one can mislead one’s audience by violating conventions. The real-world example here presents a pie chart in which the slices add up to 193%. The network that created it had to know that cursory viewers would pay more attention to the bright colors than to the numbers. The write-up observes:

“The three slices of the pie don’t add up to 100%. The survey presumably allowed for multiple responses, in which case a bar chart would be more appropriate. Instead, we get the impression that each of the three candidates have about a third of the support, which isn’t the case.”

See the article for more examples, but the upshot is clear. Parikh concludes:

“Be careful when designing visualizations, and be extra careful when interpreting graphs created by others. We’ve covered three common techniques, but it’s just the surface of how people use data visualization to mislead.”