Oh man oh man, we’re at the half way point of the class! Can you believe it? Yup, it’s Week 6, and this week we’re going to talk about data visualization. Data visualization is an interesting topic because good data with no visualization can be pretty inaccessible, but a misleading visualization can render good data totally irrelevant. Quite the conundrum. [Update: a sentence that was originally here has been removed. See bottom of the post for the original sentence and the explanation] It’s easy to think of graphics as “decorations” for the main story, but as we saw last week with the “age at death graph”, sometimes those decorations get far more views than the story itself.

Much like last week, there’s a lot of ground to cover here, so I’ve put together a few highlights:

Edward Tufte The first reading is the (unfortunately not publicly available) Visual Display of Quantitative Information by the godfather of all data viz Edward Tufte. Since I actually own this book I went and took a look at the chapter, and was struck by how much of his criticism was really a complaint about the same sort of “unclarifiable unclarity” we discussed in Week 1 and 2. Bad charts can arise because of ignorance of course, but frequently they exist for the same reason verbal or written bullshit does. Sometimes people don’t care how they’re presenting data as long as it makes their point, and sometimes they don’t care how confusing it is as long as they look impressive. Visual bullshit, if you will. Anything from Tufte is always worth a read, and this book is no exception.

Next up are the “Tools and Tricks” readings which are (thankfully) quite publicly available. These cover a lot of good ground themselves, so I suggest you read them.

Misleading axes The first reading goes through the infamous but still-surprisingly-commonly-used case of the chopped y-axis. Bergstrom and West put forth a very straightforward rule that I’d encourage the FCC to make standard in broadcasting: bar charts should have a y-axis that starts at zero, line charts don’t have to. Their reasoning is simple: bar charts are designed to show magnitude, line charts are designed to show variation, therefore they should have different requirements. A chart designed to show magnitude needs to show the whole picture, whereas one designed to show variation can just show variation. There’s probably a bit of room to quibble about this in certain circumstances, but most of the time I’d let this bar chart be your guide:

They give several examples of charts, sometimes published or endorsed by fairly official sources screwing this up, just to show us that no one’s immune. While the y-axis gets most of the attention, it’s worth noting the x-axis should be double check too. After all, even the CDC has been known to screw that up. Also covered are the problems with multiple y-axes, which can give impressions about correlations that aren’t there or have been scaled-for-drama. Finally, they cover what happens when people invert axes and just confuse everybody.

Proportional Ink The next tool and trick reading comes with a focus on “proportional ink” and is similar to the “make sure your bar chart axis includes zero” rule the first reading covered. The proportional ink rule is taken from the Tufte book and it says: “The representation of numbers, as physically measured on the surface of the graphic itself, should be directly proportional to the numerical quantities represented”.

[Added for clarity: While Tufte’s rule here can refer to all sorts of design choices, the proportional ink rule hones in on just one aspect: the shaded area of the graph.] This rule is pretty handy because it gives some credence to the assertion made in the misleading axes case study: bar charts need to start at zero, line charts don’t. The idea is that since bar charts are filled in, not starting them at zero violates the proportional ink rule and is misleading visually. To show they are fair about this, the case study also asserts that if you fill in the space under a line graph you should be starting at zero. It’s all about the ink.

Next, we dive in to the land of bubble charts, and then things get really murky. One interesting problem they highlight is that in this case following the proportional ink rule can actually lead to some visual confusion, as people are pretty terrible at comparing the sizes of circles. Additionally, there are two different ways to scale circles: area and radius. Area is probably the fairer one, but there’s no governing body enforcing one way or the other. Basically, if you see a graph using circles, make sure you read it carefully. This goes double for doughnut charts. New rule of thumb: if your average person can’t remember how to calculate the area of a shape, any graph made with said shape will probably be hard to interpret. Highly suspect shapes include:

Circles

Anything 3-D

Pie charts (yeah, circles with angles)

Anything that’s a little too clever

To that last point, they also cover some of the more dense infographics that have started popping up in recent years, and how carefully you must read what they are actually saying in order to judge them accurately. While I generally applaud designers who take on large data sets and try to make them accessible, sometimes the results are harder to wade through than a table might have been. My dislike for infographics is pretty well documented, so I feel compelled to remind everyone of this one from Think Brilliant:

Lots of good stuff here, and every high school math class would be better off if they taught a little bit more of this right from the start. Getting good numbers is one thing, but if they’re presented in a deceptive or difficult to interpret way, people can still be left with the wrong impression.

Three things I would add:

Track down the source if possible One of the weird side effects of social media is that pictures are much easier to share now, and very easy to detach from their originators. As we saw last week with the “age at death” graph, sometimes graphs are created to accompany nuanced discussions and then the graph gets separated from the text and all context is lost. One of the first posts I ever had go somewhat viral had a graph in it, and man did that thing travel. At some point people stopped linking to my original article and started reporting that the graph was from published research. Argh! It was something I threw together in 20 minutes one morning! It even had axis/scale problems that I pointed out in the post and asked for more feedback! I gave people the links to the raw data! I’ve been kind of touchy about this ever since….and I DEFINITELY watermark all my graphs now. Anyway, my personal irritation aside, this happens to others as well. In my birthday post last year I linked to a post by Matt Stiles who had put together what he thought was a fun visual (now updated) of the most common birthdays. It went viral and quite a few people misinterpreted it, so he had to put up multiple amendments. The point is it’s a good idea find the original post for any graph you find, as frequently the authors do try to give context to their choices and may provide other helpful information.

Beware misleading non-graph pictures too I talk about this more in this post, but it’s worth noting that pictures that are there just to “help the narrative” can skew perception as well. For example, one study showed that news stories that carry headlines like “MAN MURDERS NEIGHBOR” while showing a picture of the victim cause people to feel less sympathy for the victim than headlines that say “LOCAL MAN MURDERED”. It seems subconsciously people match the picture to the headline, even if the text is clear that the picture isn’t of the murderer. My favorite example (and the one that the high school students I talk to always love) is when the news broke that only .2% of Tennessee welfare applicants tested under a mandatory drug testing program tested positive for drug use. Quite a few news outlets published stories talking about how low the positive rate was, and most of them illustrated the story with a picture of a urine sample or blood vial. The problem? The .2% positive rate came from a written drug test. The courts in Tennessee had ruled that taking blood or urine would violate the civil rights of welfare applicants, and since lawmakers wouldn’t repeal the law, they had to test them somehow. More on that here. I will guarantee you NO ONE walked away from those articles realizing what kind of drug testing was actually being referenced.

A daily dose of bad charts is good for you Okay, I have no evidence for that statement, I just like looking at bad charts. Junk Charts by Kaiser Fung and the WTF VIZ tumblr and Twitter feed are pretty great.

Okay, that’s all for Week 6! We’re headed in to the home stretch now, hang in there kids.

2 thoughts on “Calling BS Read-Along Week 6: Data Visualization”

I hadn’t made the connection until this reading that news stories which rely heavily on anecdotes are misleading in a similar way to a news story with lotsa pictures and infographics. Evening NPR can make me crazy with their translated interviews with a bicycle-shop owner in Thailand and whatever joys/difficulties/memories he has, with no indication whether he is at all representative of other Thais. You can make anything sound true. It’s analogous to not starting from the zero point on your graph.