On Whether Y-axis Labels Are Always Necessary

The infamous @albertocairo [blogged about](http://www.thefunctionalart.com/2016/06/propublica-visualizes-seasonality-in.html) a [nice interactive piece on German company tax avoidance](https://projects.propublica.org/graphics/dividend) by @ProPublica. Here’s a snapshot of their interactive chart:

>_Isn’t it weird that the chart doesn’t have a scale on the Y-axis? It’s not the first time I see this, and it makes me feel uneasy._

I jumped over to the interactive piece to see if the authors used interactive tooltips since viewers can get a good idea for the scale limits if they do that and it _kinda sorta_ makes not having Y-axis label mostly OK if they compensate with said interactive notations. The interactive had no tooltips and the Y-axis was completely unlabeled.

Now, they used D3, so there _are_ built-in ways to create and add a Y-axis, so I don’t think this was an “oops…we forgot” moment. The Y values are “Short Interest Quantity” which is the quantity of stock shares that investors have sold short but not yet covered or closed out. It’s definitely a “1%-er” term and the authors already took time to explain some technical financial details and probably would have had to add even more text to explain this term properly (since that short definition is really not enough for most of us 99%-ers). It seems that they felt the the arrowed-annotations on the right hand side of the plot made up for the lack of actual Y-axis detail.

Should we _always_ have labels on a given axis? Would knowing that the Y-axis on this chart went from 0 to 800 million have aided in the decoding or groking the overall message? Here’s another example to help frame that question. This is the seminal `ggplot2::geom_density()` demo chart:

Given that folks outside the realm of statistics/datasci really don’t grok what that Y-axis is saying, would it be _horribad_ to just leave it with a _”density”_ Y-label (sans unit marks) and then explain it in text (or talk to/around it in text but not go into detail)? Or should we keep the full annotations and spend a precious paragraph of text talking about measuring the area under a curve? (Another argument is to choose the right vis for the right audience but that’s another post entirely).

To further illustrate the posit, I recently made a series of what I call a “rank ordered segment plot” for a [report](https://information.rapid7.com/rs/495-KNT-277/images/rapid7-research-report-national-exposure-index-060716.pdf) that we did at @Rapid7:

There are text annotations for countries at either end of the spectrum on the X-axis but they aren’t individually labeled cuz…ewwww that’d be messy. The interactive version (coming this week over at `community.rapid7.com`) has the full table and light hover popup-annotations. But the point wasn’t to really focus on the countries as it was to depict the sad state of the ratio of unencrypted vs encrypted for a given service type within a country.

So, _should_ the ProPublica authors have tried to be more discrete w/r/t their Y-axis or is it fine the way it is? Does there _always_ need to be discrete axes annotations or is there some wiggle room? Opines are welcome in the comments since I honestly don’t think there is “one answer to rule them all” for this.

And for those that really want to see more discrete info on the ProPublica Y-axis labels, here’s a static, faceted chart (you may need to click/select/tap the chart to make it big enough to view):

### Don’tTry This At Home!

ProPublica made that data available via two CSV files and the crosswalk org translation table via their main D3 javascript file (use Developer Tools “Inspect Element” to see such things). I ended up having to use `Sys.setlocale(‘LC_ALL’,’C’)` and expand the translation table a bit due to some of the mixed encodings in the data sets. Code to make the chart is below.

NEXT POST

6 Comments →On Whether Y-axis Labels Are Always Necessary

It’s not entirely uncommon in theory physics to have “Quantity [arbitrary units]” on a y-axis, particularly when plotting a ratio or when the plotted thing has been transformed to within an inch of its life (common) to illustrate some feature (third derivative, etc…). It’s critical that the axis itself is labelled qualitatively (I don’t count the title as acceptable via inference) but sometimes the quantitative feature is apparent without an absolute scale, e.g. peak at x, relative heights (still maintaining relative scale to some baseline), relative slopes. As usual, the answer is probably “it depends.”

As silly as this sounds: I don’t demand labels from those I trust, I trust very few, therefore I always demand axis labels. But outside of the scientific world I think that data are used to tell a story, which should have less stringent rules, so I’m more lenient on it, especially if it’s telling. The enormous annual temporal deviations in those graphs need not to be scrutinized by my eyes if their point is that that pattern exists. And as an aside, back into the scientific world, and of special circumstance, I recently gave a collaborator a graph with neither x nor y axes, that was a 4×4 panel illustrating the change of some parameters of an equation. I thought that the figure didn’t need them because it captured everything within qualitative range of behaviours. Despite that, we decided that they should be labeled.