Data Visualization: Comparing multiple measures on the same plot

We’ve talked a lot about the relationship between two variables these past few weeks. Last week we talked about correlation and this week we’ve spent a few days on the calculations and plotting of linear regression results. In order to visualize the relationship between two variables that are measured on different scales, you’ll frequently see a technique called a dual-scaled axes chart.

A Dual-scaled axes chart is when two sets of data, each with their own scales, are plotted on the same chart in order to show their relationship. For example, the chart below has time (measured in months) on the x-axis with total monthly revenue as the green line, (scaled on the left y-axis) and total number of customers as the blue line, (scaled on the right y-axis).

Even though dual-scaled axes charts are commonly used, I can’t think of any use cases where I think they are appropriate. Dual-scaled axes charts are problematic because the choice of the respective scaling on the y-axes can manipulate the reader. For instance, the chart designer decides how to overlay the two lines that are being presented by adjusting the scale of each y-axis. Additionally, because the scaling is arbitrary, with enough zoom or focus, a series of data points can be made to look very jagged or smooth, potentially misleading the reader to think that they have the same shape as another data set, regardless of what the data says. Furthermore, a strong visual cue for line charts is the point of intersection. However, with dual-axes charts, intersection points are entirely meaningless as they are created at the discretion of the chart designer, not by the data itself.

This article by Stephen Few, a data visualization expert, is a good read that goes more in-depth on the topic.

What do I do?

To address the limitations of dual axes plots, I recommend plotting the data on single normalized scale in percentage terms¹. To do this, you choose a reference point in time from which all other data is compared and then compute the percentage change for all data points in each data set relative to this reference value². For example, I’ll use January as the reference month with the data I plotted above. The number of customers in January and February were 507 and 494, respectively. The percentage change of January compared to itself is of course zero, but the percentage change from January to February is (494 – 507) / 507, or -3%. I calculate the rest of the months in reference to January’s value of 507 and would do a similar process for the revenue data, but the calculations are made relative to its January value ($26,348). It doesn’t matter which month you use as your reference point, as long as you apply the same reference month in each data set (since you are plotting in percentages).

Questions? Send any questions on data analytics or pricing strategy to doug@outlier.ai and I’ll answer them in future issues!

[1] Few also offer the suggestion of aligning individual plots closely to each other. I don’t prefer this option because you are still susceptible to some of the scaling and comparison issues with a single dual-scaled chart.

[2] Few’s version of this plot just divides each number by the reference value making the initial value 100%. I prefer to calculate the percentage change from the reference value, making the initial value 0%.