st: Axis rules made to be broken

Apart from the joy of writing that subject line, I feel able to comment
because I have looked at the literature, come to a wider conclusion, and
for some years have taught a module in data graphics.
Let me also quote Constantine Daskalakis ([CD] statalist 14 May): "First,
I don't want to plot on the log scale. Why would I? Second, I don't want
to waste three quarters of my graph area by using the full scale of the Y
axis (i.e., 0-100%), when all my measurements are around 85-95%." I'm not
making any personal criticism, but this is the type of language that
implies graphic design is about style, whim or individual preference. If
graphics are to be genuine tools of communication, we need to adopt and
understand a common language. I reserve my most scathing comments for the
software adverts that boast "use our program and with a few clicks within
minutes you will produce impressive professional graphs." Imagine anyone
making such a claim for a word processor!
Axes are misunderstood, I think, because we usually encounter them in
school as the skeleton on which data values are then measured and plotted.
When software is doing the plotting, this function disappears and the axis
must become an informative part of the plot - or why is it needed at all?
My second point is even more obvious: a graph is used to portray either a
magnitude or a relationship. A bar graph representing 500 units must have
a bar whose length is proportional to 500, or what is the point? A
moment's thought should convince you that a bar graph actually conveys
information only if there is more than one bar, so that the relative
lengths convey the relative sizes of the data. Anyone who doubts this can
find a counter-example each Saturday in the (London) Daily Telegraph
Review supplement (I can't find this on their web site). It is a small
feature that, I assume, is a post-modern spoof: "How tall is ...?" each
week gives the height of a film character accompanied by a picture drawn
against a scale - a pictogram of one datum.
My third point is an explanation of the subject line: if graphics are to
be a fertile and useful means of communication, there must be underlying
rules BUT designers will bend or break the rules for creative effect. This
is what we do with words.
I too have seen books that state explicitly "any axis must include zero"
(eg, Schmid, 1983 Statistical Graphics), but then distort the meaning by
allowing broken axes with a zig-zag. Rules learned by rote are
misleading. I require students to identify their design decisions and
give reasons for their choices. If a choice has been reasoned and
justified, then it is "correct", even if I might have made a different
choice.
An axis should contain zero *if the statement being made is one of
absolute magnitude.* I have seen bar charts in The Economist (who should
know better) in which the axis, and hence the bars, start at some
arbitrary non-zero value. If they convey any information, the bars convey
a lie. The artist drew such a graph (and the editor accepted it) because
bars drawn in true ratio would have not shown any visible variation. The
intended message was to show year on year change - so the choice of a bar
chart was inappropriate and no tinkering with the axis range could correct
this. I wonder also what readers gain in practice from bar charts where
one category absolutely dominates, and a convention is to draw all other
bars to scale but put a break (and broken axis) for the largest. The
visual message has, at best, been diluted.
A graph that focuses on *change* need not include the origin of the
absolute figures, because change implies "change from what?" and it is
this reference point that becomes the de facto centre of attention. If
bars are drawn, they should be from the reference point, but a line-plot
may be more effective in emphasizing the direction of change.
The choice of the axis range can be determined from the range in the data,
pragmatically, or by an algorithm quoted by Cleveland "Banking to 45
degrees". What is helpful (essential?) is for the designer to have a
clear articulation of what they are trying to describe: "this x goes up",
"this x goes up at an accelerating rate", "this x is going up faster than
that y" etc. (Read Tufte for discussion of which values to label along the
axis.)
CD gives an example of plotting percentages which is particularly
informative. In many situations the interest lies not on x but on 100-x.
"Use of PCs has gone up from 70 to 80%" is equivalent to "one third of
those who did NOT use PCs now do so".
To relate this to the original question, I have illustrated above that a
broken axis is, at best, a compromise introduced in the final stages of
presentation, that will never make a graph more informative. Stata has,
up to version 7, concentrated on graphics as an analytical tool. It
exchewed the "gee whizz" emphasis on style characteristic of "presentation
graphics software". By comparison, Stata output therefore lacked the
impact for audiences more impressed by style than substance. Version 8
offers various instant styles that remove the "sackcloth and ashes"
asceticism. Adding a variety of gimmicks (and controls) that invite
distortion rather than clarification seems to me a dangerous step. On the
other hand, I still regret the failure to maintain Stage (the Stata
Graphics Editor) which was just such a tool for making user-chosen
changes to a basic graph.
R. Allan Reese Email: r.a.reese@gri.hull.ac.uk
Associate Manager GRI Direct voice: +44 1482 466845
Graduate School Voice messages: +44 1482 466844
Hull University, Hull HU6 7RX, UK. Fax: +44 1482 466436
====================================================================
The management here were SO impressed with W Edward Deming's "Out of
the crisis" that they are working flat out to create THE BEST crisis;
then they can start implementing the 13 obligations.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/