(1) Wed Jul 26 2006 16:50Best-Loved Ruby Cookbook Recipes of the American People #2: "Generating graphs with Gruff":
After a long wait the Ruby
Cookbook is now available almost
everywhere, even on BookFinder. I urge you to purchase what I hope is the strangest O'Reilly book ever published (suggestions for
competitors welcome; the only one I can think of is Stephen
Feuerstein's Oracle PL/SQL Programming). In addition to
hundreds of tired and frankly predictable geek in-jokes (Star Trek,
Rogue, Cryptonomicon, the GNU Virtual
Fridge, ad nauseum), it features talking frogs, coffinfish,
two-faced politicians, corpses in freezers, dispute resolution through
ritual combat, Orwellian doublecode, a heart-pounding Novel on Rails,
and Dr. Bronner's Peppermint Soap. Special guest star: T-Rex.

It also features graphs! Recipe 12.4, "Graphing Data", introduces
Geoffery Grosenbach's Gruff library, by amazing chance also the subject of today's
promotional tutorial. Gruff makes it
easy to turn data structures into graphs and write them to PNG
files. So the old clock on the wall says it's time for part two of the Book Sales Trilogy:

The hardest part of Gruff is installing the dang
ImageMagick or RMagick libraries and their dependencies in the first
place. It's easy on Debian and other systems with a good packaging
system, but otherwise it can be a real pain. The second hardest part
is working around Gruff when its simplifying assumptions don't apply
to you. I glossed over these in the book but I'll tackle the second
one a little bit in this tutorial.

Anyway, yesterday I showed you code to take
periodic readings of books' Amazon sales rank. And then I showed it to
you again today because the code I wrote yesterday was crap. So read
that entry even if you read it earlier. The new code makes the rest of
the trilogy much easier to present.

You'll recall (from like ten seconds ago when you read it) that we
have a SalesReport class that encapsulates sales rank
information from a book. Yesterday, though, I didn't show you anything interesting to do with this information. But the night is ours! Tonight, we graph!

Let's open up the SalesReport class again and make a
sales report capable of expressing itself as a line graph:

This is mostly self-explanatory setup code. In the book I claim
that most of the Gruff themes are ugly, but that theme_37signals is
okay. Well, that's just, like, my opinion, man, but
incontrovertible fact is -- and I should have mentioned this in the book -- that theme_37signals's idea of a good time is
to graph the first dataset with a yellow line on what's basically a white background.

That's a really bad idea. I believe there's a UI maxim to the effect of: "Some other color
than yellow on white, graph-reader's delight. White under yellow,
dangerously confuse a fellow." So I go with theme_37signals but tell
Gruff to draw the data line in black: the original high-contrast color
for white backgrounds.

The other thing I do is hide the legend, because this graph is only
for one product. I would like to graph sales for all of my books on a
single graph, but I haven't figured out a good way to do it yet. One
of Gruff's simplifying assumptions is that all your data points are
spaced evenly along the X-axis starting at X=0. I'd have to insert a bunch of bogus data points for books that came out later; worse, most of my timestamps don't line up precisely, so I'd have to write code to group multiple times into a single data point. So right now I just do one book per graph.

Now we've got one line of code that's very important, because it's
where I decide how the data will be represented.

g.data(@name, collect { |date, rank| 1/rank })

The data method takes an array, and adds it to the
graph as a data set. SalesReport is an array of dates and
ranks, so I could just pass in the ranks, but that would yield a graph like this:

This is a lousy graph. Unimportant details (long stretches early on
where no one bought the book) are the most obvious features, and you
can't even see the release of the book. But the data isn't useless; it's just not presented well. We're accustomed to seeing charts go up when the numbers go up (see: any TV commercial featuring a chart), but a good sales rank is very small. Also, as all Web 2.0 types know, book sales follow a power law distribution. A book at
400K sells one copy and jumps to 200K, but you have to sell a mess of
books to go from #100 to #90. Displaying the sales rank as though it were linear distorts the data.

I don't know the exact distribution for book sales, but simply
taking the inverse of the sales rank gives the graph the right
shape. In this graph, the release of the book is obvious, and the time
leading up to it makes sense:

What about those labels on the X-axis? Where do they come from? They come
from this code:

The graph's labels are a hash that maps positions on
the X-axis to strings. Remember, the positions on the X-axis are the
indices to the array(s) you passed into the data
method. You don't get to choose these values. The X axis starts at
zero, and ends at the maximum index of the largest array you passed
into data. I choose three labels: one at the beginning,
one at the end, and one halfway between. Here's a graph with a lot
more history than the Ruby Cookbook one:

Finally, having created the graph, we write it to disk:

g.write(File.join(graph_path, "#{asin}-salesrank.png"))
end
end

Well, not quite finally. I sneakily referenced a class called
SalesRankGraph a while back, and never defined it. That
class derives from Gruff::Line, but if you do this graph
with a Gruff::Line it'll have weird numbers on the
Y-axis:

Those labels are just what you'd think: they're the numbers being
graphed. This mighty graph stretches from about zero to about
0.001. Of course, the graph is "really" measuring the inverses of
those numbers, but there's no way to put that in the labels. It's
another of Gruff's simplifying assumptions. You can choose your X-axis
labels but not your X-axis points; you can choose your Y-axis points
but not your Y-axis labels. I couldn't find an easy way to fix this,
so I just hacked the draw_line_markers to not draw
them. You can shut off both the X- and Y-axis labels by setting
hide_line_markers, but I like the X-axis labels.

class SalesRankGraph < Gruff::Line
def draw_line_markers
end
end

So now I've got some pretty nice-looking graphs to track my sales
rank. But I don't have time to look at graphs! I should have spent
today working on my new project, but instead I wasted the morning
fixing problems with the preivous entry in this series, and then spent
the afternoon making pizza sauce and writing this entry! What to do?
If only there were some post-literate infographic that would convey
sales rank information at a glance! Something like the graphics I
laboriously put on the crummy.com homepage this afternoon! Stay tuned
for tomorrow's episode, The Spark of Line!