Software development and daily life in Seattle

Menu

Visualizing time-series data

I’ve been fiddling around a bit with visualizing time-series data. A record of events, usually in a horrifically long list, that I would like to filter or otherwise aggregate to get a sense of how things are working. Most often, these are a series of events with a timestamp, sometimes with a duration, sometimes with just a working value.

I’ve done quite a bit, and pushed the periodic limits, with RRDtool. That’s a great tool (and used in a large number of projects) for data with values attached, where you’d like to see how the values are flowing. Amount of resources used over time, number of people coming to a web site, etc.

Where it doesn’t do so well is points of specific events with tags and labels, maybe a duration. Say, for example, you’d like to visualize when a service is unavailable on a larger timeline – like over several months. You *can* do it with RRDtool, but it can get a tad unwieldy. I’ve done that myself, setting any times with a “not operational” value to 1, everything else to 0 – gives you a pulse looking sort of plot. It also requires some interesting/awkward processing if you’re starting with just an event list.

Most of the time series visualizations are in the forms of graphs. Many Eyes, for example, has three widgets that all work with time/value data – line graphs, stack graphs, and a variation of the stack with categories. (Many Eyes is an exceptionally cool site). That said, I recently re-found a visualization technique that is far more appropriate for events on a timeline: Simile‘s Timeline project.

The only real trouble I had was getting the dates into the right format for the XML data source that this thing wanted. My data source was, er, tricky – in that I was screen scraping some pages to get what I wanted (thank you Beautiful Soup!), and then converting the dates into something that Timeline would eat was a touch finicky.

Given the nature of the data, I can’t really display it here – but I was very pleased with the result as a thought experiment of looking at time series data a little differently. The screen scraping took a few hours to put together, and the simile timeline display component took another 2-3 hours. When all was said and done, I had a few files (python scripts, output, etc) that grabbed the data from the web and shoved it up into a display. 5 hours for a thought experiment – very worth while.

One of the interesting side effects was that once I could scroll about the dates so easily, I wanted more data. I often feel like I’m drowning in details, but in this case it compacted the information so much that I felt like I was missing information because I could only see a month at a time. My initial cut was very simple – some enhancements I can immediately see using are the use of color to differentiate some values of the events, taking advantage of the hotspots function to highlight time selections, the use of different images for the point display of the events, and the use of more bands – segregating out events by category into different bands. While I whipped up the first example in 5 or so hours, I could easily see spending another 20 or 30 adding in subtlety and depth to the display.

If you’re looking for a timeline display of events, I recommend checking out Timeline.