Subscribe by email

How many ways are there of look­ing at series of data? Consider this rain­fall data:

We have rain­fall for every dis­trict in Tamil Nadu for every month over the last 5 years. That’s 60 data points per dis­trict. How many ways are there of plot­ting it?

In this post, we’ll look at 10 ways you can rep­res­ent a sim­ple series – in a straight line.

Data Bars

These are a quick way of plot­ting bar graphs with­in the cells. The eye is nat­ur­ally drawn to num­bers with large val­ues. It’s an easy way of loc­at­ing big num­bers, and in par­tic­u­lar, to com­pare data across series. But it isn’t very easy to find trends with­in a series.

Colour scales

These shade each cell with a col­our gradi­ent. Red for low, green for high. While they’re much worse at ex­act com­par­is­ons, they’re much bet­ter at help­ing identi­fy trends – both with­in a series and across.

Heatmap

The col­our scales can be shrunk without much loss of in­form­a­tion if we’re more in­ter­ested in the trend than in the num­bers.

This heat­map is a com­pact way of com­par­ing in­form­a­tion over time, and across dis­tricts. Reading left-to-right, the pat­terns of growth, de­cline or sea­son­al­ity can be ob­served. Reading top-to-bottom, patches of high or low that cut across data series be­come evid­ent.

This is a sim­pli­fied one-dimensional ver­sion of the tra­di­tion­al heat­map which typ­ic­ally shows data in two di­men­sions.

Bar chart

If be­ing able to com­pare quant­it­ies with­in a series be­comes im­port­ant, one can use bar charts in­stead.

The bar chart shown here is a vari­ant of the tra­di­tion­al bar chart. It does away with the ho­ri­zont­al and ver­tic­al axes, as well as the la­bels, and just shows the bars.

This is an ex­ample of a micro-chart, the most clas­sic ex­ample of which is the spark­line. Microsoft has in­tro­duced a num­ber of these micro-charts in Excel 2010. This is one of the sig­ni­fic­ant up­grades in Excel 2010’s chart­ing cap­ab­il­it­ies.

Sparkline

Sparklines are among the earli­est mi­crocharts, ini­tially cre­ated by Edward Tufte. They are the equi­val­ent of line graphs, but without the la­bels and axes.

These make it very easy to com­pare trends with­in a series. However, com­par­ing across series may not be easy. In fact, it would not be pos­sible at all un­less the spark­lines are drawn to scale.

Trendline

A trend­line over­lays spark­lines with a trend. This may be a mov­ing av­er­age, a best-fit line (e.g. lin­ear re­gres­sion), etc.

The high vari­ab­il­ity of spark­lines can be smoothened out through the trend­lines, mak­ing it slightly easi­er to spot long-term trends.

This is par­tic­u­larly use­ful when the data shows multi-seasonal pat­ters (e.g. a weekly as well as a monthly pat­tern), and we want to bring out both ef­fects in the same chart.

Streamgraph

A stream­graph is identic­al to a spark­line, ex­cept that in­stead of the height rep­res­ent­ing the value, it is the width of the graph that rep­res­ents the value.

These are also re­ferred to as stacked graphs. They are par­tic­u­larly ef­fect­ive when visu­al­ising mul­tiple series one on top of an­other. See Lee Byron’s Last.fm listen­ing his­tory for an ex­ample of ef­fect­ive use of this graph.

These are most ef­fect­ive in identi­fy­ing which series is dom­in­ant at a given point in time, and how the series grows or dies around that point.

Horizon graph

The ho­ri­zon graph ex­pands the res­ol­u­tion of spark­lines. First, it uses an ab­so­lute scale, dif­fer­en­ti­at­ing between pos­it­ives and neg­at­ives. Negatives are col­oured red, and pos­it­ives are col­oured green. These are then fol­ded.

The chart is then fol­ded re­peatedly, and uses col­our in­tens­ity in con­junc­tion with height to show the value. Panopticon, who cre­ated Horizon Graphs, have a good in­tro­duc­tion to the use and con­struc­tion of these graphs.

Like heat­maps, these are use­ful in spot­ting ho­ri­zont­al and ver­tic­al trends, but us­ing an ab­so­lute rather than a re­l­at­ive scale.

Jitter plot

Jitter plots are use­ful ways of visu­al­ising the dens­ity and fre­quency of a data series. They plot the val­ues ho­ri­zont­ally, rather than ver­tic­ally. That is, the x-axis is the value rather than the y-axis. The y-axis just spreads the points around ran­domly to min­im­ise the over­lap.

This is use­ful in com­par­ing fre­quency data. For ex­ample, here, it is clear that no rain­fall is the most fre­quent state. It can also been seen that Cuddalore typ­ic­ally has many months with little rain­fall.

When the data dens­ity be­comes too high, how­ever, jit­ter plots are not as ef­fect­ive.

Box plot

In such cases, box-plots make for a bet­ter dis­play. Invented by John Tukey in 1977, these sum­mar­ise a data series us­ing just five num­bers: the min­im­um, the lower quart­ile, the me­di­an, the up­per quart­ile and the max­im­um.

The box rep­res­ents the area where 50% of the ob­ser­va­tions lie. The ho­ri­zont­al line rep­res­ents the full range of val­ues in the series. The ver­tic­al line is the me­di­an. Half the val­ues lie to the left, and half to the right.

While this plot ap­pears simplist­ic, it of­ten is much more ro­bust (i.e. safe to use for a wide vari­ety of data­sets).

Like this:

We of­ten won­der what songs would look like. Here’s our take on what Bobby McFerrin’s Don’t Worry Be Happy looks like.

This pic­ture is a spec­tro­gram of the song. It starts at the 12 o’clock po­s­i­tion, and moves clock­wise, end­ing at about 4:00 minutes. The in­tens­ity of col­our in­dic­ates the volume at dif­fer­ent fre­quen­cies – blue for high volume, red for me­di­um, yel­low for low and white for zero. The out­er ra­di­us rep­res­ents the lower fre­quen­cies and the in­ner ra­di­us the higher fre­quen­cies.

This sort of pic­ture al­most gives you a “fin­ger­print” of the song, and a feel for the kinds of ups-and-downs. For ex­ample, if you look at Bryan Adam’s Everything I Do, you can clearly see the light be­gin­ning, the some­what stronger middle; then a pause be­fore the 3:00 mark, strong again, and then fad­ing out.

For your amuse­ment, here are what a few more songs would look like – a mix of Bollywood, old and new.

Like this:

If you were won­der­ing how the se­cur­it­ies in the world move again­st each oth­er, the pic­ture be­low is the an­swer.

This pic­ture shows the cor­rel­a­tion vari­ous cur­ren­cies, in­dices and com­mod­ity prices by link­ing to­geth­er three power­ful types of visu­al­isa­tions.

The first is a col­oured cor­rel­a­tion mat­rix. In this pic­ture, we have three se­cur­it­ies: the British Pound (GBP), Gold Price (XAU) and the Dow Jones Index (^DJI). The price of GBP and Gold tend to move slightly to­geth­er, and have a cor­rel­a­tion of 0.36 (36%). So the cell that’s between GBP and XAU is marked with 36. Similarly, the cell that’s between XAU and ^DJI has a –64 be­cause Gold and the Dow Jones in­dex are slightly neg­at­ively cor­rel­ated.

The col­our cod­ing is based on the cor­rel­a­tion. Red is -1, Green is +1 and Yellow is 0.

The second is a scat­ter­plot mat­rix. The cells that mir­ror the cor­rel­a­tions have a series of dots. Each dot rep­res­ents the price on a par­tic­u­lar day.

For ex­ample, com­pare Gold and the Dow Jones. It isn’t a straight-forward neg­at­ive cor­rel­a­tion. In fact, it al­most looks like there were two peri­ods: one in which gold was high when the Dow Jones was low, and vice ver­sa. But with­in those peri­ods, there ap­pears to have been a mild pos­it­ive cor­rel­a­tion.

The third is the hier­arch­ic­al cluster. The se­cur­it­ies is grouped in­to sim­il­ar ones based on their cor­rel­a­tion. For ex­ample, GBP and Silver (XAG) and reas­on­ably close to each oth­er, and form one group. This group is most closely re­lated to the Euro (EUR), and the three of them are closest to the Australian Dollar.

Arranging the se­cur­it­ies by the hier­archy makes it easy to spot groups of se­cur­it­ies that tend to move to­geth­er.

For ex­ample, in the ori­gin­al visu­al­isa­tion, there ap­pear to be a set of lo­gic­al blocks

At the centre, four se­cur­it­ies – the Pakistani Rupee (PKR), the Sensex (^BSES), the FTSE (^FTSE) and the S&P (^GSPC) – tend to move to­geth­er, with each oth­er; but move in the op­pos­ite dir­ec­tion to the next group of se­cur­it­ies – the Singapore Dollar (SGD), the Japanese Yen (JPY), Gold (XAU), the Swiss Franc (CHF) and the Chinese Yuan (CNY).

Similarly, the Swedish Krona, Canadian Dollar, Indian Rupee, Hong Kong Dollar and Mexican Peso form yet an­other group that moves to­geth­er, but in the op­pos­ite dir­ec­tion from the strong Asian cur­ren­cies in the block above.

We at Gramener have named this visu­al­isa­tion a cluster­plot. It’s a power­ful tech­nique when ap­plied to time series of mul­tiple (typ­ic­ally 5 – 50) vari­ables.

Here are some cases you might con­sider us­ing them:

Group your products based on con­sumer be­ha­vi­our. Which products tend to sell to­geth­er? Which ones can­ni­bal­ise the sale of the oth­er? Is there a way of ra­tion­al­ising the pro­duct base to re­duce com­plex­ity – without los­ing cus­tom­ers?

Group re­tail­ers based on sales. Which re­tail­ers tend to can­ni­bal­ise the sales across each oth­er? Which ones com­ple­ment each oth­er? Where would you need to ra­tion­al­ise to avoid du­plic­a­tion or over­lap?

Analyse pro­cess qual­ity drivers. For ex­ample, if tem­per­at­ure, pres­sure and sa­lin­ity af­fect your pro­duct qual­ity, what im­pact will in­creas­ing one para­met­er have on the oth­er?