communicating with data

treemaps

Tableau can do many things natively but there are a couple of basic primitives that are not built in because they behave somewhat differently from the overall logic. And treemaps is one of them. Then again treemaps are arguably one of the best way to express complex hierarchical information, i.e. to show the proportions in a large dataset.
Fortunately, thanks to Tableau flexibility there are ways to do that. In the tutorial I'm going to cover 2 cases. First, we'll create a somewhat complex treemap off data which will not change in runtime. Then, we'll create mini-treemaps which can change dynamically.

A complex treemap

What we want (and what we'll get) is a dataset that can be directly imported in Tableau and -boom- makes a treemap in a few clicks.
To make this dataset we can use d3. The treemap I am making is directly inspired from the d3 treemap example. d3 is already computing all of the node positions so what we'll do is modify the program slightly so that it outputs them in a way that can be directly used in Tableau.
Here is the modified file which you can download and run on your computer. To work it needs to be in the same folder as a data file called data.js which will hold your hiearchical data and which has the same structure as the one linked here.
You can just copy/paste the table that's displayed below the treemap and put it in Tableau or save it in a file for good measure. Here is the output of the data file linked above.
Let's take a look at a few rows :

Id

Path

Top-level category

Name

Value

Corner

x

y

0

flare>analytics>cluster

flare

AgglomerativeCluster

3938

0

89

167

0

flare>analytics>cluster

flare

AgglomerativeCluster

3938

1

167

167

0

flare>analytics>cluster

flare

AgglomerativeCluster

3938

2

167

192

0

flare>analytics>cluster

flare

AgglomerativeCluster

3938

3

89

192

1

flare>analytics>cluster

flare

CommunityStructure

3812

0

102

138

1

flare>analytics>cluster

flare

CommunityStructure

3812

1

167

138

1

flare>analytics>cluster

flare

CommunityStructure

3812

2

167

167

1

flare>analytics>cluster

flare

CommunityStructure

3812

3

102

167

2

flare>analytics>cluster

flare

HierarchicalCluster

6714

0

89

192

2

flare>analytics>cluster

flare

HierarchicalCluster

6714

1

167

192

2

flare>analytics>cluster

flare

HierarchicalCluster

6714

2

167

236

2

flare>analytics>cluster

flare

HierarchicalCluster

6714

3

89

236

I'm creating 4 lines per "leaf" node. So in this example which has 220 nodes, that amounts to 880 lines. Why 4? Because to draw a rectangle in Tableau you really need to define 4 corners. This is why there is a column "Corner" which is worth 0,1,2 and 3. This, we will use to tell Tableau to read our corners in bottom left, bottom right, top right, top left order which produces a nice convex rectangle and not a concave hourglass shape.
Now off to Tableau with this data.
Now it's just a matter of doing like this screen. Unsurprisingly the columns and rows are going to be determined by x and y. You want a polygon mark, and you absolutely must use your corner measure in the path. For color, you'll have a choice, you can use the top-level category column (as I have) or the full path which will divide your treemap in finer parts. Finally, level of detail: you must use the Id and not the name in case several of your nodes have the same name. It's quite important at this point to uncheck aggregate measures in Analysis. You do NOT want aggregate measures (though it's quite pretty). To be able to use the name, you must first make a measure out of it. And finally, you'll want to update your infotip slightly.
All of this you can see if you download the tableau file.
And voilà! Treemaps for your Tableau workbooks.
Caveat: the polygon mark doesn't support labels so you can't write on top of the small rectangles what they are but that's not the point of the treemap, which is instead to give an immediate first impression of the relative size of large groups of your data, then allow you to explore them, to that end the infotip function works just fine.

Simpler but dynamic treemaps

This is fine and dandy if your data doesn't change but it won't scale if you need to make many treemaps based on selections. What to do? You could use pie charts, but let's not.
To that end I've tried to emulate the Congress speaks visualization by Periscopic. I really like it. When you've selected representatives at the end of the process you are taken to a screen which shows the following mini-treemap:
There are just 5 rectangles. But they will change for any representative that we choose. Can this be done with Tableau? Obviously.
Now the Tableau part of this is slightly trickier than above. The idea is that we are going to use formulas to generate the coordinates of all 20 corners of the rectangles, in other words we are going to let Tableau calculate the layout. We can do it because the way that rectangles are going to be arranged is quite predictible. There is one on the left, then 4 stacked on the right one on top of the other. Again, we could compute all of these coordinates outside of Tableau but that would be a hassle and so for a large number of cases it becomes easier and more reliable to do this inside of Tableau.

Data

For this I have used completely random data. I have generated 20 names, and for each I have generated 5 values in a likely range, number of possible votes, number of votes the representative actually voted, number of times they voted yes, number of times they voted yes with their party, and the same for no. (or nay, technically).
At the end of the day I need 20 records per representative (5 rectangles of 4 corners each), so I can either replicate the line 20 times, or use linked tables. The idea is to get something like this for all of the representatives that can somehow get into Tableau.

Id

representative

corner

rectangle

possible votes

total votes

voted yes

yes with party

voted no

no with party

16

Nelson Thiede

0

no against party

888

784

320

274

464

373

16

Nelson Thiede

1

no against party

888

784

320

274

464

373

16

Nelson Thiede

2

no against party

888

784

320

274

464

373

16

Nelson Thiede

3

no against party

888

784

320

274

464

373

16

Nelson Thiede

0

no vote

888

784

320

274

464

373

16

Nelson Thiede

1

no vote

888

784

320

274

464

373

16

Nelson Thiede

2

no vote

888

784

320

274

464

373

16

Nelson Thiede

3

no vote

888

784

320

274

464

373

16

Nelson Thiede

0

no with party

888

784

320

274

464

373

16

Nelson Thiede

1

no with party

888

784

320

274

464

373

16

Nelson Thiede

2

no with party

888

784

320

274

464

373

16

Nelson Thiede

3

no with party

888

784

320

274

464

373

16

Nelson Thiede

0

yes against party

888

784

320

274

464

373

16

Nelson Thiede

1

yes against party

888

784

320

274

464

373

16

Nelson Thiede

2

yes against party

888

784

320

274

464

373

16

Nelson Thiede

3

yes against party

888

784

320

274

464

373

16

Nelson Thiede

0

yes with party

888

784

320

274

464

373

16

Nelson Thiede

1

yes with party

888

784

320

274

464

373

16

Nelson Thiede

2

yes with party

888

784

320

274

464

373

16

Nelson Thiede

3

yes with party

888

784

320

274

464

373

In Tableau

In Tableau we are going to use the same idea as above: polygon mark, disable aggregate measures, and use x and y for columns and rows.
Only, x and y are going to be much more complex. Sorry about that. Well, not that complex but definitely longer.
Here's x:

case [rectangle]
when "no vote" then
case [corner]
when 0 then 0
when 1 then (([possible votes]-[total votes])/[possible votes])
when 2 then (([possible votes]-[total votes])/[possible votes])
when 3 then 0
end
else
case [corner]
when 0 then (([possible votes]-[total votes])/[possible votes])
when 1 then 1
when 2 then 1
when 3 then (([possible votes]-[total votes])/[possible votes])
end
end

Depending on the rectangle we are trying to draw we can find ourselves in one of two cases (hence the use of case).
If we draw "no vote" then we are on the left of our vis. The left corners are on the leftmost side of the vis (hence value: 0) and the right corners correspond to the proportion of possible votes which where not cast by this representative, which we can compute as ([possible votes]-[total votes])/[possible votes].
In the other case, we are drawing one of the 4 stacked rectangles, so the right corners are on the rightmost side of the vis (hence value: 1) and the left corners correspond to the value we just computed.
And now, y:

case [rectangle]
when "no vote" then
case [corner]
when 0 then 0
when 1 then 0
when 2 then 1
when 3 then 1
end
when "yes against party" then
case [corner]
when 0 then 0
when 1 then 0
when 2 then (([voted yes]-[yes with party])/[total votes])
when 3 then (([voted yes]-[yes with party])/[total votes])
end
when "yes with party" then
case [corner]
when 0 then (([voted yes]-[yes with party])/[total votes])
when 1 then (([voted yes]-[yes with party])/[total votes])
when 2 then ((2*[voted yes]-[yes with party])/[total votes])
when 3 then ((2*[voted yes]-[yes with party])/[total votes])
end
when "no with party" then
case [corner]
when 0 then ((2*[voted yes]-[yes with party])/[total votes])
when 1 then ((2*[voted yes]-[yes with party])/[total votes])
when 2 then ((2*[voted yes]+[no with party]-[yes with party])/[total votes])
when 3 then ((2*[voted yes]+[no with party]-[yes with party])/[total votes])
end
when "no against party" then
case [corner]
when 0 then ((2*[voted yes]+[no with party]-[yes with party])/[total votes])
when 1 then ((2*[voted yes]+[no with party]-[yes with party])/[total votes])
when 2 then 1
when 3 then 1
end
end

y is longer but this is the same general idea. For the "no vote" rectangle, the corners are either to the top or bottom of the vis. But for the other, we can predict where the rectangle will start and when it will end, as a proportion of the [possible votes] field. The values we want are going to be correspond to these proportions, plus that of all the rectangles below so we can achieve that stacked effect (as opposed to have all rectangles superimposed at the bottom of the vis). This is why I am entering the rectangles in stacking order. Each time, the bottom corners get the value of the top corners of the previous rectangle.
Here is the final result:

This week, I was made aware of a new set of maps by French ministry of Foreign Trade, called cartographie de la France qui exporte (map of France exports) (link). Since I’m interested in the topic and that I know that French public services have killer cartographers I was eager to see what was so exciting about the first set of online maps on French exports.

I was a little underwhelmed to be honest. Online here meant static pdf files, although this is a dataset that just begs to be explored and manipulated.
On top of that, those where basic choropleth maps with markers such as this one here:

Now this map has two problems. First, it’s a choropleth with a discrete scale, but the values of adjacent areas can vary a lot. So, if you look at this portion of the map, what can be deduced on the values? not much I’m afraid.

Second, it’s difficult to compare the marks on the map. Which region has the biggest? the smallest? how do two specific regions compare? with this representation, this type of question is even more difficult to answer than with a table.

Also these charts answer one partial question. So this one, here, shows which region exports most food products. But to where? and how about the imports and balance? now if one given view was the most relevant and could illustrate some important finding, it can be highlighted but here the website gives us collections of many of such maps. As a citizen I’m leaving no more informed than I was.

Not being the one to criticize without proposing an alternative, I whipped out an interactive exploratory tool of France trade flows.
(The interactive vis is too wide to conveniently fit in a blog, but clicking on the image will open it in a new tab).

I don’t have access to the same dataset so I can’t show a strict equivalent. My data comes from COMTRADE, the UN database of trade flows, and shows all exports and imports to France in 2009. They are not broken down by region or by type of company, but I got the flows by partner country and product category.
The idea is that one can select something on one treemap to update the other. Also, it’s possible to alternate between a categorical view (where all groups of products and continents look neatly separated) and a view of the balance, which quickly shows which products or which countries get the bulk of French trade.

(technical explanation follows for those interested in the code proper)
Now following last week’s tutorial, of course it had to be done in protovis.
Actually it illustrates some interesting principles of working with arrays, trees, maps etc.

First, I want to do as much data manipulation as possible in protovis as opposed to manually. So, my source data for the treemap is stored as an array of associative arrays, which is probably the preferred form in protovis. This is no different than, for instance, Protovis’s barley example.
Now how do you get something of the shape -

where com is commodity code, cat is product category, cou is country, con is continent, imp is imports and exp is exports.

For any country + commodity combination, there will be only one record.
What I’m interested to get in the tree I will use for the treemap are exports. That is what will determine the size of the leaves of the tree.

once I have written this I could follow up with a .entries() statement which would return me a nested array, or with rollup() which could give me the tree I need.
Since, again, there is only one record for a combination of country (cou) and commodity (com), I can use any aggregation I want.

This creates a tree, nested by country, then by product category, then by commodity. The corresponding values are the exports.

now creating my treemap data dynamically saves me a ton of hassle compared to trying to come up with a data file of the right shape and size, not mentioning the calculation errors which creep in each manual manipulation !

Another point of interestingness: how I computed the data to create the bar charts on the side.
For the left treemap (and left bar chart) the user has selected a country. (and for the right ones, it’s a given product, but let’s focus on the left side, the reasoning is the same for the other side anyway).

so first I am going to take the tree we made earlier and just look at the selected country. We can do that with a statement like:

myProductTree=byProduct[selCountry];

(so now we have a tree with just 2 levels, product category and commodity).

Now I can’t run pv.nest and all that on a tree. I need an array! so I have to use flatten to turn that section of the tree into a bona fide array which I will be able to further process.

Here, note that the arguments: “cat”, “com”, “exp” are completely arbitrary. But, since I’m recreating the array almost as it originally was, I might as well use the same names for the keys.

So now, I have like a little subset of my original dataset, only the records of the selected country.
I can now proceed to sum exports by categories using a standard rollup method, just as we’ve seen here.

Conveniently, the rollup function that I defined earlier sums the records! and here I do need summing, not any aggregation.

The problem is that the rollup() method creates an associative array, and if I need to use that in a bar chart I need a proper array! so, I use pv.values() which does just that, it creates an array out of the values of an associative array.

catsByCountry = pv.values(catsByCountry);

Now the values can vary a lot in absolute terms depending on the selected country. This is why in the actual bar chart, I use pv.normalize() to have only values from 0 to 1 which are much more convenient to plot.

vis.add(pv.Bar)
.data(function() pv.normalize(catsByCountry))

one last thing, to save space in the data set (which means: bandwidth + loading time) I’ve used short keys in my data file, and I’ve used codes for countries, commodities and the like.

Working with layouts

In this final part, we’re going to look at how we can shape our data to use the protovis built-in layouts such as stacked areas, treemaps or force-directed graphs.
This is not a tutorial on how to use layouts stricto sensu, and I advise anyone interested to first look at the protovis documentation to see what can be done with this and to understand the underlying concepts.

But if there is one thing to know about layouts, it’s that they allow you to create non-trivial visualizations in even less code than regular protovis, provided that you pass them data in a form they can use, and this is precisely where we come in.

Three great categories of layouts

Currently, there are no fewer than 13 types of layouts in Protovis. Fortunately, there are examples for all of them in the gallery.
There are layouts for:

Arrays of data

In order to work with this kind of layout, the simplest thing is to put your data in a 2-dimensional array:

var data=[
[8,3,7,2,5],
[9,6,1,7,4],
...
[7,4,3,6,8]
];

For the grid layout, this gives you an array of cells divided in columns (number of elements in each line) and rows (number of lines).
The idea of the grid layout is that your cells are automatically positioned and sized, so afaik the only thing you can do is add a mark such as a pv.Bar which would fill them completely, but which you could still style with fillStyle or strokeStyle. You can’t really access the underlying data with functions but you can use methods that rely on default values, like adding labels.

On line 29, I’m using a map function to turn this array of strings, which is easier and shorter to type, into a bona fide 2-dimensional array.

That’s all there is to grids, of all the layouts they are among the easiest to reproduce with regular protovis.

Now, stacks.
The easiest way to use them is to pass them 2-dimensional arrays. Now it doesn’t have to be arrays of numbers, it can be arrays of associative arrays in case you need to do something exotic. But for the following examples let’s just assume you don’t. Here is how you’d do a stacked area, stacked columns and stacked bars respectively:

For bars, there is a little trick here. I specify that the layer orientation is horizontal (“left”) and I change the height instead of the width of the added pv.Bar.
And that all there is. You can create various streamgraphs by playing with the order and offset properties of the stack but this doesn’t change anything to the data structure, so we’re done here.

Representing networks

Protovis provides 3 cool layouts to easily exhibit relationships between nodes: arc diagrams, matrix diagrams and force-directed layouts.
The good news is that the shape of the data required by those three layouts is identical.

They require an array that correponds to the nodes. This can be as simple as a pv.range(), or as sophisticated as an array of associative arrays if you want to style your network graph according to several potential attributes of the node.

And they also require an array for the links. This array has a more rigid form, it must be an array of associative arrays of the shape: {source: #, target: #, value: #} where the values for source and target correspond to the position of a node in the node array, and value indicates the strength of the link.

Here, by varying the strength of the link, the thickness of the arcs changes accordingly. The nodes are left unstyled, had we passed a more complicated dataset to the nodes array, we could have changed their properties (fillStyle, size, strokeStyle, labels etc.) with appropriate accessor functions.

With little modifications we can create a force-directed layout and a matrix diagram.

For the matrix things are slightly more complex than for the previous 2. Here I opted for a directed matrix, as opposed to a bidirectional one: this means that each link is shown once, to its source from its target, and not twice (ie from its target back to its source) which is the default.
I chose to color the bar attached to my links (which are cells of the matrix) according to the strength of my links. Again, if my nodes field was more qualified, I could have used these properties.

Finally, we’ve added labels to the custom property Matrix.label. Only, the labels are numbered from 0 to 11 so to get numbers from 0 to 5 for both rows and columns I used Math.floor(this.index/2) (integer part of half of this number).

Hierarchized data

Like for networks, the shape of the data we can feed to treemaps, icicles and other hierarchical representation doesn’t change. So once you have your data in order, you can easily switch representations.

The protovis examples use the hierarchy of flare source code as an example, which really shows what can be done with a treemap and other tree represenations.

For our purpose we are going for a simpler tree, inspired by the work of Periscopic on congressspeaks.com which Kim Rees showed at Strata.
Kim presentation featured tiny treemaps that showed the voting record for a congressperson, and whether they had voted for or against their party.

There are many styling possibilities obviously left unexplored in this simple example (you can control properties of the tree.link, tree.node, tree.labels which we didn’t use here, etc.), but this won’t change much as far as data are concerned.