An Easy Way to Make a Treemap

If your data is a hierarchy, a treemap is a good way to show all the values at once and keep the structure in the visual. This is a quick way to make a treemap in R.

Back in 1990, Ben Shneiderman, of the University of Maryland, wanted to visualize what was going on in his always-full hard drive. He wanted to know what was taking up so much space. Given the hierarchical structure of directories and files, he first tried a tree diagram. It got too big too fast to be useful though. Too many nodes. Too many branches.

The treemap was his solution. It’s an area-based visualization where the size of each rectangle represents a metric since made popular by Martin Wattenberg’s Map of the Market and Marcos Weskamp’s newsmap.

Here’s a really easy way to make your own treemap in just a couple lines of code. We’re looking to make something like the above.

Step 0. Download R

Like before, we’re going to use R, so you’ll want to get it before going any further. Download it for Windows, Mac, or Linux. Don’t let the out-dated site full you. You can get a lot done with the free software.

Step 1. Load the Data

We’ll use data covering a hundred popular posts on FlowingData. Here it is in CSV format. You don’t have to download it though. We’ll just load it directly into R. The main thing to take note of is what is there. There’s post id, number of views, number of comments, and category.

Okay, let’s load it into R using read.csv():

data <- read.csv("http://datasets.flowingdata.com/post-data.txt")

Loading data in CSV format into R.

Easy enough. We just used the read.csv() function to load data from a URL. If your data is on your computer, you could also do something like data <- read.csv("post-data.txt"). Just make sure the data file is in your current working directory, which you can change via the “Miscellaneous” menu.

Step 2. Load the Portfolio package

Only a few more lines of code, and you’ve got a treemap. It’s so easy, because we’re going to use the portfolio library in R. First, you have to install it. You can either install the library via the “Package Installer” or you can do it through the command line. Let’s do the latter. Type this in the console to install portfolio:

install.packages("portfolio")

Once installed, load it into R:

library(portfolio)

Step 3. Make the Treemap

It’s time to make the treemap with map.market(). Type this in the console:

Step 4. Customize

Now maybe you want to modify something like color. The cool thing about R is that you can see the code for all the functions, edit it, and then use your customized version. If the green and red scheme isn’t for you or you don’t care about the positive/negative cutoff, then you can change the code to do that. I won’t go into detail, but if you type map.market in the console, you’ll see the function. You can change color or cutoff around lines 36-46.

For example, you can do a black and white color scheme:

You don’t have to stick to the default color scale though.

I was alright with the green for this, so I saved it as a PDF and then loaded it into Illustrator as usual. I numbed the green some, cleaned up the labels with a new font and layout, and updated the legend.

Touched up version of treemap with black-green color scale.

And there you go – a treemap with just a few lines of code in our all-trusty R. Rinse and repeat with your own data.

For more examples, guidance, and all-around data goodness like this, order Visualize This, the FlowingData book on visualization, design, and statistics.

About the Author

Nathan Yau is a statistician who works primarily with visualization. He earned his PhD in statistics from UCLA, is the author of two best-selling books — Data Points and Visualize This — and runs FlowingData. Introvert. Likes food. Likes beer. Follow him @flowingdata.

Thanks for another simple tutorial, Nathan. In your example, I find it interesting that the number of views doesn’t necessarily correlate with the number of comments. What’s even more striking is how topics like Ugly Visualization and Mistaken Data get more comments than some of the arguably more interesting topics. I guess people love to critcise?

This is great! I love simple tutorials that lead to tangible results, and this is one of the best.
I might have liked a short explanation of how to change the colors, but that would have definitely been icing–the cake was already there.
Again, good job.

I downloaded & started learning R after the NBA heatmap tutorial. I had no idea there were straight-forward, powerful, free tools available. So, thanks for the introduction!

With regard to the treemap — I’m guessing putting things like mouse-over titles onto the squares can’t be done in R (or is easier to do elsewhere). Is there a good, inexpensive, OS X -compatible tool for doing that?

Have you seen any implementation in which it can show more than 2 levels?

Note: neither the Excel Add-In nor the Macrofocus product allow you to modify the graphic in another editor; it only exports it as a “picture”. So if you don’t like exactly what you see … tough. Graphics in R can be copied as a metafile, making it a snap to make changes in Illustrator or even PowerPoint.

Any ideas for how to get CSV info on your file structure? We use an AFS implementation, and so if there’s a nice way to query all of the substructure of a given directory for size information and dump that into a CSV, I’d be set, but I’m not sure how to go about actually collecting the data. I can get at the directory structure as both a mounted drive in Windows or Mac, or I can get at it via a Unix box over SSH. Thanks!

Thanks, great tuto! Simple and efficient. I guess the font used for your last result is Adobe Avenir, I managed to change colours, I will manage to move the gradient bar below, but… How do you make the text go on a new line? for long sentences, it goes off the square too easily… Cheers

@Stephane – Avenir, correct. I did the text stuff in Illustrator, but if you wanted to do that in R, you’d have find the word length, find the width of the rectangle it is a label for, and then split the word accordingly, by space or hyphen.

It is disingenuous to say that this is only “four lines of code”. This is correct for calling out to a predefined function, but it misrepresents the amount of time required to develop said functionality in R.

@Michael – absolutely. there’s a lot of stuff going on under the hood, but for the purpose of this tutorial and for the group of people who will most likely use this tutorial, it’s four lines of code :).

of course, i encourage people to look under the hood once they’re comfortable.

I’m a complete R newbie, but have heard about it before so thought I’d follow your tutorial to see what it’s about. I followed your steps, but when I submitted the last command, the map.market() function, I received the following error:
Error in data$id : object of type ‘closure’ is not subsettable

@Nathan, I did indeed, and it even echoed back the contents of the file on the screen. I just tried again, followed immediately by another try of the map.market function and I still get the same message.

I’ve noticed that R crashes when calling map.market on large data sets. For example, I took a CSV file with about 52,000 items and tried to call map.market on it. On a machine with 4 gigs of RAM, R began using more and more RAM until it got to about 1.5 gigs. Then it crashed.

The dataset I was using included individual orders. I’m guessing I need to aggregate the data in some way, perhaps using tiers for things like the amount of the order ($0.01-$5.00, $5.01-$10.00, etc). But since a treemap involves multiple dimensions, I’m guessing I would need to aggregate the data in multiple dimensions.

I have an idea of how to do this in SQL with subqueries and group by statements. Can someone give me an idea of how to do this in R with a dataset that starts look looking like the following csv file?