How to Make Bubble Charts

Ever since Hans Rosling presented a motion chart to tell his story of the wealth and health of nations, there has been an affinity for proportional bubbles on an x-y axis. This tutorial is for the static version of the motion chart: the bubble chart.

A bubble chart can also just be straight up proportionally sized bubbles, but here we’re going to cover how to create the variety that is like a scatterplot with a third, bubbly dimension.

The advantage of this chart type is that it lets you compare three variables at once. One is on the x-axis, one is on the y-axis, and the third is represented by area size of bubbles. Have a look at the final chart to see what we’re making.

Step 0. Download R

We’re going to use R to do this, so download that before moving on. It’s free and open-source, so you have nothing to lose. Plus it’s a need-to-know-name of 2011, so you might as well get to know it now. You can thank me later.

Step 1. Load the data

Assuming you already have R open, the first thing we’ll do is load the data. We’re examining the same crime data the we did for our last tutorial. I’ve added state population this time around. One note about the data. The crime numbers are actually for 2005, while the populations are for 2008. This isn’t a huge deal since we’re more interested in relative populations than we are the raw values, but keep that in mind.

Okay, moving on. You can download the tab-delimited file here and keep it local, but the easiest way is to load it directly into R with the below line of code:

Circles correctly sized by area, but the range of sizes is too much. The chart is cluttered and unreadable.

Yay. Properly scaled circles. They’re way too big though for this chart to be useful. By default, symbols() sizes the largest bubble to one inch, and then scales the rest accordingly. We can change that by using the inches argument. Whatever value you put will take the place of the one-inch default. While we’re at it, let’s add color and change the x- and y-axis labels.

Notice we use fg to change border color, bg to change fill color. Here’s what we get:

Scale the circles to make the the chart more readable, and use the fg and bg arguments to change colors.

Now we’re getting somewhere.

By the way, you can make a chart with other shapes too with symbols(). You can make squares, rectangles, thermometers, boxplots, and stars. They take different arguments than the circle. The squares, for example, are sized by the length of a side. Again, make sure you size them appropriately.

Step 4. Add labels

As it is, the chart shows some sense of distribution, but we don’t know which circle represents each state. So let’s add labels. We do this with text(), whose arguments are x-coordinates, y-coordinates, and the actual text to print. We have all of these. Like the bubbles, the x is murders and the y is burglaries. The actual labels are state names, which is the first column in our data frame.

With that in mind, we do this:

text(crime$murder, crime$burglary, crime$state, cex=0.5)

The cex argument controls text size. It is 1 by default. Values greater than one will make the labels bigger and the opposite for less than one. The labels will center on the x- and y-coordinates.

Here’s what it looks like.

Add labels so you know what each circle represents.

Step 5. Clean up

Finally, as per usual, I clean up in Adobe Illustrator. You can mess around with this in R, if you like, but I’ve found it’s way easier to save my file as a PDF and do what I want with Illustrator. I uncluttered the state labels to make them more readable, rotated the y-axis labels, so that they’re horizontal, added a legend for population, and removed the outside border. I also brought Georgia to the front, because most of it was hidden by Texas.

About the Author

Nathan Yau is a statistician who works primarily with visualization. He earned his PhD in statistics from UCLA, is the author of two best-selling books — Data Points and Visualize This — and runs FlowingData. Introvert. Likes food. Likes beer. Follow him @flowingdata.

This graph is no better visually, and worse get’s populations to appear wrong… my dad who does not read English, for example, found odd American states having circle diameters implying populations about one billion (in fact the small circle of the legend says “5.0e+08” which is five hundred million…

On the other hand, isn’t any way of the plotting routine do the uncluttering of the labels automatically, perhaps adding small line segments to connect to the symbols if ambiguity can arise?

@drio – A lot of little things. When you open a PDF in InkScape or Illustrator, you can manipulate the individual elements by clicking and dragging. For example, you could click on an individual label and move it where you wanted, or you could change the color of a single circle. So it’s a lot easier to edit small parts of the graphic in there than in R, for me at least.

@Nathan,
Very helpful tutorial. Thanks!
Question for anyone who has used Inkscape for editing the PDF output from R. Do you encounter situations where some of the really tiny graphics are replaced by a symbol that looks like a lowercase ‘q’? How do you make sure that Inkscape imports even the smallest graphic created by R?

Good tutorial, but I think your axes are incorrect. Ten murders per 1,000 people in Louisiana would be 1% of the population being murdered per year! A quick check of the data on Wolfram Alpha suggests the axes should be “per 100,000 population”, for example:

Great post, I’m definitely looking forward to more R tutorials in the future as well. On a side note, I’ve noticed out rough and pixelated all of my plots look (see: http://j.drhu.me/bubblePlots.png). Is this just due to you cleaning up your images in illustrator before you post them, or can I edit the resolution of the plotting tool?

You might have better luck if you export the graphs as pdf, and then find a way to make your final image from that (pdf’s are much easier to manipulate in Illustrator or Inkscape). You could also specify the resolution of the png you want to save in R.

It’s not possible to play animations in the browser based tableau, so you can either move the slider or download the workbook and then use the free tableau reader to view and hit the play button. If you click on a particular bubble you can track it over time.

“Ever since Hans Rosling presented a motion chart to tell his story of the wealth and health of nations, there has been an affinity for proportional bubbles on an x-y axis.”… SERIOUSLY?? You got to be kidding – Excel has bubble charts since v 4.0 in 1992 (maybe earlier but I’m positive of that version) and any consultant uses them as the daily breakfast. Good tutorial for R, though.

Nice graph, especially after the clean up in Illustrator. I agree, that is often the best way to put the polish on the final product. One note of interest. Research on how people perceive proportional symbols on maps suggests that readers don’t estimate circle areas well (one explanation being that we tend to live is a linear world). As a result, good cartographers will adjust the area calculation to “trick” the eye so that the reader’s interpretation is closer to the truth. A discussion of this issue and an R solution is found at http://www.jstatsoft.org/v15/i05/paper . Are there graphing packages that offer this option for bubble charts?

Nathan, thanks for promoting bubble plots, they are one of my favorite tools. I used your crimeRatesByState2008.csv data you to show how easy it is to generate static and dynamic bubble plots using JMP. You can check it out here http://bit.ly/gacVkY. Thanks.

Great tutorial.
Thanks for the insight into using Inkscape to open up pdfs and modifying the graphics.
Just a note to people using Inkscape: the open pdf function appears to have font issues (known bug). Following the above tutorial, many of the small circles came out as q’s. From Philippe Joyez at the bug forum at Inkscape, some pdf programs allow you to save as svg. In Ubuntu, one can open up the pdf in Evince, print it to svg, and then open up the svg in Inkscape without problems.

If you wanted to get REALLY close to the resulting figure without using an image editor then you could use the following R-code. One might improve the state name positioning with an offset vector (and following the advice in ?text about interactive positioning).

Much better is to plot on the device directly, not copy from the screen to the device. E.g.:
pdf(“foo.pdf”)
plot(1:10)
dev.off()
Aspect ratios on screen might be different to the device you copy to. With the above you get fine control on the PDF produced by setting arguments to the `pdf()` function.

Hi Nathan,
I have come to your site today, was exploring and doing some tutorials. I have a problem in this one, maybe a noob one, as I’m not an expert on Illustrator. I export the R bubble graph to PDF and open it on Acrobat ok, but when I open it on Illustrator some bubbles are missing and replaced by a [X] image. What am I doing wrong? Thans for answering. Btw, your site is great: lots of info and fun with data and graphics.

Love the logic and the visual of the chart. I do have one question. Is it possible to add a z-axis to the graph, so that the bubbles are plotted (or floating) in a 3D space. For example, suppose a Z-axis shows the average income per state. Perhaps more burglaries occur in states with average high income, and more murders occur in those states with lower income [obviously I have no idea, I am just guessing here]. In a 3D chart this would show up. An alternative may be to color the bubbles differently according to a their income, yet to me that makes more sense for a nominal variable. I am very curious which result would offer the best visual.