What Visualization Tool/Software Should You Use? – Getting Started

Topic

Are you looking to get into data visualization, but don’t quite know where to begin?

With all of the available tools to help you visualize data, it can be confusing where to start. The good news is, well, that there are a lot of (free) available tools out there to help you get started. It’s just a matter of deciding which one suits you best. This is a guide to help you figure that out.

But before we get into what you should use, a couple of questions.

What data are you looking at?

Hopefully you already have a dataset that you’re interested in. If not, go find one. It’s important to have actual data when you’re learning, because the visualization tool that you use will depend on it.

There are lots of places on the Web to find data. Here are a few worth checking out:

The above is a very small subset of what’s available. Oh, and let’s not forget all the government organizations that have departments dedicated to putting together datasets. Just pick one you’re interested in.

Got your data? Ok, good, on to the next step.

What’s the purpose of your visualization?

The next step is to figure out you’re trying to do with your visualization. Are you working on a Web application that has some graphs? Is it an interactive tool? Do you want to use better-looking graphs in your slide presentation? Is the visualization for a publication? Do you just need it for analysis?

Again, what you decide here will affect what tool you should use.

What Visualization Software to Use

Now that you have the answers to those two questions in mind, we can make a decision on what will work best for you.

For Publication

This means graphics like what you see in the newspaper. Most people use Adobe Illustrator. It gives you control over all the elements in your graphic – color, stroke, font, orientation, etc.

If you want to do something more complicated than your traditional graphs, you can design it by hand in Illustrator or your can do it in R (either programmatically or with one of the add-on libraries), which is a software environment for statistical computing and graphics. From R, you can import your file as a PDF into Illustrator. That’s usually what I do.

Illustrator is kind of pricey however. Some have suggested using the open-source alternative Inkscape. I’ve never tried it though.

For Presentations

Many want to add some spice to their presentations. You can use the same software as the above, but there’s also not much harm in using Microsoft Excel despite the stigma. The key here is not to use the default settings. You can actually do a lot in Microsoft Excel and make it look good. Plus, you don’t need to include many details in a graphic made for presentation slides, because people can’t see them from far away.

Personally, I don’t use it much for graphics since I’m comfortable with R and Illustrator.

For Analysis

There are a lot of analysis tools, and the preferred one will change depend on who you ask. I use R, which requires some programming skills. Most people use Excel. I’ve also heard a lot of good things about Tableau Software.

For Web Applications

I’m going to assume you have a programming background if you’re looking to do visualization for a Web application. If you don’t know anything about computer code, you can try Many Eyes or Fusion Charts. You’ll be limited to their offerings though.

Now, if you’re developing for the Web, there are two main options here. The first is Processing, which was designed to make coding easier and to give you more bang for the buck. Check out the site and Processing forums for plenty of tutorials and tips. The end result is a Java applet.

The second, more popular option is Flash. You can either do stuff in the actual Flash program, or you can use Actionscript for a pure coding solution. Either way, the end result is something that runs in the Flash environment. The Flare visualization toolkit is a good place to start.

The upside of Flash is that it tends to load faster than Java, and more people have Flash than Java installed on their computer. You might also be able to get away with just a little bit of code if you use just Flash, although, if you really want to get serious with visualization, you’ll need to learn Actionscript.

To that end, Processing is a lot easier to learn coding-wise. Plus it’s free and open source.

For Art

Processing definitely seems to be the software of choice for artists and designers. Again, it goes back to how easy it is to learn and how much you can do with it. Illustrator is the most common choice for non-interactive graphics since it gives you drag-and-drop control over all the elements.

I’d say that if you’re looking for a nice integration with web standards the JavaScript InfoVis Toolkit is a good option. For larger datasets Flare seams more suitable for this task. Also, Flare uses Flash which means that your visualization will work in 95% of browsers and machines (which is a good thing).

I heart processing, especially offline, because java applets are annoyingly long to start. processing.js is getting much better though.
at work most of the 10k charts we publish each year are done in Excel, then imported in illustrator. Some are directly generated from Fame. Our online tools are usually done with flash or flex.

Hi. For vector graphic I use Inkscape. It’s free, far more usable than GIMP (another open source graphics program) and suits very well my needs. I love Processing but something that it cannot beat is Flash interactivity and it’s graphics rendering tree object model. It makes Flash so much better for GUIs and interactive datavis. It can be also free if you use opensource Flex SDK for Actionscript programming. I wouldn’t call it a begginers level though.

Great post Nathan. I have been to each of the web-based data sources you mention but had forgotten about 2 of the 3.

Many of the other tools for publication on the web are new to me. I’m pretty heavily into Tableau, Excel and SQL and have looked at R, but frankly it looks like the learning curve is too long to be worth the investment when tools like Tableau are available (if you have the funds) which do a very good job.

The R learning curve is very much worth it if you ever need to do anything beyond simple data manipulation that you are comfortable with in other programs. If you need to do any statistical analysis (which in my opinion would help a lot of visualizations be a lot more informative), you should really look into R.

The data I work with is rarely clean or structured in a way ready to be vized. I use Lyza from http://Lyzasoft.com to not only connect to many data sources and transform, but to figure out what data to look at with profiling and verify it is valid. I then viz it with Tableau so it is interactive. Both apps let me do most of the complex stuff with drag and drop, and create many iterations and approaches to the data quickly.

Interesting thread, Nathan. For me usability for the analysist and understandability for the enduser of the analysis are crucial, besides all the whistles and bells of nice viz gadgets.

If the people that ordered the analysis don’t understand the viz the work is useless.
This is an issue that does not pop up too much here, because statisticians love their tools. But it is the one that ordered that analysis who counts.

I use OpenOffice for table analysis, because it does the same as Excel, and it has comparable graphing capacity, but costs nothing.
And a crosstab is often the most complicated tool a non-statistician can understand.

For anything more complicated than table analysis spreadsheets are no good idea, you have to go for real stats packages. SPSS is a good tool that does not demand the steep learning curve of R.

The only product that we’ve come across that can cover all three of the traditional, single visualization techniques – spatial, temporal, and link analysis – is GeoTime. The others only paint a picture. GeoTime combines all three to tell the full story.

A very cool SVG-based alternative, in alpha stage and being written by Mike Bostock, a Stanford CS grad student of Flare’s author Jeff Heer, is protovis (http://protovis.org/) I like it you can make very concise javascript expressions to make quite sophisticated mappings from data to marks-on-screen.

Thanks for this post. I’ve been wrestling with Mozilla’s html pages lately, and it’s just terrible. Java and html parsing don’t seem to go together at all. I like the earlier user’s suggestions of using python for the data parse and then Processing for the actual viz step.

Not only is Processing pretty easy, but it has such a great user community. Even if I feel my questions are lame, I always get a useful answer.

Google Spreadsheets has a Motion Gadget widget that is very nice to show time-based data. I first learned about it from watching Hans Rosling’s TED talks. Check out what he’s done at http://www.gapminder.org as well.

I guess I need to check out R. I’ve dabbled with Processing, but as a Flash+PHP developer, I typically stick to what I know best when trying to do a complex task.

Probably not the most ideal method, but I’ve actually used PHP (+GD2) when drawing images derived from large data sets that reside in MySQL databases. It was an easy way to go from query -> image manipulation without having to create an API or anything.

I use Mathematica for pretty much anything where more than a couple of curves are required. Downsides are cost, and very steep learning curve. Upsides are programatic graph generation, quality & wide variety of plot types.

As a note, John Resig (the principal author of jQuery, among other things) ported the Processing framework to JavaScript, which makes it much more web-friendly. It’s compatible with Firefox, Safari, Chrome, Opera, and (with the Explorer Canvas script), Internet Explorer, and it works quite nicely. It’s available at http://processingjs.org/.

Great post, Nathan. We use Tableau for brainstorming ideas (with whatever db is needed). Then our designers use Illustrator to mock up the viz. Then our programmers take it all into Flash. We’ve built a number of our own ActionScript libraries for dataviz.

We use JuiceKit (http://juicekit.org) , our open-source lightly patched Flare library integrated with features that make integration with Flex easier. In particular, we want to make it easy to make Flare visualizations that work with Flex data binding.