This tutorial will show you how to take your material and transform it into a tablet-friendly, rich experience for readers. We will use open-source software called sStory, created by EJ Fox.

This tutorial was inspired by my Australian journalist friends, who came to a workshop I taught hosted by The Walkley Foundation. (The Walkleys are also Australia’s most prestigious journalism awards, akin to what the Pulitzer Prizes are to American journalism). The participants in the workshop were unfailingly bright and willing, and so we skipped through several data visualization lessons early and spent the last half of the last day of the workshop trying to pull together our work into a data-driven, immersive story. We didn’t quite get done, so these step-by-step instructions are what you need to know if you want to try it out.

Tutorial

0. Ask yourself: is this the right story for this format?

As part of our process, we had a story meeting where everyone pitched story ideas. We quickly came to the conclusion that not every story really “fit” this format. In particular, we were looking for stories that had:

A data element — charts, graphs, maps (not all immersive stories would require this, but it was a data-visualization workshop)

An audio-visual element (video, audio)

A narrative approach

Compelling photos that would look good at larger sizes

Not all of the stories had all these elements. A story on taxation, for example, might simply not be visual enough for this kind of treatment.

The story covers the trial of a colorful criminal who ran much of Boston’s organized crime and then went underground and managed to elude capture for decades. Embedded animated GIFs, audio, and a strong narrative make this an excellent example of this form.

They are relatively easily required with no need for a 4 year CS degree;

They provide journalistically relevant and useful results

They are reusable in a journalistic context

In my opinion, there are three skills that meet these criteria:

Mapping. Most news happens in a place. Maps are ancient precisely because they are such an expressive and powerful form of data visualization.

Grabbing. Thousands of websites have data gateways called APIs (Application Programming Interfaces) that allow you free access to some or all of that site’s data — as long as you can write a relatively simple program that can grab it and return it to you in a format you can use.

Scraping. This is to get the data out of all those sites that don’t require API’s — read “crappy government websites.”

I’ve done quite a bit of mapping, and some grabbing, but my experiences with scraping have been less successful, primarily because they proved to be less generalizable. I’d be able to pick my way (often slowly and with much frustration) through a scraping tutorial and get results. But at the end I did not feel that I could write a script on my own to scrape other things.

After taking a class on scraping at Journalism Interactive with Michelle Minkoff, I decided to buy Paul Bradshaw’s book “Scraping for Journalists,” and take another run at it. If you would like to read along, my notes as I pick through the book are after the jump. I would also like to thank Michelle and Paul for giving me the inspiration to restart this blog. I have been very busy with my new duties at INN, a network of 90+ investigative and community newsrooms, so I have not been devoting much time to adding to my own store of code-knowledge or developing tutorials to pass on what I’ve learned to others. But it’s something that I enjoy and believe gives back something of value to my peers in the field, so I welcome the chance to begin anew.

Not too long ago I was able to attend a demo of the data visualization toolkit Weave. The person giving the demo was Georges Grinstein, one of the tool’s creators. Georges hails from the University of Massachusetts at Lowell.

He showed a really amazing demo of foreclosure data from Lowell, MA. For those of you who aren’t from Massachusetts, Lowell was one of America’s first industrial cities; massive textile mills once dominated the town. The mill buildings are there — but the kind of jobs they once provided are long gone. I have a lot of affection for Lowell because my grandmother lived there and my mother was raised there; my dad graduated from the University of Massachusetts at Lowell at the age of 40 with a degree in computer science.

Give it awhile to load. I also recommend loading it in either Firefox or Chrome.

Run your mouse over anything. Anything at all. Everything here is highly interactive; you can drill down on almost anything. That’s pretty exciting all by itself, but now take a look at the menus in the upper right.

It’s not just interactive; it’s generative. You can remix this, create your own visualizations, change what the dashboard looks like and what it displays, add your own data!

When I think about teaching beginners data visualization, one of the primary questions I ask myself is: What am I teaching students that they can’t do easier and faster in Excel or Powerpoint? What new vista am I helping them to see?

To me, one of the primary ways to depart from the “Excel box” is interactivity, but also generativity. That’s why this is exciting. 🙂

Last week I was lucky enough to meet with three folks who work in the newsroom of a daily newspaper. That’s a big deal to me, because if my work isn’t useful to people who work in a newsroom, a mission-driven nonprofit, or doesn’t work for folks who want to change the world (or even just their little piece of it), I’m wasting my time.

I asked them: “What should my next step-by-step tutorial be? What would be really useful for a beat reporter who doesn’t think of themselves as a techie to pick up?” Remember, it wasn’t all that long ago that shooting video was considered a specialty task that print reporters didn’t do — and now everyone just points their iPhone at it and calls it a day. (Okay, some do a great deal more than that! But you get my point).

The folks who were kind enough to spend some time with me gave me the following hints:

The Absurdly Illustrated Guide To Your First Tableau Public ChartsnGraphs — Tableau Public is a downloadable app that lets users transform datasets into classic data visualizations — bar charts, pie charts, scatterplot, time series, and more. The end results are embeddable in a web page the way that a YouTube video is, and a few are interactive.

The Absurdly Illustrated Guide To Your First Survey with Crosstabs — There are lots of survey tools out there, but only a few of them do “crosstabs” — that’s the ability to compare one survey answer against another one. For instance, a survey that asked folks what their favorite flavor ice cream was but also asked their gender and had crosstabs could tell you that 47% of women liked black raspberry ice cream, but only 12% of men. My job would be to pick the best and most web/mobile friendly tool out there and produce a tutorial on how to use it and serve it up on the web and mobile devices.

The Absurdly Illustrated Guide To Your First ArcGIS Online Map.ArcGIS is a “geographical information system” or GIS. GIS predates web-based mapping systems like Google Maps by a couple of decades. They used to be very, very expensive software used by specialists, and to some extent they still are. But to get with the times, ArcGIS now has an online service too. I’ve never used it, but hey, before I wrote my TileMill tutorial I never used that either! Writing tutorials is a spectacularly effective form of learning — if I understand it well enough to explain it to a total beginner, I probably understand it pretty well.

One thing I’ve been thinking about lately that really made an impression on me is how important it will be for me to focus on “zero install” tools. Many folks have computers and servers that their corporate IT department doesn’t allow them to install anything on for security reasons. So my TileMill tutorial, which requires you to download an app, won’t work for folks in that situation. But a tutorial on the data visualization platform ManyEyes would — you don’t have to download or install anything, you just work with the application from your web browser.

That’s super-useful information for me as I go forward and write more data visualization tutorials.

So, two things:

If you’re reading this and you guys are the folks who were kind enough to meet with me, you know who you are 🙂 I didn’t use your names here because I forgot to ask if I could, and not asking would be really rude! But if you’d like the credit for giving me so many smart ideas please let me know 🙂

What about more tutorials? If you’re reading this, wherever and whoever you are, and you have a burning desire to learn a specific data visualization or mapping tool or technique, please let me know. I’m a noob like you (if you’re a noob), so I can’t guarantee I’ll do every one, but I do want to know!

Recently I searched for the name of my current project, “Data for Radicals,” and through that magic we know as Serendipity on the Internet, up popped:

Ten Rules for Radicals

by none other than Carl Malamud. To be honest, before I read “Ten Rules for Radicals,” all I knew about Carl was that my friends who were investigative journalists — particularly those who did the deep data and document dives through FOIA and other means — talked about him in hushed tones of awe.

Reading the title, I could not help but think that the essay, originally an address to the WWW2010 conference, represented one of those strange messages that happen between people of ideas, even if those people are separated by centuries, or thousands of miles, or other barriers, and even if they have never met. Haven’t you ever had that feeling of picking up a book and feeling that the author is, in an uncanny and spooky way speaking to you — directly to you? I felt that way about this essay. You can find it in full here, but interspersed within it are his “Ten Rules for Radicals,” which in the essay he illustrates with stories from his career. You should do yourself the favor of reading the whole thing, but for my own edification, I am reprinting the ten rules below.

Just as I do when I am learning new code, I did not copy and paste these. As I sit here, I am typing them word by word with my own ten fingers on my Macbook Air, sitting at my dining room table at 2:21 AM on Saturday, May 18 (What can I say? A dream woke me up and I couldn’t get back to sleep).

Rule 1: Call everything an experiment.Rule 2: When the starting gun goes off, run really fast. As a small player, the elephant can step on you, but you can outrun the elephant.Rule 3: Eyeballs rule. If a million people use your service, and on the Internet you can do that, you’ve got a lot more credibility than if you’re just issuing position papers and flaming The Man.Rule 4: When the time comes, be nice.Rule 5: Keep asking until they say yes. Gordon Bell, the inventor of the VAX, once said that you should keep your vision, but modify your plan.Rule 6: When you get the microphone, get to the point. Be clear about what you want.Rule 7: Get standing. Have some skin in the game, some reason you’re at the table.Rule 8: Get them to threaten you.Rule 9: Look for overreaching, things that are just blatantly, obviously wrong or silly.Rule 10: Don’t be afraid to fail. It took Thomas Edison 10,000 times before he got the lightbulb right, and when he was asked about those failures, he said, “I have not failed, I’ve just found 10,000 ways that won’t work.” Fail. Fail often. And don’t forget, you can question authority.

If I could put these on stone tablets, or better yet for our era, put them on plastic tablets extruded by a 3D printer, I’d do it. They’re a little long for a tattoo, though 🙂

Visual approaches to data are great — they can allow us to grasp complex issues at a glance, just the way this map from Clear Health Costs shows us the dramatic differences between what different hospitals charge for the same procedure.

As always, click on any image in this blog to see it full size. I leave helpful annotations in the illustrations to these tutorials — so if ya can’t see em, click ’em!

One of the nice things about this way of presenting data is that even though it’s simple, it can be quite revealing. Click here to go to the live data table and use the up/down arrows to sort by the center column, which displays how many full-time officers per 1,000 people served there are.

7 out of the top ten are colleges and universities. Tiny Lasell College has 7.8 officers per 1,000 students — though with only 1,800 students, their high ratio may have more to do with a minimum number of officers needed to staff three shifts to make sure there are officers on duty 24 hours. But scroll down to the bottom — two of the departments with the lowest ratio are also colleges.

Public colleges.

Surprising? Maybe not, but even so, it was a fact I did not immediately pick out when I downloaded the origincal data from UCRStats.com. And the differences are dramatic; state colleges have among the lowest number of full time officers per 1,000 people served of all of the departments listed. So one of the things that higher tuition cost buys? Bigger campus police forces.

Now I will teach you how to make your own sortable, searchable web data tables!

Tabletop.js is a Javascript library that lets you use Google spreadsheets as the data source for web apps. It’s pretty neat — especially since we know there are so many simple but useful web and mobile apps we can create where setting up a full-on database is overkill. What if you want to make a sortable, searchable list of craft breweries? Or a schedule for a music festival? Do you really have to bust out MySQL for that?

Well, with Tabletop.js, you don’t. The other great thing about using Tabletop.js is that a lot more people know how to use a Google spreadsheet than know how to enter records into a database. You can share responsibility for updating your web app with anyone who knows how to use Google Docs, or use Google Forms to let members of the public add new records to a list.

Chris Keller wired up Tabletop.js to a nifty sortable, searchable, and nicely styled web-based data table, which I use here to create a sortable, searchable table of law enforcement agencies in Middlesex County, Massachusetts, where I live. I decided to do that because I live in Watertown, Massachusetts. Until recently, very few people outside of eastern Massachusetts knew about Watertown — where it was, what it looked like, what happened there. That was before the Tsarnaev brothers led police on a car chase through my town, culminating in a gunfight and explosions.

So I thought I’d do my latest plunge into civic data by creating a data table showing all the law enforcement agencies in my county, how many full time police officers per 1,000 residents each police department has, and the population of each city or town. (College police departments are also represented in this table. I thought about taking them out, until I remembered that Sean Collier, the 26 year old police officer who died, was a campus police officer at the Massachusetts Institute of Technology (MIT)). You can see the table here: Sortable, searchable table of law enforcement agencies, Middlesex County, MA.

I’m not sure I ever heard Jay define the term “pressthink,” but I’ll try:

Pressthink: A set of shared, embedded and often unspoken notions that guide the actions of a group of people creating news media for public consumption.

In otherwords, the “pressthink” of a publication and the people who work at it help those people decide what is good work, or bad work; what’s worth doing and not doing, whether coverage is fair or not. (Writing that makes me think that pressthink is inherently moral; perhaps it is the moral philosophy of a news organization).