Presenting Data to the Public

There are lots of different ways to present your data to the public — from publishing raw datasets with stories, to creating beautiful visualizations and interactive web applications. We asked leading data journalists for tips on how to present data to the public.

To Visualize or Not to Visualize?

There are times when data can tell a story better than words or photos, and this is why terms like “news application” and “data visualization” have attained buzzword status in so many newsrooms of late. Also fuelling interest is the bumper crop of (often free) new tools and technologies designed to help even the most technically challenged journalist turn data into a piece of visual storytelling.

Tools like Google Fusion Tables, Many Eyes, Tableau, Dipity and others make it easier than ever to create maps, charts, graphs or even full-blown data applications that heretofore were the domain of specialists. But with the barrier to entry now barely a speed bump, the question facing journalists now less about whether you can turn your dataset into a visualization, but whether you should. Bad data visualization is worse in many respects than none at all.

Using Motion Graphics

With a tight script, well timed animations and clear explanations motion graphics can serve to bring complex numbers or ideas to life, guiding your audience through the story. Hans Rosling’s video lectures are a good example of how data can come to life to tell a story on the screen. Whether or not you agree with their methodology I also think the Economist’s Shoe-throwers' index is a good example of using video to tell a numbers-based story. You wouldn’t, or shouldn’t, present this graphic as a static image. There’s far too much going on. But having built up to it step by step you’re left with an understanding of how and why they got to this index. With motion graphics and animated shorts, you can reinforce what your audience is hearing from a voice-over with explanatory visuals provides a very powerful and memorable way of telling a story.

Telling the World

Our workflow usually starts in Excel. It is such an easy way to quickly work out if there’s something interesting in the data. If we have a sense that there is something in it then we go to the news desk. We’re really lucky as we sit right next to main news desk at the Guardian. Then we look at how we should visualize it or show it on the page. Then we write the post that goes with it. When I’m writing I usually have a cut down version of the spreadsheet next to the text editor. Often I’ll do bits of analysis while I’m writing to pick out interesting things. Then I’ll publish the post and spend a bit of time Tweeting about it, writing to different people and making sure that it is linked to from all the right places.

Half of the traffic from some of our posts will come from Twitter and Facebook. We’re pretty proud that the average amount of time spent on a Datablog article is 6 minutes, compared to an average of 1 minute for the rest of the Guardian website. 6 minutes is a pretty good number and time spent on the page is one of the key metrics when analyzing our traffic.

This also helps to convince our colleagues about the value of what we’re doing. That and the big data-driven stories that we’ve worked on that everyone else in the newsroom knows: COINS, Wikileaks and the UK riots. For the COINS spending data, we had 5-6 specialist reporters at the Guardian working to give their views about the data when it was released by the UK Government. We also had another team of 5-6 when the UK government spending over £25k data was released — including well known reporters like Polly Curtis. Wikileaks was also obviously very big, with lots of stories about Iraq and Afghanistan. The riots was also pretty big, with over 550k hits in two days.

But it is not just about the short term hits: it is also about being a reliable source of useful information. We try to be the place where you can get good, meaningful information on topics that we cover.

Publishing the Data

We often will embed our data onto our site in a visualization and in a form that allows for easy download of the dataset. Our readers can explore the data behind the stories through interacting in the visualization or using the data themselves in other ways. Why is this important? It increases the transparency of The Seattle Times. We are showing the readers the same data that we used to draw powerful conclusions. And who uses it? Our critics for sure, as well as those just interested in the story and all of its ramifications. By making the data available we also can enlist tips from these same critics and general readers on what we may have missed and what more we could explore — all valuable in the pursuit of journalism that matters.

Opening up your Data

Giving news consumers easy access to the data we use for our work is the right thing to do for several reasons. Readers can assure themselves that we aren’t torturing the data to reach unfair conclusions. Opening up our data is in the social science tradition of allowing researchers to replicate our work. Encouraging readers to study the data can generate tips that may lead to follow-up stories. Finally, engaged readers interested in your data are likely to return again and again.

Starting an Open Data Platform

At La Nación, publishing open data is an integral part of our data journalistic activities. In Argentina there is no Freedom of Information Act and no national data portal, so we feel strongly about providing our readers with access to the data that we use in our stories.

Hence we publish raw structured data through our integrated Junar platform as well as in Google Spreadsheets. We explicitly enable and encourage others to reuse our data, and we explain a bit about how to do this with documentation and video tutorials.

Furthermore we’re presenting some of these datasets and visualizations in our NACION Data blog. We’re doing this in order to evangelise about data and data publishing tools in Argentina, and show others how we gathered our data, how we use it and how they can reuse it.

Since we opened the platform in February 2012 , we’ve received suggestions and ideas for datasets, mostly from academic and research people, as well as students from universities that are very thankful every time we reply with a solution or specific dataset. People are also engaging with and commenting on our data on Tableau and several times we have been the most commented and top viewed item on the service. In 2011 we had 7 out of the top 100 most viewed visualizations.

Making Data Human

As the discussion around big data bounds into the broader conscious, one important part has been conspicuously missing — the human element. While many of us think about data as disassociated, free-floating numbers, they are in fact measurements of tangible (and very often human) things. Data are tethered to real lives of real people, and when we engage with the numbers, we must consider the real-world systems from which they came.

Take, for example, location data, which is being collected right now on hundreds of millions of phones and mobile devices. It’s easy to think of these data (numbers that represent latitude, longitude, and time) as ‘digital exhaust’, but they are in fact distilled moments from our personal narratives. While they may seem dry and clinical when read in a spreadsheet, when we allow people to put their own on a map and replay them, they experience a kind of memory replay that is powerful and human.

At the moment, location data is used by a lot of ‘third parties’ — application developers, big brands, and advertisers. While the ‘second parties’ (telecoms & device managers) own and hold the data, the ‘first party’ in this equation — you — has neither access or control over this information. At the NYTimes R&D group, we have launched a prototype project called OpenPaths (openpaths.cc) to both allow the public to explore their own location data, and to experience the concept of data ownership. After all, people should have control of these numbers that are so closely connected to their own lives and experiences.

Journalists have a very important role in bringing this inherent humanity of data to light. By doing so, they have the power to change public understanding — both of data and of the systems from which the numbers emerged.

Open Data, Open Source, Open News

2012 may well be the year of open news. It’s at the heart of our editorial ideology and a key message in our current branding. Amidst all this, it’s clear that we need an open process for data-driven journalism. This process must not only be fuelled by open data, but also be enabled by open tools. By the end of the year, we hope to be able to accompany every visualization we publish with access to both the data behind it and the code that powers it.

Many of the tools used in visualization today are closed source. Others come with restrictive licences that prohibit the use of derivative data. The open source libraries that do exist often solve a single problem well but fail to offer a wider methodology. All together, this makes it difficult for people to build on each others work. It closes conversations rather than them opening up. To this end, we are developing a stack of open tools for interactive storytelling — the Miso Project (@themisoproject).

We are discussing this work with a number of other media organizations. It takes community engagement to realise the full potential of open source software. If we’re successful, it will introduce a fundamentally different dynamic with our readers. Contributions can move beyond commenting to forking our work, fixing bugs or reusing data in unexpected ways.

Add A Download Link

In the past few years, I’ve worked with a few gigabytes of data for projects or articles, from scans of typewritten tables from the 1960’s to the 1.5 gigabytes of cables released by Wikileaks. It’s always been hard to convince editors to systematically publish source data in an open and accessible format. Bypassing the problem, I added “Download the Data” links within articles, pointing to the archives containing the files or the relevant Google docs. The interest from potential re-users was in line with what we see in government-sponsored programs (i.e. very, very low). However, the few instances of reuse provided new insights or spurred conversations that are well worth the few extra minutes per project!

Know Your Scope

Know your scope. There’s a big difference between hacking for fun and engineering for scale and performance. Make sure you’ve partnered with people who have the appropriate skill set for your project. Don’t forget design. Usability, user experience and presentation design can greatly affect the success of your project.