Data journalism - is it worth it?

Whether it is the desire to replicate the enormous sales successes of the MPs’ expenses and WikiLeaks revelations, or publishers wanting to expand into selling data services, it seems everyone wants to do something with data. The only question, writes Paul Bradshaw, is: where to start?

When Simon Rogers first asked to publish data on the Guardian website, someone asked: "Who on earth would want to look at a spreadsheet online?” It turned out that over 100,000 people would regularly hit the website to do just that. One person's audit, it seemed, was another's sticky content. And the past few years have seen data transformed from conversation killer to hot topic - in both newsroom and boardroom.

Tapping into development talent

For some publishers, the advantage of a data-driven approach to news production is that it allows them to tap into latent development talent within the readership. The Guardian and the New York Times are among an increasing number of media organisations to publish APIs - Application Programming Interfaces - that allow web developers to build new products with their content and - equally importantly - the data surrounding it. In return, the new services can carry advertising sold by the publisher, drive new traffic to the original site, or act as market research to demonstrate demand for a more developed proposition (as happened, for example, with the Guardian’s mobile app).

To stimulate this development, organisations organise 'Hack Days' where developers are invited to spend a day or a weekend creating quick editorial 'hacks'. The investment is minimal when compared to the cost of doing everything in-house: a small amount of staff time, and a lot of pizza.

Hack day events have led to all sorts of outcomes from personalised mobile editions, applications which would alert people to events and route them to the location, even a tool which suggests recipes based on an image uploaded by the user. The Guardian say they benefit from “being able to reach new markets that we might not otherwise find. We grow our vertical ad network through high quality partners [taking part in hack days]. We're also able to offer our end users innovative, clever and useful interactive services provided by experts outside of our domain.”

The age of big data

As content providers, meanwhile, publishers are having to quickly get to grips with what has been described as the age of ‘Big Data’, accelerated by a government transparency agenda which has both demonstrated the appetite of users for data - and the challenges facing publishers in satisfying that.

Trinity Mirror have been faster than most in adapting to the opportunities in data to drive stories and traffic. When Greater Manchester Police decided to use Twitter to 'tweet' every crime reported over 24 hours the Manchester Evening News used free online tools to produce constantly updated visualisations throughout the day, receiving 12,000 page views on just one page. When they followed up with school application data, the central page received over 22,000 page views.

MEN's Paul Gallagher notes that their journalists are now far more likely to see the potential news value of data. “They understand that datasets which are too large to appear in print can be of interest to readers online if they are able to browse the data and select information relevant to them, or if the data is visualised in a way which provides a compelling narrative.”

The Financial Times’ Rob Minto agrees that "the value is the way we've interpreted the data", but also that a dataset has lasting value beyond the stories it generates. “There may be big sets of data that passed through reporters’ inboxes that we need to put into repositories. Obviously with that comes the curation of data and making things up to date. But that’s a problem that’s nice to have - the first thing is making people aware of the data that is washing around the organisation that they might otherwise just bin. We’re getting much better at that.”

Engagement and driving the news agenda

The evidence suggests that data-driven journalism can prove incredibly sticky. Data Blog Editor Simon Rogers notes that users spend five times longer looking at articles related to data than the site average. At a time when publishers are seeking engagement over ‘window shoppers’, these figures stand out.

Databases also stand out when it comes to site metrics. The Texas Tribune, for example, finds that its three dozen databases collectively draw three times as many page views as the site's stories.

In some cases, data journalism itself can drive the news agenda. The Financial Times’ ‘Deficit Buster’ interactive - which allowed users to see the impacts of different parties' deficit-tackling policies - not only proved enormously popular with users, it also formed the basis of an argument between Peter Mandelson and Sky's Adam Boulton at a news conference.

Commercial opportunities

Aside from increased page views and engagement metrics, the commercial opportunities in online data remain largely unexplored. Reuters have been one of those breaking new ground with the Open Calais platform. This project adds 'semantic data' to content, allowing computers to understand key locations, people and organisations that are being referred to. The semantic information allows publishers to map stories or aggregate articles based on a common feature.

Notably, Reuters have made the technology available free to all publishers, including bloggers. This allows them to gather a vast database of information on content across a wide range of news sites, both professional and non-profit. Among the many news organisations using the technology are financial news service Citywire, whose data journalist Mary Hamilton says it allows them to “mash up, syndicate and use both data and editorial content in new ways”.

What Reuters - and Citywire - recognise is that in a world of information overload, value increasingly lies outside the content itself. Speed, convenience, and 'meta data' - information about the content - are taking on an increasingly important role.

In the B2B field, many publishers have been spending the past few months working on plans around commercial opportunities around all three of these. Future’s Stevie Spring has noted the importance of “data-driven, must-have information on your desk on your PC [which] is obviously paid for - and we’re trying to stretch as close to that as we can.” And Haymarket’s Rupert Heseltine recently mentioned selling data as one of the possible solutions to problems in the same market.

They will no doubt be looking at existing successful models such as Reed Business Information’s ICIS Chemical Business, where a website and magazine are designed to act as a ‘funnel’ driving readers towards the more lucrative online data services, including weekly and daily price reports, price histories, and instant price alerts.

The service has value because there is no major price market where this information is available; RBI have built an operation where dedicated price reporters use solid methodologies, and the data has real financial value to the user. Meanwhile, the data also helps support the magazine, which uses the price reporting as the basis for its own comment and analysis.

This is a rare combination. When it comes to the consumer market, things are more difficult. One area that has seen particular expansion, for example, is using data to create personalised products. The New York Times, Washington Post and Gannett have together thrown $12m of funding behind one such product: Ongo, an app which aims to be the Netflix of news. Users of the service will be expected to pay $6.99 per month for a basic package, and extra if they want particular content.

But personalisation is notoriously problematic, especially for publishers of content that changes regularly. One founder of a previous personalised news service, Yaron Galai, warns against falling for the Netflix comparison, because, “The corpus of documents that can be recommended is pretty much swapped every 24 hours. The system has to get smart about over and over.”

Instead, the more intelligent - and cheaper - route may be to delegate personalisation to existing platforms such as Facebook, as the Independent did recently with its innovative decision to allow users to ‘Like’ individual columnists, football teams, politicians or countries and receive news relating to those in their Facebook feed. Sporting News, which experimented with a similar project, experienced a 500% spike in traffic as a result. Again, when it comes to demonstrating user engagement to advertisers, this is powerful material.

Back at Reuters, Mark Jones has been working on a project that focuses on analysing the activity of micro communities in the financial markets. "We use network metrics to see who's active, who's most active, who is most connected, and how they use the system - and then feeding that data back into our processes."

Aside from personalisation of content, there are clear commercial possibilities for personalised advertising, especially where users are navigating data-driven interactive features. Imagine offering a user the choice to navigate through information on various cars while saying to an advertiser: “Your advert will only be shown to users who are interested in safety features, or who want a family car” or, "This rate card shows you different rates depending on the level of targeting you require." The technology for these propositions already exists - the only thing lacking is the first publisher to seriously throw their weight behind it. Who would bet against them?

Paul Bradshaw is a freelance media trainer and consultant who publishes the Online Journalism Blog. He is Visiting Professor at City University and leader of the MA Online Journalism course at Birmingham City University.