Start With the Data, Finish With a Story

Written by:

To draw your readers in you have to be able to hit them with a headline figure that makes them sit up and take notice. You should almost be able to read the story without having to know that it comes from a dataset. Make it exciting and remember who your audience are as you go.

One example of this can be found in a project carried out by the Bureau of Investigative Journalism using the EU Commission’s Financial Transparency System. The story was constructed by approaching the data set with specific queries in mind.

We looked through the data for key terms like ‘cocktail’, ‘golf’ and ‘away days’. This allowed us to determine what the Commission had spent on these items and raised plenty of questions and story lines to follow up.

But key terms don’t always give you what you want, sometimes you have to sit back and think about what you’re really asking for. During this project we also wanted to find out how much Commissioners spent on private jet travel but the as the data set didn’t contain the phrase ‘private jet’ we had to get the name of their travel providers by other means. Once we knew the name of the service provider to the Commission, ‘Abelag’, we were able to query the data to find out how much was being spent on services provided by Abelag.

With this approach we had a clearly defined objective in querying the data; to find a figure that would provide a headline, the color followed.

Another approach is to start with a blacklist and Look for exclusions. An easy way to pull storylines from data is to know what you shouldn’t find in there! A good example of how this can work is illustrated by the collaborative EU Structural Funds project between the Financial Times and the Bureau of Investigative Journalism.

We queried the data, based on the Commission’s own rules about what kinds of companies and associations should be prohibited from receiving structural funds. One example was expenditure on tobacco and tobacco producers.

By querying the data with the names of tobacco companies, producers and growers we found data that revealed British American Tobacco were receiving €1.5m for a factory in Germany.

As the funding was outside the rules of Commission expenditure, it was a quick way to find a story in the data.

You never know what you might find in a dataset, so just have a look. You have to be quite bold and this approach generally works best when trying to identify obvious characteristics that will show up through filtering (the biggest, extremes, most common etc.).