Productizing the Data Reporting Process

At least that’s the way it looks from a manufacturing standpoint. When we see great reporting — the kind that wins Pulitzer Prizes and awards from the nonproft Investigative Reporters & Editors — the product we’re delivered is just the tip of the iceberg compared to the process that it takes to get the story done. And the revenue these stories generate for news organizations almost always pales in comparison to the hundreds of thousands of dollars it costs to get them done right.

In a world where news organizations are collecting data about page views and watch time in order to sell more ads and better understand audience preferences, there’s less tolerance for a minute of newsroom effort that doesn’t yield a content product that can be counted and monetized. The short-term view of hedge funds that are stripping and flipping small newspapers only exacerbates that impatience for newsroom production.

It’s easy for an investigative story that requires some real digging to take six months of work before any of that effort sees the light of day. And that’s in a best case scenario when a story idea pans out. Even reporters who are really cranking out enterprise pieces often spend two weeks just following dead-end leads at a cost of perhaps $3,000 that has to be wrapped into the overall cost of any story that does eventually pan out.

For independent investigative journalism to survive anywhere below the national level, we have to find a way to lower the costs of doing this kind of reporting as well as increase the revenue that news organizations get from the product once it is published.

Increasingly at Reese News Lab, we are seeing data as being an important foundation to both of those imperatives.

The modern use of data in newsrooms is often traced back to the book Precision Journalism, written in 1973 by Phil Meyer, who later went on to become a Knight Chair in Journalism here at UNC. Two years before I was born Meyer argued — and had demonstrated — that by analyzing data using statistics and social science methods, you could find hidden stories. That interest took off and became known as “computer assisted reporting” in the early 1990s. And in the last 10 years, the ability to publish data on the web in interactive graphics and web apps and the increased availability of computing power and open-source statistical tools saw another boost in “data journalism.” I trace this most recent spike to Adrian Holovaty’s 2006 blog post, “A fundamental way newspaper sites need to change.”

These two flavors of computational journalism — computer-assisted reporting and data journalism — have always been a little at odds, but they are drawing closer together. If you talk about “computer assisted reporting” you’re probably talking about wrestling data away from a government agency that doesn’t want to give it up and then analyzing it and using the analysis to explain a broken system or the bad actions of a powerful person. If you talk about “data journalism,” you’re probably talking about taking data that you’ve been given and turning it into a useful visualization or tool that allows readers to search and sort the data to customize their experience in some way.

That said, the two fields are merging and that’s probably good for the economics of quality reporting across the board.

Jay Hamilton’s incredibly detailed book Democracy’s Detectives makes the best argument I’ve seen for using computer automation to lower the costs of accountability and explanatory journalism. And in his visit to the Lab earlier this year, Spotify’s former VP of product Shiva Rajaraman introduced me to the idea of the “platform stack” — in which data serves as the foundation of a good user experience that generates revenue.

At the Reese News Lab, we’ve been melding these two uses of data — for accountability reporting and product creation — for some time. And this summer, with the addition of two data analysis fellows – we’re increasing the pace at which we’re trying to bring them together to create a sustainable future for journalism.

In the summer of 2014, students working in the Lab came up with the idea of a product called Legal Stats, which would use data from North Carolina’s Administrative Office of the Courts as the foundation for a tool that would help users decide their best economic option after receiving a traffic ticket. By all accounts, the prototype they pitched at American Underground that August was both desirable and viable. Over the next year, it went on to win at least $7,000 in campus-wide entrepreneurship competitions.

But one of the challenges that we ran into was the idea’s feasibility: the data was not always clean or clear; the legal process seemed filled with loopholes and rare but important conditions to consider; and we didn’t know how to pull off the analysis with sufficient statistical validity.

That’s why the advice of News & Observer database editor David Raynor was so vital to the next step of the plan. He’s a reporter who has used the database to develop stories that had to be accurate and relevant to meet the paper’s publication standards, and he’s helped provide some valuable insights as we’ve continued to develop the product this summer under the lead of UNC statistics majors Ishan Shah and Katherine Wang. Along the way to developing the statistics for the product, they’ve come across some interesting potential story ideas about the justice system in the state as well as the ability to quickly localize their insights for any small community.

Similarly, Scott Smith, a statistics master’s degree student, is working with Iain Carmichael, a doctoral candidate in statistics, to analyze voting records and election results in North Carolina. Ultimately, we hope their work yields both a product or products that consumers find desirable enough to generate revenue as well as a tool that lowers the cost of explanatory and accountability reporting for journalists.

Soon we will be posting a job opening for a data scientist to come work with us in the Lab to help us automate and productize the reporting process in an effort to reduce the waste that has historically been inherent with great investigative journalism that has tremendous public benefit.

The kind of automation I expect we’ll be able to achieve is probably more likely to be along the lines of lead generation for investigative work, but it may also be automated and personalized content generation for more standard event-driven daily watchdog reporting.

On the product development side, data — including better data about audiences — can help get the right information to the right people at the right time. But it also might yield entirely new services or experiences beyond the traditional news report.

If we can show that that data plays an important role as both a cost-savings and revenue-generating strategy, then we’ll be able to teach the method to local news organizations that desperately need a viable business model in order to serve their communities.