tags

Automated Stories: Using Algorithms to Craft News Content

“Boom! Brown defeated Cornell by 6 points the last time they played (Feb 22, 2014)”

That there is a piece of StatSmack, an automatically generated snippet of text produced by Stat Sheet, designed for Brunonians like myself to trash talk on social media. Sure, no Pulitzer there, but it’s a creative application of text generation algorithms being used to create a new experience and opportunity to engage, directly driven from the data.

Automated Insights is the parent company of Stat Sheet, and, like its competitor Narrative Science, it uses algorithms to automatically analyze structured data and produce readable texts, reports, and dashboards. “Human insight at machine scale” reads Narrative Science’s website. New analytics services like Echobox are now coming online as well, producing readable and actionable pieces of editorial advice written in plain English, from nothing more than the stream of clicks and shares on your site.

Other automation efforts have involved using algorithms to provide context for a story, an activity that journalists often engage in when making sense of an ongoing event. A research paper from 2012 developed a technique that analyzes the statistics of a baseball game as it unfolds and suggests color commentary to liven things up during a slow spell. And my own recent research has look at automatically annotating charts and maps to help explain the context of outliers or salient trends. All of these techniques can enrich a data story and provide additional entry points and avenues for engagement with the content.

And just because it’s automated doesn’t mean it’s robotic sounding either. A paper published just last week by Christer Clerwall showed in evaluations that readers couldn’t tell the difference between a football game recap written by Automated Insights and one written by a human journalist. The algorithmically generated story garnered slightly higher scores on accuracy, trustworthiness, and objectivity ratings, but the journalist’s story was statistically “more pleasant to read.” Given the limited nature of the study (e.g. just one piece of content, and just one algorithm) it’s hard to draw final conclusions, but it does seem that algorithmically generated text can do just about as good as people in some cases, like game recaps.

Nicholas Diakopoulos is a Tow Fellow working on the Tow Center’s Data Journalism Project at the Tow Center for Digital Journalism. The Data Journalism Project is a project made possible by generous funding from both The Tow Foundationand theJohn S. and James L. Knight Foundation. The Data Journalism Project includes a wide range of academic research, teaching, public engagement and development of best practices in the field of data and computational journalism. Follow Nicholas Diakopoulos @ndiakopoulos. To learn more about the Tow Center Fellowship Program, please contact the Tow Center’s Research Director Taylor Owen: taylor.owen@columbia.edu.