Trying to find useful things to do with emerging technologies in open education and data journalism

Notes on Narrative Science and Automated Insights

In October 2009, the New York Times Media Decoder blog picked up on a story that had been doing the rounds about a research project called Stats Monkey from the Intelligent Information Laboratory at Northwestern University. The Robots Are Coming!, it declared, with the immediate rejoinder, Oh, They’re Here. Using play by play baseball data, Stats Monkey produced human readable reports of a baseball game, formulaic admittedly, but good enough, particularly when complemented by quotes from a post-match press conference report. Mechanical churnalism complementing data-driven analysis, cast into prose. (It’s worth noting that the Media Decoder post itself is little more than a restatement of what was presumably the Stats Monkey website blurb at the time.)

(What other companies/apps are out there for crowdsourcing sports analytics in this way, I wonder?)

Using GameChanger data and narrative Science story generation tools, it was possible to automate the creation of match reports for small number audiences. I don’t know if these stories used to be freely accessible, but today the match reports appear to take the form of paywalled notion of recap stories.

You can also just search for the byline, as for example it appears in this report:

In passing, it’ll be interesting to see how automatically generated stories start to feed into the glitch aesthetic (h/t @danmcquillan for introducing me to this phrase and the related notion of the new aesthetic in his presentation at #opentech last week).

What this example, and the GameChanger example, show is how the generation of timely text stories can be automated on top of the regularly updated datasets. The use of natural language interpretive text to describe patterns observed in the underlying data presumably also has SEO benefits.

More recently, Automated Insights have started producing realtime content feeds to support sports commentators – Real-time Insights for MLB – as well as feeding consumers via the stat.us powered Twitter feeds.

(See also: yseop, a French company that generates automated reports from data. [Any more?])

What I thought was particularly interesting about the ProPublica example was how it suggests a possible widespread future use of “automatically generated insight” pulling out headline interpretations from open data sets, as touched on in this great introductory technical presentation by Narrative Science’s Larry Adams (which also happens to mention the possibility of Narrative Science offering platform services via an API…? It also mentions work with the NHS?):

At one point during that presentation, Larry Adams suggests that Narrative Science use small set of narrative templates or story types (“the horserace” for example, or “top 10”) to frame the construction of their stories, as well as mentioning the sorts of feature that they look for within a data set (trends and changes in trends, for example, or outliers). Another presentation, this time by Narrative Science’s Kris Hammond also hints at some of the features they look for in data: “inflexion points, trends, correlations”.

So what sorts of techniques might we use ourselves to start generating the insights that we might be able to work up into simple narrative sentences, at least for starters?

Hi Tony and Fridolin,
Great post! In your article, you mention briefly Yseop (“a French company that generates automated reports from data”).

I work for Yseop, and I wanted to tell you that we do offer our users a package which enables them to manage their own business rules and text. With Yseop, you construct tailor-made applications on our technology, ensuring that you have complete control over the output and that the data remains on your servers. We provide you with a full development kit, as well as series of frameworks, which will help you in building your own text-generating applications in no time.

If you want to see how it works, please go to our home page and request an evaluation package. One of the members of our team will get in contact with you and provide you with an introductory development package as well as a set of tutorials that will give you an idea of how easy our technology is to work with and how one goes about building an application, including adding business rules.

The central differences between Yseop and the other companies you cited in your article is that we are able to write text in multiple languages (i.e. analyze data and explain what it means in French, English, Spanish, German, etc.) and that users are able to build and construct applications on their own (i.e. manage their own business rules and text) and that data remains on their servers.