Imagine finishing a novel, and realizing that it’s one of the best novels you’ve ever read. Then, someone tells you that the novel was written by a robot. Would you believe them?

Today, the world of linguistics and artificial intelligence is in the earliest, pioneering development stage of “bot” authors. At present, at least two of the most significant content producers on the Internet – Wikipedia and Associated Press – both use robots to write online articles.

The Wiki Bot

The virtual droid that received the greatest press recently is a Wikipedia bot named Lsjbot. It is the creation of Sverker Johansson of Sweden, who wrote the code to scrape information from a number of trusted sources, for the purpose of piecing together short articles called “stubs” on topics related to animal taxonomy.

Lsjbot reportedly pumps out 10,000 articles per day and so far has written over 2.7 million articles, all of them human-readable and intelligible. Media reports like that at Popular Science cite that this represents “8.5 percent of the articles on Wikipedia”. However, as the Wikimedia blog explains, those Swedish-language articles make up a very large bulk of articles on Swedish Wikipedia, but none of those make up the much more popular and voluminous English Wikipedia.

Just about any area you search for on Wikipedia likely has its first draft Wiki article created by the Rambot. Even the tiny little 800 person town where I grew up has its own Wikipedia page, created in 2002!

Other Wikipedia article-producing bots through the years included:

Robbot – A bot that initially was used to resolve interlanguage links, and eventually to resolve disambiguation page links.

Asteroids – This bot scraped NASA data and wrote thousands of Wiki articles about asteroids.

Today there just under a thousand Wiki bots that prowl Wikipedia, constantly making edits to existing pages whenever errors or omissions are found. The most active is Cydebot, which to date has made over 4.5 million edits to Wikipedia pages.

Other Bot-Created Content

In July of this year, the Associated Press announced that it would be producing automated, robot-written business articles. Forbes reportedly uses bots to post short stock-based articles about companies that are doing well in the market.

The most impressive use of bot-technology for article creation was that of journalist/programmer Ken Schwenche of the Los Angeles Times, who wrote a program called Quakebot to automatically write articles about earthquakes only moments after they occur. The data for the articles comes directly out of the U.S. Geological Survey alerts. In a Slate interview, Ken reported that just this year, thanks to Quakebot, LAT became the first media outlet to report on a morning tremor within three minutes of the event actually occurring.

The post consisted of only four short paragraphs, and was fabricated by interlacing the relevant data with a pre-written template that Schwencke had created ahead of time.

Writing About Complex Stories

Of course, linguistics has been a part of Artificial Intelligence for a very long time. In the article “Artificial Intelligence”, published in the Handbook of Pragmatics, the authors wrote:

Generating an extended piece of discourse involves some careful amount of planning. This complex task has conveniently been divided into two subtasks: deciding what to say and deciding how to say.

In other words, AI scientists, in attempting to get a machine to create discourse that appears authentic to humans, not only need to piece together the right words to say, but the “bot” also needs to understand how to say those things within the context of the subject matter. This is difficult enough for the human mind, where appreciation for context is embedded into children from a very young age. For machines, it’s a whole different ballgame.

Generating discourse is a multiple constrained process in which various knowledge sources should be taken into account: knowledge of the domain of discourse, the situational context and past discourse, as well as knowledge about the interlocutor or reader.

Understanding the subject matter, having a knowledgebase of existing information and data out there, and most importantly actually understanding what the reader wants, are all critical pieces of piecing together not only informational text, but also for creating more abstract writing like creative fiction.

Authors — even very young authors — learn to do this at an intuitive level. For programmers to create artificial intelligence that can do the same thing, it requires a level of algorithm generation (and self-teaching) that is still far more advanced than what the data-scraping bots of Wiki, Associated Press and others are yet capable of. Yet, these authors described how it isn’t impossible.

First, new symbols and structures can be created dynamically during program execution. Second, structures can be recursively defined and can thus represent a potentially infinite number of actual structures. And third, programs are also symbolic structures and can thus be created or manipulated by other programs.

If you look at the attempt by Ken Schwenche to use Quakebot in generating quick, accurate articles about earthquakes, you’ll see that some folks are playing simple games by formulating templates that the program can use to simply insert the data where it needs to go, and the article “sounds” like it was human generated — but only because it actually was human generated ahead of time.

Programs That Write Like Humans

What the programmers at Narrative Science are doing is taking complex data — whether it’s the scoring pattern and player stats during the course of a professional football game, or the stock values and business data about companies — and using the data itself to formulate exactly what needs to be said and how to say it.

So, for example in 2011, The New York Times provided a snippet of a sports report provided by Narrative Science, which shows just what this technology is capable of.

WISCONSIN appears to be in the driver’s seat en route to a win, as it leads 51-10 after the third quarter. Wisconsin added to its lead when Russell Wilson found Jacob Pedersen for an eight-yard touchdown to make the score 44-3 .

As you can see, Narrative Science creates an algorithm that uses both the context (sports), and the data (scores and player stats), to formulate a report that sounds exactly like what sports fans would expect to read from a human writing about sports.

Where Bots Go From Here

Even this impressive use of data analysis and AI linguistics is very limited in scope and capability. Company founder Kris Hammond made an over-the-top claim that in 20 years the company’s own computer program might be able to win a Pulitzer Prize in journalism.

While the enthusiasm is commendable, the reality is that it’ll probably take well over twenty years to accomplish that feat.

Case in point: Just this year, researchers at the University of New South Wales in Australia created a computer program that they called the “Moral Storytelling System”. The goal was the have the system create a fable based on user preferences.

The Moral Storytelling System created a fable that was intended to portray the lesson of retribution, where a fairy is punished for stealing a knight’s sword. This is the story that the computer program came up with.

Once upon a time there lived a unicorn, a knight and a fairy. The unicorn loved the knight.
One summer’s morning the fairy stole the sword from the knight. As a result, the knight didn’t have the sword anymore. The knight felt distress that he didn’t have the sword anymore. The knight felt anger towards the fairy about stealing the sword because he didn’t have the sword anymore. The unicorn and the knight started to hate the fairy.
The next day the unicorn kidnapped the fairy. As a result, the fairy was not free. The fairy felt distress that she was not free.

Not exactly an award-winning story. This is what the folks at Narrative Science and others like it are up against. If they want that Pulitzer, they’ve got a long way to go.

Your email address will not be published. Required fields are marked *

Comment

Name *

Email *

Saikat B

September 6, 2014 at 3:29 pm

A bit crazy perhaps, but isn't there a real and present danger of a bot running wild and creating chaos? There are rollback features and kill controls of course, but Wikipedia administrators could be hard-pressed if such a situation ever develops. Just wondering.

It seems that plenty of Internet comments are already written by robots, as they regurgitate information they saw elsewhere. Maybe science of linguistic has already succeeded and have unleashed an army of comment-writing robots on unsuspecting public. Well played, Science, well played indeed.

Reading hardcopy and lectronic media, listening to TV and radio, one wonders if much of the content is not already created by robots. It is formulaic and lacks the human touch. Maybe the old saw about monkeys banging on typewriters creating the complete works of Shakespeare has finally come true.

Ryan Dube is MUO's Programming Editor. Ryan has a BSc degree in Electrical Engineering. He's worked 13 years in automation engineering, 5 years in IT, and now is an Apps Engineer. He's spoken at national conferences on Data Visualization and has been featured on national TV and radio.