Why we made this change

Visitors are allowed 3 free articles per month (without a subscription), and private browsing prevents us from counting how many stories you've read. We hope you understand, and consider subscribing for unlimited online access.

Machine reading effort builds dossiers on people and organizations from translated news sources.

They look a bit like communally written Wikipedia pages. But these articles—concise profiles of people and organizations, complete with lists of connected organizations, people, and events—were in fact written by computers, in a new bid by the Pentagon to build machines that can follow global news events and provide intelligence analysts with useful summaries in close to real time.

The prototype system is part of a nonpublic site built for intelligence agencies by Raytheon BBN in Cambridge, Massachusetts, and scheduled for delivery to the government later this year. It gathers information from 40 news websites written in English, Chinese, and Arabic, and eventually it will cover hundreds of news sites in all major languages. Ultimately the system will be linked with an existing TV broadcast monitoring network.

On the new site, if you search for information on the Nigerian jihadist movement Boko Haram, you get this entirely computer-generated summary: “Founded by Mohammed Yusuf in 2002, Boko Haram is led by Ibrahim Abubakar Shekau. (Former leaders include Mohammed Yusuf.) It has headquarters in Maiduguri. It has been described as ‘a new radical fundamentalist sect,’ ‘the main anchor for mayhem in the state,’ ‘a fractured sect with no clear structure,’ and ‘the misguided extremist sect.’ “

To be sure, Wikipedia’s Boko Haram entry is clearer. But the BBN system captures everything that appears on news sites—not just on topics people chose to write Wikipedia pages about—and constantly and automatically adds information, says Sean Colbath, a senior scientist at BBN Technologies who demonstrated the technology. “I could go and read 200 articles to learn about Bashar Al-Assad (the Syrian dictator). But I’d like to have a machine tell me about it,” says Colbath. (The system, by the way, picks up the fact that the brutal Al-Assad is also a licensed ophthalmologist.)

It starts by detecting an “entity”—a name or an organization, such as Boko Haram, accounting for a variety of spellings. Then it identifies other entities (events and people) that are connected to it, along with statements made by and about the subject. “It’s automatically extracting relationships between entities,” Colbath says. “Here the machine has learned, by being given examples, how to put these relationships together and fill in those slots for you.”

The Boko Haram page goes on to list associated organizations and statements by and about the group. Clicking on any of them takes you back to original news sources, many of them translations of articles originally published in Arabic by sites such as Al Sharq in Qatar and Al Balad in Lebanon.

The BBN project is the fruit of the Defense Advanced Research Projects Agency’s latest effort to build machines that read as humans do, a decades-old problem that has been the focus of increasing research in recent years. Under DARPA’s research program, prototypes have been built by SRI International and IBM as well as Raytheon BBN.

Bonnie Dorr, DARPA’s program manager for the project, says the technology incorporates recent improvements in machine reading, enabling it to do a better job of understanding when the same underlying event is described in multiple ways—such as “Joe is married to Sue” and “Sue is Joe’s spouse”—and to determine the sentiment implied in phrases like “really awesome.”

Automatically summarizing text is notoriously tricky given the difficulty of detecting humor, sarcasm, obviously incorrect information, idioms, and variant spellings and syntax, not to mention the problems involved in interpreting and translating information sources in different languages.

Page views: This entry on the Muslim Brotherhood was composed by computers using information gathered from online news sources.

Accordingly, many of the system’s results come across as a bit wooden or off-key. The profile of Barack Obama, for example, correctly identifies him as the president of the United States, but then summarizes him this way: “Obama has been described as ‘Nobel Peace Prize winner,’ ‘the only reasonable guy in the room,’ ‘an anti-apartheid campus divestment activist,’ and ‘the most trusted politician in the CR-poll.’ “

At another point it notes, “Obama is married to Michelle LaVaughn Robinson Obama; other family members include Henry Healy, Malia Obama, and Ann Dunham.” (Healy is a distant Obama cousin from Moneygall, Ireland. Obama’s younger daughter, Sasha, isn’t mentioned.)

The system lacks real-world knowledge that would help a human analyst recognize something as false, humorous, or plainly irrelevant. Indeed, some of the outputs can be a little comical. I looked up Abraham Lincoln and found that the statements attributed to him include a number of accurate ones (though nothing from his most famous speech, the Gettysburg Address). Then I stumbled across this quote, which seems to have been produced when the system got itself snagged in some published list of famous sayings and did its best to synthesize them. “Abraham Lincoln says that the point of honey one fishing of flies more than fish barrels of a bitter pill, as well as the case for humans,” the profile reports.

Humans aren’t going to be completely replaced anytime soon.

Tech Obsessive?Become an Insider to get the story behind the story — and before anyone else.

Tagged

Credit

I’m MIT Technology Review’s senior writer, interested in a wide range of topics including climate change, energy, and information and communication technologies. Recent projects have included traveling to China to write about GMO crop… More development there, and Germany to explore how they’ll try to ramp up renewable power while closing down nuclear plants. My 2008 feature on the Obama campaign’s social-networking operation was selected for The Best Technology Writing 2009.

You've read
of three
free articles this month.
Subscribe now for unlimited online access.
You've read
of three
free articles this month.
Subscribe now for unlimited online access.
This is your last free article this month.
Subscribe now for unlimited online access.
You've read all your free articles this month.
Subscribe now for unlimited online access.
You've read
of three
free articles this month.
Log in for more, or subscribe now for unlimited online access.
Log in for two more free articles, or subscribe now
for unlimited online access.