Software to allow security officials to better search and translate documents in foreign languages, especially Arabic, has been demonstrated at a technology show in Las Vegas, as Clark Boyd reports.

Hi-tech tools are helping to searching for terror suspects

There is an old saying in computing - garbage in, garbage out. And never has the world been so awash in digital garbage.

This "needle in a haystack" problem is compounded even further for US intelligence officers on the hunt for, say, Osama Bin Laden.

For starters, American intelligence agencies are short on people who are competent in Arabic, or even want to be.

"What we've seen is that the folks in DC would most like to work in English if they could," said Steven Cohen of Basis Technology, a US company specialising in text analyzing software.

"And there are some outposts where they are still attempting to work with Arabic as if it were English, or convert it to English. That's not effective.

"Our approach is to try to treat a language and deal with it and analyze it and search it in its own script," he said.

Rich language

Basis developed a program designed to allow a non-Arabic reader to search Arabic texts. It is not a translation program but rather more like a text mining program.

You put in your search query in English. The program then searches and finds where your query occurs in the Arabic text, and highlights it.

One of the challenges is that Arabic is a rich, complex language. While al-Qaeda has one spelling in Arabic, it has many variations when it is written in English.

"The problem is the 'qua' or 'qui'", said Mr Cohen, "it could be written many different ways."

"One of the first challenges, and one that we're talking to a lot of different agencies about, is using phonic matching approach, where you can match all of the half a dozen different ways of writing al-Qaeda."

It's the way in which we use it, and integrate it, the organization that backs it up, that makes it valuable in security

US officials are desperate for these kinds of technologies. They want to be able to find documents that contain, for example, the words Osama Bin Laden and chemical weapons.

Those documents, be they scanned documents, webpages, or intercepted e-mails, could then be passed along to Arabic language experts for further scrutiny.

At least five companies at last week's Government Convention on Emerging Technologies exhibition in Las Vegas offered some kind of language components in their search software.

Basis Technology's John Machonis says companies are simply responding to the need.

"It is very important for them to find documents that might have a name, a place and a date, and then relate that to other documents that have similar names," he said.

"Then they start forming a kind of relationships between all these people. By doing that they can start forming ideas about the movements of these people, what's happening next, and that's what this show is all about - finding the kinds of technologies that do that."

Natural selection

Not all of the language technologies on display in Las Vegas rejected the idea that computers cannot adequately translate Arabic documents directly into English.

Language Weaver is a California-based company that is working with something called statistical natural language processing.

The tools could help in the search for Osama bin Laden

The idea is to train the software using existing human translations. In a sense, the program learns to translate in a more human fashion, the more information is fed to it.

"The first advantage is that it's very natural sounding. The statistical approach gives the system the ability to judge how close it is to real natural language," said Language Weaver's Laurie Gerber.

"The second advantage is that because it learns automatically, you can develop new language pairs very quickly.

"The third advantage is, by the same automatic learning capability, we can customize the system to any subject area.

Language Weaver has just launched its Arabic-to-English version. Government officials could use such tools to keep abreast of developments in the Arab press, for example.

The technology could also be used to aid field translation for US soldiers.

"One of the big things right now is being able to scan and translate documents in the field, sometimes in quite degraded condition, and they need to be able to figure out is this something of strategic importance," said Ms Gerber.

"And so even if it's quite rough, it still can give them a new degree of strategic intelligence in the field."

Clamour for software

Strategic intelligence is crucial, whether it comes from a Basis style search engine, or a Language Weaver type translation.

Language Weaver treats text like people do

And for the government, the choice should not necessarily be one or the other.

Stephen Gale, of the Center for Terrorism, Counter-Terrorism and Homeland Security at the Foreign Policy Research Institute in Philadelphia, argues that many of the technologies on display in Las Vegas were useful, if the government decides to use them appropriately.

"Many of the examples that you see here are very interesting and strong components of an entire system.

"It's the way in which we use it, and integrate it, the organization that backs it up, that makes it valuable in security."

US officials are not the only ones clamouring for language software by the way.

Japanese officials have contacted Basis technology to ask for a version of the program that would search for Arabic text from a query in Japanese characters.

Clark Boyd is technology correspondent for The World, a BBC World Service and WGBH-Boston co-production.