The Defense Advanced Research Projects Agency (DARPA) will next this month detail the union of advanced technologies from artificial intelligence, computational linguistics, machine learning, natural-language fields it hopes to bring together to build an automated system that will let analysts and others better grasp meanings from large volumes of text documents.

From DARPA: "Automated, deep natural-language understanding technology may hold a solution for more efficiently processing text information. When processed at its most basic level without ingrained cultural filters, language offers the key to understanding connections in text that might not be readily apparent to humans. Sophisticated artificial intelligence of this nature has the potential to enable defense analysts to efficiently investigate orders of magnitude more documents so they can discover implicitly expressed, actionable information contained within them."

Technology developed within the Deep Exploration and Filtering of Text (DEFT) program is expected to provide the capability to identify and interpret both explicit and implicit information from highly ambiguous and vague narrative text, and integrate individual facts into large domain models for assessment, planning and prediction, DARPA stated.

"Overwhelmed by deadlines and the sheer volume of available foreign intelligence, analysts may miss crucial links, especially when meaning is deliberately concealed or otherwise obfuscated," said Bonnie Dorr, DARPA program manager for DEFT. "DEFT is attempting to create technology to make reliable inferences based on basic text. We want the ability to mitigate ambiguity in text by stripping away filters that can cloud meaning and by rejecting false information. To be successful, the technology needs to look beyond what is explicitly expressed in text to infer what is actually meant."

Dorr added that much of the basic research needed for DEFT has been accomplished, but now has to be scaled, applied and integrated through the development of new technology. DAPRA will hold a proposers' day in Arlington, VA on May 16 to detail DEFT.

DARPA has a number of other programs that are looking to make sense of large volumes of data including:

The Anomaly Detection at Multiple Scales (ADAMS) program looks at the problem of anomaly-detection and characterization in massive data sets. In this context, anomalies in data are intended to cue collection of additional, actionable information in a wide variety of real-world contexts. The initial ADAMS application domain is insider-threat detection, in which anomalous actions by an individual are detected against a background of routine network activity.

The Machine Reading program seeks to realize artificial intelligence applications by developing learning systems that process natural text and insert the resulting semantic representation into a knowledge base rather than relying on expensive and time-consuming current processes for knowledge representation require expert and associated knowledge engineers to hand craft information.

The Programming Computation on Encrypted Data (PROCEED) research effort seeks to overcome a major challenge for information security in cloud-computing environments by developing practical methods and associated modern programming languages for computation on data that remains encrypted the entire time it is in use. By manipulating encrypted data without first decrypting it, adversaries would have a more difficult time intercepting data.

The Video and Image Retrieval and Analysis Tool (VIRAT) program aims to develop a system to provide military imagery analysts with the capability to exploit the vast amount of overhead video content being collected. If successful, VIRAT will enable analysts to establish alerts for activities and events of interest as they occur. VIRAT also seeks to develop tools that would enable analysts to rapidly retrieve, with high precision and recall, video content from extremely large video libraries.

The XDATA program seeks to develop computational techniques and software tools for analyzing large volumes of semi-structured and unstructured data. Central challenges to be addressed include scalable algorithms for processing imperfect data in distributed data stores and effective human-computer interaction tools that are rapidly customizable to facilitate visual reasoning for diverse missions. The program envisions open source software toolkits for flexible software development that enable processing of large volumes of data for use in targeted defense applications.