Is government ready for the semantic Web?

It's been slow going, but an interagency XML project could boost law enforcement, health care efforts

By John Moore

Mar 22, 2011

When IBM’s Watson recently trounced the two most successful "Jeopardy!" players of all time, the supercomputer was relying in part on an emerging field of computerized language processing known as semantic technology.

In addition to being able to work out the answer to questions for Watson, such as what fruit trees provide flavor to Sakura cheese, semantic technology is capable of providing answers to questions that might interest government agencies and other groups that historically have had problems identifying patterns or probable sequences in oceans of data.

The idea is to help machines understand the context of a piece of information and how it relates to other bits of content. As such, it has the potential to improve search engines and enable computer systems to more readily exchange data in ways that could be useful to agencies involved in a wide range of pursuits, including homeland security and health care.

While semantic technology has mostly been an academic exercise in recent years, it is now finding a greater role in a practical-minded government project called the National Information Exchange Model (NIEM).

NIEM pursues intergovernment information exchange standards with the goal of helping agencies more readily circulate suspicious activity reports or issue Amber Alerts, for example. The goal is to create bridges, or exchanges, between otherwise isolated applications and data stores.

The building of those exchanges calls for a common understanding of the data changing hands. The richer detail of semantic descriptions makes for more precise matches when systems seek to consume data from other systems. Agreement on semantics also promotes reuse; common definitions let agencies recycle exchanges.

Semantics in government IT

Today, NIEM offers a degree of semantic support. But some observers believe the interoperability effort will take a deeper dive into semantic technology. They view NIEM as a vehicle that could potentially make semantics a mainstream component of government IT.

“Semantically, there is a huge opportunity with NIEM,” said Peter Doolan, vice president and chief technology officer at Oracle Public Sector, which is working on tools for NIEM. “NIEM is a forcing function for the broader adoption of the deeper semantic technology that we have talked about for some time.”

As more agencies adopt NIEM, the impetus for incorporating semantics will grow. NIEM launched in 2005 with the Justice and Homeland Security departments as the principal backers. Last year, the Health and Human Services Department joined Justice and DHS as co-partners. State and local governments, particularly in law enforcement, have taken to NIEM as well. And in a move that underscores that trend, the National Association of State Chief Information Officers last month joined the NIEM executive steering committee.

“NIEM adoption is going at a furious pace,” said Richard Mark Soley, chairman and CEO of the Object Management Group (OMG), which has been working with NIEM. “As it gets adoption, they are going to need a way to translate information that is currently in other formats. That is when you need semantic descriptions.”

NIEM’s leadership says the program is prepared for greater use of semantics. “The NIEM program stands ready to respond to the overall NIEM community regarding a broader adoption of semantic technologies,” said DHS officials who responded to questions via e-mail.

Support for semantics

NIEM is based on XML. The project grew out of the Global Justice XML Data Model (GJXDM), a guide for information exchange in the justice and public safety sectors. Although XML serves as a foundational technology for data interoperability, it is not necessarily viewed as semantic.

However, John Wandelt, principal research scientist at the Georgia Tech Research Institute (GTRI) and division chief of that organization’s Information Exchange and Architecture Division, said semantic capability has been part of NIEM since its inception. GTRI serves as the technical architect and lead developer for GJXDM and NIEM.

“From the very early days, the community has pushed for strong semantics,” he said. Wandelt pointed to XML schema, which describes the data to be shared in an exchange. “Some say schema doesn’t carry semantics,” he said. "But the way we do XML schema in NIEM, it does carry semantics.”

NIEM’s Naming and Design Rules help programmers layer an “incremental set of semantics on top of base XML,” Wandelt said. For example, a group of XML programmers tasked to build a data model of their family trees would depict relationships between parents, siblings, and grandparents. But those ties would be implied and based entirely on an individual programmer’s way of modeling.

NIEM’s design rules, on the other hand, provide a consistent set of instructions for describing connections among entities. Wandelt said those roles make relationships explicit, thereby boosting semantic understanding.

NIEM also uses Resource Description Framework (RDF), an important underpinning of the Semantic Web, which has been slowly making its way into government IT.

RDF aims to describe data in a way the helps machines better understand relationships.

Wandelt said NEIM uses an RDF model to represent relationships among NIEM reusable data components. Examples of components, viewed as the basic building blocks of NIEM, include people, locations, organizations and vehicles. The RDF model generates NIEM-conforming XML reference schemas and subsets of those schemas, Wandelt said. It’s those subsets that developers use to specify a particular exchange.

Semantic boost needed?

Some NIEM watchers, however, would like to see a greater push to into semantic technology. Soley said the approach NIEM uses to spell out an exchange -- a specification called an Information Exchange Package Documentation (IEPD) -- could use a semantic boost.

“IEPDs are just a package of information,” Soley said. “They use XML, which doesn’t give sufficient richness.”

Soley said OMG has been working to bring semantic representation to NIEM. His group has been meeting with the NIEM community to discuss the use of OMG specifications. Those include Model Driven Message Interoperability (MDMI), which helps move data between different message formats, and Semantics of Business Vocabulary and Rules (SBVR), which seeks to facilitate the sharing of information between groups that use the same concepts but have different ways of describing them.

Others see a future in which NIEM will make greater use of Semantic Web standards such as RDF and the OWL Web Ontology Language. RDF provides a standard model for data interchange. OWL, meanwhile, goes beyond XML and RDF, improving computers’ ability to interpret content. OWL is used to create ontologies, which organize information within a particular domain, such as emergency management.

David Webber, information architect at Oracle, said OWL isn’t used much at the moment but noted that the next release of NIEM will start to use that Semantic Web standard.

Webber pointed to IEPDs, which he likened to instructions for assembling a Lego kit, as one area in which OWL may come into play. IEPDs are stored in the NIEM repository, where exchange builders can download them. OWL, along with RDF, could be used to describe an IEPD’s purpose to make it easier to identify candidates for reuse, Webber said.

Oracle’s Doolan said he views NIEM as a chance to “take the promise of semantic data technologies and capabilities and bring it beyond an academic paper.”

Is NIEM ready?

So is the NIEM community ready for full-blown semantic technology?

“While a few members within the NIEM community have expressed interest in evolving the model to provide semantic capabilities at the implementation level, larger communities of interest have not indicated an interest in accelerating adoption of semantic capabilities,” the DHS officials said.

NIEM has been very pragmatic when it comes to problem solving. While some members of the community may be interested in pursuing some of the new Semantic Web technologies, others will question the value proposition of actually doing that. “NIEM historically has been very business-case driven,” Wandelt said.

NIEM’s current use of layering additional semantics on top of plain XML provides a “big semantic bang for the buck” without the downside of assembling expertise in environments such as OWL, he said, adding, “not everyone is an ontologist.”

Michael Daconta, chief technology officer at Accelerated Information Management, said users will nudge NIEM further into semantic technology when they are ready to build more sophisticated applications.

Semantic technology, he said, can power inferencing systems capable of detecting anomalies in large amounts of data -- signs of fraud in Medicaid payments, for example. At some point, NIEM will support such applications, he suggested.

“Those higher requirements are all about fidelity and precision,” said Daconta, who helped launch NIEM in 2005 while he was a metadata program manager at DHS. “NIEM will have to support higher fidelity.”

Wandelt said NIEM is positioned to move more deeply into semantic technology -- if the demand and specific business cases materialize. With RDF underpinning NIEM, community developers can generate RDF and OWL as well as XML, he noted. When the timing is right, he added, the NIEM governance process accommodate the new technology.

The NIEM program, meanwhile, believes an upcoming conference could yield feedback for developing a semantic technology plan. The second annual NIEM National Training Event is slated for Aug. 23-25 in Philadelphia.

“Until then, we remain cautiously optimistic and on a path of exploration on behalf of the NIEM community,” DHS officials said.

inside gcn

Reader Comments

Wed, Apr 6, 2011
Brand Niemann
http://semanticommunity.info

NIEM is about standardizing syntax for messages and the Semantic Web is about Web links that have meaning for combining data. See Unifying Universal Core, SUMO, OWL 2, and XML Standards to Build Intelligence Ontologies at http://semtech2011.semanticweb.com/sessionPop.cfm?confid=62&proposalid=3963

Thu, Mar 24, 2011
John Mayer
Chicago

Wed, Mar 23, 2011
Richard Ordowich
Princeton NJ

Imposing semantics onto an existing set of legacy data is the equivalent of trying to derive architecture from a building or systems technology environment. It ends up looking like a dish of spaghetti. Semantics must be adopted from the outset and imposed upon the data. Names and definitions must be semantically structured following precise repeatable rules. Reverse engineering semantics from data is a frustratingly manual task and is made that much more difficult because the semantics of the data are context specific. Meanings are also subjective. Tagging data is not semantics.

Please post your comments here. Comments are moderated, so they may not appear immediately
after submitting. We will not post comments that we consider abusive or off-topic.