Check out our book on the Semantic Web!. Now in Second Edition!

October 21, 2009

Back on August 1, Ralph Hodgson declared Data Independence Day , to celebrate the opening of oegov, a website that collects and organizes ontologies and data sets about government. Along with recent developments in open data in the US government, this creates a an opportunity to mash-up government data in a way that has not been possible before.

We're celebrating next week at ISWC with a tutorial on building semantic web applications for government. The tutorial will show attendees how to use semantic web standards to create their own data mashup applications. A lot of the features of the semantic web come in to play - distributed vocabularies (using SKOS, of course), linked open data, RSS, etc. The idea is that each attendee will walk away from the workshop with their own app that they created from data now available from the goverment.

Controlled vocabularies play a big role in this - bigger than you might have thought possible. After all, if two people use a common controlled vocabulary well, they can share data. But if they use it badly, well, then data quality issues dominate. Fortunately, there are some controlled vocabularies being used in the government in a pretty consistent way. They are published in convenient forms on OEGov, where they can be used as terminology hubs for mashing up information.

The workshop is part of the International Semantic Web Conference 2009, to be held near Washington, DC from 25-29 October (the workshop itself will be held on Oct 26 in the afternoon, and you can register without attending the whole conference!). The conference this year has a special focus on government data and applications, and should be a great event for anyone interested in openness of government data.

September 21, 2009

Since the W3C made SKOS (the Simple Knowledge Organization System) a full Recommendation, a large number of implementations in SKOS have shown up. This is exciting news - SKOS is a well-organized standard, that brings the distribution aspects of RDF to thesaurus management. This means that you can use a URI to refer to a SKOS Concept across the web. This provides an unprecedented capability for linking vocabularies.

But now that we have these vocabularies, how can we view or edit them? One way is to use TopBraid Composer. Since Composer is a native RDF system, importing, viewing, editing and saving SKOS files is second-nature. As an example, I have downloaded one of my favorite vocabularies, the AGROVOC vocabulary from the United Nations Food and Agriculture Organization. The AGROVOC appears on the W3C SKOS Implementation Page as a SKOS (RDF) file [0]. The screenshot below shows this file displayed in the Free Edition of TopBraid Composer:

Since AGROVOC is a multi-lingual thesaurus, I can usefully set the language of Composer to something other than my native tongue; in this case, I have chosen French. In the upper left, we see the broader/narrower tree, in particular the part about Ruminants, with current focus on Dairy Cattle. In the center form, we see the details of this term: Its preferred expression in several languages (this is part of the AGROVOC data), its situation in terms of broader/narrower terms, and even the related term, Milk ("Lait" in French). In the upper right, we see the SKOS relationship hierarchy; we are currently focusing on skos:broader for our view. In the bottom, we see a SPARQL query, rather fancifully determining the connection between Cattle and Foxes. Notice that like many professional thesauri, AGROVOC uses numeric codes for its terms; creating such a SPARQL query could be quite hard work if you had to cross-reference all these numbers. But in Composer, you can use the display name to help you out. This query was written by copy-and-pasting terms from the bookmarks window ("Basket") into the SPARQL tab. We see the terms printed with readable names (in French) in the Basket; they show up in the SPARQL editor as URIs, processable by the SPARQL engine.

In the Maestro edition of TopBraid Composer, you can even see the relationships graphically; below you see the results of that SPARQL query displayed as a graph, showing all the steps from Cattle to Foxes (now in English) in the AGROVOC vocabulary.

We are finding SKOS to be an invaluable asset in vocabulary management applications. It covers the basics that are expected of any vocabulary representation (including multilingualism) with a very simple meta-model. The meta-model itself makes modest use of OWL (transitive, symmetric, inverse, and one functional property), but there is no need for someone who is editing or viewing a vocabulary to have any familiarity with OWL at all. The ability to distribute vocabularies over the web, and to connect them together (using the SKOS matching vocabulary) addresses a wide variety of real-world vocabulary management needs, which are not met by any other standard. I'll be giving a tutorial at KMWorld on the use of SKOS in vocabulary management on November 16.

[0] Last time I checked, the link on the W3C page to AGROVOC was broken. I downloaded the example file a few weeks ago, and still have it. I don't know if the link is temporarily broken, or if the file has moved, or if there is another reason why the link is currently not available.