The blog of @ldodds

Monthly Archives: November 2005

There’s a short article in Nature (subscribers only I’m afraid) this week about Google Base and its potential impacts on the science community. In particular whether it might galvanise greater data sharing between scientists.
I’ve been corresponding with Declan Butler, the author of the piece, on this and some related topics recently, and he ended up quoting me:

I’ve been playing with Google Maps a bit for a talk I’m giving on Monday. It’s addictive stuff.
But I’d like to be able to use alternate kinds of maps. E.g. plotting Samuel Pepys’s diaries on a historical map of London, or points on a sonar map of the sea, or views of MMPORG maps.
It’s lead to me to wonder: has anyone written a general purpose front-end similar to Google Maps? Basically all of the Ajax/DHTML magic, but without the actual Google supplied tiles? It’d be interesting to have that and then be able to plugin in different server-supplied tiles, perhaps on the fly (e.g. radar vs satellite images).
The whole client-side framework would be portable I think. It would be an interesting way to explore any data set that can be visualised in two dimensions.
Anyone already done or doing this? Drop me a mail if you are.

I’ve been enjoying a bit of SPARQLing recently and you can now begin to see some of the results:
XML.com has published the first part of my SPARQL tutorial. The tutorial is backed with a SPARQL query service that I whipped up using Jena. As the documentation explains there are several output options supported by the service including JSON and a quick Javascript hack, so incorporating SPARQL results into your applications (or blog!) has never been easier. The AJAX client is coming along nicely too, although I need to test in Opera and write up some proper examples.
The SPARQL service is part of a wider project to expose a number of markup and Semantic Web tools to the web for people to play with. Implemented, but not yet documented, are some basic RDF graph algebra and a rules engine; basically putting bits of Jena functionality on the web.
I’ve also started implementing an RDF data storage system. Again this involves wrapping up Jena functionality as a web service. Eventually you’ll be able to sign-up and create some triple stores and interact with them using a RESTful API, plus SPARQL queries. In addition I’ve constructed a little language, kind of a Scutterplan++ that describes not only the sources of your store, but some processing to do on it after the data is collected. For example smush it then apply some inferencing. More on that when things are a bit less vapoury.
There’s the obligatory project blog so you can follow along there if you’re interested.

The core SPARQL specification provides some hooks for extension in the
form of Extensible
Value Testing. This allows an application to provide custom functions for
testing variables in a SPARQL query, where the
built-in tests don’t cover a particular need.

The specification notes that: SPARQL queries using extension functions are likely to have
limited interoperability, so you should use them with some care. The processing
of implementing a custom function and registering it with your query engine will be bespoke to that API.

Caveats aside, I think it’s likely that once SPARQL sees further adoption a number of
useful functions will be identified by the community. The process of ensuring that such
functions are ported across differing SPARQL implementations could be the job of a
community initiative. Just like the EXSLT initiative
has helped standardise XSLT extensions functions; many of these were incorporated in
the more recent XSLT specifications.

So lets take a quick look at how to implement a custom extension function usingARQ.

When writing queries for an application, whether for an SQL or an RDF data source, its common to end up with a core set of queries that are used time and again. Typically these queries vary only in the values of a few variables.

For example, you might have a query to lookup details of a user (name, homepage, etc) with the variable being the value of the user identifier. The same query is used for all users, you just want to substitute a different value for the variable on each execution.

In Java the JDBC API provides support for this usage using a PreparedStatement object. The ARQ SPARQL query API also provides support for this kind of parameterised query. Here’s some example code that illustrates the technique.

Note: to make use of this technique you’ll have to grab the latest CVS version of ARQ. One of the classes below (QuerySolutionMap) only got added this weekend.

First up lets create a sample SPARQL query, to extract the name of a foaf:Person based on the URL of their weblog:

The code snippet is a fairly typical example of running a SELECT query using ARQ. The key difference is the use of a QuerySolutionMap. This class implements the QuerySolution interface, and when used in this context, i.e. as a parameter to the QueryExecutionFactory.create() method, provides the initial bindings for variables in the query.

The class is basically a map of variable names to RDFNode objects, so you can add as many of them as you like.

Using this style of query usage means that you can reuse queries, thereby avoiding the need to do string concatentation or API methods to construct the query dynamically.

While I’m at it, here’s another useful tip for folk working with FOAF data. The ARQ function library contains a little function to calculate the sha1sum of a literal or resource value.

HP Labs have announced the first Jena User Conference to be held at the labs in Bristol on 10th-11th May 2006.
As the website notes, the conference will include presentations on:

applications and tools developed by Jena users

demos

in-depth explorations of Jena features

tutorials

discussions about the future development of Jena

See the call for submissions for details of how to submit a demo, position or workshop paper.
Should be a fun couple of days. It’s the week before XTech 2006 so no clashes for those of you who might want to attend both.

Ken North just posted this email to XML-DEV drawing attention to a presentation by Daniela Florescu titled Declarative XML Processing with XQuery — Re-evaluating the Big Picture (Warning: PDF). It makes for interesting reading.
In the presentation, Florescu argues that XML is in a growth crisis and that there’s a need for more architectural work to tie together components of the XML landscape ranging from XQuery and XSLT through to RDF and OWL. Florescu believes that XML is about more than syntax and will in fact become the key model for information, not just bits on a wire. In short Florescu believes that XML has yet to achieve its full potential and to do that some further work needs to be done.
The presentation is worth reading in its entirety. The majority of the presentation does focus on XQuery, in particular the fact that its not really a query language: it’s a programming language and folk are already using it in this context. But there’s much more to it. Semantic web folk will find much that will have them nodding in agreement.
Florescu suggests a number of concrete areas that require work. Amongst these are:

Make XML a graph not a tree, by making links a first class part of the model

Integrate the XML data model with RDF

Extend programming capabilities of XQuery, e.g. to include assertions, error-handling, metadata extraction functions and continous queries. This latter area is interesting as it would allow an Xquery to run continously, acting on a stream of XML documents as they arrive

Integrate XQuery with OWL and RDF. E.g. to allow searching an XML document by semantic classification of nodes, rather than their names.

Make browsers XQuery aware, and developer a simple HTTP protocol for invoking XQuery on a remote repository. (I’ve been working with the SPARQL protocol recently and its occured to me several times that an equivalent for XQuery is an obvious area for further work)

All in all I find this to be a very thought-provoking presentation; there’s a lot of interesting ideas in there. For the Semantic Web crowd many of these will be old news: being able to query/manipulate data based on semantics is the core of RDF; linking as a first class model element is something we rely on constantly. But there’s also some new angles to consider. For example there’s a lot of work happening to tie programming languages in with XML, and XML vocabularies such as XQuery becoming more like scripting languages: what’s the equivalent in semantic web circles? Could an ontology aware version of XQuery provide a useful data manipulation environment?
I expect the XML-DEV thread to grow pretty quickly. Will be interested to see if this gets picked up and discussed by other communities also.

So you’re sitting in a coffee shopping talking over an idea with a friend: how to get people to contribute quality metadata on all manner of topics. Kind of a semantic wikipedia but where the goal is data entry rather than essay writing.
The basic concept is straight-forward: start from a basic fact such as “Isaac Newton is a Person”. Then, using appropriate ontologies generate additional questions: a Person has a birthdate therefore, “what is Isaac Newton’s birthdate?”(*). And so on. Generate RSS feeds containing the questions with embedded forms allowing people to answer each question. Update your database based on the answers, using a voting (or similar) model to take the most common answer from all contributors. Provide stats on who answers the most questions, answers them first, answers most questions in a particular domain, etc. You’ll then end up with a positive feedback effect that (hopefully) encourages more people to contribute.
Move forward two days and Amazon have done all the hard work: Amazon Mechanical Turk. With the spin thats its all about micropayments rather than karma, and unlike the original it doesn’t play chess.(*) January 4, 1643 by the Gregorian Calendar.