Dave Reynolds' noodlings on the semantic web with just a smattering of Aikido

Main menu

Post navigation

RDF Result sets

[This post was prompted by a discussion with Jeni Tennison and will probably end up cross-posted to our company blog sometime.]

In some recent semantic web applications, where we’ve been creating user interfaces over REST style interfaces over RDF data sets, we found a common pattern emerging – ResultSets. The approach we took has been documented but it’s buried in other details so I’d like to pull out the essential pattern in this post.

Situation

The situation is that your UI (or other client) wants to find all resources that match some criteria and get a description of them. Typically the client wants to see those resources ordered (e.g. in terms of relevance to some original query, or by name or whatever) .

This is not just a SPARQL SELECT. SELECT allows you to find the matching resources and to sort them but it can only extract a fixed set of values from the resources. A key value of RDF is it’s ability to handle schema-less information and not require resource descriptions to be of uniform shape. If we only pull back descriptions via SELECT we loose that.

This not a simple subgraph of the RDF dataset (e.g. as you would get from a DESCRIBE) since then you loose the information on which are the top level matching resources and how they are ordered.

Specifying the query

Abstractly we specify the query using the template:

query(select, var, description)

Where select is a SPARQL select query which extracts the resources we want, possibly ordering them; var is the name of a variable in the select which corresponds to the retrieved resources and description is either the single keyword “DESCRIBE” (meaning that each resource should be returned via a SPARQL DESCRIBE operation) or it is a SPARQL ConstructTemplate which refers to other variables in the select.

In fact there’s a lot of separate machinery for how to build up the query as a series of query refinement operations, but that’s not relevant here.

Returning the results

To return results we provide two abstractions – a ResultSet and a ResultWindow.

A ResultSet:

is identified by a URI

has RDF metadata to describe that URI (the dataset operated on, the query run, when it ran etc)

can be used to open a ResultWindowopenWindows(ResultSet, {start, {end}})

A ResultSetWindow:

is also identified by a URI and identifies:

an ordered list of resources

an RDF graph containing at least the descriptions of the resources within the window

a flag to indicate if the window reaches to the end of the ResultSet

Having a first class representation of the whole result set allows us to pass it around, annotate it, share it, without having to copy the actual results. It is up to the server to decide how eager/lazy to be on evaluation and what caching (if any) to do.

Having a window allows us to probe and page through inconveniently large result sets. If a client opens a window over the whole set or pages through in nice order then the server can still stream results but the server has to be prepared to reissue the query with a LIMIT/OFFSET or rewind the results if the client opens windows out of order.

Packaging up as a RESTful API

So far we’ve been talking abstractly but as well as a Java API for this query interface we want to use it in a RESTful web service setting.

The query endpoint is simple, supporting GET (or POST for large queries):