The blog of @ldodds

Monthly Archives: February 2008

As with my first “FOAF tale”, “Joe Triple” yesterday’s story “Bee Node” was intended as more than an exercise in punning. The original story was intended to help illustrate a few aspects of Semantic Web technology which I think I worth drawing attention to. But this time around the focus is mainly on SPARQL rather than on RDF modelling and ontologies.

The SPARQL queries in the story illustrate a general pattern of interaction that I expect will become common in clients accessing data via SPARQL endpoints.

This pattern is: ASK, DESCRIBE, CONSTRUCT which I’ll call “ADC” from now on. What the ADC pattern provides is a way to probe a remote data set to see if it has information that is of interest and then extract information from that data set with increasing levels of precision and control.

The ADC Pattern: ASK

The initial step is the ASK query. When I was first learning SPARQL I didn’t really see the usefulness in ASK. It seemed that the same effect, i.e. detecting where a given graph pattern can be matched against the data, could be achieved with a SELECT query:

SELECT *
WHERE {
...pattern of interest...
}
LIMIT 1

If there’s at least one row, then we know there’s matching data. This kind of query is useful when checking for existence of data in a relational database for example.

But, as I understand it, a SPARQL query engine can optimize for this common usage as it need not return any data (as it must do with a SELECT), it can simply terminate the query once it has found the first query solution. Better all round really as the query form better reflects the intent of the query than the “LIMIT 1″
hack does.

Detective Sparql practically applies this query form in his investigation. His first query attempts to find sources that have the location of Bee Node and just asks whether the endpoint has the specific data items:

The query uses a UNION to ask the same question in slightly different ways. The first pattern uses a URI for Bee Node, the second references her via an identifying property. This is a realistic and likely scenario as different endpoints may have different URIs for the same resource.

The second ASK query that Piotr uses does essentially the same thing, but instead of looking for specific triples, e.g. geo:lat it ASKs a more general question: does the endpoint have any triples for specified subject; in this case Bee Node. It does this by using a variable in place of both the predicate and object:

</person/bnode> ?p ?o.

Queries that use wildcards for properties are a brilliantly useful feature in SPARQL as it allows one to describe very general, reusable graph patterns.

Actually Detective Sparql missed a trick here as what he should have asked is:

…as that query would have checked for both facts about Bee Node and facts relating to Bee Node.

The ADC Pattern: DESCRIBE

Following up on the ASK queries, Detective Sparql uses a DESCRIBE query to request that specific sources “spill the beans” and demonstrate what they know and provide whatever information they find useful.

This provides a good way to extract some useful view of the data context within which a specific resource sits: its literal properties and relationships to other resources in the dataset. Depending on the algorithm the endpoint uses to generate these views (and the shape of the underlying data set) the amount of data returned by a DESCRIBE query can vary wildly.

This is very useful in some contexts; particularly web crawling where the client just wants to execute some general queries and use that as a starting point for further accesses. However in many others this unpredictability may not be suitable, particularly where the client wants or needs to control the shape of the result graph and the amount of information returned.

The ADC Pattern: CONSTRUCT

It’s at this point where the CONSTRUCT query becomes useful.

The advantage of a CONSTRUCT query is that it provides the client with complete control over how the result graph is constructed. The client can specific exactly what resources it wants returned and which properties it’s interested in.

Like ASK I originally wrote off CONSTRUCT and DESCRIBE as being specialized queries that would only be of limited interest. I expected that SELECT queries, which line up very nicely with their SQL equivalents, would be the primary SPARQL query form. But I was mistaken. Now that I’ve actually began writing applications that make heavy use of SPARQL I’ve found that CONSTRUCT is the query form that has most flexibility. There’s more to say about that, but the presentation I gave at a recent SWIG meeting is useful background reading.

One important utility of CONSTRUCT is the ability to transform the underlying data set. Currently a CONSTRUCT query is the closest thing that RDF has to XSLT. Using CONSTRUCT a data set can be transformed into a particular shape than may fit the processing expectations of the client application. Although it should be said that CONSTRUCT is a poor cousin to XSLT (or SQL for that matter), in that it’s limited in what it can achieve. At least until SPARQL gets more basic functions for things like string manipulation.

Detective Sparql uses this feature to transform SIOC data into his preferred ontology. This is going to be inevitable where vocabularies don’t neatly line up with one another as is the case with SIOC and FOAF.

CONSTRUCT also provides a limited form of inferencing capability without requiring all a full reasoner.

Where CONSTRUCT is limited is in its ability to traverse an RDF graph. Limited in the sense that the traversal must be explicitly specified. DESCRIBE doesn’t suffer from this, except that you have to rely on the SPARQL endpoint deciding where and how far to traverse. It’d be interesting to see DESCRIBE extended to allow the client to specify the algorithm for generating the view

Hopefully this posting demonstrates some aspects of SPARQL which go beyond the simple query language, and illustrates how the different query forms have their own strengths and weakness and how they can be combined to work with data out in the wild.