Check out our book on the Semantic Web!. Now in Second Edition!

August 29, 2010

Graphical "more like this" Query Building

I promised in an earlier blog post to talk about how to create queries over OWL in RDF. So here it is.

As Ivan alluded in his comment, there are some syntax issues with talking about OWL restrictions in RDF. What is he referring to? Well, let's take the same example in the last blog post, a datatype restriction about things with age>=21. We could write this in Manchester Syntax as

hasAge only xsd:integer [>=21]

But the OWL/RDF rendition of this is where the 'arcane' syntax comes in. We can see it just by looking at the source code in turtle, where it looks like this:

How do you write a rule like that? By looking up in the standard how to express datatype restrictions, and how to link those to restricted value sets, and . . . . if that seems labor intensive and error-prone to you, then you're right. It is.

But we can use a power-tool to help make this happen. The power tools aren't included in the free version of TopBraid Composer, so if you want to follow along here, you'll need the Maestro Edition; a 30-day trial is available for free.

Start by loading http://workingontologist.org/Examples/adult.rdf into Composer, just as shown before, and open it. We're going to use the model itself as a prototype to create a query. Let's start by looking at an example of the restriction we want to match - look at the definition of Adult in the model:

You can type it in just like that. But that doesn't help us write a SPARQL query to match any restriction of this form. How can we do that? If you click on "Graph" at the bottom of the pane, you can explore this definition, in RDF. If you drill down to the Datatype Restriction itself, you get a view like the top of this figure:

This is just a graphic representation of triples in the model - you can see all the structure of the RDF representation of the restriction.

Now comes the fun part - let's turn this image into a query (which, to avoid suspense, is already shown at the bottom of the figure). We want a query that will match "things like this" restriction. What does "like this" mean? That's what we have to specify - there are some aspects of this example that should be included in the match (like the fact that it is a owl:Restriction, on a rdfs:Datatype xsd:integer, and that it is a owl:minInclusive restriction), and others should not be included in the match (that the property is :hasAge; after all, we this to match for restrictions on any property). So, we select the things that we want to keep in the query, marked with a small "x" (you can set/reset the "x" by clicking on the small box in each node in the graph).

Once you have selected the aspects that specify what you mean by "like this" (a Datatype Restriction, on some property, with minInclusive over xsd:integers), you can generate the query automatically by clicking the
button. You can see the generated query at the bottom of the figure.

All the generator did was to take the triples shown in the figure, and render them in the query. Selected nodes (with "x") appear in the query as themselves; unselected nodes (no "x") become variables. Properties always show up as themselves. Best guesses are made for meaningful variable names; it uses type information for the guesses.

There are a few differences between the generated query and the WHERE clause of the rule:

The first difference is ordering of triples - the generator isn't very fussy about the order in which triples are generated, so it is different each time (if you are following along at home, your generated query will probably be different from the one shown here, and also from the rule).

The second difference is the inclusion of a triple to match data, to wit:

?x ?datatypeproperty ?val .

After all, in a rule, we want to say "when some data satisfies this restriction, ..." This clause uses the same variable for the property (?datatypeproperty) as used in the rest of the query.

The final difference has to do with the constant "21". The generated query includes the constant, whereas the rule turns it into a variable (?mval) and adds a filter to compare it to the actual data (?val). After all, the value "21" comes from the model, and shouldn't be built in to the rule.

So yes, these modifications have to be made by hand (using the SPARQL editor, where the generator put the query). The query generator should be seen as a power tool; you still need an operator who knows how to use it, but it simplifies a lot of the heavy lifting for query writing. In this case, we have a rule with 10 clauses (9 triples and a filter). The generator created seven of the triples, and most of the eighth one; the human only had to write the last two clauses. That is, the power tool took care of the "arcane syntax" that Ivan referred to, leaving the human to figure out what they really want the rule to mean.

I use this feature of TopBraid Composer all the time, in this pattern. I want to write a query that matches some 'arcane' bit of RDF (e.g., from dbpedia, the OWL in RDF standard, the XML DOM, SKOS, etc.). Instead of trying to write a query from scratch, I find (or even build) an example of the thing I want to match. Then I generate the query - automatically guaranteeing that I didn't leave out any triples, that I got all the namespaces and property names correct, that I didn't accidentally collide bnodes by giving them the same variable name, etc. Then I beat up the result to create the query that I really want - in which I define what I want to do with the match.

So when you see an elaborate query with dozens of triples in it, and you wonder what sort of geek can write or maintain such a thing, keep in mind that it might not have been written at all; it might have been generated from an example.

Comments

Graphical "more like this" Query Building

I promised in an earlier blog post to talk about how to create queries over OWL in RDF. So here it is.

As Ivan alluded in his comment, there are some syntax issues with talking about OWL restrictions in RDF. What is he referring to? Well, let's take the same example in the last blog post, a datatype restriction about things with age>=21. We could write this in Manchester Syntax as

hasAge only xsd:integer [>=21]

But the OWL/RDF rendition of this is where the 'arcane' syntax comes in. We can see it just by looking at the source code in turtle, where it looks like this:

How do you write a rule like that? By looking up in the standard how to express datatype restrictions, and how to link those to restricted value sets, and . . . . if that seems labor intensive and error-prone to you, then you're right. It is.

But we can use a power-tool to help make this happen. The power tools aren't included in the free version of TopBraid Composer, so if you want to follow along here, you'll need the Maestro Edition; a 30-day trial is available for free.

Start by loading http://workingontologist.org/Examples/adult.rdf into Composer, just as shown before, and open it. We're going to use the model itself as a prototype to create a query. Let's start by looking at an example of the restriction we want to match - look at the definition of Adult in the model:

You can type it in just like that. But that doesn't help us write a SPARQL query to match any restriction of this form. How can we do that? If you click on "Graph" at the bottom of the pane, you can explore this definition, in RDF. If you drill down to the Datatype Restriction itself, you get a view like the top of this figure:

This is just a graphic representation of triples in the model - you can see all the structure of the RDF representation of the restriction.

Now comes the fun part - let's turn this image into a query (which, to avoid suspense, is already shown at the bottom of the figure). We want a query that will match "things like this" restriction. What does "like this" mean? That's what we have to specify - there are some aspects of this example that should be included in the match (like the fact that it is a owl:Restriction, on a rdfs:Datatype xsd:integer, and that it is a owl:minInclusive restriction), and others should not be included in the match (that the property is :hasAge; after all, we this to match for restrictions on any property). So, we select the things that we want to keep in the query, marked with a small "x" (you can set/reset the "x" by clicking on the small box in each node in the graph).

Once you have selected the aspects that specify what you mean by "like this" (a Datatype Restriction, on some property, with minInclusive over xsd:integers), you can generate the query automatically by clicking the
button. You can see the generated query at the bottom of the figure.

All the generator did was to take the triples shown in the figure, and render them in the query. Selected nodes (with "x") appear in the query as themselves; unselected nodes (no "x") become variables. Properties always show up as themselves. Best guesses are made for meaningful variable names; it uses type information for the guesses.

There are a few differences between the generated query and the WHERE clause of the rule:

The first difference is ordering of triples - the generator isn't very fussy about the order in which triples are generated, so it is different each time (if you are following along at home, your generated query will probably be different from the one shown here, and also from the rule).

The second difference is the inclusion of a triple to match data, to wit:

?x ?datatypeproperty ?val .

After all, in a rule, we want to say "when some data satisfies this restriction, ..." This clause uses the same variable for the property (?datatypeproperty) as used in the rest of the query.

The final difference has to do with the constant "21". The generated query includes the constant, whereas the rule turns it into a variable (?mval) and adds a filter to compare it to the actual data (?val). After all, the value "21" comes from the model, and shouldn't be built in to the rule.

So yes, these modifications have to be made by hand (using the SPARQL editor, where the generator put the query). The query generator should be seen as a power tool; you still need an operator who knows how to use it, but it simplifies a lot of the heavy lifting for query writing. In this case, we have a rule with 10 clauses (9 triples and a filter). The generator created seven of the triples, and most of the eighth one; the human only had to write the last two clauses. That is, the power tool took care of the "arcane syntax" that Ivan referred to, leaving the human to figure out what they really want the rule to mean.

I use this feature of TopBraid Composer all the time, in this pattern. I want to write a query that matches some 'arcane' bit of RDF (e.g., from dbpedia, the OWL in RDF standard, the XML DOM, SKOS, etc.). Instead of trying to write a query from scratch, I find (or even build) an example of the thing I want to match. Then I generate the query - automatically guaranteeing that I didn't leave out any triples, that I got all the namespaces and property names correct, that I didn't accidentally collide bnodes by giving them the same variable name, etc. Then I beat up the result to create the query that I really want - in which I define what I want to do with the match.

So when you see an elaborate query with dozens of triples in it, and you wonder what sort of geek can write or maintain such a thing, keep in mind that it might not have been written at all; it might have been generated from an example.