Simple Queries

Let us have a look at how a query can be constructed. As an example we will query for all resources that are tagged with a certain tag. Let's imagine that we have a reference to this tag stored in myTag. (Please ignore the fact that Nepomuk::Tag::tagOf essentially returns the same information. After all, we are here to learn how it works.)

We begin by constructing the SPARQL query string. It is a simple query and if you know SQL it should be easy to understand. Basically we select resources that match the patterns in the where statement. In this case the resource needs to have the hasTag property with object myTag. As we can see, Soprano already provides a set of standard URIs as static instances in the Soprano::Vocabulary namespace. And since we have the Nepomuk resource object for the tag we can simply use its unique URI to directly access the tagged resources.

But what if we do not have the tag URI but only its label, i.e. the name given by the user?

Both queries are the same and it is up to the query writer to decide which version he or she prefers. We are just presenting both versions here for demonstration purposes.

Now let us analyse what is happening here. Instead of just matching a single graph pattern, we match two where the first one introduces another variable which is then reused in the second one. rdfs:label has a string literal range, meaning that each object related to a resource via the rdfs:label property is a string literal. And in this case we want to select the tag that has myTagLabel as its label.

Bringing more context into the mix

In Introduction to RDF and Ontologies we briefly learned about named graphs or context which make up the fourth part of each statement in Nepomuk. We can now use this information to filter our results based on creation dates. Imagine for example that we want to retrieve all resources tagged before the first of January 2008. We do this by introducing some more complex SPARQL syntax. For simplicity we go back to our first example of matching the tag URI directly to keep the query from getting too unreadable. But of course both can be combined. (Keep in mind that we only use the prefix syntax here for readability. In actual code it may be better to directly add the URIs from Soprano::Vocabulary to prevent typing errors in property and class names.)

As we can see SPARQL does not simple add the context as fourth parameter but needs us to suround the triples we want to match into a certain context with the graph keyword.

We use the SPARQL FILTER keyword to filter out only those graphs/contexts that have a nao:created value smaller than January, first.

We use Soprano::LiteralValue instead of QDateTime directly. This is important since QDateTime does not support the RDF way of formatting a dateTime string. Thus, we need to use Soprano's internal dateTime string conversion algorithm by using LiteralValue.

Full text queries

While SPARQL in theory supports full text queries through the REGEX FILTER keywords the storage backends do not have their own real full text index. Thus, a full text search using SPARQL FILTER may become very slow if there are many statements to filter.

That is why in Soprano we have the CLucene based full text index model. It is stacked on top of the actual storage model within the Nepomuk Server and provides a full text index on all literal object nodes in the repository. Since Soprano does not have a fancy query API yet (using plain strings as queries does not count as fancy) full text queries have still to be performed separately. This may be inconvenient but will hopefully be solved in Soprano 3.

So for now we have to learn a second way to query the repository: using the Lucene Query Language. But that is much easier in most cases.

Let us assume that we want to search resources that are related to some literal object that matches the value "nepomuk". In SPARQL this would mean to query for:

select ?r where { ?r ?p ?o .

FILTER REGEX(STR(?o),'nepomuk', 'i') .
FILTER isLiteral(?o) . }

We convert the object literal into a string and match it to a regular expression ignoring case. This works but may be slow. Using the Soprano lucene full text index we perform this query as follows: