We took away their mystery

Menu

Introduction to RDF and SPARQL

Let’s start with a relatively simple graph. The graph shows the relationships between John, Fred, Max and Picca. John and Fred are humans who we’ll refer to as contacts. Max and Picca are pets. Max is a dog and Picca is a parrot. Both Picca and Max are owned by John. Fred claims that John is his friend.

If we would want to represent this story semantically we would first need to make an dictionary that describes pets, contacts, dogs, parrots. The dictionary would also describe possible relationships like ownership of a pet and the friendship between two contacts. Don’t forget, making something semantic means that you want to give meaning to the things that interest you.

Giving meaning is exactly what we’ll start with. We will write the schema for making this story possible. We will call this an ontology.

We describe our ontology using the Turtle format. In Turtle you can have prefixes. The prefix test: for example is the same as using <http://test.org/ontologies/tracker#>.

In Turtle you describe statements by giving a subject, a predicate and then an object. The subject is what you are talking about. The predicate is what about the subject your are talking about. And finally the object is the value. This value can be a resource or a literal.

When you write a . (a dot) in Turtle it means that you end describing the subject. When you write a ; (semicolon) it means that you continue with the same subject, but will start describing a new predicate. When you write a , (comma) it means that you even continue with the same predicate. The same rules apply in the WHERE section of a SPARQL query. But first things first: the ontology.

Note that the “test” ontology is not officially registered at tracker-project.org. It serves merely as an example.

Now that we have meaning, we will introduce the actors: Picca, Max, John and Fred. Copy the @prefix lines of the ontology file from above, put the ontology file in the share/tracker/ontologies directory and run tracker-processes -r before restarting tracker-store in master. After doing all that you can actually store this as a /tmp/import.ttl file and then run tracker-import /tmp/import.ttl and it should import just fine. Ready for the queries below to be executed with the tracker-sparql -q ‘$query’ command.

Note that tracker-processes -r destroys all your RDF data in Tracker. We don’t yet support adding custom ontologies at runtime, so for doing this test you have to start everything from scratch.

which I think is what you want and which is exactly what test:Picca (without the angle brackets) means given your @prefix test: definition. In other words, the angle brackets in Turtle (and SPARQL) are like quotes and allow you to use arbitrary URI’s for labeling identifiers for nodes in the rdf graph [1]. Check out

By the way there is a convention to use lowercase identifiers for instances (e.g. test:picca instead of test:Picca) and identifiers starting with a capital for classes but that is just a convention.

[1] People usually use http URL’s. One can also use URI’s which are not URL’s, e.g. is a reasonable way to refer to the rdf node representing the mailbox of John Smith. Personally, I prefer to use non URL’s for “things” like parrots that cannot be obviously retrieved over the internet i.e. I like to use for a nicely namespaced, uniquely labelled rdf node for a parrot Picca, a rdf node for his homepage and for the rdf node of his maibox. I can then say

to claim that Picca the parrot has a homepage and a mailbox (note that the URI labels of the rdf nodes have no meaning other than giving them a unique name). One could then say other things about that home page or the mailbox e.g.

@IvanFrade: Using triplets doesn’t save you from having to do backtracking or something similar (probably less efficient). Consider, for example, this graph: http://gist.github.com/149936 . It’s 21 nodes linked in a linear fashion from n0 to n20, including backlinks. To find the correct way from n0 to n20 you’ll use this query: http://gist.github.com/149938 . Finding N1 is easy, because there’s only one node adjacent to n0. But N2 could be n0 or n2. The only way to know whether it is one or the other (or both) is to try out all possibilities for the rest of the query. Of course you could do this without backtracking by, for example, first generating all possible assignments (the Cartesian product) and then eliminating the ones which contain non-existent triplets. That would be less efficient and would cost much more memory, though.

As it happens, I’ve tried running that query through rdfproc – it took about 50 seconds to complete. The same query in Prolog (http://gist.github.com/149940), using SWI-Prolog (which is not know for its performance), took a bit more than 100 milliseconds, including start-up, compilation and shut-down.

@Rogier Brussee: Yeh I was aware of that. But while I was making the examples I liked the URIs more (for being short on the blog). I hope it didn’t confuse people too much, though. If those people simply don’t write the < and >, for the subjects, they should get the full URI as subject instead.

Mark, this is definitely an interesting find, although I suspect you can find examples where Prolog would be slower. I hope SPARQL backend writers are aware of this approach also. If not, you’d better let them know :)