0:02Skip to 0 minutes and 2 secondsBARRY NORTON: OK, so to reintroduce SPARQL, as I said, we have the stack of technologies developed for this idea of the Semantic Web, many of which are applied in linked data. And SPARQL sits over here, sits on top of RDF. So it's an RDF query language, or in full protocol, an RDF query language. And also as we'll see in the second part of the webinar, it sits alongside things like OWL and RDFS. So those can be applied during querying and will be in the examples that we look at. So as I said, SPARQL in the first instance is a declarative, say what you want, query language for RDF data. And you can find the specification here.

0:53Skip to 0 minutes and 53 secondsUnderneath that, it has an algebra, which is related to the relational algebra that explains how, for instance, SQL, or you'll often hear me saying, slightly incorrectly, SQL. So the relational algebra explains how SQL queries work in its first use. It also explains how SPARQL queries work over linked data, over RDF. A newer part of the SPARQL standards or recommendations, which are effective standards from the W3C is the update language that says you have some store managing some RDF data, and you want to effect changes upon that data that's been managed. For that, you use the update language.

1:38Skip to 1 minute and 38 secondsIn both cases, whether you are issuing queries or updates, you need some basic knowledge of this protocol to say how to issue your requests and see the results.

1:50Skip to 1 minute and 50 secondsNow concerning the query part-- and we'll cover both SPARQL 1.0 and the newer SPARQL 1.1 together-- we have the basic query language over graph data, with keywords for combining patterns, which is the means by which you express what data you want. We'll cover at the same time extensions that were introduced in the new language, so we can do things like aggregates. We can issue subqueries that are queries within queries, as well as updates. We'll touch only very briefly on a third extension, which is towards federation, where you're saying instead of having one store to whom you issue a query, you actually know about several stores and you want them to work together to answer your query.

2:39Skip to 2 minutes and 39 secondsNow how federation works and is applied is something that we really cover down in chapter 5. So for the most part we're talking about single queries and updates on a single store with all of the data.

SPARQL: Basic concepts

SPARQL was proposed as a standard by the World Wide Web Consortium (W3C) in November 2008.

It is maintained and developed by the W3C SPARQL Working Group, who in November 2012 recommended an upgraded version SPARQL 1.1 with new features including an update language (allowing users to change as well as consult RDF datasets).

The latest recommendation can be found at these two sites, one for the query language and one for update:

Along with RDF and OWL, SPARQL is one of the three core standards of the Semantic Web. Its location in the Semantic Web ‘stack of languages’ is shown in Figure 1. One point to note in the figure is that SPARQL does not depend on RDFS and OWL. However, as will be shown later, knowledge encoded in RDFS and OWL may enhance the power of querying.

Figure 2.2 SPARQL in the Semantic Web Stack

The essence of querying is shown by the following illustration, using for the time being English rather than RDF. Imagine an RDF dataset with statements containing the following information:

The Beatles made the album ‘Help’.
The Beatles made the album ‘Abbey Road’.
The Beatles made the album ‘Let it be’.
The Beatles includes band-member Paul McCartney.
Wings made the album ‘Band on the run’.
Wings made the album ‘London Town’.
Wings includes band-member Paul McCartney.
The Rolling Stones made the album ‘Hot Rocks’.

One can imagine various queries that a music portal might need to run over such a dataset. For instance, the portal might construct web pages on demand for any album or group nominated by the user. This would require retrieval of information from the dataset for questions such as the following:

Who made the album ‘Help’?
Which albums did the Beatles make?

These are so-called WH-questions (‘who’, ‘what’, ‘where’, etc.), for which the first would receive a single answer (‘The Beatles’), and the second a list of three answers (‘Help’, ‘Abbey Road’, ‘Let it be’).

The SPARQL counterparts to these questions use RDF triples that contain variables; these correspond to the WH-words in the English queries. The general form for such questions (still working in English) is as follows:

Give me all values of X such that X made the album ‘Help’.
Give me all values of X such that the Beatles made X.

We can go further than this by introducing more than one variable, thus generalising the query:

Give me all values of X and Y such that X made Y.

This is like asking a question with two WH-words, such as ‘Which bands made which albums?’. The answer is not a list of values, as before, but a list of X-Y pairs that could be conveniently presented in a table:

X

Y

The Beatles

‘Help’

The Beatles

‘Abbey Road’

The Beatles

‘Let it be’

Wings

‘Band on the run’

Wings

‘London Town’

The Rolling Stones

‘Hot Rocks’

In all these examples, the question is represented by a single statement with one or more variables; however, we can also construct more complex queries containing several statements:

Give me all values of X and Y such that: (a) X made Y, and (b) X includes band member Paul McCartney.

The answer would be the first five pairs from the previous answer, excluding ‘Hot Rocks’ since the dataset does not list Paul McCartney as a band member of the Rolling Stones.

Moving now from English to SPARQL, here is the encoding for the simple query ‘Which albums did the Beatles make?’ for the MusicBrainz dataset. For now don’t worry about learning the exact syntax; the important thing is to understand what the various bits and pieces are doing.

The query begins with PREFIX statements that define abbreviations for namespaces.

The query proper begins in the line starting SELECT, which also contains a variable (corresponding to X and Y in our English examples) starting with the question mark character ‘?’. Choose any word you like for the rest of the variable name, provided that you use it consistently.

The remainder of the query, starting WHERE, contains a list of RDF triple patterns. These are like RDF triples except that they include variables. They are expressed in Turtle, which we introduced last week.

The WHERE clause in the example has two RDF triple patterns, separated by a full stop. The first pattern matches resources made by the Beatles; the second requires that these resources belong to a class mo:SignalGroup (this rather weird name distinguishes albums, which are ‘signal groups’, from their constituent tracks, which are also encoded as resources made by the Beatles).

The response to a query is computed by a process known as graph matching, shown diagrammatically in Figure 2.3, where both query and dataset are shown as RDF graphs1 specified in Turtle (to simplify, only part of the above dataset is included).

Figure 2.3 Answering a query by graph matching

Before moving further with SPARQL query syntax, the next step provides a glossary of terms that you may find useful throughout the course.