23 November 2006

Basic graph patterns (BGPs) are a block of adjacent triple patterns
that match, or don't match, all together. In BGP matching every named
variable must have some value for the BGP to have been matched. Filters
add restrictions on the values variables can take.

They are also an extension point in SPARQL because you can plug in
a different BGP matcher (e.g. a DL reasoner or a wrapper for legacy
SQL data), then reuse the other two layers to form complex queries out
of the exact matches from BGP matching.

Graph expressions (OPTIONAL, UNION,
GRAPH, Groups (things between {}) combine BGPs in various
ways to give more complicated patterns. Graph expressions are recursive
so you can have patterns within patterns. A SPARQL query has one
graph expression in the WHERE clause.

Solution modifiers (Project, DISTINCT,
ORDER BY, LIMIT/OFFSET) take the output of
matching the query graph pattern and process it in various ways to
yield the result set.

Current Situation

The semantics for SPARQL have, so far, been declarative and top-down.
A solution to a query passes if it meets all the conditions following
from the definitions of each of the SPARQL graph pattern forms and filters.
That means, in theory, the whole of the solution is available.

In fact, ARQ builds solutions up by adding the
minimum necessary at each subpattern to make the solution match.
This is what gives ARQ
streaming evaluation which keeps the total memory use down. But the
effect is the same, whole solutions are available everywhere in the
pattern.

Another way is to have a bottom-up algebra. The
relational
algebra works this way. Subexpressions are evaluated
to give tables (relations) and then these tables combined together
to form new tables.

It turns out that these are nearly the same as shown by
Richard Cyganiak and
Jorge Perez et al
(see references below).
The class of graph patterns that differ are a certain kind of
nested
OPTIONAL where a variable is used on the outside,
fixed part, and in the inner OPTIONAL, but not in
between.

Now that is a rather unusual query.
The tricky bit is the use of ?v at the outermost and
innermost levels but not in between.The query can be rewritten so
as not to nest (note the repeat of :x3 :q ?w).

In DAWG, we're considering
an algebra for SPARQL, based on the paper
Semantics
of SPARQL. This will mean a change of results in the case
above. I've never seen such a query in the wild, only in artificial test cases.

The change from declarative semantics to a constructional algebra is
motivated primarily by the implementers in the working group wanting to
apply all the good stuff on relational algebra optimization to SPARQL
engines. The complexity of the exact mapping to SQL in the
semantics preserving SPARQL-to-SQL is daunting.

Not that simple!

But other things differ as well if a purely relational style is
applied (and that's not being proposed currently) because
"Semantics of SPARQL"
does not consider the difference in treatment of filters (it's focus is on the
combination of graph patterns).

Two cases arise: one where filters are in OPTIONALs, and one
where filters are nested inside groups.

SELECT *
{ ?x :p ?v .
OPTIONAL
{
?y :q ?w .
FILTER(?v = 2)
}
}

The FILTER uses a variable from the fixed part of
the OPTIONAL. Under Jorge's semantics the variable
isn't in-scope, so the filter evaluation is an error (unbound variable),
otherwise known as false; the optional part never matches.
This form of query is realistic and does occur for real
so the current proposal for DAWG makes this work. Like SQL
and LEFT OUTER JOIN with the ON clause,
the LeftJoin operation in the SPARQL algebra can take a condition
which applies over the whole LeftJoin.

The fixed
pattern could be repeated inside the optional part but such repetition
for something application might well use is bad, very bad.

Application writers will have to take care about scope and
groups: putting FILTERs inside {} changes
their scope.

Currently,

{ :x :p ?v . FILTER(?v < 5) }

and

{ :x :p ?v . { FILTER(?v < 5) } }

have the same effect. But they aren't in the algebraic form
because { FILTER(?v < 5) } evaluates on the empty
table that does not have a ?v in it; evaluation is an
error and hence false.

ARQ

There is an implementation of the algebra in
ARQ in CVS
(not in the current release ARQ 1.4). It's enabled in the command
line tool
arq.sparql
with --engine=ref from the command line and

QueryEngine2.register() ;

in code.

The work-in-progress
unfinished
editor's working text is available. Check it is actually up to
date because that's just work space and isn't a DAWG document. It does not
reflect working group decisions - it more a proposal for consideration.

Disclaimer: Any views expressed here are mime and not
the working group (which at
the time of writing is thinking about it but hasn't decided
anything yet).