13 April 2006

SPARQL is now (April 2006) a W3C Candidate Recommendation
which means it is stable enough for wide spread
implementation. Actually, there are quite a few implementations already
(SPARQL
implementations page on ESW wiki).

RDQL predates
SPARQL - in fact, RDQL design predates the
current RDF specifications and some of the design decisions
in RDQL are a reflection of that. The biggest of these is
that RDF didn't have any datatyping so RDQL handles tests
on, say, integers without checking the datatype (if it looks like
an integer, it can be tested as integer).

SPARQL has all the features of RDQL and more:

ability to add optional information to query results

disjunction of graph patterns

more expression testing (date-time support, for example)

named graphs

sorting

but, above all, it is more tightly specified so queries in one implementation
should behave the same in all other implementations.

ARQ - A Query Engine for Jena

In parallel with the developing the SPARQL specification,
I have been developing a new query subsystem for
Jena called
ARQ.
ARQ is now part of the Jena download.

ARQ builds on top of the existing Jena query support
for matching of basic graph patterns
(BGPs are the building block in SPARQL).
ARQ can execute SPARQL and RDQL as well as an extended form of
SPARQL. It has several extension points, such as
Property functions. The ARQ query engine works with
any Jena Graph or Model.

Converting RDQL code to SPARQL code

The functionality of RDQL is a subset of SPARQL so it's
not hard to convert RDQL queries to SPARQL. What needs
to be done is convert the triple syntax and convert
any constraints.

Syntax

SPARQL syntax uses a Turtle-like syntax which is familiar to anyone knowing N3.
Namespaces go at the start of the query, not after like
USING. There are no () around triple
patterns; instaead there is a "." (a single dot)
between triple patterns. An RDQL only ever has one graph pattern,
in SPARQL, blocks of triple patterns are
delimited by {}

You can even use the command line
arq.qparse to read in an
RDQL query and write out the SPARQL query but it's a rough approximation
you'll need to check and it may not be completely legal SPARQL.

Constraints

The constraints need the most care because SPARQL uses RDF datatyping
and RDQL doesn't. Some common areas are:

regular expressions

string equality and numeric equality

Regular expressions

A SPARQL regular expression looks like:

regex(expr, "pattern")
regex(expr, "pattern", "i")

The catch is that the expr must be a literal; it can't be a URI.
(Well - it can, but it will never match!). If you want to perform a regular
expression match on a URI, use the
str() built-in to get the string form of the URI.

regex(str(?uri), "^http://example/ns#")

Equality

RDQL has = for numeric equality and eq for
string equality. A number in RDQL was anything that can be parsed an a number, whether
it had a datatype or not (or even the wrong datatype). Likewise, anything could be
treated as a string (like URIs in regular expressions).

SPARQL has = which is taken from
XQuery/XPath Functions and Operators. It decides whether that is numeric equals,
string equals or URI-equals based on the kind of arguments it is given.

API Changes

The ARQ API is in the package com.hp.hpl.jena.query. The RDQL API
is deprecated, starting with Jena 2.4. The new API
is similar in style to the old one for SELECT, with iteration over
the rows of the results (javadoc).
Differences include the widespread use of factories, naming consistent with the
SPARQL specifications, and different exec operations for the
different kinds of SPARQL query. QueryExecution objects should be
properly closed.

One change is that to get the triples that matched a query, instead of asking the binding for the triples
that were used in the matching, the application should now make a
CONSTRUCT query.