Metalog - Query language for RDF

Massimo Marchiori, Janne Saarela
{massimo,jsaarela}@w3.org

World Wide Web Consortium

The Resource Description Framework (RDF) Model&Syntax
Specification describes a metadata infrastructure which can
accommodate classification elements from different vocabularies
i.e. schemas. The underlying model consists of a labeled directed
acyclic graph which can be linearized into eXtensible Markup
Language (XML) transfer syntax for interchange between applications.

This paper will demonstrate how a new querying language, Metalog,
allows users to write queries in English-like syntax. We will
demonstrate how these queries have equivalent representation
both as RDF descriptions and as logic programs. We will also
show how an automated compilation between these translations is
possible.

The RDF representation of the queries and the XML namespace
mechanism together can be used to refine existing queries already
available and addressable on the Web.

Finally, we present some practical work with the Metalog
query language in the context of managing multilingual Web sites
and Access Control Lists.

Introduction

Resource Description Framework

Figure X. An RDF data model

Let's go even one step further. The data model is a labeled
directed graph. This can also be represented as predicates which
correspond with the arcs of the data model and connect thus two
nodes. The example in Figure X corresponds with 11 triples some of
which are presented in the following excerpt:

The example in Figure N. presents us the mechanism how RDF deals
with higher order statements. If another property asserts something
about another property, the RDF has decided not to allow nesting
of triples but a mechanism called reification which provides
a unique id for any assertion thus allowing it to be referer to
from other triples. This mechanism also sets the recursion limit
by not allowing reificated properties to be further reificated.

The following triple corresponds with four triples presented
in the following examples:

Syntax

RDF could have chosen a special syntax but due to the popularity of
the XML document encoding syntax, the decision was to build RDF on top
of XML. This relieves RDF from some specification work as for example
internalization (I18N) which is defined by XML to be based on Unicode.

The following example presents the XML encoding of the data model
presented in Figure X.

Query Languages

In general, query languages are formal languages to retrieve
data from a database. Standardadized languages already exist to
retrieve information from different types of databases such
as Structured Query Language (SQL) for relational databases and
Object Query Language (OQL) and SQL3 for object databases.

Semi-structure query languages such as XML-QL [ref]
operate on the document level structure....

Logic programs consist of facts and rules where valid
inference rules are used to determine all the facts that
apply within a given model.

With RDF, the most suitable approach is to focus on the
underlying data model. Even though XML-QL could be used to
query RDF descriptions in their XML encoded form, a single
RDF data model could not be correctly determined with a single
XML-QL query due to the fact that RDF allows several XML
syntax encodings for the same data model.

Metalog

We feel that the query language we are about to propose and
any other to be widely deployed on the Web must address the
following requirements:

The query language must be easy to author

The query language must have extensible semantics

The evaluation environment must be easy to implement

Requirement 1: Easy to author

To address requirement 1, we would like users to use a simple
syntax which reads easily.

If the language of a document is X
and the author of the document is Y
then the Y can speak X.
If the "Language" of a DOCUMENT is X
and the "Author" of the DOCUMENT is Y
then speak of X is Y.

Mapping RDF constructs to Metalog syntax

In order to support different primitive constructs of the
RDF data model in the query language, we proposed the following
syntax sugar to the query language.

If a "Language" of a DOCUMENT is "fi" then ...

The a keyword indicates the value will be searched from
a value of a Bag instance in the data model.

If the 2nd "Author" of a DOCUMENT is "Janne Saarela" then ...

The ordering keyword indicates the value will be searched from
a value of a Sequence container instance in the data model.
Also, as the order is significant, the match must be on the given
listitem.

If the "Language" of a DOCUMENT is "fi" then ...

The the keyword indicates the value will be searched from
a value of a Alternative container instance as well as
a direct value in the data model. As Alternatives are used to indicate
mutually exclusive values, the match can only happen for one query.
Thus, the result of this query introduces a new fact in the data model.

Requirement 2: Extensible semantics

Requirement 3: The evaluation environment must be easy to implement

Metalog queries as RDF schemas

RDF schemas provide as a way to define type systems using the RDF
data model. These types allow the authors of RDF entries to use
specific properties with corresponding constrained property values
with given arity.

We propose that metalog programs must have a corresponding RDF
schema representation or extensibility. In this way, an author of a
metalog query can point to a specific RDF schema representation of an
existing metalog query and refine the query himself.

Metalog allows the use to point to an RDF schema with a namespace
mechanism [wait for a good solid reference] that uses URIs. In this
way, each predicate i.e. propertyName within a metalog query will be
unique.

Figure N. Refinement of metalog queries using
URI addressing.

Higher-order statements in Metalog

Evaluation environment requirements

Due to the ways how RDF can represent values, the evaluation
environment should provide some a priori knowledg on how
the data model can be queried in general. Thus, we present
here some useful queries that should always be present
in the query evaluation system either passed along with
any given query or hard-wired in the evaluation code.

direct value - there is a fact in the corresponding data
model where the value is directly present in the triple.

proxied value through collection - If a property has
multiple values, the author may use different collections nodes
(Sequence, Bag, or Alternative) to indicate whether the values
preserve order or not, or whether they are mutually exclusive,
respectively. In this case, the value is proxied through an instance
of one of these nodes.

The following default rules define first of all corresponding rules
for the previous value cases and then rules to determine
reification and collection identify with reificated/4 and collection/1
predicates, respectively.

Results

This compiler is written in C++ using a combination of flex and
bison to create a parse tree.

RDF/XML document -> triple compiler

This compiler is written in Java and it uses the Simple Api for
XML documents (SAX) to initially parse RDF/XML encoded files. Once the
parse tree is available, a translation process is run again in the
tree to produce a corresponding triple representation of the
underlying data model.

RDF schema -> prolog syntax compiler

The compilation translates an RDF description of a query to a
prolog type syntax. We call it prolog type since the programs may
actually be out of the scope of the semantics Prolog supports. For
example, a procedure may have a disjunctive head.

Logic program evaluation environment

We have selected the Coral deductive database [X] as test
environment and we have set plain semi-naive evaluation strategy for
all test queries we will present later.

We would like to emphasize the fact that both compilers are easily
ported to different platforms from the Solaris 2.6 environment we have
been using. The evaluation environment is something we hope people
will be able to embed into different applications using different
evaluation strategies.

As input data we have been using a set of 2700 RDF data model
triples that correspond with the data available at the World Wide Web
Consortium technical reports page. This page presents the public
documents the consortium has published along with their authors,
dates, and URIs. The first example in Figure N is an excerpt of this
data.

The queries we wanted to test were of N different types that
will be discussed in the following test set-ups.

Trivial queries

We start with straight-forward queries using the example
described already in Figure N as our case example.

NAMESPACE URI "http://purl.org/schemas/DublinCore/RDF" ALIAS uri1
IF "uri1:Creator" of DOC is PERSON and
"uri1:Language" of DOC is LANGUAGE
then "Speaks" Person Language.

Query 1 - Metalog syntax

Query 1 - RDF/XML encoding of the query

Query 1 - Query in prolog syntax

Related work

The use of Web infrastructure to accommodate logic programs has
been suggested by (Sandevall, 1996) and (Loke & Davidson, 1996). The
latter approach suggests using familiar logic program notation to
place facts and queries on HTML pages. The embedded rules also have
the ability to refer to other HTML pages with other predicates using a
namespace mechanism. In this way, their evaluation context increases
over the amount of HTML pages they retrieve to find facts that satisfy
the queries.

Future work

Conclusions

Acknowledgements

The authors would like to thank Bert Bos for his help in running
the test sets.