Summary

To write any such rules in the XML syntax of RIF Core WD1 we have to make some assumptions about the mapping from RDF to RIF. It is easy to invent such mappings but there are several alternatives.

Once we have the mapping then to express the rules in a concrete XML syntax as opposed to the abstract syntax we have to fill in the blanks on the tedious details like namespace handling and datatypes.

We would also like to be able to annotate the rules and rulesets in various ways. This example does so using RDF.

It was fairly easy to guess what the intended XML syntax was like from the current write up by my first guesses had mistakes. This may just be my fault but I think it suggests the need for a rather clearer specification of the syntax.

Background

Use case Vocabulary Mapping for Data Integration is about integrating data from multiple sources. The sources are provided as RDF conforming to RDFS/OWL vocabularies. Rules are used to translate the individual source representations to a common vocabulary.

The use case is loosely based on several existing applications of Jena and JenaRules, at least one of which is shipping commercially.

Source rules

These are artificial rules loosely based on the existing applications. They have been chosen to illustrate the typical range of features used in this class of applications.

The first is the simplest and supports quantification over RDF predicates without requiring quantification over RIF relations. The second is in some ways the most "natural" since we would normally regard an RDF triple as representing an instance of a binary relation.

For this exercise we chose the second option.

RDF Resource mapping

Next we have to decide how the map all the URIs like it:ComputeNode into Const(ant)s.

They could be simply strings, they could be instances of some URIsort or they could be called out as special cases in the abstract syntax.

Strings is the easiest but for the concrete syntax we would prefer some sort of qname/curie syntax. At the abstract syntax level this is irrelvant and boring. At the concrete syntax level handling namespaces in XML is one tedious headache. To simplify this we extend the syntax to use attributes for (c)URI(es).

So for example the first triple pattern in the first rule would look like:

This assumes some RIF-mandated rule about expansion of curies. This may not be acceptable W3C practice in which case wherever you see "pre:foo" imagine you actually see "&pre;foo" where pre is an XML Entity.

Quantification over predicates

Having chosen the "natural" mapping that RDF predicates map to binary RIF relations (Consts) we have a problem with rule r5 which quantifies over such predicates.

To cope with this we extended the synatax in an "obvious" way so the second triple pattern in r5 would look like:

However, from discussion with Harold it seems that Const is intended to be a leaf node not a role specifier and there is a role specifier <op> that can be used in this situation so that the recommended syntax is:

Datatypes

Rule r4 has a numeric constant (the rule syntax 0.75 will translate into the RDF literal "0.75"^^xsd:double). RIF has no agreed concrete syntax for such constants so we adopt an obvious one using an attribute to give the datatype, assume all reasonable XSD atomic datatypes are supported and that curie/qname syntax is supported.

So we assume the constant will look like:

<Const rif:datatype="xsd:double">0.75</Const>

in the bipartitioned proposal this would become:

<Data rif:datatype="xsd:double">0.75</Data>

Builtins

Rule r4 also refers to a builtin function ("product"). We haven't yet discussed specific sets of builtins for RIF. Since we are using XSD for atomic types it might be logical to use XQuery functions and operators but that doesn't supply URIs to identify the operators like "*". Similarly we could use MathML but that only gives us QNames and not URIs. So we'll just pretend RIF has defined a set of builtins.

bNodes

This class of JenaRule application goes outside Horn in that the rules can manufacture new bNodes to represent objects we know to be present but are not assigning a URI. This is treating bNodes simply as Skolem constants. We've represented this as the makeTemp builtin in rule r3. To translate this rule we assume some equivalent genSymfunction than can manufacture a skolem constant.

Rule naming

A boring but sometimes useful feature of the source rules is the syntactic rule label. We'd like be able to attach arbitrary descriptive metadata to rules such as names, descriptions, authors etc

We could have a single Literal for the entire rule and follow the B.1 rule syntax for that literal. However, then for the rules with repeated bodies (lots in this example) we end up with a lot of duplication. To make the translation marginally less unreadable we've chosen to go with a head/body/vars split to enable us to have multiple heads as a purely syntactic convenience and just use the A.1 syntax for those components.

[Note that rif:vars as used here implicitly implies universal quantification, if we were to adopt something like it then the quantification should be explicit.]

This not a fundamental issue and switching to a single rif:ruleSrc property which points to a B.1 literal with all rules expanded in full would be perfectly acceptable.

Rule and Ruleset labelling

It would be convenient to be able to label these rules as being indended for processing RDF data so a translator knows to expect only binary predicates.

It would also be convenient to be able to label them as intended for model transformation rather than deductive closure. In the original application the rules are actually preduction rules with implicit "asserts" for each triple in the conclusion. The desired output of the rule processor is just the set of newly asserted conclusions not the full deductive closure. This procedural usage is presumably outside RIF core but we can at least annotate the ruleset to indicate this was the original intended usage.

For both of these we've invented RDFS/OWL classes and used them as annotations on the base Ruleset. One can argue if they should be Rule rather than Ruleset classifications.