05 January 2008

In Jena,
Graph is an interface. It abstracts anything that looks like RDF -
storage options, inference, other legacy data sources.

The main operations are find(Triple), add(Triple) and
remove(Triple). In
addition, there are a number of getters to access handlers of various features
(query, statistics, reification, bulk update, event manager) .
Having handlers, rather than directly including all the operations for each
feature reduces the size of the interface and makes it easier to provide default
implementations of each feature.

Implementing a graph rarely needs to directly implement the interface.
More usually, an implementation starts by inheriting from the class GraphBase.
A minimal (read-only) implementation just needs to implement graphBaseFind.
Wrapping legacy data often only makes sense as a read-only graph. To provide update operations, just implement the methods performAdd and performDelete,
which are the methods called from the base implementations of add(Triple) and
remove(Triple).

Then for testing with JUnit, inherit
from AbstractGraphTest (override tests that don't make sense in a particular circumstance)
and provide the getGraph operation to generate a graph instance to test.

Application APIs

Graph/Triple/Node provide the low level interface in Jena;
Model/Statement/Resource/Literal provide the
RDF API
and the ontology API
provides an OWL-centric view of the RDF data.

Where the graph level is minimal and symmetric (e.g. literal as subjects,
inclusion of named variables) for easy implementation, the RDF API enforces the
RDF conditions and provides a wide variety of convenience operations so writing a
program can be succinct, not requiring the application writer to write
unnecessary boilerplate code sequences. The ontology API does the same for OWL.
If you look at the javadoc, you'll see the APIs are large but the system level
interface is small.

A graph is turned into a Model by calling
ModelFactory.createModelForGraph(Graph). All the key application APIs
are interface-based although it's rarely needed to do anything other that use the
standard Model-Graph bridge.

Data access to the graph all goes via find. All the read operations of
application APIs, directly or indirectly, come down to calling Graph.find or a
graph query handler. And the default graph query handler works by calling
Graph.find, so once find is implemented everything (read-only) works.
ARQ's
query API,
which includes a SPARQL
implementation, included. It may not be the most efficient way but importantly all functionality is
available and so the graph implementer can quickly get a first implementation up
and running, then decide where and when to spend further development time - or
whether that's needed at all.

Jena-Mulgara

An example of this is a prototype
Jena-Mulgara bridge (work
in progress as of Jan'08). This maps the Graph API to a Mulgara session object,
which can be a local Mulgara database or a remote Mulgara server. The prototype
is a single class together with a set of factory operations for more convenient
creation of a bridge graph wrapped in all Jena's APIs.

Implementing graph nodes, for IRIs and for literals is straight forward.
Mulgara uses JRDF to represent these
nodes and to represent triples. Mapping to and from Jena versions of the same is
just the change in naming.

Blank nodes are more interesting. A blank node in Jena has an internal label
(which is not a URI in disguise). When working at the lowest level of Graph,
the code is manipulating things at a concrete, syntactic level.

A blank node in Mulgara has an internal id but it can change. It really is
the internal node index as I found out by creating a blank node with id=1 and
found it turned into rdf:type which was what was really at node slot 1.
Paul has been (patiently!)
explaining this to me on a Mulgara mailing list. The session interface is an
interface onto the RDF data, not an interface to extend the graph details to the
client. Both approaches are valid - it's just different levels of abstraction.

If the Jena application is careful about blank nodes (not assuming they are
stable across transactions, and not deleting all triples involving some blank
node, then creating triples involving that blank node) then it all works out.
The most important case of reading data within a transaction is safe. Bulk
loading is better down via the native Mulgara interfaces anyway. The
Jena-Mulgara bridge enables a Jena application to access a Mulgara server
through the same interfaces as any other RDF data.