Plone and SPARQL

One side to the Plone/SPARQL coin is to let Plone expose its catalog as a SPARQL endpoint. The other side lets Plone/Zope access remote SPARQL endpoints and deal with result sets.

Everyone else is doing it, so why don't we? Take a look at projects like SparqlPress, which aim to do similar things for WordPress.

This isn't the first time the Semantic Web has been munged with Plone/Zope. A while back (2005) there was a project called Zemantic which Elliot told me about after seeing a presentation. This looked to be going strong until all of a sudden it stopped and the website disappeared. The source code is still available and we should see what's been done.

Then in 2006, Plone Proposal #103 was made which talked about integrating RDF and Plone. It did get some traction, but then sadly died, although not without some interesting comments from a couple of chaps: Nick Bower and Tim Hoffman, who seem to have similar ideals to us.

Plone as a SPARQL Endpoint

There are a few ways we could imagine doing this. One would be to take SPARQL queries and evaluate them at run time against an algorithmic mapping between the contents of the portal_catalog and a virtual RDF representation. What I mean by this is that we would evaluate the algebraic representation of a SPARQL query and simply map triple patterns like (X, rdf:type plone:Image) to simple queries against the portal_catalog, building up an RDF model as we go.

Another approach would be to build an RDF model alongside (in sync with) the portal_catalog, and simply query that directly.

The tools we have available really boil down to a couple of choises. One is the Python based RDFlib, which amongst other things would allow us to store the RDF model in the ZODB, and which on digging also seems to have some support for SPARQL's algebraic representation and evaluation strategies. The other is Redland's Python bindings, which may ultimately be more optimised, but which would require an external triple store.

Since our architecture nicely loosely couples our data from our implementations and schemas, we won't lose much by just attempting to use RDFlib to start with and postponing the decision on the best way to do things as an optimization issue. For starters we need to figure out how to model the information in the portal_catalog in RDF, i.e. come up with an ontology.

Plone as a SPARQL Client

So far we've not thought too much about this, but over the years have had a few goes at doing similar things. The main idea is that one would like to present information in a web page (or set of pages) which is the amalgamation of information from a bunch of different sources. To date, we've used a combination of KebasData, RDF Grabber and Page Templates, which has worked rather well, if not necessarily scalably.

While it is still the case that using Page Templates to walk over an RDF graph is appealing, the more standard approach to fetching external data is to use queries and then to iterate over result sets. ZSQL is/was one way to do this with external relational sources, so perhaps we should try the same thing with SPARQL, ZPARQL?

Now, one of the issues raised in Plone Proposal #103 was that of scalability. I reckon that this is the same issue we're trying to address by doing ontology driven federated search, in that on the one side we want the full expressive power of OWL-DL plus DL safe rules, while on the other we want to be able to address large amounts of distributed instance data. One method is always to collect all that instance data locally and then put an inference engine in front, but scaling that is still work in progress (c.f. IBM's SHER). The method we're proposing is more pragmatic, in that for each query we try to collect enough instance data to satisfy the query, then run the inference engine over this collected data and return the results.

Since we've got a component based architecture, this latter approach will be implemented as a separate component and therefore have its own SPARQL endpoint which can be addressed via ZPARQL methods. So Plone as a SPARQL client will be able to either query itself directly, or indirectly query all available and relevant data sources using this separate component, for which we need a name, let's say Ontology Driven Distributed SPARQL, ODDS.