From: Sean Bechhofer <seanb@cs.man.ac.uk>
Subject: Streaming OWL Parsers
Date: Fri, 14 Feb 2003 16:49:21 +0000 (GMT Standard Time)
> In trying to build some "OWL Implementations", a number of issues are
> coming to light, not least with how one deals with XML-RDF encodings of
> OWL.
>
> We've been trying to build a streaming parser, but this is proving
> difficult with OWL represented as RDF triples. Given an OWL-RDF ontology,
> the various triples that make up any expression or assertion could be
> scattered liberally throughout the document, so even if I try and build
> things in a streaming fashion, there's a load of stuff that I'm going to
> have to cache or remember or make assumptions about and clean up later.
>
> If we're talking about the kinds of example model that we've seen in
> things like the test sets, this is, of course, not an issue. Just build
> the data structure. Big deal. But what happens when I've got an ontology
> with 10^8 concepts/individuals in it and I want to do some simple
> processing on it, that doesn't necessarily warrant me building the whole
> data structure? Perhaps I want to find all the assertions of a certain
> form. With a more rigorously structured grammar (like an XML schema or
> lisp like expressions), things would be easier, but when presented with a
> collection of triples, if I'm very unlucky, I'll end up having to
> construct some structure in order to get what I want out of it.
>
> <flippant>
> An analogy that springs to mind is that it's like building a model of the
> eiffel tower out of matchsticks. Except that what's happened is that
> someone's done it already, labelled each pair of touching ends of each
> matchstick with a unique number, dismantled it, and then given it to you
> in a box saying "what do you think that is then?".
> </flippant>
>
> Ok, perhaps an over-exaggeration, but it's what it feels like sometimes.
>
> Of course, I'm not claiming that it's *impossible* to do this, it just
> seems a lot of work, and requires some unpleasant and messy engineering,
> when the presence of further constraints would make things a lot easier.
> This is not intended as a simple "bash-RDF" polemic -- I'd be interested
> to hear any positive experiences of trying to handle similar
> languages/grammars expressed as RDF schema. Is there any? Is it possible
> to guarantee that you can do this without having to load the whole thing
> into memory at some point?
I believe that it is in general impossible to write an n-pass streaming
``parser'' for many interesting RDF tasks. For example, suppose that you
are given a sequence of RDF triples, and want to emit the RDFS-instances of
a particular class. It might be the case that you have to determine the
sub-classes of this class to an arbitrary depth, and these triples might be
arranged in reverse order. Thus there is no n-pass constant-memory-size
algorithm that can do this, for any fixed n.
In OWL, you might simply want to extra an axiom from a sequence of RDF
triples. This again cannot be done using an n-pass constant-memory-size
algorithm because the nesting might be done in reverse order.
> And if someone actually has implemented a streaming OWL parser, simply let
> me have it and then I'll shut up and go away :-)
I would be very surprised if one existed.
> Sean
peter