Hi all (but especially students and academic staff),
Yesterday I found a bug in Redland's librdfa-based RDFa parsing
facilities. A fairly obscure markup pattern caused the librdfa library
to fail to generate an RDF triple. Redland/raptor deals with this by
throwing a fatal error, bringing my RDFa-parsing ambitions to a grinding
halt. This was on input data I'd generated myself (the curious can see
details at http://bugs.librdf.org/mantis/view.php?id=289 ).
If RDF (and especially RDFa) parsers are going to be robustly handle all
the scary messy markup that's out there, then I don't think we can wait
for humans like me to stumble upon the awkward corner cases that trip
them up. So I've a proposal (based on some old work by Janne Saarela):
I'd like to see an auto-generated repository of RDFa samples, most (but
not all) of which are decent wellformed XHTML with RDFa, but also with a
good number of poorly-marked up files. Note that poor, confusing or
downright weird markup may or may not trip up XML's wellformedness rules.
Here is an old set of RDF/XML test files autogenerated with Prolog:
http://www.w3.org/RDF/Test/Janne/
Related tools include the Dada Engine, http://dev.null.org/dadaengine/
(the tool behind http://www.elsewhere.org/pomo/ ) and Rmutt,
http://www.schneertz.com/rmutt/ ... either of which could be used to
make the output more entertaining.
Generating such a test set and then wiring it up to a set of RDFa
parsers (via http://rdfa.digitalbazaar.com/rdfa-test-harness/ or
something like it) shouldn't be a huge job, but it would be a very
useful one. I'd like to see perhaps 1000 'nonsense' RDFa documents that
experiment with every conceivable or inconceivable syntactic variant
that parsers might encounter in the wild. And then find out (a) if any
parsers completely fail with that input (b) what number and content of
triples are generated (c) whether the spec gurus agree on what ought to
be generated.
Does this sound worthwhile? Anyone willing to work on it or to help
explore it as a student project? Students would gain an understanding of
XML, RDFa grammars and on state of the art (and lack thereof ;) for
automatic tool support for assuring compliance with the standards.
cheers,
Dan
--
http://danbri.org/