-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
State of Redland 2007-02
Redland was born 2000-08. Happy 6.5th birthday!
This is a review of the last approximately 15 months since I moved to
the USA in Oct 2005 to work for Yahoo! Media Group in Sunnyvale,
California. It covers:
* Review of Redland users, current state, releases
* Redland challenges, tasks including work already underway and my
future ideas
* Call for participation: how I want to change the project
(This is on the web at http://librdf.org/2007/02/18-state/)
1. Redland Users
Redland is made available by several Linux, Unix and other open source
projects such as:
* Debian (sarge, etch)
* Fedora (FC4 onwards) : just Raptor
* FreeBSD Ports
* Gentoo
* Mandriva (9.1 onwards) : Raptor and Redland
* SUSE (9.2 onwards)
* Ubuntu (breezy, hoary, dapper, edgy)
and the libraries are also used inside other applications and services
such as, for example:
* ActiveRDF ruby RDF
* Amaya web browser and HTML Editor
* Ardour digital audio workstation
* Hydrogen simple drum machine/step sequencer
* Morla RDF graphical editor
* My Opera
* Nepomuk KDE semantic desktop app
* The Venice Project client side (I think!)
* Venus feed aggregator
* Yahoo! Food, TV, Personal Finance ... web sites
* ... but I am not keeping track of these very well in the
applications list ...
2. State of the libraries
My summary of the high-level state of the libraries is:
Raptor syntax parsing and serializing: libraptor
Very mature. The API is changing rarely, mostly bug fixes or
adding new features to existing parsers/serializers or adding
entirely new ones.
Rasqal query parsing, executing: librasqal
Under development. The API is changing with each release as it
is both not complete and the SPARQL query engine implementation
is not fully functional
Redland RDF API and triple stores: librdf
Mature. Some API change is happening to add new features
especially for query and storage.
Binding languages
A mixture of mature bindings such as Perl, Python which are
well tested, working and complete and immature ones with little
testing or known incomplete, such as Tcl and Java.
I feel it is too large for one person to maintain who has all
the N-language skills unless that person is me and I do nothing
else!
3. Releases
For each of the libraries, the period above has seen the following
releases with major changes:
Raptor 1.4.8 - 1.4.14 (7 releases)
+ A new user tutorial covering the entire API was written.
+ A new RSS tag soup parser was added
+ New Atom 1.0, RSS 1.0, Turtle and DOT serializers were added.
Rasqal 0.9.11 - 0.9.13 (3 releases)
+ Updated the SPARQL syntax support to match the November 2005
and April 2006 W3C Working Drafts.
+ Can now serialize query results to JSON.
+ Added APIs to manager query results serializing
+ The query engine had it's ordering, distinct and limit
support fixed.
+ Lots of internal query engine changes, in particular to split
the query parsing ('prepare') and the query execution
('execute'). These were too intertwined in earlier versions.
So now you can nearly execute the same query multiple times.
Redland 1.0.3-1.0.5 (3 releases)
+ A new PostgreSQL storage was added
+ Many fixes for SQLite storage
Language Bindings 1.0.3.1-1.0.5.1 (3 releases)
+ Many fixes were made across all the bindings especially to
handle query results.
+ The Python and Ruby bindings got many fixes
and all of them have benefited from better API documents using gtk-doc
to replace the older kernel-doc, giving better DocBook and better HTML
output. The entire project also switched over from CVS to Subversion
early in 2006.
4. Challenge
The main challenge I see is to make the project more scalable - moving
from the current state where I do all the packaging and am the main
developer. To help this, my goals for 2007 are to:
* Try to make the development more of a shared task
* Make it easier to work on just part of Redland
* Turn the main website into a shared read/write developer resource
* Schedule #redland IRC developer meetings if that will help give
the project more of a regular heartbeat
5. General Tasks
More of a wishlist than an ordered list
* Think about a License change to Apache2 only.
* Make Redland turn SPARQL into underlying SQL queries when
possible.
* Create the redland developer's site in something like Drupal.
* Start the redland (librdf) API tutorial.
* Create some documentation to explain the libraries structure and
relationships.
* Consider not shipping raptor and rasqal inside the redland tarball
* Create documentation on the data flow inside the libraries
* Figure out whether to keep writing manual pages as well as gtkdoc.
(DRY)
* Figure out where module/implementation documentation goes, such as
storage options in redland, parser features in raptor etc. This is
needed in C and in the bindings as it is not about the actually
functions called. (DRY)
* The demos need to be updated and the changes made put back into
subversion.
* A SPARQL protocol endpoint demo would be good to have
DRY = Don't Repeat Yourself
5.1 Pending stuff
There are several tasks already in progress either sitting in a patch,
in Subversion or underway separately.
* A new schema for the SQLite store: me (patch)
* Redland transaction support: me (in Subversion)
* Object-based PHP5 bindings: Yahoo! (pending)
* SPARQL syntax extensions called LAQRS: me (in Subversion)
* Apache2 mod_sparql: David Reid (separate project)
* A new native Ruby binding not using SWIG: somebody on IRC
* Complete the Raptor GRDDL support: me (in Subversion)
5.2 Raptor tasks
* Complete the GRDDL support: nearly done
* Bug fixes only for 2007
5.3 Rasqal tasks
* Make Rasqal be able to execute complete SPARQL
* Make SPARQL OPTIONALs work
* Make SPARQL GROUP work
* Make SPARQL UNION work
* Make datatypes work, especially xsd:date and xsd:decimal (bignum
library)
* Read result sets from the sparql query results XML
* Write a query optimiser
* Add a way to declare extension functions
* Look into language extensions
* Address query engine denial of service:
+ limit query wall clock time
+ limit triple pattern matches
+ callback to allow application to abort queries?
+ limit memory use?
+ limit sorting of results?
+ limit URI fetching is done now with the raptor changes
5.4 Bindings tasks
* Split the single language bindings package to be one per-binding.
That would be: Perl, PHP5, Python and Ruby
* Make the Perl binding into a CPAN installable tarball - partially
done but not entirely working
* Deprecate or remove bindings that have no active maintainer. These
would be C#, Java and Tcl.
6. Future Ideas
6.1 New Version control system
This is more speculative and I am giving no firm commitment that this
will happen soon. Subversion is stable and well supported.
Move from Subversion to a more distributed development-friendly
version control system.
My requirements for a new VCS:
* Distributed - no central repository required
* Can operate networkless
* Friendly to managing patches
* Quick
* Reliable and successful (no research project, bleeding edge)
* Mature
GIT seems one possibility - I tried this conversion already and it
worked well. Mercurial I couldn't get it converted without losing
information. SVK I'm not so sure about, as I don't like VCS that are
layered on others e.g. CVS still leaks it's original RCS basis. I
didn't try DARCS. Arch / Bazzar / Bazzar-ng is too bleeding edge. This
is a medium term goal.
6.2 Raptor Version 2
This is a break-the-binary-API choice, not a rebuild. The main reason
to do this would be to add a 'world' style argument to constructors,
like redland has and similar to the curl handle, APR pool or BDB
environment. This would mean that raptor_init() and raptor_finish()
would be replaced by something like rw = raptor_new_world() and
raptor_free_world(rw).
One other reason to do this wuld be to add a pull-style triple parser,
where the model is:
parser = new RDFXMLparser()
parser.start_parse( { URI => uri} )
while (not parser.done())
triple = parser.raptor_get_next_triple()
...
delete parser
... rather than the current one of receiving triples via a callback.
However, this would either needed a pull-based WWW library (I know of
only libwww and I don't want to use that) or batch up the triples in
memory by wrapping the push-based parser or multiple threads, which
has it's own set of problems. This would also need an update to the
raptor_iostream class to add read methods, but that's easier than the
first problem. So this is likely not V2 stuff.
For V2 there would also be a bunch of other API cleanups:
* Rename all raptor_foo functions to be raptor_parser_foo where they
really are about parsers
* Ditch the URI context/data and use raptor_world to hold that
* Alter raptor_statement to have 4 components including a context /
graph / formula so that Raptor could parse N3. Possibly rename it
to raptor_triple.
So in summary: this is not being done soon.
7. Call for participation
This is your opportunity to help more directly with Redland, in
particular with language bindings as there are a trickle of patches
and fixes to these that take me some time to get to looking at and
releasing.
These are the areas I've seen that can benefit from an active person:
* OSX porter / (ObjC binding maintainer)
* Win32 porter
* Perl binding maintainer
* Python binding maintainer
* (New Ruby binding?)
* (New PHP5 binding: Yahoo! pays me to look after this)
and deprecate / remove the bindings for C#, Java and Tcl. They stay in
Subversion, but are no longer shipped.
What saying "yes" to one of the roles above would mean is gaining the
role in the bug tracker for the area and gaining commit to the Redland
Subversion for the area, which might mean adding a new area if needed.
It might also be that the bindings single package is split into
individual language packages means a Subversion change to match.
Thanks for reading.
Dave Beckett
California, USA, 2007-02-18
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFF2IzXQ+ySUE9xlVoRAoboAJ4texM+WY5d1qZG7RtUciL8uTpKuACdEpZn
Xtd8367vA6Ahc0IXxOtLss8=
=sW7e
-----END PGP SIGNATURE-----