data.ox.ac.uk has a Python frontend, which queries a Fuseki instance over HTTP. When a request for a page comes in it performs a SPARQL query against Fuseki, asking for results back as RDF/XML. These are then parsed into an in-memory rdflib graph, which can then be queried to construct HTML pages, or transformed into other formats (other RDF serializations, JSON, RSS, etc).

In a bid to make things a bit quicker I decided to benchmark some of the rdflib parsers. I timed rdflib.ConjunctiveGraph.parse() ten times for each parser (interleaved) over 100,000 triples. Here are the results:

This isn’t a perfect benchmark as my work box was doing who-knows-what at the same time, but things should have evened out for a comparitative analysis. It’s also quite clear that the N-Triples parser is about three times faster than the expat-based RDF/XML parser. On the basis of this I’m going to make the data.ox.ac.uk frontend request data as N-Triples; hopefully it’ll have a noticeable improvement on response times. I am slightly shocked that the RDF/XML parser only manages an average of 238 triples per second.