Store your RDF triples in a database for faster performance and greater scalability.

by Rod Coffin

Sep 26, 2007

Page 2 of 3

RDF Store Overview
You have several options to consider when choosing an RDF database for your application. AllegroGraph is a commercial option that boasts impressive performance as well as other value-added features such as a built-in reasoner and support for federated databases. Three popular open source options are Sesame, Jena, and Mulgara.

Jena's traditional database layout is RDB and was optimized for the Jena Model API. A new Jena component, SDB, is being developed to offer an alternative database layout optimized for larger patterns such as those you would typically execute when performing SPARQL queries. SDB is currently in beta so Jena references in this article refer to the RDB database layout unless otherwise noted.

The aspects you will want to consider when selecting an RDF database are:

Database compatibility

API compatibility

Load and query performance

Tool support

Inferencing support

AllegroGraph is itself a database and as such doesn't rely on or truly integrate with traditional relational databases. It does, however, offer the ability to back up its operations to a relational database. Sesame and Jena, on the other hand, are not databases; they're toolkits for working with RDF data. Both have the ability to either sit on top of popular databases (see Table 1) or use file-based and memory-based modes.

Table 1. Database Compatibility: The table shows the relational databases supported by four popular RDF databases.

MySQL

PostgreSQL

Oracle

SQL Server

Derby

HSQLDB

Jena

Yes

Yes

Yes

Yes

Yes

Yes

Sesame

Yes

Yes

Yes

No

No

No

AllegroGraph

NA

NA

NA (backup)

NA

NA

NA

Mulgara

NA

NA

NA

NA

NA

NA

Sesame and Jena both allow you to store RDF triples. The schema that each uses to store the triples is proprietary, but each exposes an API to manage and query the stored RDF data. You can access AllegroGraph repositories through both the Sesame and Jena APIs.

Regardless of which RDF database you choose, you can access that store through either the Jena API or the Sesame API. The Jena Sesame Model project allows developers to access Sesame databases through Jena's model abstraction. Conversely, the Sesame-Jena Adapter project provides access to Jena models through the Sesame API. Although you can use either, you will generally be better off using the Jena API to access Jena databases and the Sesame API to access Sesame databases. You may want to factor in this affinity when deciding what set of trade-offs to make when selecting an RDF database (see Table 2).

Table 2. API Compatibility: You can access all the RDF datasets analyzed here through both the Jena and Sesame API.

Jena API

Sesame API

Jena

Yes

Yes (via Sesame-Jena Adapter)

Sesame

Yes (via Jena Sesame Model)

Yes

AllegroGraph

Yes (via AllegroGraph interfaces)

Yes (via AllegroGraph interfaces)

Mulgara

Yes (also exposes its own JRDF API)

No

Obviously, load and query performance are among the biggest factors affecting any RDF database selection. Performance benchmarking and tuning are always very contextual regardless of the technology being considered. I urge you to perform your own benchmarks in your own network and with your own hardware, datasets, and query types. Consult these links for performance benchmarks reported by the respective RDF database providers:

Tool support is another important consideration when choosing technologies; RDF databases provide different types and levels of tooling around the core function of managing RDF data. Sesame ships with graphical tools to manage a Sesame server, and supports load, query, and explore operations via a web interface. Although Jena offers only command-line management utilities, several related projects can help you manage Jena RDF databases:

Another potentially important consideration when evaluating RDF databases is the query languages they support. All the popular RDF databases explored here offer a proprietary query language into RDF data, but not all offer support for SPARQL, an emerging standard RDF query language. Table 4 highlights the differences in support for RDF query languages among the various tools:

Table 4. RDF Query Language Support: One distinguishing characteristic of RDF databases lies in their support for SPARQL.

Native RDF Query Language

SPARQL Support

Jena

RDQL

Yes

Sesame

RQL

No

AllegroGraph

SPARQL

Yes

Mulgara

iTQL

No

Inferencing support is yet another important characteristic to consider when selecting an RDF database. Sesame and AllegroGraph notably provide optional inferencing front ends that can dynamically create entailments during database operations and can insert these additional entailments along with the asserted statements into the database. Jena features a robust and highly configurable inference engine, but at this time you can't configure it as a front end to an RDF database. Fortunately, there's a relatively simple workaround; you can create your own entailments using Jena's inference engine and add those into your RDF database explicitly.