RDF for Fun and Profit: Using Empire

Empire is an
implementation of the Java
Persistence API (JPA) for RDF and the Semantic Web. Instead of
another implementation of for relational databases, Empire
implements JPA for RDF and SPARQL, thus
allowing developers who are familiar with JPA, but not with semantic
web techologies like RDF, to make an easy transition into this
brave, new world. JPA is a specification for managing Java objects,
most commonly with an RDBMS; it's industry standard for Java ORMs.

What Itch Does Empire Scratch?

We started Empire—which is available under the terms of
Apache 2.0 open source license—to bridge the gap between an
RDBS-backed web application and the Semantic Web. We built a web
application for a customer which used JPA & Hibernate, but we also wanted
to provide a SPARQL endpoint so that we could use Pelorus, a faceted browser
for RDF and SPARQL. Ideally, we wanted to use a JPA implementation
which would operate against an RDF database in support of these
requirements. The objective of this article is to walk through some
basic uses of Empire to illustrate how it can be used in your
application. For the purposes of the article, we’ll present some
examples from an application which uses metadata about various
O'Reilly books.

Persistence With Plain RDF

O'Reilly has recently started publishing its catalog pages with
RDFa markup as mentioned here.
For example, if you checkout the page for “Switching to the Mac”
you’d find this RDF embedded in the page:

If you were to create this data by hand using the Sesame API, it's going to look
something like this:

You might have factory classes or constants to represent common
concepts such as terms from the FOAF or DC ontologies; but for the
most part, creating RDF data is going to look quite similar to this.
While this is a perfectly functional example, you might find a
couple issues with it. First, this code does not look “natural” —
that is, it does not represent what is actually going on in an
easily discernible way. It doesn’t really look like we’re creating
some data about a book in the O'Reilly catalog. It also has locked
us into a particular RDF API; this is Sesame code. It’s a
non-trivial task to transition this code to another API. Third, the
code is only going to make sense to someone who is familiar with
RDF; it exposes a lot of RDF minutiae to the developer, which is
only going to increase the learning curve for new developers.

What we want are simple Java beans to represent concepts in our application; that application code is easier
to create and maintain and does not leak RDF specifics into the codebase nor does it tie us to any particular RDF API.

Consider the following example:

This code is much easier to work with; it's more clear in what it's
trying to accomplish, it succinctly represents our domain, does not
tie us to any API other than our own, and exposes no RDF details to
the programmer. Nearly any developer, Java or otherwise, could look
at this code and immediately understand what's going on. Obviously
using Plain Old Java Objects (POJOs) is ideal, but that is only half
of the challenge. We still need to save, remove and search for our
data, and we want it represented as RDF. This is where Empire comes
in.

Empire Persistence

If you’ve used a JPA implementation before, a lot of the following code should be very familiar to you.
Mappings between a Java bean and an RDBMS are often controlled through the common annotations provided by
JPA. You typically begin by declaring that your bean is a JPA entity:

Empire simply extends this approach by adding an additional annotation to the class to specify its type:

We’ve now mapped instances of the Book class to individuals of the
frb:Expression class. You’ll notice an additional optional
annotation, @Namespaces, on the class where we specify namespaces
that we’ll use throughout our markup; this allows us to use qnames
instead of full URIs. We need to make one last change before we can
start mapping the properties of the class to RDF: we need to assert
that this book can have an RDF identifier:

In Empire it’s easier to work with named individuals than anonymous
ones; but Empire supports both and provides builtin handlers for
keeping them straight. You never have to worry about setting or
creating ids. Now we need to map the properties of our Java bean to
the properties of our instances of the Book in our database.
Typically, using
Hibernate, Toplink
or another JPA implementation, standard properties are very easy,
you just declare them:

These three fields will get persisted in the database when you save your Book object. If you have a collection of items, you’ll just need
to specify some basics of the mapping:

Empire only requires a little bit more information; namely, it needs to know what property each field in your bean corresponds to:

With these simple additional annotations, the Java bean can now be used with Empire.

Using Empire

Initializing Empire is trivial, you simply need to declare which API
bindings you’d like to load. The following example shows how to
load the support for Sesame, which allows Empire to connect to
Sesame repositories. You can load multiple bindings at once and
have different persistence contexts connected to databases of
different types, while still maintaining the same public API:

Here we use the standard JPA framework to grab an instance of our persistence context named ‘oreilly’. The resulting EntityManager
will be connected to the Sesame repository specified in our configuration:

The following shows how to retrieve a specific item, in this case a book, from the database and print some of its data:

Here we show that finding the same object in the database yields an instance which is .equals() to our
original copy:

Additionally, we then make some edits to our original and save them
back into the database. Our copy remains unchanged and is a
snapshot of the state of the book at the time we retrieved it. This
also shows how attributes on the JPA annotations can control the
persistence behavior; in this case, how persistence is cascaded
between objects:

We can always refresh a “stale” object with the latest data from the database.

Here is an example of removing an object from the database; it again demonstrates how persistence operations are
controlled through the JPA annotations:

A final example demonstrates how standard JPA parameterized queries can be used with normal SPARQL to query the database:

Features and Support

Empire implements as much of JPA as possible while attempting to
retain the expected behavior based on the JPA spec. There are
features and portions of JPA that Empire does not yet support, such as
@SqlResultSetMapping; and some others that have no correlation to an
RDF based system, such as @Table or @Column.

Configuration
of Empire
is controlled through simple properties or XML format files loaded
at startup. There is no tricky XML mapping language to learn, all
mappings are controlled through the standard JPA annotations. The
configuration files simply define the connection parameters for
your database as well as allow for global properties to be used by
all databases.

Empire uses Dependency Injection
via Google Guice
to manage its plugin architecture
and Javassist
for bytecode manipulation; generating instances from interfaces or
abstract classes at runtime and lazy loading of resources from the
database using method interceptors. This allows Empire to provide an
API agnostic mechanism for working with RDF databases, thus avoiding
API and/or database lock-in. Empire provides out of the box support
for Sesame, Jena, 4Store; support for BigData, Oracle
11g, and Virutoso coming soon.

Conclusion

Empire provides a standard, widely-known Java persistence framework
for use in Semantic Web projects where data is stored in RDF. By
providing an implementation of JPA and using it to abstract the
minutiae of RDF, it lowers the learning curve for new developers, and
helps provide a straightforward path for migrating or enhancing
existing traditional web applications to use semantic technologies.