Sir Thomas More's famous treatise, Utopia [1], recounts the experiences of a fictitious
traveler to an imaginary island where everyone lives well, all citizens are educated, and no
one is left behind. Penned in 1516, Utopia describes how government,
professions, social relations, travel, the military, religion, and even marriage work in that
"ideal" world.

If Sir Thomas were writing today, he would do well to include a chapter on data
management in his book. In an ideal world, what would data management be like? While
we can only fancy More's description, the Java Content Repository API (JSR 170) [2] expert group may have a partial
answer. The new API, which was approved as a final Java standard by the JCP [3] on May 31st, claims to radically
simplify Java data management by creating a unified access model for data repositories.

If the Java Content Repository (JCR) API expert group's vision bears out, in five or ten
years' time we will all program to repositories, not databases, according to David
Nuescheler, CTO of Day Software [4],
and JSR 170 spec lead. Repositories are an outgrowth of many years of data
management research, and are best understood as fancy object stores especially suited to
today's applications.

To experience first hand whether the JCR API's promise of simplifying Java data
management is real or utopian, I took the JSR 170 reference implementation,
Apache Jackrabbit [5], on a test drive. I built a small blogging application with JCR, and
will share my experiences with you in this article.

My findings? The JCR is worth a serious look if you are building real-world, data-centric
Java applications. And while programming to a content repository as opposed to a
database can save serious development time, the devil—as you've probably
expected—is in the details.

Not your father's database

Commercial repositories are often implemented on top of more traditional database, and
even filesystem, technology. Therefore, repositories often serve as a layer between a data
store, such as an RDBMS, and an application requiring access to persistent data. A
repository typically consists of the following components [6]:

A repository engine that manages the repository's content, or repository
objects.

A repository information model that describes the repository
contents.

A repository API that allows applications to interact with the repository
engine and provides for querying or altering the repository information model.

Optionally, a repository includes an interface to a persistent store, or a
persistence manager.

The relationships between these components are illustrated in figure 1.

Figure 1: Repository components.

What benefits do these components bring to a plain old database? According Microsoft
Research's Phil Bernstein, who served as architect of that company's object repository
that first shipped in Visual Basic 5, a repository engine offers six features above a
traditional relational or object database management system [6]:

Dynamic extensibility: Each repository object has a type. The
repository information model is a collection of the possible object types in the repository
as well as of the objects that implement those types. The repository engine allows adding
new types and extending existing types. In contrast to relational databases, a repository
information model, including type information, is often implemented not as metadata, but
as a collection of first-class repository objects. As a result, a repository often has no
metadata in the sense of relational database metadata. This is roughly analogous to how
objects run in a Java virtual machine, for instance: Type information is represented by
first-class objects of the type Class, and the JVM associates
non-Class objects with Class objects that define the object's
type.

Relationship management: While relational databases define entity
relations between database objects, they do so at the level of the database schema
(metadata), not in terms of actual database objects. By contrast, repositories allow object
relationships to be specified in terms of first-class objects representing those
relationships. For instance, two Page objects might be related via a
Link object, denoting that one page links to another. Because
Link is a repository object, it can be associated with a rich object type:
For example, one describing a bi-directional link between the two pages. A repository
engine enforces referential integrity between related objects.

Notification: Objects both inside and outside the repository
may listen to changes occurring to repository objects. The repository engine dispatches
notifications as such changes take place.

Version management: Most applications today require versioned
data: Given a data item, an application must be able to access the current as well as all
past versions of that data item. Neither relational nor object databases provide standard,
out-of-the-box versioning, leaving versioning chores to each application accessing the data
store. By contrast, keeping track of versions, and making those versions available to
applications, is an important repository feature.

Configuration management: Applications often need to keep track
of subsets of repository objects. For instance, a single repository might contain objects
belonging to several users or companies, or might comprise objects for several software
packages. Such repository object subsets are termed configurations or workspaces.

If your application can use any of the above features, then repositories might be for you.
There are dozens of repository products to choose from. For starters, database vendors
often ship a repository component as part of their high-end DMBS product (the
Microsoft Repository ships with SQL Server, for instance) [7]. IDE and software configuration
tool vendors also include repositories in their offerings. A version control system, such as
CVS or Subversion, are specialized repositories [8]. In the near future, even file
systems will incorporate some repository features, such as Sun's ZFS filesystem [9], and the WINFS filesystem that
will ship with Microsoft's Longhorn operating system [10]. Many open source and
commercial content management systems (CMS) are also based on repositories. And
now, there is Jackrabbit, an open-source content repository from the Apache
Incubator Project.