Target audience

This reference guide is for the developers of JBoss DNA and those users that want to have a better understanding of how JBoss DNA
works or how to extend the functionality. For a higher-level introduction to JBoss DNA, see the Getting Started document.

Chapter 1. Introduction to JBoss DNA

JBoss DNA is a JCR implementation that provides access to content stored in many different kinds of systems.
A JBoss DNA repository isn't yet another silo of isolated information, but rather it's a JCR view of the information
you already have in your environment: files systems, databases, other repositories, services, applications, etc.

To your applications, JBoss DNA looks and behaves like a regular JCR repository. Using the standard JCR API,
applications can search, navigate, version, and listen for changes in the content. But under the covers, JBoss DNA
gets its content by federating multiple back-end systems (like databases, services, other repositories, etc.),
allowing those systems to continue "owning" the information while ensuring the unified repository stays up-to-date
and in sync.

Of course when you start providing a unified view of all this information, you start recognizing the need to store
more information, including metadata about and relationships between the existing content. JBoss DNA lets you do this, too.
And JBoss DNA even tries to help you discover more about the information you already have, especially the information
wrapped up in the kinds of files often found in enterprise systems: service definitions, policy files, images, media,
documents, presentations, application components, reusable libraries, configuration files, application installations,
databases schemas, management scripts, and so on. As files are loaded into the repository, you can make JBoss DNA
automatically sequence these files to extract from their content meaningful information that can be stored in the repository,
where it can then be searched, accessed, and analyzed using the JCR API.

This document goes into detail about how JBoss DNA works to provide these capabilities. It also talks in detail
about many of the parts within JBoss DNA - what they do, how they work, and how you can extend or customize the
behavior. In particular, you'll learn about JBoss DNA connectors
and sequencers, how you can use the implementations included in JBoss DNA,
and how you can write your own to tailor JBoss DNA for your needs.

So whether your a developer on the project, or you're trying to learn the intricate details of
how JBoss DNA works, this document hopefully serves a good reference for developers on the project.

1.1. Use cases for JBoss DNA

JBoss DNA repositories can be used in a variety of applications. One of the more obvious use cases for a metadata repository
is in provisioning and management, where it's critical to understand and keep track of the metadata for models, database, services,
components, applications, clusters, machines, and other systems used in an enterprise. Governance takes that a step
farther, by also tracking the policies and expectations against which performance of the systems described by the repository can be verified.
In these cases, a repository is an excellent mechanism for managing this complex and highly-varied information.

But these large and complex use cases aren't the only way to use a JBoss DNA repository. You could use an embedded JBoss DNA repository
to manage configuration information for an application, or you could use JBoss DNA just provide a JCR interface on top of a few non-JCR systems.

The point is that JBoss DNA can be used in many different ways, ranging from the very tiny embedded repository to a large and distributed
enterprise-grade repository. The choice is yours.

1.2. What is metadata?

Before we dive into more detail about JBoss DNA and metadata repositories, it's probably useful to explain what we
mean by the term "metadata." Simply put, metadata is the information you need to manage something.
For example, it's the information needed to configure an operating system, or the description of the information in an LDAP tree,
or the topology of your network. It's the configuration of an application server or enterprise service bus.
It's the steps involved in validating an application before it can go into production. It's the description of your
database schemas, or of your services, or of the messages going in and coming out of a service. JBoss DNA is
designed to be a repository for all this (and more).

There are a couple of important things to understand about metadata. First, many systems manage (and frequently change) their own metadata and information.
Databases, applications, file systems, source code management systems, services, content management systems, and even other repositories
are just a few types of systems that do this. We can't pull the information out and duplicate it, because
then we risk having multiple copies that are out-of-sync. Ideally, we could access all of this information through a homogenous API
that also provides navigation, caching, versioning, search, and notification of changes. That would make our lives significantly easier.

What we want is federation.
We can connect to these back-end systems to dynamically access the content and project it into a single, unified
repository. We can also cache it for faster access, as long as the cache can be invalidated based upon time or event.
But we also need to maintain a clear picture of where all the bits come from, so users can be sure they're looking
at the right information. And we need to make it as easy as possible to write new connectors, since there are
a lot of systems out there that have information we want to federate.

The second important characteristic of the metadata is that a lot of it is represented as files, and there are
a lot of different file formats. These include source code, configuration files, web pages, database schemas,
XML schemas, service definitions, policies, documents, spreadsheets, presentations, images, audio files, workflow
definitions, business rules, and on and on. And logically if files contain metadata, we want to add those files
to our metadata repository. The problem is, all that metadata is tied up as blobs in the repository.
Ideally, our repository would automatically extract from those files the content that's most useful to us,
and place that content inside the repository where it can be much more easily used, searched, related, and analyzed.
JBoss DNA does exactly this via a process we call sequencing,
and it's an important part of a metadata repository.

The third important characteristic of metadata is that it rarely stays the same. Different consumers of the
information need to see different views of it. Metadata about two similar systems is not always the same.
The metadata often needs to be tagged or annotated with additional information. And the things being
described often change over time, meaning the metadata has to change, too. As a result, the way in which
we store and manage the metadata has to be flexible and able to adapt to our ever-changing needs, and the object model
we use to interact with the repository must accommodate these needs. The graph-based nature of the JCR API provides this
flexibility while also giving us the ability to constrain information when it needs to be constrained.

1.3. What is JCR?

There are a lot of choices for how applications can store information persistently so that it can be accessed at a
later time and by other processes. The challenge developers face is how to use an approach that most closely matches the
needs of their application. This choice becomes more important as developers choose to focus their efforts on
application-specific logic, delegating much of the responsibilities for persistence to libraries and frameworks.

Perhaps one of the easiest techniques is to simply store information in
files
. The Java language makes working with files relatively easy, but Java really doesn't provide many bells and whistles. So
using files is an easy choice when the information is either not complicated (for example property files), or when users may
need to read or change the information outside of the application (for example log files or configuration files). But using
files to persist information becomes more difficult as the information becomes more complex, as the volume of it increases,
or if it needs to be accessed by multiple processes. For these situations, other techniques often have more benefits.

Another technique built into the Java language is
Java serialization
, which is capable of persisting the state of an object graph so that it can be read back in at a later time. However, Java
serialization can quickly become tricky if the classes are changed, and so it's beneficial usually when the information is
persisted for a very short period of time. For example, serialization is sometimes used to send an object graph from one
process to another. Using serialization for longer-term storage of information is more risky.

One of the more popular and widely-used persistence technologies is the relational database.
Relational database management systems have been around for decades and are very capable. The Java Database Connectivity
(JDBC) API provides a standard interface for connecting to and interacting with relational databases. However, it is a
low-level API that requires a lot of code to use correctly, and it still doesn't abstract away the DBMS-specific SQL
grammar. Also, working with relational data in an object-oriented language can feel somewhat unnatural, so many developers
map this data to classes that fit much more cleanly into their application. The problem is that manually creating this
mapping layer requires a lot of repetitive and non-trivial JDBC code.

Object-relational mapping
libraries automate the creation of this mapping layer and result in far less code that is much more maintainable with
performance that is often as good as (if not better than) handwritten JDBC code. The new
Java Persistence API (JPA)
provide a standard mechanism for defining the mappings (through annotations) and working with these entity objects. Several
commercial and open-source libraries implement JPA, and some even offer additional capabilities and features that go beyond
JPA. For example, Hibernate is one of the most feature-rich JPA implementations
and offers object caching, statement caching, extra association
mappings, and other features that help to improve performance and usefulness. Plus, Hibernate is open-source (with support
offered by JBoss).

While relational databases and JPA are solutions that work well for many applications, they are more limited in cases when the
information structure is highly flexible, the structure is not known a priori, or that structure is
subject to frequent change and customization. In these situations, content repositories
may offer a better choice for persistence. Content repositories are almost a hybrid with the storage capabilities of
relational databases and the flexibility offered by other systems, such as using files. Content repositories also
typically provide other capabilities as well, including versioning, indexing, search, access control,
transactions, and observation. Because of this, content repositories are used by content management systems (CMS), document
management systems (DMS), and other applications that manage electronic files (e.g., documents, images, multi-media, web
content, etc.) and metadata associated with them (e.g., author, date, status, security information, etc.). The
Content Repository for Java technology API
provides a standard Java API for working with content repositories. Abbreviated "JCR", this API was developed as part of the
Java Community Process under JSR-170
and is being revised under JSR-283.

The JCR API provides a number of information services that are needed by many applications,
including: read and write access to information; the ability to structure information in a hierarchical and flexible manner that can adapt
and evolve over time; ability to work with unstructured content; ability to (transparently) handle large strings;
notifications of changes in the information; search and query; versioning of information; access control; integrity constraints;
participation within distributed transactions; explicit locking of content; and of course persistence.

Figure 1.1. JCR API features

1.4. Project roadmap

The roadmap for JBoss DNA is managed in the project's
JIRA instance
. The roadmap shows the different tasks, requirements, issues and other activities that have been targeted to each of the
upcoming releases. (The
roadmap report
always shows the next three releases.)

By convention, the JBoss DNA project team periodically review JIRA issues that aren't targeted to a release, and then schedule
them based upon current workload, severity, and the roadmap. And if we review an issue and don't know how to target it,
we target it to the
Future Releases
bucket.

At the start of a release, the project team reviews the roadmap, identifies the goals for the release, and targets (or retargets)
the issues appropriately.

1.5. Development methodology

Rather than use a single formal development methodology, the JBoss DNA project incorporates those techniques, activities, and
processes that are practical and work for the project. In fact, the committers are given a lot of freedom for how they develop
the components and features they work on.

Nevertheless, we do encourage familiarity with several major techniques, including:

Agile software development
includes those software methodologies (e.g., Scrum) that promote development iterations and open collaboration. While the
JBoss DNA project doesn't follow these closely, we do emphasize the importance of always having running software
and using running software as a measure of progress. The JBoss DNA project also wants to move to more frequent
releases (on the order of 4-6 weeks)

Test-driven development (TDD)
techniques encourage first writing test cases for new features and functionality, then changing the code to add the
new features and functionality, and finally the code is refactored to clean-up and address any duplication or inconsistencies.

Behavior-driven development (BDD)
is an evolution of TDD, where developers specify the desired behaviors first (rather than writing "tests").
In reality, this BDD adopts the language of the user so that tests are written using words that are meaningful
to users. With recent test frameworks (like JUnit 4.4), we're able to write our unit tests to express
the desired behavior. For example, a test class for sequencer implementation might have a test method
shouldNotThrowAnErrorWhenStreamIsNull(), which is very easy to understand the intent.
The result appears to be a larger number of finer-grained test methods, but which are more easily understood
and easier to write. In fact, many advocates of BDD argue that one of the biggest challenges of TDD is knowing what
tests to write in the beginning, whereas with BDD the shift in focus and terminology make it easier for more
developers to enumerate the tests they need.

Lean software development
is an adaptation of lean manufacturing techniques,
where emphasis is placed on eliminating waste (e.g., defects, unnecessary complexity, unnecessary code/functionality/features),
delivering as fast as possible, deferring irrevocable decisions as much as possible,
continuous learning (continuously adapting and improving the process), empowering the team (or community, in our case),
and several other guidelines. Lean software development can be thought of as an evolution of agile techniques
in the same way that behavior-driven development is an evolution of test-driven development. Lean techniques
help the developer to recognize and understand how and why features, bugs, and even their processes impact the development
of software.

1.6. JBoss DNA modules

JBoss DNA consists of the following modules:

dna-jcr
contains JBoss DNA's implementation of the JCR API. If you're using JBoss DNA as a JCR repository, this is the
top-level dependency that you'll want to use. The module defines all required dependencies, except for
the repository connector(s) and any sequencer implementations needed by your configuration.
As we'll see later on, using JBoss DNA as a JCR repository is easy: simply create a configuration, start the JCR engine,
get the JCR Repository object for your repository, and then use the JCR API.
This module also contains the Jackrabbit JCR API unit tests that verify the behavior of the JBoss DNA implementation.
As DNA does not fully implement the JCR 1.0.1 specification, there are a series of tests that are currently commented
out in this module. The dna-jcr-tck module contains all of these tests.

dna-repository
provides the core DNA graph engine and services for managing repository connections, sequencers, MIME type detectors,
and observation. If you're using JBoss DNA repositories via our graph API rather than JCR, then this is where you'd start.

dna-graph
defines the Application Programming Interface (API) for JBoss DNA's low-level graph model,
including a DSL-like API for working with graph content. This module also defines the
APIs necessary to implement custom connectors, sequencers, and MIME type detectors.

dna-common
is a small low-level library of common utilities and frameworks, including logging, progress monitoring,
internationalization/localization, text translators, component management, and class loader factories.

There are several modules that provide system- and integration-level tests:

dna-jcr-tck
provides a separate testing project that executes all Jackrabbit JCR TCK tests on a nightly basis to track implementation
progress against the JCR 1.0 specification. This module will likely be retired when the dna-jcr implementation
is complete.

dna-integration-tests
provides a home for all of the integration tests that involve more components that just unit tests. Integration
tests are often more complicated, take longer, and involve testing the integration and functionality of multiple
components (whereas unit tests focus on testing a single class or component and may use stubs or mock objects
to isolate the code being tested from other related components).

The following modules are optional extensions that may be used selectively and as needed (and are located in the source
under the
extensions/
directory):

dna-classloader-maven
is a small library that provides a
ClassLoaderFactory
implementation that can create
java.lang.ClassLoader
instances capable of loading classes given a Maven Repository and a list of Maven coordinates. The Maven Repository
can be managed within a JCR repository.

dna-connector-federation
is a DNA repository connector that federates, integrates and caches information from multiple sources (via other
repository connectors).

dna-connector-filesystem
is a DNA repository connector that provides read-only access to file systems, allowing their structure and data to be
viewed as repository content.

dna-connector-jdbc-metadata
is a prototype DNA repository connector that provides read-only access to metadata from relational databases through a JDBC
connection.
This is still under development.

dna-connector-jbosscache
is a DNA repository connector that manages content within a
JBoss Cache
instance. JBoss Cache is a powerful cache implementation that can serve as a distributed cache and that can persist
information. The cache instance can be found via JNDI or created and managed by the connector.

dna-connector-store-jpa
is a DNA sequencer that provides for persistent storage and access of DNA content in a relational database. This connector
is based on JPA technology.

dna-connector-svn
is a prototype DNA sequencer that obtains content from a Subversion repository, providing that content in
the form of nt:file and nt:folder nodes.

dna-sequencer-zip
is a DNA sequencer that extracts from ZIP archives the files (with content) and folders.

dna-sequencer-xml
is a DNA sequencer that extracts the structure and content from XML files.

dna-sequencer-java
is a DNA sequencer that extracts the package, class/type, member, documentation, annotations, and other information
from Java source files.

dna-sequencer-msoffice
is a DNA sequencer that extracts metadata and summary information from
Microsoft Office
documents. For example, the sequencer extracts from a PowerPoint presentation the outline as well as thumbnails of
each slide. Microsoft Word and Excel files are also supported.

dna-sequencer-jbpm-jpdl
is a prototype DNA sequencer that extracts process definition metadata from jBPM process definition language (jPDL) files.
This is still under development.

dna-sequencer-java
is a DNA sequencer that extracts the structure (methods, fields) from Java source files.

dna-mimetype-detector-aperture
is a DNA MIME type detector that uses the
Aperture
library to determine the best MIME type from the filename and file contents.

dna-web-jcr-rest
provides a set of JSR-311 (JAX-RS) objects that form the basis of a RESTful server for Java Content Repositories. This project
provides integration with DNA's JCR implementation (of course) but also contains a service provider interface (SPI) that can be
used to integrate other JCR implementations with these RESTful services in the future. For ease of packaging, these classes are
provided as a JAR that can be placed in the WEB-INF/lib of a deployed RESTful server WAR.

dna-web-jcr-rest-war
wraps the RESTful services from the dna-web-jcr-rest JAR into a WAR and provides in-container integration tests. This project
can be consulted as a template for how to deploy the RESTful services in a custom implementation.

There are also documentation modules (located in the source under the
docs/
directory):

docs-getting-started
is the project with the
DocBook
source for the JBoss DNA Getting Started document.

docs-getting-started-examples
is the project with the Java source for the example application used in the JBoss DNA Getting Started document.

docs-reference-guide
is the project with the
DocBook
source for this document, the JBoss DNA Reference Guide document.

Finally, there is a module that represents the whole JBoss DNA project:

dna
is the parent project that aggregates all of the other projects and that contains some asset files to create the
necessary Maven artifacts during a build.

Each of these modules is a Maven project with a group ID of
org.jboss.dna
. All of these projects correspond to artifacts in the
JBoss Maven 2 Repository
.

Part I. Developers and Contributors

The JBoss DNA project uses a number of process, tools, and procedures to assist in the development of
the software. This portion of the document focuses on these aspects and will help developers and contributors
obtain the source code, build locally, and contribute to the project.

If you're not contributing to the project but are still developing
custom connectors or sequencers.
this information may be helpful in establishing your own environment.

The JBoss DNA project uses Maven as its primary build tool, Subversion
for its source code repository, JIRA for the issue management and bug tracking system,
and Hudson for the continuous integration system. We do not stipulate a specific integrated
development environment (IDE), although most of us use Eclipse and rely upon the code formatting
and compile preferences to ensure no warnings or errors.

The rest of this chapter talks in more detail about these different tools and how to set them up.

2.1. JDK

Currently, JBoss DNA is developed and built using JDK 5.
So if you're trying to get JBoss DNA to compile locally, you should make sure you have the JDK 5 installed and are using it.
If you're a contributor, you should make sure that you're using JDK 5 before committing any changes.

Note

You should be able to use the latest JDK,
which is currently JDK 6. It is possible to build JBoss DNA using JDK 6 without any code changes, but it's
not our official JDK (yet).

Why do we build using JDK 5 and not 6? The main reason is that if we were to use JDK 6, then JBoss DNA couldn't really be used in any
applications or projects that still used JDK 5. Plus, anybody using JDK 6 can still use JBoss DNA.
However, considering that the end-of-life for Java 5 is
October 2009, we may be switching to
Java 6 sometime in 2009.

When installing a JDK, simply follow the procedure for your particular platform. On most platforms, this should set the
JAVA_HOME environment variable. But if you run into any problems, first check that this environment
variable was set to the correct location, and then check that you're running the version you expect by running
the following command:

$ java -version

If you don't see the correct version, double-check your JDK installation.

2.2. JIRA

JBoss DNA uses JIRA as its bug tracking, issue tracking, and project management tool.
This is a browser-based tool, with very good functionality for managing the different tasks. It also serves as
the community's roadmap, since we can define new features and manage them along side the bugs and other issues.
Although most of the issues have been created by community members, we encourage any users to suggest new features,
log defects, or identify shortcomings in JBoss DNA.

The JBoss DNA community also encourages its members to work only issues that are managed in JIRA, and preferably those
that are targeted to the current release effort. If something isn't in JIRA but needs to get done, then create an
issue before you start working on the code changes. Once you have code changes, you can upload a patch to the JIRA issue
if the change is complex, if you want someone to review it, or if you don't have commit privileges and have fixed
a bug.

2.3. Subversion

JBoss DNA uses Subversion as its source code management system, and specifically the instance at
JBoss.org. Although you can view the
trunk of the Subversion repository directly
(or using FishEye) through your browser,
it order to get more than just a few files of the latest version of the source code, you probably want
to have an SVN client installed. Several IDE's have SVN support included (or available as plugins),
but having the command-line SVN client is recommended. See
http://subversion.tigris.org/ for downloads and instructions for your
particular platform.

When committing to SVN, be sure to include in a commit comment that includes the JIRA issue that the commit applies to and a very
good and thorough description of what was done. It only takes a minute or two to be very clear about the change. And including
the JIRA issue (e.g., "DNA-123") in the comment allows the JIRA system to track the changes that have been made for each issue.

Also, any single SVN commit should apply to one and only one JIRA issue. Doing this helps ensure that each commit is atomic
and focused on a single activity. There are exceptions to this rule, but they are rare.

Sometimes you may have some local changes that you don't want to (or aren't allowed to) commit. You can make a patch file
and upload it to the JIRA issue, allowing other committers to review the patch. However, to ensure that patches are easily
applied, please use SVN to create the patch. To do this, simply do the following in the top of the codebase (e.g., the
trunk directory):

$ svn diff . > ~/DNA-000.patch

where DNA-000 represents the DNA issue number. Note that the above command places the patch file in your home directory,
but you can place the patch file anywhere. Then, simply use JIRA to attach the patch file to the particular issue, also adding
a comment that describes the version number against which the patch was created.

To apply a patch, you usually want to start with a workspace that has no changes. Download the patch file, then issue the
following command (again, from the top-level of the workspace):

$ patch -E -p0 < ~/DNA-000.patch

The "-E" option specifies to delete any files that were made empty by the application of the patch, and the "-p0" option instructs
the patch tool to not change any of the paths. After you run this command, your working area should have the changes defined
by the patch.

2.4. Git

Several contributors are using Git on their local development machines. This allows
the developer to use Git branches, commits, merges, and other Git tools, but still using
the JBoss DNA Subversion repository. For more information, see our
blogposts on the topic.

2.5. Maven

JBoss DNA uses Maven 2 for its build system, as is this example. Using Maven 2 has several advantages, including
the ability to manage dependencies. If a library is needed, Maven automatically finds and downloads that library, plus
everything that library needs. This means that it's very easy to build the examples - or even create a maven project that
depends on the JBoss DNA JARs.

To use Maven with JBoss DNA, you'll need to have JDK 5 or 6 and Maven 2.0.9 (or higher).

Maven can be downloaded from http://maven.apache.org/, and is installed by unzipping the
maven-2.0.9-bin.zip file to a convenient location on your local disk. Simply add $MAVEN_HOME/bin
to your path and add the following profile to your ~/.m2/settings.xml file:

This profile informs Maven of the two JBoss repositories (snapshots
and releases) that contain all of the JARs for JBoss DNA and all dependent libraries.

While you're adding $MAVEN_HOME/bin to your path, you should also set the $MAVEN_OPTS environment variable
to "-Xmx256m". If you don't do this, you'll likely see an java.lang.OutOfMemoryError sometime during a full
build.

Note

The JBoss Maven repository provides a central location for not only the artifacts produced by the JBoss.org projects (well, at least those
that use Maven), but also is where those projects can place the artifacts that they depend on. JBoss DNA has a policy that
the source code and JARs for all dependencies must be loaded into the
JBoss Maven repository. It may be a little bit more work for the developers, but it does help ensure that developers have easy
access to the source and that the project (and dependencies) can always be rebuilt when needed.

For more information about the JBoss Maven repository, including instructions for adding source and JAR artifacts,
see the JBoss.org Wiki.

There are just a few commands that are useful for building JBoss DNA (and it's subprojects).
Usually, these are issued while at the top level of the code (usually just below trunk/), although issuing
them inside a subproject just applies to that subproject.

Table 2.2. Useful Maven commands

Command

Description

mvn clean

Clean up all built artifacts (e.g., the target/ directory in each project)

mvn clean install

Clean up all built artifacts, then compile, run the unit tests, and install the resulting JAR artifact(s)
into your local Maven repository (e.g, usually ~/.m2/repository).

2.6. Continuous integration with Hudson

JBoss DNA's continuous integration is done with several Hudson jobs on JBoss.org.
These jobs run periodically and basically run the Maven build process. Any build failures or test failures are reported,
as are basic statistics and history for each job.

Build that runs every night (usually around 2 a.m. EDT), regardless of whether changes have been committed to SVN
since the previous night.

2.7. Eclipse IDE

Many of the JBoss DNA committers use the Eclipse IDE, and all project files required by Eclipse are committed in SVN, making
it pretty easy to get an Eclipse workspace running with all of the JBoss DNA projects.
Many of the JBoss DNA committers use the Eclipse IDE, and all project files required by Eclipse are committed in SVN, making
it pretty easy to get an Eclipse workspace running with all of the JBoss DNA projects.

We're using the latest released version of Eclipse (3.4, called "Ganymede"),
available from Eclipse.org. Simply follow the instructions for your platform.

After Eclipse is installed, create a new workspace. Before importing the JBoss DNA projects, import (via
File->Import->Preferences) the subset of the Eclipse preferences by importing the
eclipse-preferences.epf file (located under trunk). Then, open the Eclipse preferences and
open the Java->Code Style-> Formatter preference page, and press the "Import" button and
choose the eclipse-code-formatter-profile.xml file (also located under trunk). This will load the code
formatting preferences for the JBoss DNA project.

Then install Eclipse plugins for SVN and Maven. (Remember, you will have to restart Eclipse after installing them.)
We use the following plugins:

After you check out the JBoss DNA codebase, you can import the JBoss DNA Maven projects into Eclipse as Eclipse projects.
To do this, go to "File->Import->Existing Projects", navigate to the trunk/ folder in the import wizard,
and then check each of the subprojects that you want to have in your workspace.
Don't forget about the projects under extensions/ or docs/.

2.8. Releasing

This section outlines the basic process of releasing JBoss DNA. This must be done
either by the project lead or only after communicating with the project lead.

Before continuing, your local workspace should contain no changes and should be a perfect reflection of Subversion.
You can verify this by getting the latest from Subversion

$ svn update

and ensuring that you have no additional changes with

$ svn status

You may also want to note the revision number for use later on in the process. The release number is returned by
the svn update command, but may also be found using

$ svn info

At this point, you're ready to verify that everything builds normally.

2.8.1. Building all artifacts and assemblies

By default, the project's Maven build process is does not build the documentation, JavaDocs, or assemblies.
These take extra time, and most of our builds don't require them. So the first step of releasing JBoss DNA
is to use Maven to build all of regular artifacts (e.g., JARs) and these extra documents and assemblies.

Note

Before running Maven commands to build the releases, increase the memory available to Maven with this command:
$ export MAVEN_OPTS=-Xmx256m

To perform this complete build, issue the following command while in the target/ directory:

$ mvn -P assembly clean javadoc:javadoc install

This command runs the "clean", "javadoc:javadoc", and "install" goals using the "assembly" profile,
which adds the production of JavaDocs, the Getting Started document, the Reference Guide document,
the Getting Started examples, and several ZIP archives. The order of the goals is important,
since the "install" goal attempts to include the JavaDoc in the archives.

After this build has completed, verify that the assemblies under target/ have actually been created and that
they contain the correct information.
At this point, we know that the actual Maven build process is building
everything we want and will complete without errors. We can now proceed with preparing for the release.

2.8.2. Determine the version to be released

The version being released should match the JIRA road map. Make sure that all issues related to the release are closed.
The project lead should be notified and approve that the release is taking place.

2.8.3. Release dry run

The next step is to ensure that all information in the POM is correct and contains all the information required for
the release process. This is called a dry run, and is done with the Maven "release" plugin:

$ mvn -Passembly release:prepare -DdryRun=true

This may download a lot of Maven plugins if they already haven't been downloaded, but it will eventually prompt you for
the release version of each of the Maven projects, the tag name for the release, and the next development versions
(again for each of the Maven projects). The default values are probably acceptable; if not, then check that the
"<version>" tags in each of the POM files is correct and end with "-SNAPSHOT".

After the dry run completes you should clean up the files that the release plugin created in the dry run:

$ mvn -Passembly release:clean

2.8.4. Prepare for the release

Run the prepare step (without the dryRun option):

$ mvn -Passembly release:prepare

You will again be prompted for the release versions and tag name. These should be the same as what was used during the dry run.
This will run the same steps as the dry run, with the additional step of tagging the release in SVN.

If there are any problems during this step, you should go back and try the dry run option. But after this runs successfully,
the release will be tagged in SVN, and the pom.xml files in SVN under /trunk will have the
next version in the "<version>" values.
However, the artifacts for the release are not yet published. That's the next step.

2.8.5. Perform the release

At this point, the release's artifacts need to be published to the JBoss Maven repository. This next command check outs the
files from the release tag created earlier (into a trunk/target/checkout directory), runs a build, and then
deploys the generated artifacts. Note that this ensures that the artifacts are built from the tagged code.

$ mvn release:perform -DuseReleaseProfile=false

Note

If during this process you get an error finding the released artifacts in your local Maven repository, you may
need to go into the trunk/target/checkout folder and run $ mvn install. This is a simple
workaround to make the artifacts available locally. Another option to try is adding -Dgoals=install,assembly
to the $ mvn release:perform... command above.

The artifacts are deployed to the local file system, which is comprised of a local checkout of the JBoss Maven2 repository
in a location specified by a combination of the <distributionManagement> section of several pom.xml
files and your personal settings.xml file. Once this Maven command completes, you will need to
commit the new files after they are deployed. For more information, see the
JBoss wiki.

At this point, the software has been released and tagged, and it's been deployed to a local checked-out copy of the
JBoss DNA Maven 2 repository (via the "<distribution>" section of the pom.xml files). Those need to be committed
into the Maven 2 repository using SVN. And finally, the last thing is to publish the release onto
the project's downloads and documentation pages.

The assemblies of the source, binaries, etc. also need to be published onto the http://www.jboss.org/dna/downloads.html area of the
the project page. This process is expected to change, as JBoss.org
improves its infrastructure.

2.9. Summary

In this chapter, we described the various aspects of developing code for the JBoss DNA project. Before we start talking
about some of the details of JBoss DNA repositories, connectors, and sequencers, we'll first talk about
some very ubiquitous information: how does JBoss DNA load all of the extension classes?
This is the topic of the next chapter.

Chapter 3. Testing

The JBoss DNA project uses automated testing to verify that the software is doing what it's supposed to
and not doing what it shouldn't do. These automated tests are run continuously and also act as regression tests,
ensuring that we known if any problems we find and fix reappear later. All of our tests are executed as part of
our Maven build process, and the entire build process (including the tests)
is automatically run using Hudson continuous integration system.

3.1. Unit tests

Unit tests verify the behavior of a single class (or small set of classes) in isolation
from other classes.
We use the JUnit 4.4 testing framework, which has significant improvements over earlier versions and makes
it very easy to quickly write unit tests with little extra code. We also frequently use the Mockito library
to help create mock implementations of other classes that are not under test but are used in the tests.

Unit tests should generally run quickly and should not require large assemblies of components. Additionally,
they may rely upon the file resources included in the project, but these tests should require no external resources
(like databases or servers). Note that our unit tests are run during the "test" phase of the standard
Maven lifecycle.
This means that they are executed against the raw .class files created during complication.

Developers are expected to run all of the JBoss DNA unit tests in their local environment before committing changes to SVN.
So, if you're a developer and you've made changes to your local copy of the source, you can run those tests that are
related to your changes using your IDE or with Maven (or any other mechanism). But before you commit your changes,
you are expected to run a full Maven build using mvn clean install (in the "trunk/" directory).
Please do not rely upon continuous integration to run all of the tests for you - the CI
system is there to catch the occasional mistakes and to also run the integration tests.

3.2. Integration tests

While unit tests test individual classes in (relative) isolation, the purpose of
integration tests are to verify that assemblies of classes and components are
behaving correctly. These assemblies are often the same ones that end users will actually use. In fact,
integration tests are executed during the "integration-test" phase of the standard
Maven lifecycle,
meaning they are executed against the packaged JARs and artifacts of the project.

Integration tests also use the JUnit 4.4 framework, so they are again easy to write and follow the same pattern
as unit tests. However, because they're working with larger assemblies of components, they often will take longer
to set up, longer to run, and longer to tear down. They also may require initializing "external resources", like
databases or servers.

Note, that while external resources may be required, care should be taken to minimize these dependencies and to
ensure that most (if not all) integration tests may be run by anyone who downloads the source code. This means
that these external resources should be available and set up within the tests. For example, use in-memory databases
where possible. Or, if a database is required, use an open-source database (e.g., MySQL or PostgreSQL). And when
these external resources are not available, it should be obvious from the test class names and/or test method names
that it involved an external resource (e.g., "MySqlConnectorIntegrationTest.shouldFindNodeStoredInDatabase()").

3.3. Writing tests

As mentioned in the introduction, the JBoss DNA project doesn't follow any one methodology
or process. Instead, we simply have a goal that as much code as possible is tested to ensure it behaves as expected.
Do we expect 100% of the code is covered by automated tests? No, but we do want to test as much as we can.
Maybe a simple JavaBean class doesn't need many tests, but any class with non-trivial logic should be tested.

We do encourage writing tests either before or while you write the code. Again, we're not blindly following a methodology.
Instead, there's a very practical reason: writing the tests early on helps you write classes that are testable.
If you wait until after the class (or classes) are done, you'll probably find that it's not easy to test all
of the logic (especially the complicated logic).

Another suggestion is to write tests so that they specify and verify the behavior that is expected from a class or component.
One challenge developers often have is knowing what they should even test and what the tests should look like.
This is where Behavior-driven development (BDD)
helps out. If you think about what a class' behaviors are supposed to be (e.g., requirements), simply capture those
requirements as test methods (with no implementations). For example, a test class for sequencer
implementation might have a test method shouldNotThrowAnErrorWhenTheSuppliedStreamIsNull() { }. Then, after you enumerate
all the requirements you can think of, go back and start implementing the test methods.

If you look at the existing test cases, you'll find that the names of the unit and integration tests in JBoss DNA
follow a naming style, where the test method names are readable sentences. Actually, we try to name the test methods
and the test classes such that they form a concisely-worded requirement. For example,

InMemorySequencer should not throw an error when the supplied stream is null.

In fact, at some point in the future, we'd like to process the source to automatically generate a list of the behavior specifications
that are asserted by the tests.

But for now, we write tests - a lot of them. And by following a few simple conventions and practices, we're able
to do it quickly and in a way that makes it easy to understand what the code is supposed to do (or not do).

3.4. Technology Compatibility Kit (TCK) tests

Many Java specifications provide TCK test suites that can be used to check or verify that an implementation
correctly implements the API or SPI defined by the specification. These TCK tests vary by technology, but
JSR-170 does provide TCK tests that ensure that a JCR repository implementation exhibits the correct and expected
behavior.

JBoss DNA has not yet passed enough of the TCK tests to publish the results. We still have to implement
queries, which is a required feature of Level 1 repositories. However, suffice to say that JBoss DNA has passed
many of the individual tests that make up the Level 1 and Level 2 tests, and it is a major objective of the next
release to pass the remaining Level 1 and Level 2 tests (along with some other optional features).

JBoss DNA also frequently runs the JCR unit tests from the Apache Jackrabbit project. (Those these tests are not
the official TCK, they apparently are used within the official TCK.) These unit tests are set up in the
dna-jcr-tck project.

Part II. JBoss DNA Core

The JBoss DNA project organizes the codebase into a number of subprojects. The most fundamental are those
core libraries, including the graph API, connector framework, sequencing framework,
as well as the configuration and engine in which all the components run. These are all topics covered
in this part of the document.

The JBoss DNA implementation of the JCR API as well as some other
JCR-related components are covered in the next part.

Chapter 4. Execution Context

The various components of JBoss DNA are designed as plain old Java objects, or POJOs. And rather than making assumptions
about their environment, each component instead requires that any external dependencies necessary for it to operate
must be supplied to it. This pattern is known as Dependency Injection, and it allows the components to be simpler
and allows for a great deal of flexibility and customization in how the components are configured.

The approach that JBoss DNA takes is simple: a simple POJO that represents the everything about the environment
in which components operate. Called ExecutionContext, it contains references to most of the essential
facilities, including: security (authentication and authorization); namespace registry; name factories; factories
for properties and property values; logging; and access to class loaders (given a classpath).
Most of the JBoss DNA components require an ExecutionContext and thus have access to all these facilities.

The ExecutionContext is a concrete class that is instantiated with the no-argument constructor:

public class ExecutionContext implements ClassLoaderFactory {
/**
* Create an instance of an execution context, with default implementations for all components.
*/
public ExecutionContext() { ... }
/**
* Get the factories that should be used to create values for {@link Property properties}.
* @return the property value factory; never null
*/
public ValueFactories getValueFactories() {...}
/**
* Get the namespace registry for this context.
* @return the namespace registry; never null
*/
public NamespaceRegistry getNamespaceRegistry() {...}
/**
* Get the factory for creating {@link Property} objects.
* @return the property factory; never null
*/
public PropertyFactory getPropertyFactory() {...}
/**
* Get the security context for this environment.
* @return the security context; never null
*/
public SecurityContext getSecurityContext() {...}
/**
* Return a logger associated with this context. This logger records only those activities within the
* context and provide a way to capture the context-specific activities. All log messages are also
* sent to the system logger, so classes that log via this mechanism should <i>not</i> also
* {@link Logger#getLogger(Class) obtain a system logger}.
* @param clazz the class that is doing the logging
* @return the logger, named after clazz; never null
*/
public Logger getLogger( Class<?> clazz ) {...}
/**
* Return a logger associated with this context. This logger records only those activities within the
* context and provide a way to capture the context-specific activities. All log messages are also
* sent to the system logger, so classes that log via this mechanism should <i>not</i> also
* {@link Logger#getLogger(Class) obtain a system logger}.
* @param name the name for the logger
* @return the logger, named after clazz; never null
*/
public Logger getLogger( String name ) {...}
...
}

The fact that so many of the JBoss DNA components take ExecutionContext instances gives us some interesting possibilities.
For example, one execution context instance can be used as the highest-level (or "application-level") context for all of the services
(e.g., RepositoryService, SequencingService, etc.).
Then, an execution context could be created for each user that will be performing operations, and that user's context can
be passed around to not only provide security information about the user but also to allow the activities being performed
to be recorded for user feedback, monitoring and/or auditing purposes.

As mentioned above, the starting point is to create a default execution context, which will have all the default components:

Once you have this top-level context, you can start creating subcontexts with different components,
and different security contexts. (Of course, you can create a subcontext from any instance.)
To create a subcontext, simply use one of the with(...) methods on the parent context. We'll show examples
later on in this chapter.

4.1. Security

JBoss DNA uses a simple abstraction layer to isolate it from the security infrastructure used within an application.
A SecurityContext represents the context of an authenticated user, and is defined as an interface:

public interface SecurityContext {
/**
* Get the name of the authenticated user.
* @return the authenticated user's name
*/
String getUserName();
/**
* Determine whether the authenticated user has the given role.
* @param roleName the name of the role to check
* @return true if the user has the role and is logged in; false otherwise
*/
boolean hasRole( String roleName );
/**
* Logs the user out of the authentication mechanism.
* For some authentication mechanisms, this will be implemented as a no-op.
*/
void logout();
}

Every ExecutionContext has a SecurityContext instance, though the top-level (default) execution context does not represent
an authenticated user. But you can create a subcontext for a user authenticated via JAAS:

There are quite a few JAAS providers available, but one of the best and most powerful providers is
JBoss Security, the open source
security framework used by JBoss. JBoss Security offers a number of JAAS login modules, including:

User-Roles Login Module
is a simple
javax.security.auth.login.LoginContext
implementation that uses usernames and passwords stored in a properties file.

Client Login Module
prompts the user for their username and password.

Database Server Login Module
uses a JDBC database to authenticate principals and associate them with roles.

Operating System Login Module
authenticates using the operating system's mechanism.

and many others. Plus, JBoss Security also provides other capabilities, such as using XACML policies or using federated single sign-on.
For more detail, see the JBoss Security project.

4.1.2. Web application security

If JBoss DNA is being used within a web application, then it is probably desirable to reuse the security infrastructure
of the application server. This can be accomplished by implementing the SecurityContext interface with an implementation
that delegates to the HttpServletRequest. Then, for each request, create a SecurityContextCredentials
instance around your SecurityContext, and use that credentials to obtain a JCR Session.

Here is an example of the SecurityContext implementation that uses the servlet request:

@Immutable
public class ServletSecurityContext implements SecurityContext {
private final String userName;
private final HttpServletRequest request;
/**
* Create a {@link ServletSecurityContext} with the supplied
* {@link HttpServletRequest servlet information}.
*
* @param request the servlet request; may not be null
*/
public ServletSecurityContext( HttpServletRequest request ) {
this.request = request;
this.userName = request.getUserPrincipal() != null ? request.getUserPrincipal().getName() : null;
}
/**
* Get the name of the authenticated user.
* @return the authenticated user's name
*/
public String getUserName() {
return userName;
}
/**
* Determine whether the authenticated user has the given role.
* @param roleName the name of the role to check
* @return true if the user has the role and is logged in; false otherwise
*/
boolean hasRole( String roleName ) {
request.isUserInRole(roleName);
}
/**
* Logs the user out of the authentication mechanism.
* For some authentication mechanisms, this will be implemented as a no-op.
*/
public void logout() {
}
}

We'll see later in the JCR chapter how this can be use to obtain a JCR Session for
the authenticated user.

4.2. Namespace Registry

As we saw earlier, every ExecutionContext has a registry of namespaces. Namespaces are used throughout the graph API
(as we'll see soon), and the prefix associated with each namespace makes for more readable string representations.
The namespace registry tracks all of these namespaces and prefixes, and allows registrations to be added, modified, or
removed. The interface for the NamespaceRegistry shows how these operations are done:

public interface NamespaceRegistry {
/**
* Return the namespace URI that is currently mapped to the empty prefix.
* @return the namespace URI that represents the default namespace,
* or null if there is no default namespace
*/
String getDefaultNamespaceUri();
/**
* Get the namespace URI for the supplied prefix.
* @param prefix the namespace prefix
* @return the namespace URI for the supplied prefix, or null if there is no
* namespace currently registered to use that prefix
* @throws IllegalArgumentException if the prefix is null
*/
String getNamespaceForPrefix( String prefix );
/**
* Return the prefix used for the supplied namespace URI.
* @param namespaceUri the namespace URI
* @param generateIfMissing true if the namespace URI has not already been registered and the
* method should auto-register the namespace with a generated prefix, or false if the
* method should never auto-register the namespace
* @return the prefix currently being used for the namespace, or "null" if the namespace has
* not been registered and "generateIfMissing" is "false"
* @throws IllegalArgumentException if the namespace URI is null
* @see #isRegisteredNamespaceUri(String)
*/
String getPrefixForNamespaceUri( String namespaceUri, boolean generateIfMissing );
/**
* Return whether there is a registered prefix for the supplied namespace URI.
* @param namespaceUri the namespace URI
* @return true if the supplied namespace has been registered with a prefix, or false otherwise
* @throws IllegalArgumentException if the namespace URI is null
*/
boolean isRegisteredNamespaceUri( String namespaceUri );
/**
* Register a new namespace using the supplied prefix, returning the namespace URI previously
* registered under that prefix.
* @param prefix the prefix for the namespace, or null if a namesapce prefix should be generated
* automatically
* @param namespaceUri the namespace URI
* @return the namespace URI that was previously registered with the supplied prefix, or null if the
* prefix was not previously bound to a namespace URI
* @throws IllegalArgumentException if the namespace URI is null
*/
String register( String prefix, String namespaceUri );
/**
* Unregister the namespace with the supplied URI.
* @param namespaceUri the namespace URI
* @return true if the namespace was removed, or false if the namespace was not registered
* @throws IllegalArgumentException if the namespace URI is null
* @throws NamespaceException if there is a problem unregistering the namespace
*/
boolean unregister( String namespaceUri );
/**
* Obtain the set of namespaces that are registered.
* @return the set of namespace URIs; never null
*/
Set<String> getRegisteredNamespaceUris();
/**
* Obtain a snapshot of all of the {@link Namespace namespaces} registered at the time this method
* is called. The resulting set is immutable, and will not reflect changes made to the registry.
* @return an immutable set of Namespace objects reflecting a snapshot of the registry; never null
*/
Set<Namespace> getNamespaces();
}

4.3. Class loaders

JBoss DNA is designed around extensions: sequencers, connectors, MIME type detectors, and class loader factories.
The core part of JBoss DNA is relatively small and has few dependencies, while many of the "interesting" components
are extensions that plug into and are used by different parts of the core or by layers above (such as the
JCR implementation). The core doesn't really care what
the extensions do or what external libraries they require, as long as the extension fulfills its end of the
extension contract.

This means that you only need the core modules of JBoss DNA on the application classpath, while the extensions
do not have to be on the application classpath. And because the core modules of JBoss DNA have few dependencies,
the risk of JBoss DNA libraries conflicting with the application's are lower. Extensions, on the other hand,
will likely have a lot of unique dependencies. By separating the core of JBoss DNA from the class loaders used
to load the extensions, your application is isolated from the extensions and their dependencies.

Note

Of course, you can put all the JARs on the application classpath, too.
This is what the examples in the Getting Started document do.

But in this case, how does JBoss DNA load all the extension classes? You may have noticed earlier that
ExecutionContext implements the ClassLoaderFactory interface with a single method:

public interface ClassLoaderFactory {
/**
* Get a class loader given the supplied classpath. The meaning of the classpath
* is implementation-dependent.
* @param classpath the classpath to use
* @return the class loader; may not be null
*/
ClassLoader getClassLoader( String... classpath );
}

This means that any component that has a reference to an ExecutionContext has the ability to create a
class loader with a supplied class path. As we'll see later, the connectors and sequencers are all
defined with a class and optional class path. This is where that class path comes in.

The actual meaning of the class path, however, is a function of the implementation. JBoss DNA uses
a StandardClassLoaderFactory that just loads the classes using the Thread's current context
class loader (or, if there is none, delegates to the class loader that loaded the StandardClassLoaderFactory class).
Of course, it's possible to implement other ClassLoaderFactory with other implementations.
Then, just create a subcontext with your implementation:

Note

The dna-classloader-maven project has a class loader factory implementation that parses the names into
Maven coordinates, then uses those coordinates
to look up artifacts in a Maven 2 repository. The artifact's POM file is used to determine the dependencies,
which is done transitively to obtain the complete dependency graph. The resulting class loader has access
to these artifacts in dependency order.

This class loader is not ready for use, however, since there is no tooling to help populate the repository.

4.4. MIME Type Detectors

JBoss DNA often needs the ability to determine the MIME type for some binary content. When uploading content into
a repository, we may want to add the MIME type as metadata. Or, we may want to make some processing decisions
based upon the MIME type. So, JBoss DNA created a small pluggable framework for determining the MIME type by using
the name of the file (e.g., extensions) and/or by reading the actual content.

JBoss DNA defines a MimeTypeDetector interface that abstracts the implementation that actually determines
the MIME type given the name and content.
If the detector is able to determine the MIME type, it simply returns
it as a string. If not, it merely returns null. Note, however, that a detector must be thread-safe.
Here is the interface:

@ThreadSafe
public interface MimeTypeDetector {
/**
* Returns the MIME-type of a data source, using its supplied content and/or its supplied name,
* depending upon the implementation. If the MIME-type cannot be determined, either a "default"
* MIME-type or null may be returned, where the former will prevent earlier
* registered MIME-type detectors from being consulted.
*
* @param name The name of the data source; may be null.
* @param content The content of the data source; may be null.
* @return The MIME-type of the data source, or optionally null
* if the MIME-type could not be determined.
* @throws IOException If an error occurs reading the supplied content.
*/
String mimeTypeOf( String name, InputStream content ) throws IOException;
}

To use a detector, simply invoke the method and supply the name of the content (e.g., the name of the file, with the extension)
and the InputStream to the actual binary content. The result is a String containing the
MIME type
(e.g., "text/plain") or null if the MIME type cannot be determined. Note that the name or InputStream may be
null, making this a very versatile utility.

Once again, you can obtain a MimeTypeDetector from the ExecutionContext. JBoss DNA provides and uses by
default an implementation that uses only the name (the content is ignored), looking at the name's extension
and looking for a match in a small listing (loaded from the org/jboss/dna/graph/mime.types loaded from the classpath).
You can add extensions by copying this file, adding or correcting the entries, and then placing your updated file in the
expected location on the classpath.

Of course, you can always use a different MimeTypeDetector by creating a subcontext and supplying your implementation:

4.5. Property factory and value factories

Two other components are made available by the ExecutionContext. The PropertyFactory is an interface
that can be used to create Property instances, which are used throughout the graph API. The ValueFactories
interface provides access to a number of different factories for different kinds of property values.
These will be discussed in much more detail in the next chapter. But like the other components that
are in an ExecutionContext, you can create subcontexts with different implementations:

Of course, implementing your own factories is a pretty advanced topic, and it will likely be something you do not
need to do in your application.

4.6. Summary

In this chapter, we introduced the ExecutionContext as a representation of the environment in which many of the
JBoss DNA components operate. ExecutionContext provides a very simple but powerful way to inject commonly-needed
facilities throughout the system.

In the next chapter, we'll dive into Graph API and will introduce the notion of
nodes, paths, names, and properties, that are so essential and used throughout JBoss DNA.

One of the central concepts within JBoss DNA is that of its graph model.
Information is structured into a hierarchy of nodes with properties, where nodes in the hierarchy
are identified by their path (and/or identifier properties). Properties are identified by
a name that incorporates a namespace and local name, and contain one or more property values
consisting of normal Java strings, names, paths, URIs, booleans, longs, doubles, decimals, binary content,
dates, UUIDs, references to other nodes, or any other serializable object.

Therefore, this chapter provides essential information that will be essential to really understanding
how the connectors, sequencers, and other JBoss DNA features work.

5.1. Names

JBoss DNA uses names to identify quite a few different types of objects. As we'll soon see, each property
of a node is given by a name, and each segment in a path is comprised of a name. Therefore,
names are a very important concept.

JBoss DNA names consist of a local part and are qualified with a namespaces. The local part can consist of
any character, and the namespace is identified by a URI. Namespaces were introduced in the
previous chapter and are managed by the ExecutionContext's
namespace registry. Namespaces help reduce the risk of
clashes in names that have an equivalent same local part.

All names are immutable, which means that once a Name object is created, it will never change.
This characteristic makes it much easier to write thread-safe code - the objects never change and therefore
require no locks or synchronization to guarantee atomic reads. This is a technique that is more and more
often found in newer languages and frameworks that simplify concurrent operations.

@Immutable
public interface Name extends Comparable<Name>, Serializable, Readable {
/**
* Get the local name part of this qualified name.
* @return the local name; never null
*/
String getLocalName();
/**
* Get the URI for the namespace used in this qualified name.
* @return the URI; never null but possibly empty
*/
String getNamespaceUri();
}

The use of a factory may seem like a disadvantage and unnecessary complexity, but there actually
are several benefits. First, it hides the concrete implementations, which is very appealing if
an optimized implementation can be chosen for particular situations. It also simplifies the
usage, since Name only has a few methods. Third, it allows the factory to cache or pool instances
where appropriate to help conserve memory. Finally, the very same factory actually serves as
a conversion mechanism from other forms. We'll actually see more of this
later in this chapter, when we talk about other kinds of property values.

The factory for creating Name objects is called NameFactory and is available within the ExecutionContext,
via the getValueFactories() method. But before we see that, let's first discuss how names are represented as strings.

We'll see how names are used later one, but one more point to make: Name is both serializable and comparable,
and all implementations should support equals(...) and hashCode() so that Name can
be used as a key in a hash-based map. Name also extends the Readable interface, which we'll learn
more about later in this chapter.

5.2. Paths

Another important concept in JBoss DNA's graph model is that of a path, which provides a way
of locating a node within a hierarchy. JBoss DNA's Path object is an immutable ordered sequence
of Path.Segment objects. A small portion of the interface is shown here:

@Immutable
public interface Path extends Comparable<Path>, Iterable<Path.Segment>, Serializable, Readable {
/**
* Return the number of segments in this path.
* @return the number of path segments
*/
public int size();
/**
* Return whether this path represents the root path.
* @return true if this path is the root path, or false otherwise
*/
public boolean isRoot();
/**
* {@inheritDoc}
*/
public Iterator<Path.Segment> iterator();
/**
* Obtain a copy of the segments in this path. None of the segments are encoded.
* @return the array of segments as a copy
*/
public Path.Segment[] getSegmentsArray();
/**
* Get an unmodifiable list of the path segments.
* @return the unmodifiable list of path segments; never null
*/
public List<Path.Segment> getSegmentsList();
/**
* Get the last segment in this path.
* @return the last segment, or null if the path is empty
*/
public Path.Segment getLastSegment();
/**
* Get the segment at the supplied index.
* @param index the index
* @return the segment
* @throws IndexOutOfBoundsException if the index is out of bounds
*/
public Path.Segment getSegment( int index );
/**
* Return an iterator that walks the paths from the root path down to this path. This method
* always returns at least one path (the root returns an iterator containing itself).
* @return the path iterator; never null
*/
public Iterator<Path> pathsFromRoot();
/**
* Return a new path consisting of the segments starting at beginIndex index (inclusive).
* This is equivalent to calling path.subpath(beginIndex,path.size()-1).
* @param beginIndex the beginning index, inclusive.
* @return the specified subpath
* @exception IndexOutOfBoundsException if the beginIndex is negative or larger
* than the length of this Path object
*/
public Path subpath( int beginIndex );
/**
* Return a new path consisting of the segments between the beginIndex index (inclusive)
* and the endIndex index (exclusive).
* @param beginIndex the beginning index, inclusive.
* @param endIndex the ending index, exclusive.
* @return the specified subpath
* @exception IndexOutOfBoundsException if the beginIndex is negative, or
* endIndex is larger than the length of this Path
* object, or beginIndex is larger than endIndex.
*/
public Path subpath( int beginIndex, int endIndex );
...
}

There are actually quite a few methods (not shown above) for obtaining related paths: the path of the parent, the path of an ancestor,
resolving a path relative to this path, normalizing a path (by removing "." and ".." segments), finding the lowest
common ancestor shared with another path, etc. There are also a number of methods that compare the path with others,
including determining whether a path is above, equal to, or below this path.

Each Path.Segment is an immutable pair of a Name and same-name-sibling (SNS) index. When two sibling nodes
have the same name, then the first sibling will have SNS index of "1" and the second will be given a SNS index of "2".
(This mirrors the same-name-sibling index behavior of JCR paths.)

@Immutable
public static interface Path.Segment extends Cloneable, Comparable<Path.Segment>, Serializable, Readable {
/**
* Get the name component of this segment.
* @return the segment's name
*/
public Name getName();
/**
* Get the index for this segment, which will be 1 by default.
* @return the index
*/
public int getIndex();
/**
* Return whether this segment has an index that is not "1"
* @return true if this segment has an index, or false otherwise.
*/
public boolean hasIndex();
/**
* Return whether this segment is a self-reference (or ".").
* @return true if the segment is a self-reference, or false otherwise.
*/
public boolean isSelfReference();
/**
* Return whether this segment is a reference to a parent (or "..")
* @return true if the segment is a parent-reference, or false otherwise.
*/
public boolean isParentReference();
}

5.3. Properties

The JBoss DNA graph model allows nodes to hold multiple properties, where each property is identified
by a unique Name and may have one or more values. Like many of the other classes used in the graph model,
Property is an immutable object that, once constructed, can never be changed and therefore provides
a consistent snapshot of the state of a property as it existed at the time it was read.

JBoss DNA properties can hold a wide range of value objects, including normal Java strings, names, paths,
URIs, booleans, longs, doubles, decimals, binary content, dates, UUIDs, references to other nodes,
or any other serializable object. All but three these are the standard Java classes: dates are
represented by an immutable DateTime class; binary content is represented by an immutable Binary
interface patterned after the proposed interface of the same name in JSR-283;
and is an immutable interface patterned after the corresponding interface is
JSR-170 and JSR-283.

The Property interface defines methods for obtaining the name and property values:

@Immutable
public interface Property extends Iterable<Object>, Comparable<Property>, Readable {
/**
* Get the name of the property.
*
* @return the property name; never null
*/
Name getName();
/**
* Get the number of actual values in this property.
* @return the number of actual values in this property; always non-negative
*/
int size();
/**
* Determine whether the property currently has multiple values.
* @return true if the property has multiple values, or false otherwise.
*/
boolean isMultiple();
/**
* Determine whether the property currently has a single value.
* @return true if the property has a single value, or false otherwise.
*/
boolean isSingle();
/**
* Determine whether this property has no actual values. This method may return true
* regardless of whether the property has a single value or multiple values.
* This method is a convenience method that is equivalent to size() == 0.
* @return true if this property has no values, or false otherwise
*/
boolean isEmpty();
/**
* Obtain the property's first value in its natural form. This is equivalent to calling
* isEmpty() ? null : iterator().next()
* @return the first value, or null if the property is {@link #isEmpty() empty}
*/
Object getFirstValue();
/**
* Obtain the property's values in their natural form. This is equivalent to calling iterator().
* A valid iterator is returned if the property has single valued or multi-valued.
* The resulting iterator is immutable, and all property values are immutable.
* @return an iterator over the values; never null
*/
Iterator<?> getValues();
/**
* Obtain the property's values as an array of objects in their natural form.
* A valid iterator is returned if the property has single valued or multi-valued, or a
* null value is returned if the property is {@link #isEmpty() empty}.
* The resulting array is a copy, guaranteeing immutability for the property.
* @return the array of values
*/
Object[] getValuesAsArray();
}

Creating Property instances is done by using the PropertyFactory object owned by the ExecutionContext.
This factory defines methods for creating properties with a Name and various representation of values,
including variable-length arguments, arrays, Iterator, and .

When it comes to using the property values, JBoss DNA takes a non-traditional approach.
Many other graph models (including JCR) mark each property with a data type and then require
all property values adhere to this data type. When the property values are obtained, they
are guaranteed to be of the correct type. However, many times the property's data type may
not match the data type expected by the caller, and so a conversion may be required
and thus has to be coded.

The JBoss DNA graph model uses a different tact. Because callers almost always have to convert the
values to the types they can handle, JBoss DNA skips the steps of associating the Property with a data type
and ensuring the values match. Instead, JBoss DNA simply provides a very easy mechanism to convert
the property values to the type desired by the caller. In fact, the conversion mechanism
is exactly the same as the factories that create the values in the first place.

5.4. Values and value factories

JBoss DNA properties can hold a variety of types of value objects: strings, names, paths,
URIs, booleans, longs, doubles, decimals, binary content, dates, UUIDs, references to other nodes,
or any other serializable object. To assist in the creation of these values and conversion
into other types, JBoss DNA defines a ValueFactory interface. This interface is parameterized
with the type of value that is being created, but defines methods for creating those values
from all of the other known value types:

This makes it very easy to convert one or more values (of any type, including mixtures) into
corresponding value(s) that are of the desired type. For example, converting the first value
of a property (regardless of type) to a String is simple:

What we've glossed over so far, however, is how to obtain the correct ValueFactory for the desired type.
If you remember back to the previous chapter, ExecutionContext has a getValueFactories() method
that return a ValueFactories interface:

This interface exposes a ValueFactory for each of the types, and even has methods to obtain a ValueFactory
given the PropertyType enumeration. So, the previous examples could be expanded a bit:

You might have noticed that several of the ValueFactories methods return subinterfaces of ValueFactory. These
add type-specific methods that are more commonly needed in certain cases. For example, here is the NameFactory interface:

And finally, the BinaryFactory defines methods for creating Binary objects from a variety of binary formats,
as well as a method that looks for a cached Binary instance given the supplied secure hash:

public interface BinaryFactory extends ValueFactory<Binary> {
/**
* Create a value from the binary content given by the supplied input, the approximate length,
* and the SHA-1 secure hash of the content. If the secure hash is null, then a secure hash is
* computed from the content. If the secure hash is not null, it is assumed to be the hash for
* the content and may not be checked.
*/
Binary create( InputStream stream, long approximateLength, byte[] secureHash )
throws ValueFormatException, IoException;
Binary create( Reader reader, long approximateLength, byte[] secureHash )
throws ValueFormatException, IoException;
/**
* Create a binary value from the given file.
*/
Binary create( File file ) throws ValueFormatException, IoException;
/**
* Find an existing binary value given the supplied secure hash. If no such binary value exists,
* null is returned. This method can be used when the caller knows the secure hash (e.g., from
* a previously-held Binary object), and would like to reuse an existing binary value
* (if possible) rather than recreate the binary value by processing the stream contents. This is
* especially true when the size of the binary is quite large.
*
* @param secureHash the secure hash of the binary content, which was probably obtained from a
* previously-held Binary object; a null or empty value is allowed, but will always
* result in returning null
* @return the existing Binary value that has the same secure hash, or null if there is no
* such value available at this time
*/
Binary find( byte[] secureHash );
}

JBoss DNA provides efficient implementations of all of these interfaces: the ValueFactory interfaces and subinterfaces;
the Path, Path.Segment, Name, Binary, DateTime, and interfaces; and the ValueFactories interface
return by the ExecutionContext. In fact, some of these interfaces have multiple implementations that are optimized for
specific but frequently-occurring conditions.

5.5. Readable, TextEncoder, and TextDecoder

As shown above, the Name, Path.Segment, Path, and Property interfaces all extend the Readable interface,
which defines a number of getString(...) methods that can produce a (readable) string representation of
of that object. Recall that all of these objects contain names with namespace URIs and local names (consisting of any
characters), and so obtaining a readable string representation will require converting the URIs to prefixes,
escaping certain characters in the local names, and formatting the prefix and escaped local name appropriately.
The different getString(...) methods of the Readable interface accept various combinations
of NamespaceRegistry and TextEncoder parameters:

@Immutable
public interface Readable {
/**
* Get the string form of the object. A default encoder is used to encode characters.
* @return the encoded string
*/
public String getString();
/**
* Get the encoded string form of the object, using the supplied encoder to encode characters.
* @param encoder the encoder to use, or null if the default encoder should be used
* @return the encoded string
*/
public String getString( TextEncoder encoder );
/**
* Get the string form of the object, using the supplied namespace registry to convert any
* namespace URIs to prefixes. A default encoder is used to encode characters.
* @param namespaceRegistry the namespace registry that should be used to obtain the prefix
* for any namespace URIs
* @return the encoded string
* @throws IllegalArgumentException if the namespace registry is null
*/
public String getString( NamespaceRegistry namespaceRegistry );
/**
* Get the encoded string form of the object, using the supplied namespace registry to convert
* the any namespace URIs to prefixes.
* @param namespaceRegistry the namespace registry that should be used to obtain the prefix for
* the namespace URIs
* @param encoder the encoder to use, or null if the default encoder should be used
* @return the encoded string
* @throws IllegalArgumentException if the namespace registry is null
*/
public String getString( NamespaceRegistry namespaceRegistry,
TextEncoder encoder );
/**
* Get the encoded string form of the object, using the supplied namespace registry to convert
* the names' namespace URIs to prefixes and the supplied encoder to encode characters, and using
* the second delimiter to encode (or convert) the delimiter used between the namespace prefix
* and the local part of any names.
* @param namespaceRegistry the namespace registry that should be used to obtain the prefix
* for the namespace URIs in the names
* @param encoder the encoder to use for encoding the local part and namespace prefix of any names,
* or null if the default encoder should be used
* @param delimiterEncoder the encoder to use for encoding the delimiter between the local part
* and namespace prefix of any names, or null if the standard delimiter should be used
* @return the encoded string
*/
public String getString( NamespaceRegistry namespaceRegistry,
TextEncoder encoder, TextEncoder delimiterEncoder );
}

The Jsr283Encoder escapes characters that are not allowed in JCR names,
per the JSR-283 specification. Specifically,
these are the '*', '/', ':', '[', ']', and '|' characters, which are escaped by replacing
them with the Unicode characters U+F02A, U+F02F, U+F03A, U+F05B, U+F05D, and U+F07C, respectively.

The UrlEncoder converts text to be used within the different parts of a URL, as defined by Section 2.3 of
RFC 2396. Note that this class does not
encode a complete URL (since java.net.URLEncoder and java.net.URLDecoder
should be used for such purposes).

The XmlValueEncoder escapes characters that are not allowed in XML values. Specifically,
these are the '&', '<', '>', '"', and ''', which are all escaped to
"&amp;", '&lt;', '&gt;', '&quot;', and '&#039;'.

All of these classes also implement the TextDecoder interface, which defines a method that
decodes an encoded string using the opposite transformation.

Of course, you can provide alternative implementations, and supply them to the appropriate getString(...) methods
as required.

5.6. Locations

In addition to Path objects, nodes can be identified by one or more identification properties.
These really are just Property instances with names that have a special meaning
(usually to connectors).
JBoss DNA also defines a Location class that encapsulates:

one or more identification properties that are likely source=specific
and that are represented with Property objects; or

a combination of both.

So, when a client knows the path and/or the identification properties, they can create a Location object
and then use that to identify the node. Location is a class that can be instantiated through factory
methods on the class:

Like many of the other classes and interfaces, Location is immutable and cannot be changed once created.
However, there are methods on the class to create a copy of the Location object with a different Path,
a different UUID, or different identification properties:

One more thing about locations: we'll see later in the next chapter how they are used to make requests
to the connectors. When creating the requests, clients usually have an
incomplete location (e.g., a path but no identification properties). When processing the requests, connectors
provide an actual location that contains the path and all identification properties.
If actual Location objects are then reused in subsequent requests by the client, the connectors will have the benefit of having
both the path and identification properties and may be able to more efficiently locate the identified node.

5.7. Graph API

JBoss DNA's Graph API was designed as a lightweight public API for working with graph information.
The Graph class is the primary class in API, and each instance represents a single, independent
view of a single graph. Graph instances don't maintain state, so every request (or batch of requests) operate against
the underlying graph and the return immutable snapshots of the requested state at the time
the request was made.

There are several ways to obtain a Graph instance, as we'll see in later chapters. For the time being, the important
thing to understand is what a Graph instance represents and how it interacts with the underlying content to return
representations of portions of that underlying graph content.

The Graph class basically represents an internal domain specific language (DSL),
designed to be easy to use in an application.
The Graph API makes extensive use of interfaces and method chaining, so that methods return a concise interface that has only those
methods that make sense at that point. In fact, this should be really easy if your IDE has code completion.
Just remember that under the covers, a Graph is just building Request objects, submitting them to the connector,
and then exposing the results.

5.7.1. Using workspaces

JBoss DNA graphs have the notion of workspaces that provide different views of the content. Some graphs may have
one workspace, while others may have multiple workspaces. Some graphs will allow a client to create new workspaces or destroy
existing workspaces, while other graphs will not allow adding or removing workspaces. Some graphs may have workspaces may show the same (or very
similar) content, while other graphs may have workspaces that each contain completely independent content.

The Graph object is always bound to a workspace, which initially is the default workspace. To find out
what the name of the default workspace is, simply ask for the current workspace after creating the Graph:

Once you know the name of a particular workspace, you can specify that the graph should use it:

graph.useWorkspace("myWorkspace");

From this point forward, all requests will apply to the workspace named "myWorkspace". At any time, you can use a different workspace,
which will affect all subsequent requests made using the graph. To go back to the default workspace, simply supply a null name:

graph.useWorkspace(null);

Of course, creating a new workspace is just as easy:

graph.createWorkspace().named("newWorkspace");

This will attempt to create a workspace named "newWorkspace", which will fail if that workspace already exists. You may
want to create a new workspace with a name that should be altered if the name you supply is already used. The following code shows
how you can do this:

graph.createWorkspace().namedSomethingLike("newWorkspace");

If there is no existing workspace named "newWorkspace", a new one will be created with this name. However, if "newWorkspace" already
exists, this call will create a workspace with a name that is some alteration of the supplied name.

Notice that the examples pass a Path instance to the on(...) and of(...) methods. Many
of the Graph API methods take a variety of parameter types, including String, Paths, Locations, UUID, or Property parameters.
This should make it easy to use in many different situations.

Of course, changing content is more interesting and offers more interesting possibilities. Here are a few examples:

The methods shown above work immediately, as soon as each request is built. However, there is another way to use
the Graph object, and that is in a batch mode. Simply create a Graph.Batch object using the
batch() method, create the requests on that batch object, and then execute all of the commands on the
batch by calling its execute() method. That execute() method returns a Results interface
that can be used to read the node information retrieved by the batched requests.

Method chaining works really well with the batch mode, since multiple commands can be assembled together very easily:

Of course, this section provided just a hint of the Graph API.
The Graph interface is actually quite complete and offers a full-featured approach for reading and updating a graph.
For more information, see the Graph JavaDocs.

5.8. Requests

JBoss DNA Graph objects operate upon the underlying graph content, but we haven't really talked about how that works.
Recall that the Graph objects don't maintain any stateful representation of the content, but instead submit requests
to the underlying graph and return representations of the requested portions of the content.
This section focuses on what those requests look like, since they'll actually become very important when
working with connectors in the next chapter.

A graph Request is an encapsulation of a command that is to be executed by the underlying graph owner (typically
a connector). Request objects can take many different forms, as there are different classes for each kind of request.
Each request contains the information needed to complete the processing, and it also is the place
where the results (or error) are recorded.

The Graph object creates the Request objects using Location objects to identify the node (or nodes) that are the
subject of the request. The Graph can either submit the request immediately, or it can batch multiple requests
together into "units of work". The submitted requests are then processed by the underlying system (e.g., connector)
and returned back to the Graph object, which then extracts and returns the results.

A request to read from the named workspace in the source a node's properties and children.
The node may be specified by path and/or by identification properties.
The connector returns all properties and the locations for all children,
or sets a PathNotFoundException error on the request if the node did not exist in the workspace.
If the node is found, the connector sets on the request the actual location of the node (including the path and identification properties).
The connector sets a InvalidWorkspaceException error on the request if the named workspace does not exist.

VerifyNodeExistsRequest

A request to verify the existance of a node at the specified location in the named workspace of the source.
The connector returns all the actual location for the node if it exists, or
sets a PathNotFoundException error on the request if the node does not exist in the workspace.
The connector sets a InvalidWorkspaceException error on the request if the named workspace does not exist.

ReadAllPropertiesRequest

A request to read from the named workspace in the source all of the properties of a node.
The node may be specified by path and/or by identification properties.
The connector returns all properties that were found on the node,
or sets a PathNotFoundException error on the request if the node did not exist in the workspace.
If the node is found, the connector sets on the request the actual location of the node (including the path and identification properties).
The connector sets a InvalidWorkspaceException error on the request if the named workspace does not exist.

ReadPropertyRequest

A request to read from the named workspace in the source a single property of a node.
The node may be specified by path and/or by identification properties,
and the property is specified by name.
The connector returns the property if found on the node,
or sets a PathNotFoundException error on the request if the node or property did not exist in the workspace.
If the node is found, the connector sets on the request the actual location of the node (including the path and identification properties).
The connector sets a InvalidWorkspaceException error on the request if the named workspace does not exist.

ReadAllChildrenRequest

A request to read from the named workspace in the source all of the children of a node.
The node may be specified by path and/or by identification properties.
The connector returns an ordered list of locations for each child found on the node,
an empty list if the node had no children,
or sets a PathNotFoundException error on the request if the node did not exist in the workspace.
If the node is found, the connector sets on the request the actual location of the parent node (including the path and identification properties).
The connector sets a InvalidWorkspaceException error on the request if the named workspace does not exist.

ReadBlockOfChildrenRequest

A request to read from the named workspace in the source a block of children of a node, starting with the nth children.
This is designed to allow paging through the children, which is much more efficient for large numbers of children.
The node may be specified by path and/or by identification properties, and the block
is defined by a starting index and a count (i.e., the block size).
The connector returns an ordered list of locations for each of the node's children found in the block,
or an empty list if there are no children in that range.
The connector also sets on the request the actual location of the parent node (including the path and identification properties)
or sets a PathNotFoundException error on the request if the parent node did not exist in the workspace.
The connector sets a InvalidWorkspaceException error on the request if the named workspace does not exist.

ReadNextBlockOfChildrenRequest

A request to read from the named workspace in the source a block of children of a node, starting with the children that immediately follow
a previously-returned child.
This is designed to allow paging through the children, which is much more efficient for large numbers of children.
The node may be specified by path and/or by identification properties, and the block
is defined by the location of the node immediately preceding the block and a count (i.e., the block size).
The connector returns an ordered list of locations for each of the node's children found in the block,
or an empty list if there are no children in that range.
The connector also sets on the request the actual location of the parent node (including the path and identification properties)
or sets a PathNotFoundException error on the request if the parent node did not exist in the workspace.
The connector sets a InvalidWorkspaceException error on the request if the named workspace does not exist.

ReadBranchRequest

A request to read a portion of a subgraph that has as its root a particular node, up to a maximum depth.
This request is an efficient mechanism when a branch (or part of a branch) is to be navigated and processed,
and replaces some non-trivial code to read the branch iteratively using multiple ReadNodeRequests.
The connector reads the branch to the specified maximum depth, returning the properties and children for all
nodes found in the branch.
The connector also sets on the request the actual location of the branch's root node (including the path and identification properties).
The connector sets a PathNotFoundException error on the request if the node at
the top of the branch does not exist in the workspace.
The connector sets a InvalidWorkspaceException error on the request if the named workspace does not exist.

ChangeRequest is a subclass of Request that provides a base class for all the requests that request a change
be made to the content. As we'll see later, these ChangeRequest objects also get reused by the
observation system.

Table 5.2. Types of Change Requests

Name

Description

CreateNodeRequest

A request to create a node at the specified location and setting on the new node the properties included in the request.
The connector creates the node at the desired location, adjusting any same-name-sibling indexes as required.
(If an SNS index is provided in the new node's location, existing children with the same name after that SNS index
will have their SNS indexes adjusted. However, if the requested location does not include a SNS index, the new
node is added after all existing children, and it's SNS index is set accordingly.)
The connector also sets on the request the actual location of the new node (including the path and identification properties)..
The connector sets a PathNotFoundException error on the request if the parent node does not exist in the workspace.
The connector sets a InvalidWorkspaceException error on the request if the named workspace does not exist.

RemovePropertiesRequest

A request to remove a set of properties on an existing node. The request contains the location of the node as well as the
names of the properties to be removed. The connector performs these changes and sets on the request the
actual location (including the path and identification properties) of the node.
The connector sets a PathNotFoundException error on the request if the node does not exist in the workspace.
The connector sets a InvalidWorkspaceException error on the request if the named workspace does not exist.

UpdatePropertiesRequest

A request to set or update properties on an existing node. The request contains the location of the node as well as the
properties to be set and those to be deleted. The connector performs these changes and sets on the request the
actual location (including the path and identification properties) of the node.
The connector sets a PathNotFoundException error on the request if the node does not exist in the workspace.
The connector sets a InvalidWorkspaceException error on the request if the named workspace does not exist.

RenameNodeRequest

A request to change the name of a node. The connector changes the node's name, adjusts all SNS indexes
accordingly, and returns the actual locations (including the path and identification properties) of both the original
location and the new location.
The connector sets a PathNotFoundException error on the request if the node does not exist in the workspace.
The connector sets a InvalidWorkspaceException error on the request if the named workspace does not exist.

CopyBranchRequest

A request to copy a portion of a subgraph that has as its root a particular node, up to a maximum depth.
The request includes the name of the workspace where the original node is located as well as the name of the
workspace where the copy is to be placed (these may be the same, but may be different).
The connector copies the branch from the original location, up to the specified maximum depth, and places a copy
of the node as a child of the new location.
The connector also sets on the request the actual location (including the path and identification properties)
of the original location as well as the location of the new copy.
The connector sets a PathNotFoundException error on the request if the node at
the top of the branch does not exist in the workspace.
The connector sets a InvalidWorkspaceException error on the request if one of the named workspaces does not exist.

MoveBranchRequest

A request to move a subgraph that has a particular node as its root.
The connector moves the branch from the original location and places it as child of the specified new location.
The connector also sets on the request the actual location (including the path and identification properties)
of the original and new locations. The connector will adjust SNS indexes accordingly.
The connector sets a PathNotFoundException error on the request if the node that is to be moved or the
new location do not exist in the workspace.
The connector sets a InvalidWorkspaceException error on the request if the named workspace does not exist.

DeleteBranchRequest

A request to delete an entire branch specified by a single node's location.
The connector deletes the specified node and all nodes below it, and sets the actual location,
including the path and identification properties, of the node that was deleted.
The connector sets a PathNotFoundException error on the request if the node being deleted does not exist in the workspace.
The connector sets a InvalidWorkspaceException error on the request if the named workspace does not exist.

CompositeRequest

A request that actually comprises multiple requests (none of which will be a composite).
The connector simply processes all of the requests in the composite request, but should set on the composite
request any error (usually the first error) that occurs during processing of the contained requests.

There are also requests that deal with workspaces:

Table 5.3. Types of Workspace Read Requests

Name

Description

GetWorkspacesRequest

A request to obtain the names of the existing workspaces that are accessible to the caller.

VerifyWorkspaceRequest

A request to verify that a workspace with a particular name exists.
The connector returns the actual location for the root node if the workspace exists, as well as the actual name of the workspace
(e.g., the default workspace name if a null name is supplied).

And there are also requests that deal with changing workspaces (and thus extend ChangeRequest):

Table 5.4. Types of Workspace Change Requests

Name

Description

CreateWorkspaceRequest

A request to create a workspace with a particular name.
The connector returns the actual location for the root node if the workspace exists, as well as the actual name of the workspace
(e.g., the default workspace name if a null name is supplied).
The connector sets a InvalidWorkspaceException error on the request if the named workspace already exists.

DestroyWorkspaceRequest

A request to destroy a workspace with a particular name.
The connector sets a InvalidWorkspaceException error on the request if the named workspace does not exist.

CloneWorkspaceRequest

A request to clone one named workspace as another new named workspace.
The connector sets a InvalidWorkspaceException error on the request if the original workspace does not exist,
or if the new workspace already exists.

Although there are over a dozen different kinds of requests, we do anticipate adding more in future releases.
For example, DNA will likely support searching repository content in sources through an additional subclass of Request.
Getting the version history for a node will likely be another kind of request added in an upcoming release.

This section covered the different kinds of Request classes. The next section provides a easy way to encapsulate how
a component should responds to these requests, and after that we'll see how these Request objects are also used
in the observation framework.

5.9. Request processors

JBoss DNA connectors are typically the components that receive these Request objects. We'll dive deep into connectors
in the next chapter, but before we do there is one more component related to
Requests that should be discussed.

The RequestProcessor class is an abstract class that defines a process(...) method for each concrete Request subclass.
In other words, there is a process(CompositeRequest) method, a process(ReadNodeRequest) method,
and so on. This makes it easy to implement behavior that responds to the different kinds of Request classes:
simply subclass the RequestProcessor, override all of the abstract methods, and optionally
overriding any of the other methods that have a default implementation.

Note

The RequestProcessor abstract class contains default implementations for quite a few of the process(...) methods,
and these will be sufficient but probably not efficient or optimum. If you can provide a more efficient
implementation given your source, feel free to do so. However, if performance is not a big issue, all of the concrete methods
will provide the correct behavior. Keep things simple to start out - you can always provide better implementations later.

5.10. Observation

The JBoss DNA graph model also incorporates an observation framework that allows components to register and
be notified when changes occur within the content owned by a graph.

Many event frameworks define the listeners and sources as interfaces. While this is often useful, it requires
the implementations properly address the thread-safe semantics of managing and calling the listeners.
The JBoss DNA observation framework uses abstract or concrete classes to minimize the effort required for implementing
ChangeObserver or Observable. These abstract classes provide implementations for a number of
utility methods (such as the unregister() method on ChangeObserver) that
also save effort and code.

However, one of the more important reasons for providing classes is that ChangeObserver uses
weak references to track the Observable instances, and the ChangeObservers
class uses weak references for the listeners. This means that an observer does not prevent Observable instances
from being garbage collected, nor do observers prevent Observable instances from being garbage collected.
These abstract class provide all this functionality for free.

5.10.1. Observable

Any component that can have changes and be observed can implement the Observable interface. This interface
allows to register (or be registered) to receive notifications of the changes. However, a concrete and thread-safe
implementation of this interface, called ChangeObservers, is available and should be used where possible, since it
automatically manages the registered ChangeObserver instances and properly implements the register and unregister mechanisms.

5.10.2. Observers

Components that are to recieve notifications of changes are called observers. To create an observer, simply extend
the ChangeObserver abstract class and provide an implementation of the notify(Changes) method.
Then, register the observer with an Observable using its register(ChangeObserver) method.
The observer's notify(Changes) method will then be called with the changes that have
been made to the Observable.

When an observer is no longer needed, it should be unregistered from all Observable instances with which
it was registered. The ChangeObserver class automatically tracks which Observable instances it is
registered with, and calling the observer's unregister() will unregister the observer from
all of these Observables. Alternatively, an observer can be unregistered from a single Observable using the
Observable's unregister(ChangeObserver) method.

5.10.3. Changes

The Changes class represents the set of individual changes that have been made during a single, atomic
operation. Each Changes instance has information about the source of the changes, the timestamp at which
the changes occurred, and the individual changes that were made. These individual changes take the form of
ChangeRequest objects, which we'll see more of in the next chapter. Each request is
frozen, meaning it is immutable and will not change. Also none of the change requests will be marked as cancelled.

Using the actual ChangeRequest objects as the "events" has a number of advantages.
First, the existing ChangeRequest subclasses already contain the information to accurately and completely
describe the operation. Reusing these classes means we don't need a duplicate class structure or come up with a generic
event class.

Second, the requests have all the state required for an event, plus they often will have more. For example,
the DeleteBranchRequest has the actual location of the branch that was deleted (and in this way is not much different than
a more generic event), but the CreateNodeRequest has the actual location of the created node along with the properties
of that node. Additionally, the RemovePropertyRequest has the actual location of the node along with the name of the property
that was removed. In many cases, these requests have all the information a more general event class might have but
then hopefully enough information for many observers to use directly without having to read the graph to decide what
actually changed.

Third, the requests that make up a Changes instance can actually be replayed. Consider the case of a cache
that is backed by a RepositorySource, which might use an observer to keep the cache in sync.
As the cache is notified of Changes, the cache can simply replay the changes against its source.

As we'll see in the next chapter, each connector is responsible for propagating
the ChangeRequest objects to the connector's Observer. But that's not the only use of .
We'll also see later how the sequencing system uses to monitor
for changes in the graph content to determine which, if any, sequencers should be run. And, the
JCR implementation also uses the observation framework to propagate those changes
to JCR clients.

5.11. Summary

In this chapter, we introduced JBoss DNA's graph model and showed the different
kinds of objects used to represent nodes, paths, names, and properties. We saw how all of these objects
are actually immutable, and how the low-level Graph API uses this characteristic to provide a stateless
and thread-safe interface for working with repository content using the request model
used to read, update, and change content.

Next, we'll dive into the connector framework, which builds
on top of the graph model and request model, allowing JBoss DNA to access the graph content stored
in many different kinds of systems.

There is a lot of information stored in many of different places: databases, repositories, SCM systems,
registries, file systems, services, etc. The purpose of the federation engine is to allow applications to use the JCR API
to access that information as if it were all stored in a single JCR repository, but to really leave the information where
it is.

Why not just copy or move the information into a JCR repository? Moving it is probably pretty difficult, since most
likely there are existing applications that rely upon that information being where it is. All of those applications
would break or have to change. And copying the information means that we'd have to continually synchronize the changes.
This not only is a lot of work, but it often makes it difficult to know whether information is accurate and "the master" data.

JBoss DNA lets us leave information where it, yet access it through the JCR API as if it were in one big repository.
One major benefit is that existing applications that use the information in the original locations don't break, since they
can keep using the information. But now our JCR clients can also access all the information, too. And if our federating JBoss DNA repository is
configured to allow updates, JCR client applications can change the information in the repository and JBoss DNA will propagate
those changes down to the original source, making those changes visible to all the other applications.

In short, all clients see the correct information, even when it changes in the underlying systems. But the JCR clients can get to all of the information
in one spot, using one powerful standard API.

6.1. Connectors

With JBoss DNA, your applications use the JCR API to work with the repository,
but the DNA repository transparently fetches the information from different kinds of repositories and storage systems,
not just a single purpose-built store. This is fundamentally what makes JBoss DNA different.

How does JBoss DNA do this? At the heart of JBoss DNA and it's JCR implementation is a simple graph-based
connector system. Essentially, JBoss DNA's JCR implementation uses a single
connector to access all content:

Figure 6.1. JBoss DNA's JCR implementation delegates to a connector

That single connector could use an in-memory repository, a JBoss Cache instance (including those that are clustered and replicated),
or a federated repository where content from multiple sources is unified.

Figure 6.2. JBoss DNA can put JCR on top of multiple kinds of systems

Really, the federated connector gives us all kinds of possibilities, since we can use that connector on top of lots of connectors
to other individual sources. This simple connector architecture is fundamentally what makes JBoss DNA so powerful and flexible.
Along with a good library of connectors, which is what we're planning to create.

For instance, we want to build a connector to other JCR repositories, and another that accesses
the local file system. We've already started on a Subversion connector,
which will allow JCR to access the files in a SVN repository (and perhaps push changes into SVN through a commit).
And of course we want to create a connector that accesses data
and metadata from relational databases. For more information, check out our
roadmap.
Of course, if we don't have a connector to suit your needs, you can write your own.

Figure 6.3. Future JBoss DNA connectors

It's even possible to put a different API layer on top of the connectors. For example, the new New I/O (JSR-203)
API offers the opportunity to build new file system providers. This would be very straightforward to put on top of a JCR implementation,
but it could be made even simpler by putting it on top of a DNA connector. In both cases, it'd be a trivial mapping from nodes that represent
files and folders into JSR-203 files and directories, and events on those nodes could easily be translated into JSR-203 watch events.
Then, simply choose a DNA connector and configure it to use the source you want to use.

Figure 6.4. Virtual File System with JBoss DNA

Before we go further, let's define some terminology regarding connectors.

A connector is the runnable code packaged in one or more JAR files that
contains implementations of several interfaces (described below). A Java developer writes
a connector to a type of source, such as a particular database management system, LDAP directory, source code
management system, etc. It is then packaged into one or more JAR files (including dependent JARs) and deployed
for use in applications that use JBoss DNA repositories.

The description of a particular source system (e.g., the "Customer" database, or the company LDAP system)
is called a repository source. JBoss DNA defines a RepositorySource interface
that defines methods describing the behavior and supported features and a method for establishing connections.
A connector will have a class that implements this interface and that has JavaBean properties for
all of the connector-specific properties required to fully describe an instance of the system. Use of JavaBean
properties is not required, but it is highly recommended, as it enables reflective configuration and administration.
Applications that use JBoss DNA create an instance of the connector's RepositorySource implementation and set
the properties for the external source that the application wants to access with that connector.

A repository source instance is then used to establish connections to
that source. A connector provides an implementation of the RepositoryConnection interface, which
defines methods for interacting with the external system. In particular, the execute(...) method
takes an ExecutionContext instance and a Request object. The ExecutionContext object defines the
environment in which the processing is occurring,
while the Request object describes the requested operations on the content, with different concrete subclasses
representing each type of activity. Examples of commands include (but not limited to) getting a node, moving a node, creating a node,
changing a node, and deleting a node. And, if the repository source is able to participate in JTA/JTS distributed transactions, then the
RepositoryConnection must implement the getXaResource() method by returning
a valid javax.transaction.xa.XAResource object that can be used by the transaction monitor.

As an example, consider that we want JBoss DNA to give us access through JCR to the schema information contained in a
relational databases. We first have to develop a connector that allows us to interact with relational databases using JDBC.
That connector would contain a JdbcRepositorySource Java class that implements RepositorySource,
and that has all of the various JavaBean properties for setting the name of the driver class, URL, username, password,
and other properties. (Or we might have a JavaBean property that defines the JNDI name where we can find a JDBC
DataSource instance pointing to our JDBC database.)

Our new connector would also have a JdbcRepositoryConnection Java class that implements the
RepositoryConnection interface. This class would probably wrap a JDBC database connection,
and would implement the execute(...) method such that the nodes exposed by the connector
describe the database schema of the database. For example, the connector might represent each database table
as a node with the table's name, with properties that describe the table (e.g., the description, whether it's a
temporary table), and with child nodes that represent each of the columns, keys and constraints.

To use our connector in an application that uses JBoss DNA, we need to create an instance of the
JdbcRepositorySource for each database instance that we want to access. If we have 3 MySQL databases,
9 Oracle databases, and 4 PostgreSQL databases, then we'd need to create a total of 16 JdbcRepositorySource
instances, each with the properties describing a single database instance. Those sources are then available for use by
JBoss DNA components, including the JCR implementation.

So, we've so far learned what a connector is and how they're used to establish connections to the underlying sources
and access the content in those sources. Next we'll show how connectors expose the notion of workspaces, and describe
how to create your own connectors.

6.2. Out-of-the-box connectors

A number of connectors are already available in JBoss DNA, and are outlined in detail
later in the document.
Note that we do want to build more connectors
in the upcoming releases.

6.3. Writing custom connectors

There may come a time when you want to tackle creating your own connector. Maybe the connectors we provide out-of-the-box
don't work with your source. Maybe you want to use a different cache system.
Maybe you have a system that you want to make available through a JBoss DNA repository. Or, maybe you're
a contributor and want to help us round out our library with a new connector. No matter what the reason, creating a new connector
is pretty straightforward, as we'll see in this section.

Creating a custom connector involves the following steps:

Create a Maven 2 project for your connector;

Implement the RepositorySource interface, using JavaBean properties for each bit of information the implementation will
need to establish a connection to the source system.

Then, implement the RepositoryConnection interface with a class that represents a connection to the source. The
execute(ExecutionContext, Request) method should process any and all requests that may come down the pike,
and the results of each request can be put directly on that request.

Don't forget unit tests that verify that the connector is doing what it's expected to do. (If you'll be committing the connector
code to the JBoss DNA project, please ensure that the unit tests can be run by others that may not have access to the
source system. In this case, consider writing integration tests that can be easily configured to use different sources
in different environments, and try to make the failure messages clear when the tests can't connect to the underlying source.)

Configure JBoss DNA to use your connector. This may involve just registering the source with the RepositoryService,
or it may involve adding a source to a configuration repository used by the federated repository.

Deploy the JAR file with your connector (as well as any dependencies), and make them available to JBoss DNA
in your application.

Let's go through each one of these steps in more detail.

6.3.1. Creating the Maven 2 project

The first step is to create the Maven 2 project that you can use to compile your code and build the JARs.
Maven 2 automates a lot of the work, and since you're already set up to use Maven,
using Maven for your project will save you a lot of time and effort. Of course, you don't have to use Maven 2, but then you'll
have to get the required libraries and manage the compiling and building process yourself.

Note

JBoss DNA may provide in the future a Maven archetype for creating connector projects. If you'd find this useful
and would like to help create it, please join the community.

In lieu of a Maven archetype, you may find it easier to start with a small existing connector project.
The dna-connector-filesystem project is small, but it may be tough to separate
the stuff that every connector needs from the extra code and data structures that manage the content.
See the subversion repository: http://anonsvn.jboss.org/repos/dna/trunk/extensions/dna-connector-filesystem/

You can create your Maven project any way you'd like. For examples, see the
Maven 2 documentation.
Once you've done that, just add the dependencies in your project's pom.xml dependencies section:

This is the only dependency required for compiling a connector - Maven pulls in all of the dependencies needed by
the 'dna-graph' artifact. Of course, you'll still have to add dependencies for any library your connector needs
to talk to its underlying system.

As for testing, you probably will want to add more dependencies, such as those listed here:

Testing JBoss DNA connectors does not require a JCR repository or the JBoss DNA services. (For more detail,
see the testing section.) However, if you want to do
integration testing with a JCR repository and the JBoss DNA services, you'll need additional dependencies
(e.g., dna-repository and any other extensions).

6.3.2. Implementing a RepositorySource

As mentioned earlier, a connector consists of the Java code that is used to access content
from a system. Perhaps the most important class that makes up a connector is the implementation of the
RepositorySource. This class is analogous to JDBC's DataSource in that it is instantiated to represent
a single instance of a system that will be accessed, and it contains enough information (in the form of JavaBean properties)
so that it can create connections to the source.

Why is the RepositorySource implementation a JavaBean? Well, this is the class that is instantiated, usually
reflectively, and so a no-arg constructor is required. Using JavaBean properties makes it possible
to reflect upon the object's class to determine the properties that can be set (using setters) and read
(using getters). This means that an administrative application can instantiate, configure, and manage
the objects that represent the actual sources, without having to know anything about the actual implementation.

So, your connector will need a public class that implements RepositorySource and provides JavaBean properties
for any kind of inputs or options required to establish a connection to and interact with the underlying source.
Most of the semantics of the class are defined by the RepositorySource and inherited interface.
However, there are a few characteristics that are worth mentioning here.

6.3.2.1. Workspaces

The previous chapter talked about how connector expose their information through the graph language of JBoss DNA.
This is true, except that we didn't dive into too much of the detail. JBoss DNA graphs have the notion of workspaces
in which the content appears, and its very easy for clients using the graph to switch between workspaces. In fact,
workspaces differ from each other in that they provide different views of the same information.

Consider a source control system, like SVN or CVS. These systems provide different views of the source code:
a mainline development branch as well as other branches (or tags) commonly used for releases. So, just like one source
file might appear in the mainline branch as well as the previous two release branches, a node in a repository source
might appear in multiple workspaces.

However, each connector can kind of decide how (or whether) it uses workspaces. For example, there may be no overlap
in the content between workspaces. Or a connector might only expose a single workspace (in other words, there's only one
"default" workspace).

6.3.2.3. Cache policy

Each connector is responsible for determining whether and how long DNA is to cache the
content made available by the connector. This is referred to as the caching policy,
and consists of a time to live value representing the number of milliseconds that
a piece of data may be cached. After the TTL has passed, the information is no longer used.

DNA allows a connector to use a flexible and powerful caching policy. First, each connection returns the
default caching policy for all information returned by that connection.
Often this policy can be configured via properties on the RepositorySource implementation.
This is optional, meaning the connector can return null if it does not wish to
have a default caching policy.

Second, the connector is able to override its default caching policy on individual requests
(which we'll cover in the next section).
Again, this is optional, meaning that a null caching policy on a request implies that the
request has no overridden caching policy.

Third, if the connector has no default caching policy and none is set on the individual requests,
DNA uses whatever caching policy is set up for that component using the connector. For example, the federating
connector allows a default caching policy to be specified, and this policy is used should the sources
being federated not define their own caching policy.

In summary, a connector has total control over whether and for how long the information it provides
is cached.

6.3.2.4. Leveraging JNDI

Sometimes it is necessary (or easier) for a RepositorySource implementation to look up an object in JNDI.
One example of this is the JBoss Cache connector: while the connector can
instantiate a new JBoss Cache instance, more interesting use cases involve JBoss Cache instances that are
set up for clustering and replication, something that is generally difficult to configure in a single JavaBean.
Therefore the JBossCacheSource has optional JavaBean properties that define how it is to look up a
JBoss Cache instance in JNDI.

This is a simple pattern that you may find useful in your connector. Basically, if your source implementation
can look up an object in JNDI, simply use a single JavaBean String property that defines the
full name that should be used to locate that object in JNDI. Usually it's best to include "Jndi" in the
JavaBean property name so that administrative users understand the purpose of the property.
(And some may suggest that any optional property also use the word "optional" in the property name.)

6.3.2.5. Capabilities

Another characteristic of a RepositorySource implementation is that it provides some hint as to whether
it supports several features. This is defined on the interface as a method that returns a
RepositorySourceCapabilities object. This class currently provides methods that say whether the connector supports
updates, whether it supports same-name-siblings (SNS), and whether the connector supports listeners and events.

Note that these may be hard-coded values, or the connector's response may be determined at runtime by various factors.
For example, a connector may interrogate the underlying system to decide whether it can support updates.

The RepositorySourceCapabilities can be used as is (the class is immutable), or it can be subclassed
to provide more complex behavior. It is important, however, that the capabilities remain constant
throughout the lifetime of the RepositorySource instance.

Note

Why a concrete class and not an interface? By using a concrete class, connectors inherit the default
behavior. If additional capabilities need to be added to the class in future releases, connectors may
not have to override the defaults. This provides some insulation against future enhancements to the connector framework.

6.3.2.6. Security and authentication

As we'll see in the next section, the main method connectors have to process requests takes an ExecutionContext,
which contains the JAAS security information of the subject performing the request. This means that the connector
can use this to determine authentication and authorization information for each request.

Sometimes that is not sufficient. For example, it may be that the connector needs its own authorization information
so that it can establish a connection (even if user-level privileges still use the ExecutionContext provided with
each request). In this case, the RepositorySource implementation will probably need JavaBean properties
that represent the connector's authentication information. This may take the form of a username and password,
or it may be properties that are used to delegate authentication to JAAS.
Either way, just realize that it's perfectly acceptable for the connector to require its own security properties.

6.3.3. Implementing a RepositoryConnection

One job of the RepositorySource implementation is to create connections to the underlying sources.
Connections are represented by classes that implement the RepositoryConnection interface, and creating this
class is the next step in writing a connector. This is what we'll cover in this section.

/**
* A connection to a repository source.
* <p>
* These connections need not support concurrent operations by multiple threads.
* </p>
*/
@NotThreadSafe
public interface RepositoryConnection {
/**
* Get the name for this repository source. This value should be the same as that returned
* by the same RepositorySource that created this connection.
*
* @return the identifier; never null or empty
*/
String getSourceName();
/**
* Return the transactional resource associated with this connection. The transaction manager
* will use this resource to manage the participation of this connection in a distributed transaction.
*
* @return the XA resource, or null if this connection is not aware of distributed transactions
*/
XAResource getXAResource();
/**
* Ping the underlying system to determine if the connection is still valid and alive.
*
* @param time the length of time to wait before timing out
* @param unit the time unit to use; may not be null
* @return true if this connection is still valid and can still be used, or false otherwise
* @throws InterruptedException if the thread has been interrupted during the operation
*/
boolean ping( long time, TimeUnit unit ) throws InterruptedException;
/**
* Get the default cache policy for this repository. If none is provided, a global cache policy
* will be used.
*
* @return the default cache policy
*/
CachePolicy getDefaultCachePolicy();
/**
* Execute the supplied commands against this repository source.
*
* @param context the environment in which the commands are being executed; never null
* @param request the request to be executed; never null
* @throws RepositorySourceException if there is a problem loading the node data
*/
void execute( ExecutionContext context, Request request ) throws RepositorySourceException;
/**
* Close this connection to signal that it is no longer needed and that any accumulated
* resources are to be released.
*/
void close();
}

While most of these methods are straightforward, a few warrant additional information.
The ping(...) allows DNA to check the connection to see if it is
alive. This method can be used in a variety of situations, ranging from verifying that a RepositorySource's
JavaBean properties are correct to ensuring that a connection is still alive before returning the connection from
a connection pool.

If the connector is able to publish events, then
DNA hasn't yet defined the event mechanism, so connectors don't have any methods to invoke on the .
This will be defined in the next release, so feel free to manage the listeners now. Note that by default the RepositorySourceCapabilities returns
false for supportsEvents().

The most important method on this interface, though, is the execute(...) method, which serves as the
mechanism by which the component using the connector access and manipulates the content exposed by the connector.
The first parameter to this method is the ExecutionContext, which contains the information about environment
as well as the subject performing the request. This was discussed earlier.

The second parameter, however, represents a Request that is to be processed by the connector. Request objects can
take many different forms, as there are different classes for each kind of request (see the
previous chapter for details).
Each request contains the information a connector needs to do the processing, and it also is the place
where the connector places the results (or the error, if one occurs).

Although there are over a dozen different kinds of requests, we do anticipate adding more in future releases.
For example, DNA will likely support searching repository content in sources through an additional subclass of Request.
Getting the version history for a node will likely be another kind of request added in an upcoming release.

A connector is technically free to implement the execute(...) method in any way, as long as the semantics
are maintained. But as discussed in the previous chapter, JBoss DNA provides
a RequestProcessor class that can simplify writing your own connector and at the
same time help insulate your connector from new kinds of requests that may be added in the future. The RequestProcessor
is an abstract class that defines a process(...) method for each concrete Request subclass.
In other words, there is a process(CompositeRequest) method, a process(ReadNodeRequest) method,
and so on.

To use this in your connector, simply create a subclass of RequestProcessor, overriding all of the abstract methods and optionally
overriding any of the other methods that have a default implementation.

Note

The RequestProcessor abstract class contains default implementations for quite a few of the process(...) methods,
and these will be sufficient but probably not efficient or optimum. If you can provide a more efficient
implementation given your source, feel free to do so. However, if performance is not a big issue, all of the concrete methods
will provide the correct behavior. Keep things simple to start out - you can always provide better implementations later.

If you do this, the bulk of your connector implementation may be in the RequestProcessor implementation methods.
This not only is pretty maintainable, it also lends itself to easier testing. And should any new request types be added
in the future, your connector may work just fine without any changes. In fact, if the RequestProcessor class
can implement meaningful methods for those new request types, your connector may "just work". Or, at least
your connector will still be binary compatible, even if your connector won't support any of the new features.

Finally, how should the connector handle exceptions? As mentioned above, each Request object has a slot where the connector
can set any exception encountered during processing. This not only handles the exception, but in the case of CompositeRequests
it also correctly associates the problem with the request. However, it is perfectly acceptable to throw an exception
if the connection becomes invalid (e.g., there is a communication failure) or if a fatal error would prevent subsequent
requests from being processed.

6.3.4. Testing custom connectors

Testing connectors is not really that much different than testing other classes. Using mocks may help to isolate your
instances so you can create more unit tests that don't require the underlying source system.

However, there may be times when you have to use the underlying source system in your tests. If this is the case,
we recommend using Maven integration tests, which run at a different point in the Maven lifecycle. The benefit of
using integration tests is that by convention they're able to rely upon external systems. Plus, your unit tests
don't become polluted with slow-running tests that break if the external system is not available.

6.4. Summary

In this chapter, we covered all the aspects of JBoss DNA connectors, including the connector API,
how DNA's JCR implementation works with connectors, what connectors are available (and how to use them),
and how to write your own connector. So now that you know how to set up and use JBoss DNA repositories,
the next chapter describes the sequencing framework and how
to build your own custom sequencers. After that, we'll get into how to configure
JBoss DNA and use JCR.

Chapter 7. Sequencing framework

Many repositories are used (at least in part) to manage files and other artifacts, including
service definitions, policy files, images, media, documents, presentations, application components,
reusable libraries, configuration files, application installations, databases schemas, management scripts, and so on.
Unlocking the information buried within all of those files is what JBoss DNA sequencing is all about.
As files are loaded into the repository, you JBoss DNA can automatically sequence these files to extract
from their content meaningful information that can be stored in the repository, where it can then be
searched, accessed, and analyzed using the JCR API.

7.1. Sequencers

Sequencers are just POJOs that implement a specific interface, and their job is to process a stream of
data (supplied by JBoss DNA) to extract meaningful content that usually takes the form of a structured graph.
Exactly what content is up to each sequencer
implementation. For example, JBoss DNA comes with an image sequencer
that extracts the simple metadata from different kinds of image files (e.g., JPEG, GIF, PNG, etc.).
Another example is the Compact Node Definition (CND) sequencer
that processes the CND files to extract and produce a structured representation of the node type definitions,
property definitions, and child node definitions contained within the file.

Sequencers are configured to identify the kinds of nodes that the sequencers can work against.
When content in the repository changes, JBoss DNA looks to see which (if any) sequencers might be able
to run on the changed content. If any sequencer configurations do match, those sequencers are run
against the content, and the structured graph output of the sequencers is then written back into the repository
(at a location dictated by the sequencer configuration). And once that information is in the repository,
it can be easily found and accessed via the standard JCR API.

In other words, JBoss DNA uses sequencers to help you extract more meaning from the artifacts you already
are managing, and makes it much easier for applications to find and use all that valuable information.
All without your applications doing anything extra.

7.2. Stream Sequencers

The StreamSequencer interface defines the single method that must be implemented by a sequencer:

public interface StreamSequencer {
/**
* Sequence the data found in the supplied stream, placing the output
* information into the supplied map.
*
* @param stream the stream with the data to be sequenced; never null
* @param output the output from the sequencing operation; never null
* @param context the context for the sequencing operation; never null
*/
void sequence( InputStream stream, SequencerOutput output, context );
}

Implementations are responsible for processing the content in the supplied InputStream content and generating
structured content using the supplied SequencerOutput interface.
The provides additional details about the information that is being sequenced,
including the location and properties of the node being sequenced, the MIME type
of the node being sequenced, and a Problems object where the sequencer can record problems that aren't
severe enough to warrant throwing an exception. The also provides access
to the ValueFactories that can be used to create Path, Name, and any other value objects.

The SequencerOutput interface is fairly easy to use, and its job is to hide from the sequencer
all the specifics about where the output is being written. Therefore, the interface has only a few methods
for implementations to call.
Two methods set the property values on a node, while the other sets references to other nodes in the repository.
Use these methods to describe the properties of the nodes you want to create, using relative paths for the nodes and
valid JCR property names for properties and references. JBoss DNA will ensure that nodes are created or updated
whenever they're needed.

public interface SequencerOutput {
/**
* Set the supplied property on the supplied node. The allowable
* values are any of the following:
* - primitives (which will be autoboxed)
* - String instances
* - String arrays
* - byte arrays
* - InputStream instances
* - Calendar instances
*
* @param nodePath the path to the node containing the property;
* may not be null
* @param property the name of the property to be set
* @param values the value(s) for the property; may be empty if
* any existing property is to be removed
*/
void setProperty( String nodePath, String property, Object... values );
void setProperty( Path nodePath, Name property, Object... values );
/**
* Set the supplied reference on the supplied node.
*
* @param nodePath the path to the node containing the property;
* may not be null
* @param property the name of the property to be set
* @param paths the paths to the referenced property, which may be
* absolute paths or relative to the sequencer output node;
* may be empty if any existing property is to be removed
*/
void setReference( String nodePath, String property, String... paths );
}

Note

JBoss DNA will create nodes of type nt:unstructured unless you specify the value for the
jcr:primaryType property. You can also specify the values for the jcr:mixinTypes property
if you want to add mixins to any node.

7.3. Path Expressions

Each sequencer must be configured to describe the areas or types of content that the sequencer is capable
of handling. This is done by specifying these patterns using path expressions that
identify the nodes (or node patterns) that should be sequenced and where to store the output generated by the sequencer.
We'll see how to fully configure a sequencer in the next chapter,
but before then let's dive into path expressions in more detail.

A path expression consist of two parts: a selection criteria (or an input path) and an output path:

inputPath => outputPath

The inputPath part defines an expression for the path of a node that is to be sequenced.
Input paths consist of '/' separated segments, where each segment represents a pattern for a single node's
name (including the same-name-sibling indexes) and '@' signifies a property name.

Let's first look at some simple examples:

Table 7.1. Simple Input Path Examples

Input Path

Description

/a/b

Match node "b" that is a child of the top level node "a". Neither node
may have any same-name-sibilings.

/a/*

Match any child node of the top level node "a".

/a/*.txt

Match any child node of the top level node "a" that also has a name ending in ".txt".

/a/*.txt

Match any child node of the top level node "a" that also has a name ending in ".txt".

/a/b@c

Match the property "c" of node "/a/b".

/a/b[2]

The second child named "b" below the top level node "a".

/a/b[2,3,4]

The second, third or fourth child named "b" below the top level node "a".

/a/b[*]

Any (and every) child named "b" below the top level node "a".

//a/b

Any node named "b" that exists below a node named "a", regardless
of where node "a" occurs. Again, neither node may have any same-name-sibilings.

With these simple examples, you can probably discern the most important rules. First, the '*' is a wildcard character
that matches any character or sequence of characters in a node's name (or index if appearing in between square brackets), and
can be used in conjunction with other characters (e.g., "*.txt").

Second, square brackets (i.e., '[' and ']') are used to match a node's same-name-sibiling index.
You can put a single non-negative number or a comma-separated list of non-negative numbers. Use '0' to match a node that has no
same-name-sibilings, or any positive number to match the specific same-name-sibling.

Third, combining two delimiters (e.g., "//") matches any sequence of nodes, regardless of what their names are
or how many nodes. Often used with other patterns to identify nodes at any level matching other patterns.
Three or more sequential slash characters are treated as two.

Many input paths can be created using just these simple rules. However, input paths can be more complicated. Here are some
more examples:

Table 7.2. More Complex Input Path Examples

Input Path

Description

/a/(b|c|d)

Match children of the top level node "a" that are named "a",
"b" or "c". None of the nodes may have same-name-sibling indexes.

/a/b[c/d]

Match node "b" child of the top level node "a", when node
"b" has a child named "c", and "c" has a child named "d".
Node "b" is the selected node, while nodes "b" and "b" are used as criteria but are not
selected.

/a(/(b|c|d|)/e)[f/g/@something]

Match node "/a/b/e", "/a/c/e", "/a/d/e",
or "/a/e" when they also have a child "f" that itself has a child "g" with property
"something". None of the nodes may have same-name-sibling indexes.

These examples show a few more advanced rules. Parentheses (i.e., '(' and ')') can be used
to define a set of options for names, as shown in the first and third rules. Whatever part of the selected node's path
appears between the parentheses is captured for use within the output path. Thus, the first input path in the previous table
would match node "/a/b", and "b" would be captured and could be used within the output path using "$1",
where the number used in the output path identifies the parentheses.

Square brackets can also be used to specify criteria on a node's properties or children. Whatever appears in between the square
brackets does not appear in the selected node.

Let's go back to the previous code fragment and look at the first path expression:

This matches a node named "jcr:content" with property "jcr:data" but no siblings with the same name,
and that is a child of a node whose name ends with ".jpg", ".jpeg", ".gif", ".bmp", ".pcx",
or ".png" that may have any same-name-sibling index. These nodes can appear at any level in the repository.
Note how the input path capture the filename (the segment containing the file extension), including any same-name-sibling index.
This filename is then used in the output path, which is where the sequenced content is placed.

7.4. Out-of-the-box sequencers

A number of sequencers are already available in JBoss DNA, and are outlined in detail
later in the document.
Note that we do want to build more sequencers
in the upcoming releases.

7.5. Creating custom sequencers

The current release of JBoss DNA comes with six sequencers. However, it's very easy to create your own
sequencers and to then configure JBoss DNA to use them in your own application.

Creating a custom sequencer involves the following steps:

Create a Maven 2 project for your sequencer;

Implement the StreamSequencer interface with your own implementation, and create unit tests to verify
the functionality and expected behavior;

Deploy the JAR file with your implementation (as well as any dependencies), and make them available to JBoss DNA
in your application.

It's that simple.

7.5.1. Creating the Maven 2 project

The first step is to create the Maven 2 project that you can use to compile your code and build the JARs.
Maven 2 automates a lot of the work, and since you're already set up to use Maven,
using Maven for your project will save you a lot of time and effort. Of course, you don't have to use Maven 2, but then you'll
have to get the required libraries and manage the compiling and building process yourself.

Note

JBoss DNA may provide in the future a Maven archetype for creating sequencer projects. If you'd find this useful
and would like to help create it, please join the community.

Testing JBoss DNA sequencers does not require a JCR repository or the JBoss DNA services. (For more detail,
see the testing section.) However, if you want to do
integration testing with a JCR repository and the JBoss DNA services, you'll need additional dependencies for these libraries.

At this point, your project should be set up correctly, and you're ready to move on to
write your custom implementation of the StreamSequencer interface. As stated earlier, this should be fairly
straightforward: process the stream and generate the output that's appropriate for the kind of file being
sequenced.

Notice how the image metadata is extracted and the output graph is generated. A single node is created with the name
image:metadata
and with the image:metadata node type. No mixins are defined for the node, but several properties are set on the node
using the values obtained from the image metadata. After this method returns, the constructed graph will be saved to the repository
in all of the places defined by its configuration. (This is why only relative paths are used in the sequencer.)

7.5.2. Testing custom sequencers

The sequencing framework was designed to make testing sequencers much easier. In particular, the
StreamSequencer interface does not make use of the JCR API. So instead of requiring a fully-configured
JCR repository and JBoss DNA system, unit tests for a sequencer can focus on testing that the content is
processed correctly and the desired output graph is generated.

Note

For a complete example of a sequencer unit test, see the ImageMetadataSequencerTest unit test
in the org.jboss.dna.sequencer.images package of the dna-sequencers-image project.

The following code fragment shows one way of testing a sequencer, using JUnit 4.4 assertions and
some of the classes made available by JBoss DNA. Of course,
this example code does not do any error handling and does not make all the assertions a real test would.

These are just two simple tests that show ways of testing a sequencer. Some tests may get quite involved,
especially if a lot of output data is produced.

It may also be useful to create some integration tests
that configure JBoss DNA to use a custom sequencer, and to then upload
content using the JCR API, verifying that the custom sequencer did run. However, remember that JBoss DNA
runs sequencers asynchronously in the background, and you must synchronize your tests to ensure that the
sequencers have a chance to run before checking the results.

7.6. Summary

In this chapter, we described how JBoss DNA sequences files as they're uploaded into a repository. We've also learned
in previous chapters about the JBoss DNA execution contexts,
graph model, and connectors.
In the next part we'll put all these pieces together to learn how
to set up a JBoss DNA repository and access it using the JCR API.

Part III. JBoss DNA JCR

The JBoss DNA project provides an implementation of the JCR API, which is
built on top of the core libraries discussed earlier. This implementation
as well as a number of JCR-related components are described in this part of the document.
But before talking about how to use the JCR API with a JBoss DNA repository, first we need to
show how to set up a JBoss DNA engine.

Chapter 8. Configuring and Using JBoss DNA

Using JBoss DNA within your application is actually quite straightforward. As you'll see in this chapter,
the first step is setting up JBoss DNA and starting the JcrEngine. After that, you obtain the
javax.jcr.Repository instance for a named repository and just use the standard JCR API throughout your
application.

8.1. JBoss DNA's JcrEngine

JBoss DNA encapsulates everything necessary to run one or more JCR repositories into a single JcrEngine instance.
This includes all underlying repository sources, the pools of connections to the sources, the sequencers,
the MIME type detector(s), and the Repository implementations.

Obtaining a JcrEngine instance is very easy - assuming that you have a valid JcrConfiguration instance. We'll see
how to get one of those in a little bit, but if you have one then all you have to do is build and start the engine:

When the shutdown() method is called, the Repository instances managed by the engine are marked as being shut down,
and they will not be able to create new Sessions. However, any existing Sessions or ongoing operations (e.g., event notifications)
present at the time of the shutdown() call will be allowed to finish.
In essence, shutdown() is a graceful request, and since it may take some time to complete,
you can wait until the shutdown has completed by simply calling awaitTermination(...) as shown above.
This method will block until the engine has indeed shutdown or until the supplied time duration has passed (whichever comes first).
And, yes, you can call the awaitTermination(...) method repeatedly if needed.

8.2. JcrConfiguration

The previous section assumed the existence of a JcrConfiguration. It's not really that creating an instance is all that difficult.
In fact, there's only one no-argument constructor, so actually creating the instance is a piece of cake. What can be a little more challenging,
though, is setting up the JcrConfiguration instance, which must define the following components:

Repository sources are the POJO objects that each describe a particular
location where content is stored. Each repository source object is an instance of a JBoss DNA connector, and is configured
with the properties that particular source. JBoss DNA's RepositorySource classes are analogous to JDBC's DataSource classes -
they are implemented by specific connectors (aka, "drivers") for specific kinds of repository sources (aka, "databases").
Similarly, a RepositorySource instance is analogous to a DataSource instance, with bean properties for each configurable
parameter. Therefore, each repository source definition must supply the name of the RepositorySource class, any
bean properties, and, optionally, the classpath that should be used to load the class.

Repositories define the JCR repositories that are available. Each
repository has a unique name that is used to obtain the Repository instance from the JcrEngine's getRepository(String)
method, but each repository definition also can include the predefined namespaces (other than those automatically defined by
JBoss DNA), various options, and the node types that are to be available in the repository without explicit registration
through the JCR API.

Sequencers define the particular sequencers that are available for use.
Each sequencer definition provides the path expressions governing which nodes in the repository should be sequenced when those nodes change,
and where the resulting output generated by the sequencer should be placed. The definition also must state the name of
the sequencer class, any bean properties and, optionally, the classpath that should be used to load the class.

MIME type detectors define the particular MIME type detector(s) that should
be made available. A MIME type detector does exactly what the name implies: it attempts to determine the MIME type given a
"filename" and contents. JBoss DNA automatically uses a detector that uses the file extension to identify the MIME type,
but also provides an implementation that uses an external library to identify the MIME type based upon the contents.
The definition must state the name of the detector class, any bean properties and, optionally, the classpath that should
be used to load the class.

There really are three options:

Load from a file is conceptually the easiest and requires the least amount
of Java code, but it now requires a configuration file.

Load from a configuration repository is not much more complicated than loading
from a file, but it does allow multiple JcrEngine instances (usually in different processes perhaps on different machines)
to easily access their (shared) configuration. And technically, loading the configuration from a file really just creates an
InMemoryRepositorySource, imports the configuration file into that source, and then proceeds with this approach.

Programmatic configuration is always possible, even if the configuration is loaded
from a file or repository. Using the JcrConfiguration's API, you can define (or update or remove) all of the definitions that make
up a configuration.

Each of these approaches has their obvious advantages, so the choice of which one to use is entirely up to you.

8.2.1. Loading from a configuration file

Loading the JBoss DNA configuration from a file is actually very simple:

where the file parameter can actually be a File instance, a URL to the file, an InputStream
containing the contents of the file, or even a String containing the contents of the file.

Note

The loadFrom(...) method can be called any number of times, but each time it is called it completely wipes
out any current notion of the configuration and replaces it with the configuration found in the file.

There is an optional second parameter that defines the Path within the configuration file identifying the parent node of the various
configuration nodes. If not specified, it assumes "/". This makes it possible for the configuration content to be
located at a different location in the hierarchical structure. (This is not often required, but when it is required
this second parameter is very useful.)

Here is the configuration file that is used in the repository example, though it has been simplified a bit and most comments
have been removed for clarity):

<?xml version="1.0" encoding="UTF-8"?><configuration xmlns="http://www.jboss.org/dna/1.0" xmlns:jcr="http://www.jcp.org/jcr/1.0"><!--Define the JCR repositories --><dna:repositories><!--Define a JCR repository that accesses the 'Cars' source directly.This of course is optional, since we could access the same content through 'vehicles'.--><dna:repository jcr:name="car repository" dna:source="Cars"><dna:options jcr:primaryType="dna:options"><jaasLoginConfigName jcr:primaryType="dna:option" dna:value="dna-jcr"/></dna:options></dna:repository></dna:repositories><!--Define the sources for the content.These sources are directly accessible using the DNA-specific Graph API.--><dna:sources jcr:primaryType="nt:unstructured"><dna:source jcr:name="Cars" dna:classname="org.jboss.dna.graph.connector.inmemory.InMemoryRepositorySource" dna:retryLimit="3" dna:defaultWorkspaceName="workspace1"/><dna:source jcr:name="Aircraft" dna:classname="org.jboss.dna.graph.connector.inmemory.InMemoryRepositorySource"><!--Define the name of the workspace used by default.Optional, but convenient.--><defaultWorkspaceName>workspace2</defaultWorkspaceName></dna:source></dna:sources><!--Define the sequencers.This is an optional section.Forthis example, we're not using any sequencers.--><dna:sequencers><!--dna:sequencer jcr:name="Image Sequencer"><dna:classname>org.jboss.dna.sequencer.image.ImageMetadataSequencer</dna:classname><dna:description>Image metadata sequencer</dna:description><dna:pathExpression>/foo/source =>/foo/target</dna:pathExpression><dna:pathExpression>/bar/source =>/bar/target</dna:pathExpression></dna:sequencer--></dna:sequencers><dna:mimeTypeDetectors><dna:mimeTypeDetector jcr:name="Detector" dna:description="Standard extension-based MIME type detector"/></dna:mimeTypeDetectors></configuration>

8.2.2. Loading from a configuration repository

Loading the JBoss DNA configuration from an existing repository is also pretty straightforward. Simply create and configure the
RepositorySource instance to point to the desired repository, and then call the loadFrom(RepositorySource source)
method:

This really is a more advanced way to define your configuration, so we won't go into how you configure a RepositorySource.
For more information, consult the Getting Started.

Note

The loadFrom(...) method can be called any number of times, but each time it is called it completely wipes
out any current notion of the configuration and replaces it with the configuration found in the file.

There is an optional second parameter that defines the name of the workspace in the supplied source where the configuration content
can be found. It is not needed if the workspace is the source's default workspace.
There is an optional third parameter that defines the Path within the configuration repository identifying the parent node of the various
configuration nodes. If not specified, it assumes "/". This makes it possible for the configuration content to be
located at a different location in the hierarchical structure. (This is not often required, but when it is required
this second parameter is very useful.)

8.2.3. Programmatic configuration

Defining the configuration programmatically is not terribly complicated, and it for obvious reasons results in more verbose Java code.
But this approach is very useful and often the easiest approach when the configuration must change or is a reflection of other
dynamic information.

The JcrConfiguration class was designed to have an easy-to-use API that makes it easy to configure each of the different kinds of
components, especially when using an IDE with code completion. Here are several examples:

8.2.3.1. Repository sources

Each repository source definition must include the name of the RepositorySource class as well as each bean property
that should be set on the object:

This example defines an in-memory source with the name "source A", a description, and a single "defaultWorkspaceName" bean property.
Different RepositorySource implementations will the bean properties that are required and optional.
Of course, the class can be specified as Class reference or a string (followed by whether the class should be loaded from
the classpath or from a specific classpath).

Note

Each time repositorySource(String) is called, it will either load the existing definition with the supplied
name or will create a new definition if one does not already exist. To remove a definition, simply call remove()
on the result of repositorySource(String).
The set of existing definitions can be accessed with the repositorySources() method.

8.2.3.2. Repositories

Each repository must be defined to use a named repository source, but all other aspects (e.g., namespaces, node types, options)
are optional.

This example defines a repository that uses the "source 1" repository source (which could be a federated source, an in-memory source,
a database store, or any other source). Additionally, this example adds the node types in the "myCustomNodeTypes.cnd" file as those
that will be made available when the repository is accessed. It also defines the "http://www.example.com/acme" namespace,
and finally sets the "JAAS_LOGIN_CONFIG_NAME" option to define the name of the JAAS login configuration that should be used by
the JBoss DNA repository.

Note

Each time repository(String) is called, it will either load the existing definition with the supplied
name or will create a new definition if one does not already exist. To remove a definition, simply call remove()
on the result of repository(String).
The set of existing definitions can be accessed with the repositories() method.

8.2.3.3. Sequencers

Each defined sequencer must specify the name of the StreamSequencer implementation class as well as the path expressions
defining which nodes should be sequenced and the output paths defining where the sequencer output should be placed (often as a function
of the input path expression).

JcrConfiguration config =...config.sequencer("Image Sequencer").usingClass("org.jboss.dna.sequencer.image.ImageMetadataSequencer").loadedFromClasspath().setDescription("Sequences image files to extract the characteristics of the image").sequencingFrom("//(*.(jpg|jpeg|gif|bmp|pcx|png|iff|ras|pbm|pgm|ppm|psd)[*])/jcr:content[@jcr:data]").andOutputtingTo("/images/$1");

This shows an example of a sequencer definition named "Image Sequencer" that uses the ImageMetadataSequencer class
(loaded from the classpath), that is to sequence the "jcr:data" property on any new or changed nodes that are named
"jcr:content" below a parent node with a name ending in ".jpg", ".jpeg", ".gif", ".bmp", ".pcx", ".iff", ".ras",
".pbm", ".pgm", ".ppm" or ".psd". The output of the sequencing operation should be placed at the "/images/$1" node,
where the "$1" value is captured as the name of the parent node. (The capture groups work the same was as regular expressions;
see the Getting Started for more details.)
Of course, the class can be specified as Class reference or a string (followed by whether the class should be loaded from
the classpath or from a specific classpath).

Note

Each time sequencer(String) is called, it will either load the existing definition with the supplied
name or will create a new definition if one does not already exist. To remove a definition, simply call remove()
on the result of sequencer(String).
The set of existing definitions can be accessed with the sequencers() method.

8.2.3.4. MIME type detectors

Each defined MIME type detector must specify the name of the MimeTypeDetector implementation class as well as any
other bean properties required by the implementation.

Of course, the class can be specified as Class reference or a string (followed by whether the class should be loaded from
the classpath or from a specific classpath).

Note

Each time mimeTypeDetector(String) is called, it will either load the existing definition with the supplied
name or will create a new definition if one does not already exist. To remove a definition, simply call remove()
on the result of mimeTypeDetector(String).
The set of existing definitions can be accessed with the mimeTypeDetectors() method.

8.3. What's next

This chapter outlines how you configure JBoss DNA, how you then access a javax.jcr.Repository instance,
and use the standard JCR API to interact with the repository. The
next chapter talks about using the JCR API with your JBoss DNA repository.

The
Content Repository for Java technology API
provides a standard Java API for working with content repositories. Abbreviated "JCR", this API was developed as part of the
Java Community Process under JSR-170 (JCR 1.0) and is being revised under JSR-283.
JBoss DNA provides a partial JCR 1.0 implementation that allows you to work with the contents of a repository using the
JCR API. For information about how to use the JCR API, please see the JSR-170 specification.

Note

In the interests of brevity, this chapter does not attempt to reproduce the JSR-170 specification nor provide
an exhaustive definition of JBoss DNA JCR capabilities. Rather, this chapter will describe any deviations from the
specification as well as any DNA-specific public APIs and configuration.

Using JBoss DNA within your application is actually quite straightforward. As you'll see in this chapter,
the first step is setting up JBoss DNA and starting the JcrEngine. After that, you obtain the
javax.jcr.Repository instance for a named repository and just use the standard JCR API throughout your
application.

9.1. Obtaining JCR Repositories

Once you've obtained a reference to a JcrEngine as described in
the previous chapter, obtaining a repository is as easy as calling
the getRepository(String) method with the name of the repository that you just configured.

At this point, your application can proceed by working with the JCR API.

9.2. Creating JCR Sessions

Once you have obtained a reference to the JCR Repository, you can create a JCR session using one of its
login(...) methods. The JSR-170 specification provides four login methods.

The first method allows the implementation to choose its own security context to create a session in the default workspace
for the repository. The JBoss DNA JCR implementation uses the security context from the current AccessControlContext. This implies
that this method will throw a LoginException if it is not executed as a PrivilegedAction. Here is one example of how this might
work:

It is also possible to supply the Credentials directly as part of the login process, although JBoss DNA imposes
some requirements on what types of Credentials may be supplied. The simplest way is to provide a SimpleCredentials object.
These credentials will be validated against the JAAS realm named "dna-jcr" unless another realm name is provided as an option
during the JCR repository configuration. For example:

If a LoginContext is available for the user, that can be used as part of the credentials to authenticate the user with
JBoss DNA instead. This snippet uses an anonymous class to provide the login context, but any class with a LoginContext getLoginContext()
method can be used as well.

Servlet-based applications may wish to reuse the authentication information from HttpServletRequest instead. Please note that
the example below assumes that the servlet has a security constraint that prevents unauthenticated access.

Once the Session is obtained, the repository content can be accessed and modified like any other JCR repository. No roles are required to connect
to any workspace at this time. Restrictions on workspace connections will likely be added to JBoss DNA in the near future. The roles from the JAAS
information or the HttpServletRequest are used to control read and write access to the repository. Please see the JCR Security section
for more details on how access is controlled.

9.3. JCR Specification Support

The JBoss DNA JCR implementation will not be JCR-compliant prior to the 1.0 release. Additionally, the JCR
specification allows some latitude to implementors for some implementation details. The sections below
clarify JBoss DNA's current and planned behavior.

9.3.1. L1 and L2 Features

JBoss DNA currently supports most of the Level 1 and Level 2 feature set defined by the JSR-170 specification.
Queries, which are part of Level 1, are not implemented. Some of the L2 features such as workspace cloning and updating, corresponding nodes,
and referential integrity for REFERENCE properties are also not yet implemented. As the current implementation does provide many
of the features that may be needed by an application, we really hope that this release will allow you to give us some feedback on what we have so far.

9.3.2. Optional Features

JBoss DNA does not currently support any of the optional JCR features. Currently, the observation optional feature is planned to be complete prior
to the 1.0 release. The locking optional feature may be implemented in this timeframe as well.

Note

The JCR-SQL optional feature is not planned to be implemented as it has been dropped from the JSR-283 specification.

9.3.3. JCR Security

Although the JSR-170 specification requires implementation of the Session.checkPermission(String, String) method,
it allows implementors to choose the granularity of their access controls. JBoss DNA supports coarse-grained, role-based access control at the repository
and workspace level.

JBoss DNA currently defines two permissions: READONLY and READWRITE. If the Credentials passed into Session.login(...)
(or the Subject from the AccessControlContext, if one of the no-credential login methods were used) has either role, the session will have
the corresponding access to all workspaces within the repository. That is, having the READONLY role implies that Session.checkPermission(path, "read")
will not throw an AccessDeniedException for any value of path in any workspace in the repository. Similarly, having the READWRITE
role implies that Session.checkPermission(path, actions) will not throw an AccessDeniedException for any values of path and
actions.

Note

In this release, JBoss DNA does not properly check for actions or even check that the actions parameter passed into
Session.checkPermission(...) is even valid. This will be corrected prior to the 1.0 release.

It is also possible to grant access only to one or more named workspaces. For a workspace named "staging", this can be done by assigning a role named
READONLY.staging. Appending "." + workspaceName to the READWRITE role works as well.

As a final note, the JBoss DNA JCR implementation will likely have additional security roles added prior to the 1.0 release. A CONNECT role
is already being used by the DNA REST Server to control whether users have access to the repository through that means.

9.3.4. Built-In Node Types

JBoss DNA supports all of the built-in node types described in the JSR-170 specification. However, several of these node types
(mix:lockable, mix:versionable, nt:version, nt:versionLabels, nt:versionHistory, and nt:frozenNode) are semantically meaningless
as JBoss DNA does not yet support the locking or versioning optional features.

Although JBoss DNA does define some custom node types in the dna namespace, none of these
node types are intended to be used by developers integrating with JBoss DNA and may be changed or removed
at any time.

9.3.5. Custom Node Type Registration

Although the JSR-170 specification does not require support for registration of custom types, JBoss DNA supports this extremely
useful feature. Custom node types can be added at startup, as noted above or at runtime through a DNA-specific interface. JBoss DNA supports defining node
types either through a JSR-283-like template approach or through the use of Compact Node Definition (CND) files.
Both type registration mechanisms are supported equally within JBoss DNA, although the CND approach for defining node types is recommended.

Note

JBoss DNA also supports defining custom node types to load at startup. This is discussed in more detail
in the next chapter.

Although the JSR-283 specification is not yet final, it does provide a useful means of programatically defining JCR node types. JBoss DNA supports a comparable
node type definition API that implements the functionality from the specification, albeit with classes in an org.jboss.dna.jcr package. The intent
is to deprecate these classes and replace their usage with the JSR-283 equivalents after JBoss DNA fully supports in the JSR-283 specification in a future release.
Node types can be defined like so:

Residual properties and child node definitions can also be defined simply by not calling setName on
the template.

Custom node types can be defined more succinctly through the Compact Node Definition file format. In fact, this is how JBoss
DNA defines its built-in node types. An example CND file that declares the same node type as above would be:

Note

JBoss DNA does not yet support a simple means of unregistering types at this time, so be careful before registering types outside of a
sandboxed environment.

9.4. Summary

In this chapter, we covered how to use JCR with JBoss DNA and learned about how it implements the JCR specification.
Now that you know how JBoss DNA repositories work and how to use JCR to work with DNA repositories, we'll move on in
the next chapter to show how you can use the RESTful web service to
provide access to the content in a JCR repository to clients.

Chapter 10. The JBoss DNA RESTful Web Service

JBoss DNA now provides a RESTful interface to its JCR implementation that allows HTTP-based
access and updating of content. Although the initial version of this REST server only supports the JBoss DNA
JCR implementation, it has been designed to make integration with other JCR implementors easy. This
chapter describes how to configure and deploy the REST server.

10.1. Supported Resources and Methods

The REST Server currently supports the URIs and HTTP methods described below. The URI patterns assume
that the REST server is deployed at its conventional location of "/resources". These URI patterns would
change if the REST server were deployed under a different web context and URI patterns below would
change accordingly. Currently, only JSON-encoded responses are provided.

Table 10.1. Supported URIs for the JBoss DNA REST Server

URI Pattern

HTTP Method(s)

HTTP Description

/resources

Returns a list of accessible repositories

GET

/resources/{repositoryName}

Returns a list of accessible workspaces within that repository

GET

/resources/{repositoryName}/{workspaceName}

Returns a list of available operations within the workspace

GET

/resources/{repositoryName}/{workspaceName}/item/{path}

Accesses the item (node or property) at the path

ALL

Note that this approach supports dynamic discovery of the available repositories on the server. A typical
conversation might start with a request to the server to check the available repositories.

GET http://www.example.com/resources

This request would generate a response that mapped the names of the available repositories to metadata information
about the repositories like so:

The actual response wouldn't be pretty-printed like the example, but the format would be the same. The name
of the repository ("dna:repository" URL-encoded) is mapped to a repository object that contains a name
(the redundant "dna:repository") and a list of available resources within the repository and their respective
URIs. Note that JBoss DNA supports deploying multiple JCR repositories side-by-side on the same server,
so this response could easily contain multiple repositories in a real deployment.

The only thing that you can do with a repository through the REST interface at this time is to
get a list of its workspaces. A request to do so can be built up from the previous response like this:

GET http://www.example.com/resources/dna%3arepository

This request (and all of the following requests) actually create a JCR Session to service the request and
require that security be configured. This process is described in more detail in
a later section. Assuming that security has been properly
configured, the response would look something like this:

Like the first response, this response consists of a list of workspace names mapped to metadata about the
workspaces. The example above only lists one workspace for simplicity, but there could be many different
workspaces returned in a real deployment. Note that the "items" resource builds the full URI to the root
of the items hierarchy, including the encoding of the repository name and the workspace name.

Now a request can be built to retrieve the root item of the repository.

GET http://www.example.com/resources/dna%3arepository/default/items

Any other item in the repository could be accessed by appending its path to the URI above. In a default
repository with no content, this would return the following response:

The response contains a mapping of property names to their values and an array of child names. Had one of
the properties been multi-valued, the values for that property would have been provided as an array as well,
as will shortly be shown.

The items resource also contains an option query parameter: dna:depth. This parameter, which defaults
to 1, controls how deep the hierarchy of returned nodes should be. Had the request had the parameter:

GET http://www.example.com/resources/dna%3arepository/default/items?dna:depth=2

Then the response would have contained details for the children of the root node as well.

Adding content simply requires a POST to the name of the relative root node of the
content that you wish to add and a request body in the same format as the response from a GET. Adding multiple
nodes at once is supported, as shown below.

Note that protected properties like jcr:uuid are not provided but that the primary type and mixin types are
provided as properties. The REST server will translate these into the appropriate calls behind the
scenes. The response from the request will be empty by convention.

The PUT method allows for updates of nodes and properties. If the URI points to a property, the body of the
request should be the new JSON-encoded value for the property.

Setting multiple properties at once can be performed by providing a URI to a node instead of a property. The
body of the request should then be a JSON object that maps property names to their new values.

Note

The PUT method doesn't currently support adding or removing mixin types. This will be corrected in the future.
A JIRA issue has been created to help
track this issue.

10.2. Configuring the DNA REST Server

The DNA REST server is deployed as a WAR and configured mostly through its web configuration file (web.xml).
Here is an example web configuration that is used for integration testing of the DNA REST server along with
an explanation of its parts.

<!--
This parameter provides the fully-qualified name of a class that implements
the o.j.d.web.jcr.rest.spi.RepositoryProvider interface. It is required
by the DnaJcrDeployer that controls the lifecycle for the DNA REST server.
-->
<context-param>
<param-name>org.jboss.dna.web.jcr.rest.REPOSITORY_PROVIDER</param-name>
<param-value>org.jboss.dna.web.jcr.rest.spi.DnaJcrRepositoryProvider</param-value>
</context-param>

As noted above, this parameter informs the DnaJcrDeployer of the specific repository provider in use.
Unless you are using the JBoss DNA REST server to connect to a different JCR implementation, this should
never change.

<!--
This parameter, specific to the DnaJcrRepositoryProvider implementation, specifies
the name of the configuration file to initialize the repository or repositories.
This configuration file must be on the classpath and is given as a classpath-relative
directory.
-->
<context-param>
<param-name>org.jboss.dna.web.jcr.rest.CONFIG_FILE</param-name>
<param-value>/configRepository.xml</param-value>
</context-param>

If you are not familiar with the file format for a JcrEngine configuration file, you can build one
programatically with the JcrConfiguration class and call save(...) instead of build()
to output the configuration file that equates to the configuration.

This is followed by a bit of RESTEasy and JAX-RS boilerplate.

<!--
This parameter defines the JAX-RS application class, which is really just a metadata class
that lets the JAX-RS engine (RESTEasy in this case) know which classes implement pieces
of the JAX-RS specification like exception handling and resource serving.
This should not be modified.
-->
<context-param>
<param-name>javax.ws.rs.Application</param-name>
<param-value>org.jboss.dna.web.jcr.rest.JcrApplication</param-value>
</context-param>
<!-- Required parameter for RESTEasy - should not be modified -->
<listener>
<listener-class>org.jboss.resteasy.plugins.server.servlet.ResteasyBootstrap</listener-class>
</listener>
<!-- Required parameter for JBoss DNA REST - should not be modified -->
<listener>
<listener-class>org.jboss.dna.web.jcr.rest.DnaJcrDeployer</listener-class>
</listener>
<!-- Required parameter for RESTEasy - should not be modified -->
<servlet>
<servlet-name>Resteasy</servlet-name>
<servlet-class>org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher</servlet-class>
</servlet>
<!-- Required parameter for JBoss DNA REST - should not be modified -->
<servlet-mapping>
<servlet-name>Resteasy</servlet-name>
<url-pattern>/*</url-pattern>
</servlet-mapping>

In general, this part of the web configuration file should not be modified.

Finally, security must be configured for the REST server.

<!--
The JBoss DNA REST implementation leverages the HTTP credentials to for authentication and authorization
within the JCR repository. It makes no sense to try to log into the JCR repository without credentials,
so this constraint helps lock down the repository.
This should generally not be modified.
-->
<security-constraint>
<display-name>DNA REST</display-name>
<web-resource-collection>
<web-resource-name>RestEasy</web-resource-name>
<url-pattern>/*</url-pattern>
</web-resource-collection>
<auth-constraint>
<!--
A user must be assigned this role to connect to any JCR repository, in addition to needing the READONLY
or READWRITE roles to actually read or modify the data. This is not used internally, so another
role could be substituted here.
-->
<role-name>connect</role-name>
</auth-constraint>
</security-constraint>
<!--
Any auth-method will work for JBoss DNA. BASIC is used this example for simplicity.
-->
<login-config>
<auth-method>BASIC</auth-method>
</login-config>
<!--
This must match the role-name in the auth-constraint above.
-->
<security-role>
<role-name>connect</role-name>
</security-role>
</web-app>

As noted above, the REST server will not function properly unless security is configured. All authorization
methods supported by the Servlet specification are supported by JBoss DNA and can be used interchangeable, as
long as authenticated users have the connect role listed above.

10.3. Deploying the DNA REST Server

Deploying the DNA REST server only requires three steps:
preparing the web configuration, configuring the users and their roles in your web container
(outside the scope of this document), and assembling the WAR. This section describes the requirements
for assembling the WAR.

If you are using Maven to build your projects, the WAR can be built from a POM. Here is a portion of the
POM used to build the JBoss DNA REST Server integration subproject.

If you are using sequencers or any connectors other than the in-memory or federated connector, you will also have
to add the JARs for those dependencies into the WEB-INF/lib directory as well. You will also have to
change the version numbers on the JARs to reflect the current version of JBoss DNA.

This WAR can be deployed into your servlet container.

10.4. Repository Providers

The JBoss DNA REST server can also be used as an interface to to other JCR repositories by creating
an implementation of the RepositoryProvider interface that connects to the other repository.

The RepositoryProvider only has a few methods that must be implemented. When the DnaJcrDeployer starts
up, it will dynamically load the RepositoryProvider implementation (as noted above) and call the
startup(ServletContext) method on the provider. The provider can use this method to load any
required configuration parameters from the web configuration (web.xml) and initialize the repository.

As an example, here's the DNA JCR provider implementation of this method with exception handling omitted for brevity.

As you can see, the name of configuration file for the JcrEngine is read from the servlet context and used
to initialize the engine.
Once the repository has been started, it is now ready to accept the main methods that provide the interface
to the repository.

The first method returns the set of repository names supported by this REST server.

The JBoss DNA JCR repository does support multiple repositories on the same server. Other JCR implementations
that don't support multiple repositories are free to return a singleton set containing any string from this method.

The other required method returns an open JCR Session for the user from the current request in a given repository
and workspace. The provider can use the HttpServletRequest to get the authentication credentials for the
HTTP user.

Chapter 11. In-Memory Connector

The in-memory repository connector is a simple connector that creates a transient, in-memory repository.
This repository is used as a very simple in-memory cache or as a standalone transient repository.
This connector works well for a readable and writable repository source with small to moderate sized
content that need not be permanently saved.

Optional property that, if used, specifies the name in JNDI where an InMemoryRepository instance can be found.
This is an advanced property that is infrequently used.

rootNodeUuid

Optional property that, if used, defines the UUID of the root node in the in-memory repository. If not used,
then a new UUID is generated.

retryLimit

Optional property that, if used, defines the number of times that any single operation on a RepositoryConnection to this source should be retried
following a communication failure. The default value is '0'.

defaultCachePolicy

Optional property that, if used, defines the default for how long this information provided by this source may to be
cached by other, higher-level components. The default value of null implies that this source does not define a specific
duration for caching information provided by this repository source.

defaultWorkspaceName

Optional property that is initialized to an empty string and which defines the name for the workspace that will be used by default
if none is specified.

Chapter 12. File System Connector

This connector exposes an area of the local file system as a read-only graph of "nt:file" and "nt:folder" nodes.
The connector considers a workspace name to be the path to the directory on the file system that represents the root of that
workspace. Each connector can define whether it allows new workspaces can be created, but if so the names of the new workspaces
must represent valid paths to existing directories.

The FileSystemSource class provides a number of JavaBean properties that control its behavior:

Optional property that, if used, specifies the file system path to the existing directory that should be used for the
default workspace. If null (or not specified), the source will use the current working directory of this virtual machine
(as defined by new File(".").getAbsolutePath().

predefinedWorkspaceNames

Optional property that, if used, defines names of the workspaces that are predefined and need not be created before being used.
This can be coupled with a "fase" value for the "creatingWorkspaceAllowed" property to allow only the use of only predefined workspaces.

Optional property that, if used, defines the number of times that any single operation on a RepositoryConnection to this source should be retried
following a communication failure. The default value is '0'.

cacheTimeToLiveInMilliseconds

Optional property that, if used, defines the maximum time in milliseconds that any information returned by this connector
is allowed to be cached before being considered invalid. When not used, this source will not define a specific
duration for caching information.

Chapter 13. JDBC Storage (JPA) Connector

This connector stores a graph of any structure or size in a relational database, using a JPA provider on top of a JDBC driver.
Currently this connector relies upon some Hibernate-specific capabilities. The schema of the database is dictated by this
connector and is optimized for storing a graph structure.
(In other words, this connector does not expose as a graph the data in an existing database with an arbitrary schema.)

The JpaSource class provides a number of JavaBean properties that control its behavior:

Determines whether the content in the database is can be updated ("true"), or if the content may only be read ("false").
The default value is "true".

rootNodeUuid

Optional property that, if used, defines the UUID of the root node in the in-memory repository. If not used,
then a new UUID is generated.

nameOfDefaultWorkspace

Optional property that is initialized to an empty string and which defines the name for the workspace that will be used by default
if none is specified.

predefinedWorkspaceNames

Optional property that, if used, defines names of the workspaces that are predefined and need not be created before being used.
This can be coupled with a "fase" value for the "creatingWorkspaceAllowed" property to allow only the use of only predefined workspaces.

Required property that defines the dialect of the database. This must match one of the Hibernate dialect names, and must correspond to the type of driver being used.

dataSourceJndiName

The JNDI name of the JDBC DataSource instance that should be used. If not specified, the other driver properties must be set.

driverClassName

The name of the JDBC driver class.
This is not required if the DataSource is found in JNDI, but is required otherwise.

driverClassloaderName

The name of the class loader or classpath that should be used to load the JDBC driver class.
This is not required if the DataSource is found in JNDI.

url

The URL that should be used when creating JDBC connections using the JDBC driver class.
This is not required if the DataSource is found in JNDI.

username

The username that should be used when creating JDBC connections using the JDBC driver class.
This is not required if the DataSource is found in JNDI.

password

The password that should be used when creating JDBC connections using the JDBC driver class.
This is not required if the DataSource is found in JNDI.

maximumConnectionsInPool

The maximum number of connections that may be in the connection pool.
The default is "5".

minimumConnectionsInPool

The minimum number of connections that will be kept in the connection pool.
The default is "0".

maximumConnectionIdleTimeInSeconds

The maximum number of seconds that a connection should remain in the pool before being closed.
The default is "600" seconds (or 10 minutes).

maximumSizeOfStatementCache

The maximum number of statements that should be cached.
Statement caching can be disabled by setting to "0".
The default is "100".

numberOfConnectionsToAcquireAsNeeded

The number of connections that should be added to the pool when there are not enough to be used.
The default is "1".

idleTimeInSecondsBeforeTestingConnections

The number of seconds after a connection remains in the pool that the connection should be tested to ensure it is still valid.
The default is 180 seconds (or 3 minutes).

referentialIntegrityEnforced

An advanced boolean property that dictates whether the database's referential integrity should be enabled, or false if this checking
is not to be used. While referential integrity does help to ensure the consistency of the records, it does add work to update
operations and can impact performance.
The default value is "true".

largeValueSizeInBytes

An advanced boolean property that controls the size of property values at which they are considered to be "large values".
Depending upon the model, large property values may be stored in a centralized area and keyed by a secure hash
of the value. This is an space and performance optimization that stores each unique large value only once.
The default value is "1024" bytes, or 1 kilobyte.

compressData

An advanced boolean property that dictates whether large binary and string values should be stored in a compressed form.
This is enabled by default. Setting this value only affects how new records are stored; records can always be read
regardless of the value of this setting.
The default value is "true".

model

An advanced property that dictates the type of storage schema that is used. Currently, the only supported value is "Basic",
which is also the default.

retryLimit

Optional property that, if used, defines the number of times that any single operation on a RepositoryConnection to this source should be retried
following a communication failure.
The default value is '0'.

cacheTimeToLiveInMilliseconds

Optional property that, if used, defines the maximum time in milliseconds that any information returned by this connector
is allowed to be cached before being considered invalid. When not used, this source will not define a specific
duration for caching information.
The default value is "600000" milliseconds, or 10 minutes.

Using the file system connector is used by creating in the JcrConfiguration a repository source that uses the JpaSource class.
For example:

Of course, setting other more advanced properties would entail calling setProperty(...) for each. Since almost all
of the properties have acceptable default values, however, we don't need to set very many of them.

13.1. Basic Model

This database schema model stores node properties as opaque records and children as transparent records.
Large property values are stored separately.

The set of tables used in this model includes:

Namespaces - the set of namespace URIs used in paths, property names, and property values.

Properties - the properties for each node, stored in a serialized (and optionally compressed) form.

Large values - property values larger than a certain size will be broken out into this table, where they are tracked by
their SHA-1 has and shared by all properties that have that same value. The values are stored in a binary (and optionally
compressed) form.

Children - the children for each node, where each child is represented by a separate record. This approach makes it
possible to efficiently work with nodes containing large numbers of children, where adding and removing child nodes is largely
independent of the number of children. Also, working with properties is also completely independent of the number of child
nodes.

ReferenceChanges - the references from one node to another

Subgraph - a working area for efficiently computing the space of a subgraph; see below

Options - the parameters for this store's configuration (common to all models)

This database model contains two tables that are used in an efficient mechanism to find all of the nodes in the subgraph below
a certain node. This process starts by creating a record for the subgraph query, and then proceeds by executing a join to find
all the children of the top-level node, and inserting them into the database (in a working area associated with the subgraph
query). Then, another join finds all the children of those children and inserts them into the same working area. This continues
until the maximum depth has been reached, or until there are no more children (whichever comes first). All of the nodes in the
subgraph are then represented by records in the working area, and can be used to quickly and efficient work with the subgraph
nodes. When finished, the mechanism deletes the records in the working area associated with the subgraph query.

This subgraph query mechanism is extremely efficient, performing one join/insert statement <i>per level of the subgraph</i>,
and is completely independent of the number of nodes in the subgraph. For example, consider a subgraph of node A, where A has
10 children, and each child contains 10 children, and each grandchild contains 10 children. This subgraph has a total of 1111
nodes (1 root + 10 children + 10*10 grandchildren + 10*10*10 great-grandchildren). Finding the nodes in this subgraph would
normally require 1 query per node (in other words, 1111 queries). But with this subgraph query mechanism, all of the nodes in
the subgraph can be found with 1 insert plus 4 additional join/inserts.

This mechanism has the added benefit that the set of nodes in the subgraph are kept in a working area in the database, meaning
they don't have to be pulled into memory.

Subgraph queries are used to efficiently process a number of different requests, including ,
DeleteBranchRequest, , and CopyBranchRequest. Processing each of these kinds of
requests requires knowledge of the subgraph, and in fact all but the need to know the complete
subgraph.

Chapter 14. Federation Connector

The federated repository source provides a unified repository consisting of information that is dynamically federated from multiple other
RepositorySource instances. This is a very powerful repository source that appears to be a single repository, when in
fact the content is stored and managed in multiple other systems. Each FederatedRepositorySource is typically configured
with the name of another RepositorySource that should be used as the local, unified cache of the federated content.
The FederatedRepositorySource then looks in the configuration repository to determine the various workspaces
and how other sources are projected into each workspace.

14.1. Projections

Each federated repository source provides a unified repository consisting of information that is dynamically federated
from multiple other RepositorySource instances. The connector is configured with a number of projections
that each describe where in the unified repository the federated connector should place the content from another source.
Projections consist of the name of the source containing the content and a number of rules that
define the path mappings, where each rule is defined as a string with this format:

pathInFederatedRepository => pathInSourceRepository

Here, the pathInFederatedRepository is the string representation of the path in the unified
(or federated) repository, and pathInSourceRepository is the string representation of the path of the
actual content in the underlying source. For example:

/ => /

is a trivial rule that states that all of the content in the underlying source should be mapped into the unified
repository such that the locations are the same. Therefore, a node at /a/b/c in the source would
appear in the unified repository at /a/b/c. This is called a mirror projection,
since the unified repository mirrors the underlying source repository.

Another example is an offset projection, which is similar to the mirror projection except that
the federated path includes an offset not found in the source:

/alpha/beta => /

Here, a node at /a/b/c in the source would actually appear in the unified repository at
/alpha/beta/a/b/c. The offset path (/alpha/beta in this example) can have 1 or more segments.
(If there are no segments, then it reduces to a mirror projection.)

Often a rule will map a path in one source into another path in the unified source:

/alpha/beta => /foo/bar

Here, the content at /foo/bar is projected in the unified repository under /alpha/beta,
meaning that the /foo/bar prefix never even appears in the unified repository. So the node at
/foo/bar/baz/raz would appear in the unified repository at /alpha/beta/baz/raz. Again,
the size of the two paths in the rule don't matter.

14.2. Multiple Projections

Federated repositories that use a single projection are useful, but they aren't as interesting or powerful as
those that use multiple projections. Consider a federated repository that is defined by two projections:

Note how the /foo/bum branch does not even appear in the unified repository, since it is outside of the
branch being projected. Also, the /alpha node doesn't exist in S1 or S2; it's what is called a
placeholder node that exists purely so that the nodes below it have a place to exist.
Placeholders are somewhat special: they allow any structure below them (including other placeholder nodes or real
projected nodes), but they cannot be modified.

Even more interesting are cases that involve more projections. Consider a federated repository that contains
information about different kinds of automobiles, aircraft, and spacecraft, except that the information
about each kind of vehicle exists in a different source (and possibly a different kind of source, such as
a database, or file, or web service).

First, the sources. The "Cars" source contains the following structure:

14.3. Processing flow

This connctor executes against the federated repository by
projecting them into requests against the underlying sources that are being federated.

One important design of the connector framework is that requests can be submitted in a batch, which may be processed more efficiently
than if each request was submitted one at a time.
This connector design accomplishes this by projecting the incoming requests into requests against each source, then
submitting the batch of projected requests to each source, and then transforming the results of the projected requests back
into original requests.

This is accomplished using a three-step process:

Process the incoming requests and for each generate the appropriate request(s) against the sources
(dictated by the workspace's projections). These
"projected requests" are then enqueued for each source.

Submit each batch of projected requests to the appropriate source, in parallel where possible.
Note that the requests are still ordered correctly for each source.

Accumulate the results for the incoming requests by post-processing the projected requests and
transforming the source-specific results back into the federated workspace (again, using the workspace's projections).

This process is a form of the fork-join divide-and-conquer algorithm, which involves splitting a problem into smaller
parts, forking new subtasks to execute each smaller part, joining on the subtasks (waiting until all have finished), and then
composing the results. Technically, Step 2 performs the fork and join operations, but this class uses RequestProcessor
implementations to do Step 1 and 3 (called ForkRequestProcessor and JoinRequestProcessor, respectively).

Such fork-join style techniques are well-suited to parallel processing. This connector uses an ExecutorService
to allow these different processors to operate concurrently. This can greatly improve the performance as perceived
by the clients, since indeed much of the operations on the different sources are occurring at the same time.

It is also possible that not every incoming Request get projected to all sources. Indeed, many operations can
effectively be mapped to a single projection. In such cases, the overhead of the federated
connector is quite minimal.

Note

Requests that include the Path within the request's Location can be very quickly mapped to the correct projection,
and thus such federated requests can be processed with very little overhead. However, when requests contain Locations
that only contain identification properties (e.g., UUIDs), the connector may not be able to determine the correct
projection(s), and may have to simply forward the request to all of the projections. This is obviously less desirable,
so when possible ensure that the Request objects include the Path.

14.4. Update operations

The federated connector behavior for read-only requests is fairly obvious. In the best case, the connector determines the
appropriate projections, forwards the request into the appropriate sources, and then combines the results.
But what happens with change requests?

Currently, the federated connector requires that each ChangeRequest be mapped to one and only one projection.
However, when a single projection cannot be determined for a ChangeRequest, the connector throws an error.

This is thought to be a minimal problem that will not actually be an issue in most uses of the federated connector.
If you find that your usage does indeed fall into this category,
please let us know via the mailing lists or log
an enhancement request in JIRA. Be sure to include as much detail as possible about the scenario,
the problem condition, and the desired behavior.

14.5. Configuration

The federated repository uses other RepositorySources that are to be federated and a RepositorySource that is to be used as the
cache of the unified contents. These are configured in another RepositorySource that is treated as a configuration repository,
which should contain information about the workspaces and how other sources are projected:

Note

We're using XML to represent a graph structure, since the two map pretty well. Each XML element represents
a node and XML attributes represent properties on a node. The name of the node is defined by either the
jcr:name attribute (if it exists) or the name of the XML element. And we use XML namespaces
to define the namespaces used in the node and property names. BTW, this is exactly how the XML graph importer
works.

14.6. Repository Source properties

While the majority of the configuration is defined using the configuration source (as discussed above), the FederatedRepositorySource
class have have a few JavaBean properties:

Optional property that, if used, defines the number of times that any single operation on a RepositoryConnection to this source should be retried
following a communication failure. The default value is '0'.

Chapter 15. Subversion Connector

This connector provides read-only access to the directories and folders within a Subversion repository, providing that content in
the form of nt:file and nt:folder nodes.
This source considers a workspace name to be the path to the directory on the repository's root directory location
that represents the root of that workspace (e.g., "trunk" or "branches").
New workspaces can be created, as long as the names represent valid existing directories within the SVN repository.

The SVNRepositorySource class provides a number of JavaBean properties that control its behavior:

Required property that should be set with the URL to the Subversion repository.

username

The username that should be used to establish a connection to the repository.

password

The password that should be used to establish a connection to the repository. This is not required if the URL represents an
anonymous SVN repository address.

directoryForDefaultWorkspace

Optional property that, if used, specifies the relative path of the directory in the repository that should be
exposed as the default workspace.

predefinedWorkspaceNames

Optional property that, if used, defines names of the workspaces that are predefined and need not be created before being used.
This can be coupled with a "fase" value for the "creatingWorkspaceAllowed" property to allow only the use of only predefined workspaces.

Optional property that, if used, defines the number of times that any single operation on a RepositoryConnection to this source should be retried
following a communication failure. The default value is '0'.

cacheTimeToLiveInMilliseconds

Optional property that, if used, defines the maximum time in milliseconds that any information returned by this connector
is allowed to be cached before being considered invalid. When not used, this source will not define a specific
duration for caching information.

Chapter 16. JBoss Cache Connector

The JBoss Cache repository connector allows a JBoss Cache instance to be
used as a JBoss DNA (and thus JCR) repository. This provides a repository that is an effective, scalable, and distributed cache,
and is often paired with other repository sources to provide a local or federated
repository.

The JBossCacheSource class provides a number of JavaBean properties that control its behavior:

Optional property that, if used, specifies the name in JNDI where an existing JBoss Cache Factory instance can be found.
That factory would then be used if needed to create a JBoss Cache instance. If no value is provided, then the
JBoss Cache DefaultCacheFactory class is used.

cacheConfigurationName

Optional property that, if used, specifies the name of the configuration that is supplied to the cache factory
when creating a new JBoss Cache instance.

cacheJndiName

Optional property that, if used, specifies the name in JNDI where an existing JBoss Cache instance can be found.
This should be used if your application already has a cache that is used, or if you need to configure the cache in
a special way.

uuidPropertyName

Optional property that, if used, defines the property that should be used to find the UUID value for each node
in the cache. "dna:uuid" is the default.

retryLimit

Optional property that, if used, defines the number of times that any single operation on a RepositoryConnection to this source should be retried
following a communication failure. The default value is '0'.

defaultCachePolicy

Optional property that, if used, defines the default for how long this information provided by this source may to be
cached by other, higher-level components. The default value of null implies that this source does not define a specific
duration for caching information provided by this repository source.

nameOfDefaultWorkspace

Optional property that is initialized to an empty string and which defines the name for the workspace that will be used by default
if none is specified.

predefinedWorkspaceNames

Optional property that defines the names of the workspaces that exist and that are available for use without having to create them.

creatingWorkspacesAllowed

Optional property that is by default 'true' that defines whether clients can create new workspaces.

Chapter 17. JDBC Metadata Connector

This connector is a prototype that provides read-only access to the database schema (metadata) from relational databases through a JDBC
connection.
This is still under development.

Part V. Sequencer Library

The JBoss DNA project provides a number of sequencers out-of-the-box.
These are ready to be used by simply including them in the classpath and configuring
them appropriately.

Chapter 18. Compact Node Type (CND) Sequencer

This sequencer processes JCR Compact Node Definition (CND) files
to extract the node definitions with their property definitions, and inserts these into the repository using JCR built-in types.
The node structure generated by this sequencer is equivalent to the node structure used in /jcr:system/jcr:nodeTypes.

Chapter 20. ZIP File Sequencer

The ZIP file sequencer is included in JBoss DNA and extracts the files and folders contained in the ZIP archive file,
extracting the files and folders into the repository using JCR's nt:file and nt:folder
built-in node types. The structure of the output thus matches the logical structure of the contents of the ZIP file.

To use this sequencer, simply include the dna-sequencer-zip JAR
in your application and configure the JcrConfiguration to use this sequencer using something similar to:

Chapter 21. Microsoft Office® Document Sequencer

This sequencer is included in JBoss DNA and processes Microsoft Office documents, including Word documents, Excel spreadsheets,
and PowerPoint presentations. With documents, the sequencer attempts to infer the internal structure from the heading styles.
With presentations, the sequencer extracts the slides, titles, text and slide thumbnails.
With spreadsheets, the sequencer extracts the names of the sheets. And, the sequencer extracts for all the files the
general file information, including the name of the author, title, keywords, subject, comments, and various dates.

To use this sequencer, simply include the dna-sequencer-msoffice JAR and all of the
POI JARs
in your application and configure the JcrConfiguration to use this sequencer using something similar to:

Chapter 22. Java Source File Sequencer

One of the sequencers that included in JBoss DNA is the dna-sequencer-java subproject.
This sequencer parses Java source code added to the repository and extracts the basic structure of the classes and enumerations
defined in the code.
This structure includes: the package structures, class declarations, class and member attribute declarations,
class and member method declarations with signature (but not implementation logic), enumerations with each enumeration literal value,
annotations, and JavaDoc information for all of the above.
After extracting this information from the source code, the sequencer then writes this structure into the repository,
where it can be further processed, analyzed, searched, navigated, or referenced.

To use this sequencer, simply include the dna-sequencer-java JAR (plus all of the JARs that it is dependent upon)
in your application and configure the JcrConfiguration to use this sequencer using something similar to:

Chapter 23. Image Sequencer

The ImageMetadataSequencer sequencer extracts metadata from JPEG, GIF, BMP, PCX, PNG, IFF, RAS, PBM, PGM, PPM and PSD image files.
This sequencer extracts the file format, image resolution, number of bits per pixel and optionally number of images, comments
and physical resolution, and then writes this information into the repository using the following structure:

image:metadata node of type image:metadata

jcr:mimeType - optional string property for the mime type of the image

image:numberOfImages - optional integer property for the number of images stored in the file; defaults
to 1

image:physicalWidthDpi - optional integer property for the physical width of the image in dots per inch

image:physicalHeightDpi - optional integer property for the physical height of the image in dots per
inch

image:physicalWidthInches - optional double property for the physical width of the image in inches

image:physicalHeightInches - optional double property for the physical height of the image in inches

This structure could be extended in the future to add EXIF and IPTC metadata as child nodes. For example, EXIF metadata is
structured as tags in directories, where the directories form something like namespaces, and which are used by different camera
vendors to store custom metadata. This structure could be mapped with each directory (e.g. "EXIF" or "Nikon Makernote" or
"IPTC") as the name of a child node, with the EXIF tags values stored as either properties or child nodes.

To use this sequencer, simply include the dna-sequencer-images JAR
in your application and configure the JcrConfiguration to use this sequencer using something similar to:

Chapter 24. MP3 Sequencer

Another sequencer that is included in JBoss DNA is the dna-sequencer-mp3 sequencer project.
This sequencer processes MP3 audio files added to a repository and extracts the ID3
metadata for the file, including the track's title, author, album name, year, and comment.
After extracting this information from the audio files, the sequencer then writes this structure into the repository,
where it can be further processed, analyzed, searched, navigated, or referenced.

To use this sequencer, simply include the dna-sequencer-mp3 JAR and the JAudioTagger
library in your application and configure the JcrConfiguration to use this sequencer using something similar to:

Chapter 25. Aperture MIME type detector

The ApertureMimeTypeDetector class is an implementation of MimeTypeDetector that uses the
Aperture open-source library, which
is a very capable utility for determining the MIME type for a wide range of file types,
using both the file name and the actual content.

To use, simply include the dna-mime-type-detector-aperture.jar file on the classpath
and create a new ExecutionContext subcontext with it:

Deploy the JAR file with your implementation (as well as any dependencies), and make them available to JBoss DNA
in your application.

It's that simple.

The first step is to create the Maven 2 project that you can use to compile your code and build the JARs.
Maven 2 automates a lot of the work, and since you're already set up to use Maven,
using Maven for your project will save you a lot of time and effort. Of course, you don't have to use Maven 2, but then you'll
have to get the required libraries and manage the compiling and building process yourself.

Note

JBoss DNA may provide in the future a Maven archetype for creating detector projects. If you'd find this useful
and would like to help create it, please join the community.

After you've created the project, simply implement the MimeTypeDetector interface. And testing should be
quite straightforward, MIME type detectors don't require any other components. In your tests,
simply instantiate your MimeTypeDetector implementation, supply various combinations of names and/or InputStreams,
and verify the output is what you expect.

Chapter 27. Looking to the future

JBoss DNA adds a lot of new features and capabilities. It introduced an initial RESTful server that makes
JCR repositories accessible over HTTP to clients. The JCR implementation was enhanced to support more features,
including the ability to define and register node types using the Compact Node Definition (CND) format.
A new configuration system was added, making it very easy to configure and manage the JBoss DNA JCR engine.
An observation framework was added to the graph API. The federation connector was rewritten to improve performance
and correct several issues. And quite a few issues were fixed.

What's next for JBoss DNA? Passing all of the JCR API compatibility tests for Level 1 and Level 2,
plus some of the optional features, is the primary focus for the next release. Of course, there are a handful of
improvements we'd like to make under the covers, and a few outstanding issues that we'll address.
Farther out on our roadmap are the development of additional connectors and sequencers,
some Eclipse tooling for publishing artifacts to a repository, and quite a few other interesting features.

We're always looking for suggestions and contributors. If you'd like to get involved on JBoss DNA, the first
step is joining the mailing lists or hopping into our chat room
on IRC (at irc.freenode.net#jbossdna). You can also download the code
and get it building, and start looking for simple issues or bugs in our
JIRA issue management system.

But if nothing else, please contact us and let us know how you're using JBoss DNA and what we can do to make it even better.

And, if you haven't already, check out our Getting Started guide, which has examples that you can build and run to see
JBoss DNA in action.