ItemState management

Note: This documentation has been created based on jackrabbit-core-1.6.1.

The JCR API evolves around Nodes and Properties (which are called Items). Internally, these Items have an ItemState which may or may not be persistent. Modifications to Items through the JCR Session act on the ItemState, and saving the JCR Session persists the touched ItemStates to the persistent store, and makes the changes visible to other JCR Sessions. The management of ItemState instances over various concurrent sessions is an important responsibility of the Jackrabbit core. The following picture shows some of the relevant components in the management of ItemStates.

Core components and their responsibilities:

A Session implementation is the starting point for interaction with a JCR instance. It is obtained by logging in through the Repository instance, which is not shown in the picture above. A Session reflects the clients view on the repository state, and manages transient changes of the client to the repository state which can be persisted by calling Sesssion.save(). A SessionImpl instance has a SessionItemStateManager, a HierarchyManager and an ItemManager.

The SessionItemStateManager implements the UpdatableItemStateManager type (which extends the ItemManager type). Its main responsibilities are to provide access to persistent items, and to manage transient items (i.e., items that have been modified through the session but which have not yet been saved).

A HierarchyManager is responsible for translating paths to internal ItemId instances (a UUID for Nodes and a UUID + property name for Properties) and vice versa. It typically is session local as it must take the transient changes into account.

The SharedItemStateManager manages persistent states and stores states when sessions are saved using a PersistenceManager instance. It also updates the search index, synchronizes with the cluster and triggers the observation mechanism (all on session save).

Miscellaneous concepts:

The ItemData type and subtypes are a thin wrapper around the ItemState type and subtypes. The ItemManager creates new Item instances every time it is asked for an item (e.g., via session.getNode("/A")). These items, however, point to the same NodeData instance which keeps track of whether the item has been modified (via its status field).

The NodeStateMerger type is a helper class whose responsibility it is to merge changes from other Sessions to the current Session. It is used in the SessionItemStateManager and the SharedItemStateManager. In the first case it is used in a callback function (called from the SharedItemStateManager) to merge changes in the persistent state to the transient state of the Session. In the second case it is used just before a ChangeLog is persisted to merge differences between the given ChangeLog and the persistent state.

The locking strategy of the SharedItemStateManager is per workspace configurable through the ISMLocking entry in the repository descriptor.

The next sections show collaboration diagrams for various use cases that read and/or modify content.

Use case 1: simple read

The following is a collaboration diagram of what happens w.r.t. ItemState management in the Jackrabbit core when a Session reads a single existing node just after startup (i.e., Jackrabbit caches are empty).

The client of the API has a Session already and asks it to get the node at path "/A".

then it creates a new NodeImpl instance based on the ItemData and returns that to the Session which gives it to the client.

Remarks:

The ItemManager returns a new Item instance for every read call. Two of such instances that refer to the same Item, however, have the same ItemData reference. This ensures that changes to the state of one instance are immediately visible to the second instance.

The LocalItemStateManager creates a copy of the shared (persisted) ItemState managed by the SharedItemStateManager. The local state has the shared state as overlayed state. Similarly, if an Item is modified in the session, then a new transient state is created and managed by the SessionItemStateManager (see next use case). This transient state has the local state as overlayed state.

Modification of a node creates a transient copy of the local ItemState.

Now suppose that another Session also adds a property to the same node. The following picture shows the relations of the various item state managers and the existing item states in the caches of the unsaved sessions.

The SharedItemStateManager has the shared state of the persistent node. It has two listeners: the LocalItemStateManagers of the existing Sessions. Each LocalItemStateManager has a local copy of the shared state, the local state which has the shared state as overlayed state, and has a SessionItemStateManager as a listener. Both SessionItemSTateManagers have a transient copy of their local state (the overlayed state) which contains the modifications (addition of a property), and also a transient state for the new property.

Use case 3: save

Consider the situation in which we have just executed use case 2. I.e., the client has added a single property to an existing node. The transient space contains two items: the new PropertyState and the modified NodeState of the Node to which the new property has been added. The following shows what happens when the session is saved. Note that only the modified NodeState instance is taken into account because otherwise the picture becomes even more unreadable. The handling of the new PropertyState however is approximately the same.

During the save of a session information flows from the transient state to the shared state. The ItemImpl (and subtypes) copy transient item state information to the local state. The SharedItemStateManager calls push on the local ChangeLog to push the local changes to the shared states (and thus to the shared ChangeLog).

During a save of a session information also flows back from the shared state to the transient and local states in other sessions. This is modeled in the next view.

Consider the situation in which we have just executed use case 2. I.e., the client has added a single property to an existing node. The transient space contains two items: the new PropertyState and the modified NodeState of the Node to which the new property has been added. Now suppose that there is another Session which also has added a property to that existing node. The following shows what happens when the session is saved (focus on callbacks from the SharedItemStateManager to the transient states in the second session).

The API client saves the session.

The update operation on the LocalItemStateManager is called (the local ChangeLog has already been constructed and the local states in that ChangeLog have been disconnected from their shared counterparts).

The states in the local ChangeLog are connected to their shared counter parts and merged (step 4') if they are stale. This is necessary because it could have happened that between step 22 and 23 another session on another thread was saved which changed states in the local ChangeLog (which are disconnected from their shared states). The detection of whether a state is stale is done via modification counter. If the merge fails, a StaleItemStateException is thrown and the save fails.

When the merge succeeds the shared state is added to the shared ChangeLog which

disconnects the shared state (which effectively does nothing as the shared state has no overlayed state).

Staleness detection, merging and synchronization considerations

Since different Sessions can be used concurrently (i.e., from different threads) it is possible that two Sessions modify the same node and save at the same moment. There is essentially a single lock that serializes save calls: the ISMLocking instance in the SharedItemStateManager. This is a Jackrabbit-specific read-write lock which supports write locks for ChangeLog instances and read locks for individual items. The read lock is acquired when the SharedItemStateManager needs to read from its ItemStateReferenceCache (typically when Sessions access items: the LocalItemStateManagers create a local copy of the shared state). The write lock is acquired in two places:

When a API client saves its Session: the write lock is acquired in the update method which is called on the SharedItemStateManager by the LocalItemStateManager of the Session that is saved. It is downgraded to a read lock after the changes have been persisted just before the other sessions are being notified. The read lock is released after that.

When an external update from another cluster node needs to be processed (see next section). The write lock is acquired and release in the externalUpdate method of the SharedItemStateManager. In between a ChangeLog from another cluster node is processed (i.e., applied to the shared states and pushed to the local states).

Thus, two threads can concurrently build their local ChangeLogs, but they need exclusive access to the SharedItemStateManager for processing their ChangeLog. During this processing a saving thread invokes call backs on the ItemStateListener interface (LocalItemStateManager and SessionItemStateManager implement that type) in other open sessions (via the persisted call on the shared ChangeLog in step 11). The intention of this listener structure is that open sessions get immediate access to the saved changes. Of course, there can be collisions which must be detected. Therefore, every ItemState instance has the following fields:

a modCount counter which essentially is a revision number of the underlying shared state (it is updated through the touch method called from the SharedItemStateManager)

a status field which indicates the status of the state (one of UNDEFINED, EXISTING, EXISTING_MODIFIED, EXISTING_REMOVED, NEW, STALE_MODIFIED, STALE_DESTROYED)

The isStale method on the ItemState is used for determining whether an ItemState can be saved to the persistent store. The method of staleness detection depends on whether the state is transient. If it is, then a staleness is determined by checking the status field. If it is not transient, then staleness is determined by the modCount and the modCount of the overlayed state. It is used in three places:

The stateModified call back method on the SessionItemStateManager uses it to determine whether a merge is attempted. If a merge failed, then the status is set to a stale value. In that way, merges are only tried if they have not yet failed for the state. (The ItemImpl.save method uses stale statusses to fail fast.)

The SharedItemStateManager uses the isStale method to try a merge just before saving if a local state has become stale. This can occur because a local ChangeLog has been disconnected from the shared states. If another session then saves, it is possible that states in the local ChangeLog become stale (detected by the modCount).

Concurrency issues:

Lack of synchronization between client-triggered operations such as (re)moving a node and call backs from other sessions that may affect the transient states and disrupt the client-triggered operation (by interleaving).

Clustering (external updates)

In clustered environments potentially every cluster node may write to the JCR. Because the JCR spec requires a certain level of consistency the synchronized access to the local SharedItemStateManager is not enough to make things work. The global idea of the clustering mechanism is the following:

When the SharedItemStateManager saves a ChangeLog to the persistent store, it also writes a cluster revision (a representation of the just persisted ChangeLog) to some place where other cluster nodes can read it. Such a revision has a number, and every cluster node has a local revision number which indicates which revision entries they have processed.

There is a global revision counter which act as a cluster-wide lock. Before a SharedItemStateManager can save a ChangeLog it must acquire this repository-wide lock. This happens at the begin of the Update operation with the call eventChannel.updateCreated(this). The locked global revision then becomes the number of the revision that is about to be written. Other cluster nodes now cannot save because they are blocked on the global revision lock.

Just after the SharedItemStateManager (or actually the ClusterNode type) acquires the global revision lock it also reads revisions of other cluster nodes to get completely up-to-date. This might prevent the current ChangeLog from being saved because other cluster nodes may have saved incompatible changes.

At the end of the _Update) update operation the global lock is released and other cluster nodes can save.

In order to keep up to date with the latest revison, the ClusterNode type periodically reads revisions and pushes these via the SharedItemStateManager to any active Sessions.