JBoss Cache as a POJO Cache

In-memory caching is a crucial feature in today's large-scale enterprise applications, where scalability and high performance are required. An in-memory cache can store either application state information (e.g., a HttpSession in a web application) or database query results (i.e., entity data). Since many enterprise applications run in clustered environments, the cache needs to be replicated across the cluster. Furthermore, if greater reliability is needed, the in-memory cache should also be persisted to the hard disk or database.

Most existing in-memory caching solutions fall into the category of what we call a "plain" cache system, in which the direct object references are stored and cached. Since a plain cache deals with the object references directly, it acts like an elaborate HashMap and thus is very intuitive to use. When an object needs to be replicated or persisted in a plain cache system, the object has to implement the Serializable interface. However, a plain cache also has some known limitations relating to replication or persistency:

The user will have to manage the cache specifically. For instance, when an object is updated, a user will need to execute a corresponding API to update the cache content.

The need for Java object serialization could hamper performance. If the object size is huge, even a single field update would trigger serialization of the whole object and replication across the cluster. Thus, it can be unnecessarily expensive.

Java object serialization cannot preserve the relationship between cached objects. In particular, the cached object cannot be referenced multiple times by other objects (multiple referenced), or have an indirect reference to itself (cyclic). Otherwise, the relationship will be broken upon serialization. For example, Figure 1 illustrates this problem during replication. If we have two Person instances that share the same Address object, upon replication, it will be split into two separate Address instances (instead of one).

Addressing the above problems in plain cache system, there is another new category of cache system: the POJO (plain old Java object) cache. A POJO cache is a system that acts as an "object-oriented" and distributed cache. In this system, once a user attaches the POJO to the cache, the caching aspect (e.g., replication and persistence) should be transparent to the user. A user would simply operate on the POJO without worrying about updating the cache content or maintaining the object relationship. There is no explicit API called to manage the cache. In addition, it has three more characteristics:

There is no need to implement the Serializable interface for POJOs.

Replication (or even persistence) is done on a per-field basis (as opposed to the whole object binary level in plain cache), resulting in a potential boost to performance.

The object relationship and identity are preserved automatically in a distributed replicated environment. This enables transparent usage experience and increases software performance.

A leading in-memory POJO cache solution is the open source JBoss Cache. JBoss Cache is the first Java library that supports replicated, persistent, transactional, and fine-grained caching. It can be used both as a POJO cache and a plain cache. Since the JBoss Cache is 100 percent Java-based, it runs in any Java SE environment, including inside of an application server or as a standalone process. JBoss Cache has been used inside of the JBoss application server for EJB 3.0 stateful session bean clustering and HPP session replication, for example.

In this article, we demonstrate how JBoss Cache can be used as a POJO cache (through its JBossCacheAop component). A use case will also be given to illustrate key features in a distributed setting.

Brief JBoss Cache Overview

Plain Cache

The default plain cache module in JBoss Cache is called TreeCache. You can configure it either programmatically or through an external XML file. Here are the features that you can configure:

Cache mode: It can be either local or replicated. If it is replicated, you can further specify synchronous or asynchronous modes.

TransactionManager: You can specify a JTA-compliant transaction manager for JBoss Cache to look up. If it finds an ongoing transaction context, it will participate in that transaction and perform commit or rollback accordingly.

Pluggable eviction policy: The cache eviction policy refers to the algorithm the cache uses to expire its contents. In JBoss Cache, you can implement your own eviction policy via a pluggable interface. JBoss Cache currently ships with a region-based LRUEvictionPolicy.

Pluggable CacheLoader: CacheLoader allows you to load persisted cache contents back into the memory. JBoss Cache currently supports file loaders and SleepyCat- and JDBC-based loaders.

Overflowing: Combined with a cache loader and an eviction policy, this provides the passivation/activation feature seen in EJB. Whenever an item is evicted, it will be passivated so it is always persistent.

POJO Cache

The POJO cache module in JBoss Cache is called TreeCacheAop. In order to use the POJO cache, you have to "prepare" the objects (this process is also known as object instrumentation) before they are cached. This is needed so that the system can intercept the POJO operations. The object instrumentation process is performed by the JBoss AOP library. JBoss AOP allows you to specify the to-be-instrumented classes via an XML file or annotations. Currently, we support only JDK-1.4-style annotation (a specific feature of JBoss AOP). JDK 5.0 annotation support is coming in the next release and it will make the instrumentation process nearly transparent!

TreeCacheAop is a subclass of TreeCache, so it uses the same XML file for configuration and provides the same caching functionality as its superclass counterpart. The JBoss POJO Cache also has a POJO-based eviction policy.

A Use Case Walkthrough

In the rest of the article, we will use a "sensor network supervising system" example to illustrate the capability of JBoss POJO Cache in providing instantaneous fine-grained state replication and automatic preservation of state object relationship.

Problem Description

In the different stations along a high-speed railway, there are thousands of sensors that need to be monitored and supervised. Examples of such instruments include temperature, wind, and rain sensors that are critical to the successful operation of a high-speed train. If a particular sensor is malfunctioning, the manager computer in the supervising system should alert the administrator and shut down the unit and/or schedule it for maintenance.

Since the operation is mission-critical, the supervising system has to be highly available, meaning that whenever one sensor manager computer in the system goes down, the administrator should be able to seamlessly switch over to another one to perform supervision and active management. Hence, all manager computers must replicate the same sensor network state information at real time. Note that the characteristics of this kind of system--i.e., the requirement of high-availability and the existence of thousands (or even larger numbers) of elements--are also commonplace elsewhere in modern-day network management. Figure 2 illustrates the overview of such system that includes a clustering capability.

Figure 2. Overview of the sensor supervising system

Because of the hierarchical nature of the sensor network, a complicated domain object graph will typically be required to model the sensor network on the manager side. When the domain object states are not replicated (or persisted), management of object relationships (e.g., adding nodes and traversing the graph nodes) is provided by the JVM itself, and thus is transparent to the end user. However, because the Java serialization process does not recognize the object relationship, this seemingly simplistic object-graph relationship will break down when the states are either replicated or persisted. As a result, it renders a simple failover of the manager side components difficult to achieve.

Traditionally, to provide complete failover (or persistence) capability, the system has to be designed to manage the object relationship explicitly, as in a modern day object-relation mapping (ORM) solution approach. And in a traditional entity persistence layer-style design, you have to do the following:

Optional state persistency that can preserve the object graph, as well.

Fine-grained replication (or persistency) that can be batched for optimized network traffic.

As we described above, the replication and/or persistence aspects in the POJO cache are totally transparent to the user. Note that for total clustering, there is another aspect of load balancing or locating the primary and/or secondary managers to which both the client GUI (for administrators) and the sensors should connect. We do not cover such issues here, and will focus only on the replication of the sensor network state across the managers.

Topology and Object modeling

Figure 3 is the topology for our sensor supervising system example. Basically, in this simplified example, we have two stations (Tokyo and Yokohama) and within each station we will have one wind and one rain sensor. For each sensor, there are numerous components that need to be supervised; e.g., the power supply and the sensor unit itself. Our goal is to supervise the sensors efficiently by 1) monitoring the individual item status (e.g., a StateItem) as well as their overall status (e.g., Wind Summary), and 2) having the ability to bring up and down individual sensors at runtime without restarting the clustered manager nodes.

Figure 3. Topology for the sensor supervising system

Figure 4 illustrates this domain model using a class diagram. Essentially, we have a PropagationManager at the top level that handles the whole sensor network. It always has one root Node. The root Node can have, recursively, numerous Nodes. Each Node can have multiple StateItems. A StateItem is the smallest management unit; for example, representing the unit's power supply.

Figure 4. Class diagram for the sensor supervising system

For the current topology in Figure 3, Japan represents the root node, while Tokyo and Yokohama are the station nodes, respectively. The different sensors (rain or wind) are represented by a Node object as well. Each sensor then has two StateItems, namely Power Supply and Sensor Unit.

The relationship between Nodes is bidirectional; we can navigate from the root node all the way to the leaf node, or we can navigate backwards through the parent node as well. In addition, there are one WindSensor Summary and one RainSensor SummaryStateItem instances that reference the respective sensors. The purpose of the summary items is to monitor the overall health of, say, the wind and rain sensor units as a group. As a result, the objects in the object graph for the sensors are multiple referenced (as shown in Figure 2 and 3).

Configuration

Before we can use the POJO cache functionality, we will need to use JBoss AOP tools to instrument the POJOs (namely, the PropagationManager, Node, and StateItem classes). We can do this via either XML declaration or annotation. Here we illustrate the POJO instrumentation via the JBoss AOP JDK 1.4 annotation (JDK 5.0 annotation will be supported in the next JBoss Cache release). Below are the code snippets that declare the annotation on the three main interfaces.

Note the annotation inside the JavaDoc--@@org.jboss.cache.aop.InstanceOfAopMarker is a JBoss POJO Cache annotation that essentially declares that all instances of this interface will be instrumented, so there is no need to annotate individual classes. If you want to annotate a specific class (without propagating to its subclass), you can also use the @@org.jboss.cache.aop.AopMarker annotation.

After the interfaces are annotated, we then use a JDK-1.4-style JBoss AOP annotation precompiler, annoc, and an AOP precompiler, aopc, to perform compile-time instrumentation. Once these steps are done, the instrumentation process is complete and we are ready to run the example.

Code Snippet

Below is a code snippet that instantiates a PropagationManager instance and sets the appropriate relationship between the station and the sensor nodes. Finally, we use the TreeCacheAop API putObject() to put the POJO (in this case, the PropagationManager instance) under cache management. After that, any POJO operation will be fine-grain replicated; e.g., a setState() operation will only replicate the corresponding state field (which is an integer).

The two cache instances are configured through XML injection and started thereafter. They can be run on two separate JVMs (and different physical machines).

A PropagationManager instance is put into cache management via a putObject() API call. After that, only pure POJO operations are needed. For example, we can add additional nodes to the PropagationManager instance and it will replicate accordingly.

We retrieve a fresh instance of PropagationManager on cache #2 via a getObject() call.

Various setState() calls trigger only fine-grained replication to the other cache instance.

Result Output

Finally, when the example is run, the resulting output will printed as follows:

Please note the bold text lines. Basically, we do a POJO operation setState first on manager #1 and then print out the propagation tree from the second manager to verify that the state has been updated, and vice versa. It is worthwhile to repeat that, although not shown in the output for the replication layer traffic, each setState() operation will only trigger a fine-grained field-level replication. In addition, if the call is under a transaction context, then the update will be batched; i.e., replicated only when it is ready to commit. Finally, notice that we have the capability to add a new sensor into the network on the fly. A traditional system would have required some sort of restart mechanism.

If you are interested in running this example yourself, you can download JBoss Cache release 1.2.4. Check under the directory examples/aop/sensor that contains the full source and instructions to run.

Conclusion

In this article we have demonstrated the capability of JBoss Cache acting as a POJO cache by leveraging the TreeCacheAop component. By using the POJO cache functionality, it provides seamless failover capability for POJOs by performing fine-grained replication while preserving object-graph relationship.

Acknowledgment

The author would like to acknowledge Mr. Yusuke Komori of SMG Co. from Japan for kindly contributing the use case in this article.