Designing J2EE Applications for Real-Life Clustered Environments

One of the most important requirements fulfilled by the J2EE platform is its
clustering capability that enables scalability and high availability. The importance
of high availability in today's increasingly inter-connected world can be gauged
by one incident: a single 22-hour service outage of eBay in June 1999 caused an
interruption of around 2.3 million auctions, and sparked a strong reaction on
Wall Street with 9.2 percent drop in eBay's stock value.

In an application server cluster,
all of the dynamics of a distributed computing environment come into play, and any
piece of code or design that assumes a non-distributed environment breaks down.
Common examples of what works on a standalone application server but fails on
a cluster include static objects that store information required by all connected
users, external files that the application server cannot replicate transparently,
and application-level caching that leads to out-of-sync caches on different nodes.

So the big question for architects and developers of J2EE applications is: "Is there nothing we can do to ensure that the J2EE application being developed and tested on a standalone server will run on a cluster without any code changes?" The answer is that you can write cluster-ready J2EE applications; however, you need to be aware of the differences between standalone and clustered environments and ensure that you do not make any assumption about the single-process nature of an application. In this article, the authors draw from their practical experience to list and discuss some critical considerations when building J2EE applications so that they can be deployed in a clustered environment without requiring any code changes.

Static Variables for Application State

When an application needs to share a state among many objects, a state that needs to have a single instance, the most popular solution is to store the state in a static variable. Many Java applications do it, and so do many J2EE applications -- and that's where the problem is. This approach works absolutely fine on a single server, but fails miserably in a cluster. Each node in the cluster would maintain its own copy of the static variable, thereby creating as many different values for the state as the number of nodes in the cluster.

One place where we saw this situation in action was in an online dating application. This application worked absolutely fine on a single server, but started showing erroneous data when deployed on a cluster. As you would expect, there were four categories of users (men seeking women, women seeking men, etc.). Whenever a user logged in, the user could see how many people are logged into each category. The application maintained this state in a static Hashtable, which worked on a standalone server. However, in a cluster, there were multiple copies of Hashtable, and a user would update only the copy residing on the node that served the user's request.

In this case, since it wasn't necessary for the information to be "accurate-to-the-minute," a quick solution was provided by migrating the state to the database and making each node lazily update its state. In other cases, a major change in application design might be unavoidable.

Serializable Objects

As most of the application servers provide support for replication of Stateful Session Beans (SFSBs) and HTTP Session, the replicable runtime state should be maintained in these variables. Since the container would need to serialize this state to another node, it must be ensured that some common serialization rules are followed. Failure to adhere to these might not cause problems when running in a standalone server, but will break instantly on a cluster.

Mark non-serializable SFSB variables as transient or set them to null in ejbPassivate(). The container is guaranteed to call ejbPassivate() before it serializes the state to another node.

EJB Remotes may not be serializable, but EJB Handle is. If an EJB Remote object is directly or indirectly referenced by an SFSB or an HttpSession attribute, use the following trick to replace the EJB Remote
reference with an EJB Handle and recreate the remote reference in the other JVM.

For HttpSession attributes, a safer approach would be to use HttpSessionActivationListener to implement the above trick. Containers are not restricted to using the native JVM Serialization mechanism for serializing HttpSessions and their attributes. Each HttpSession attribute holding a reference to an EJB remote reference should implement HttpSessionActivationListener. In the sessionWillPassivate method, set the EJB remote reference to null after creating a handle out of it. In the sessionDidActivate, get the remote reference from handle and then set the handle to null, as shown below.

Use of External Files

The J2EE specification clearly forbids use of any I/O operations by J2EE components, as the file system APIs are not well suited for business components to access data. Many applications still use external files for various purposes, including application configuration. Though this doesn't really cause any practical problems on a standalone server; the clusters won't tolerate this defiance of the specification. The problem is that the application server has no way of replicating these files across to other nodes.

The solution is to use the database in place of external files, if possible. One could also choose to go for Entity Beans, if they fit the requirement. Another solution is to build a cluster-specific solution using JMS. In this case, a notification is broadcast to all nodes through a topic to update their external files with newly changed state.

Note that if the external files are read-only, there is no problem at all. But again, can you be sure that writing to these files won't be required in the future?

Session Storage

The J2EE specification heavily recommends storing session state in SFSBs, but many applications use HttpSession for storing the session state. It's easier to code that way, and keeps the system simpler. HttpSessions also become the most dangerous spot for breakage when migrating the application to clustered environments. To minimize breakage, the following guidelines must be followed when using HttpSessions:

ServletContext is not serializable. Care must be taken to ensure that ServletContext is not stored as an attribute in HttpSession, directly or indirectly. If that cannot be avoided, mark it as transient.

Though ServletContext has getAttribute() and setAttribute() methods, they are not meant to store replicable application state.

The application server has to do a lot of work to replicate the state across various nodes. After all, it does take a significant number of CPU cycles to broadcast a message, receive it at the other end, and update the state there. It also takes a toll on the network for wiring these objects across to other nodes. Therefore, keep the replicable state minimal. Heavy state objects might not affect "portability" of the application but, they surely can affect its "usability" in a cluster.

If the application server uses in-memory replication instead of DB persistence, do not use complex objects in an HttpSession. If objects stored under session keys are complex and depend upon each other, copies will be made on the backup machine. For an example, refer to the situation illustrated in Figure 1. Assume Object a holds a reference to Object b, and b holds a reference to c. If both a and b are stored as HttpSession attributes, there will be two copies of c on the backup node, one referenced indirectly by a and another referenced by b. The solution is to use simple objects. If it is unavoidable to have a few complex objects, then whenever the state of any dependent object changes, call setAttribute() on all objects referencing this object directly or indirectly. Though multiple copies cannot be avoided, doing so will ensure that top-level objects are not referring to a stale copy of the referred object.

Figure 1. Multiple copies of attributes with in-memory replication

Application Cache

J2EE has significant provisions for caching to improve performance, and all enterprise-class application servers provide extra degrees of caching to enable faster applications. Beyond what the J2EE specification and application servers provide, there are circumstances where there might be a need to implement application-level caching to gain still faster response times. These application caches are typically designed for a standalone environment. In a cluster, each node will end up maintaining its own copy of the cache, which will eventually run out of sync with the others.

If the gains from caching at the application level are significant enough to call for the pain, you may want to use some design patterns to keep the caches in sync. One such design pattern is the Active Clustered Expiry Cache Pattern. This pattern uses JMS-based lazy synchronization to keep read-mostly data (mutable data that is read very frequently but modified rarely) in sync. Note that if the data is not read-mostly, the advantages of caching will be significantly less compared to the cost of syncing up.

Advanced J2EE Features

Some of the advanced J2EE features and their support are so common now that the only reason we still call them "advanced" is that they are not much talked about in the J2EE specification. For example, almost all application servers support various concurrency options, though the degree of support and interpretation varies a little among implementations. Despite the differences, it is usually possible to tweak the transaction attributes, transaction isolation levels in the database/application server, and the concurrency options to achieve portability across the application servers in a standalone mode. In a clustered mode, though, the differences are much more pronounced. It is advisable to avoid dependence on such features unless you have studied the support for them by all application servers in your choice list.

Behavior of JMS clusters is another source of confusion in J2EE application design. Each JMS vendor interprets the need and scope of clustering to its own convenience. For example, while one JMS vendor considers its implementation "clustered" by providing a high availability through a cold standby, another vendor might provide a truly distributed solution with support for hot fail-over for both topics and queues for both sender and receiver clients. Many variants lie in between the two extremes. Following these guidelines will help avoid surprises when going live:

It is safe to have JMS senders within EJBs and servlets if there are no distributed transactions, or if the support for distributed transactions has been evaluated against target vendors.

JMS Receivers should be either standalone Java clients (running outside of the application server) or MDBs.

Avoid topics (durable subscribers, in particular) as much as possible, as there is a wider difference in interpretations with respect to clustering support for topics.

Again, care must be taken when using design patterns. Not all design patterns directly scale to a clustered environment. As an example, take the Sequence Blocks Pattern for Primary Key Generation, discussed in the book EJB Design Patterns by Floyd Marinescu. If used in a clustered environment, the developer must be ready to handle a TransactionRolledBackLocalException, which could be thrown if more than one Sequence Entity Bean hits the database at the same time for the next block of Primary Keys. (Please refer to the EJB Design Patterns book in the References section for more details about this pattern.) Another "design pattern" we have seen is the use of local interfaces in JSP/servlets. Typically, both the web container and the EJB container reside in the same JVM in standalone J2EE servers. It is, therefore, safe to have a JSP/servlet look up the local interface of an EJB. The performance improvement achieved by avoiding the RMI calls is definitely tempting, but the application will break immediately if the application is moved to a cluster with separate nodes for the web container and the EJB container. Web components must always be designed assuming that the EJB layer is yet another tier.

Conclusion

The challenges in clustering a J2EE application come from various quarters:
gaps left in the J2EE specification, differing interpretations by various
application server implementations, and the very nature of distributed computing.
As we discussed above, most of these issues can be handled if you understand
that the silent expectation in the J2EE specification -- that a cluster behaves like
a standalone server to clients -- is not always met in real life.
Some inherent limitations of distributed architectures will definitely manifest themselves
in all implementations; some might surface depending upon the implementation
approach of a specific J2EE application server. A basic understanding of how clustering
works in general and some specific insights into the implementation approach of
your application servers of choice can significantly reduce reworking efforts at a later
point.