Clustering at the JVM Level to Maintain Business Logic Integrity : Page 2

The typical three-tier architecture keeps the code Java developers need for clustering inside the business logic, making clustering a real chore. Clustering at the JVM level makes Java applications easier to write and cheaper to run.

by Ari Zilka

Apr 14, 2006

Page 2 of 3

Clustering Tooling Options

Developers have access to several seemingly different tools for clustering an application. With proper integration, a developer can create an environment in which machines can be added on-demand to a cluster in order to increase capacity for transaction management. And, with multiple machines deployed, transactions can hop, or fail over, between JVMs on different machines in order to provide transaction resiliency to the failure of any particular machine.

All clustering tools can be analyzed across the following functional dimensions:

Scalability  Does the tool needlessly leverage the network or the database and assume that all changes must be replicated? If so, such a tool may deliver on the clustering promise, but will not be able to scale past the network or database as a bottleneck.

Availability  Some tools sacrifice absolute availability for scalability. Other solutions assume that the operator should use yet more technology to provide availabilityclustering a database or a messaging server, for example.

Not serialization-based  Native Java serialization and custom serialization both are designed to create a clonable, copy-able, or otherwise transferable representation of an object tree for the purpose of later reconstitution. Clustering tools use serialization to copy an object in one machine's memory across the machine boundaries to another JVM, but serialization and clustering are like oil and watereven though serialization is the tool that the standard offers up for this purpose. Serialization creates copies of objects; it doesn't actually share one object across multiple machines. The issue with copies is that the Java language provides the opposite paradigm. If a calling method passes an object to a method that then operates on that object, the caller would expect the changes to have been applied once the callee returns. By analogy, serialization would return a copy of the object from the callee back to the caller, and the caller would then have the responsibility of dumping its local object reference and picking up the new morphed copy. In the serialization scenario, one object goes in, but two come out. Any tool based on serialization suffers from this "copy-on-change" affliction.

Cooperation  Most options today cannot help with cross-JVM cooperation, but clustered applications would be easier to develop if threads across JVMs could cooperate with all the same APIs as threads within a single JVM. Synchronize(), wait()/notify(), and join() should all work across the JVM boundary. Otherwise, the clustering tool is less than clustered-shared memory; two processes cannot cooperate on memory operations. The two processes must inform each other by a separate signaling mechanism when changes to memory are needed. If executing cooperative clustering of memory requires a separate signaling mechanism, the business logic is doubly impacted in that the tool requires explicit integration to the code, and then the separate signaling tool now requires more custom infrastructure logic.

Table 1 illustrates the current state of the art in clustering and how the strengths and weaknesses of each tool lead to a resulting impact on the business logic.

Scalable

Available

Not Serialization-Based

Cooperation

Impact on Business Logic

JMS

X

High

Database

High

JGroups

X

High

Custom API

X

X

High

App Server

X

X

Medium

Clustered JVM

X

X

X

X

Low

Table 1. State of the Art Clustering Solutions

Java Messaging Service (JMS) provides a wrapper on top of classic message-queuing services. Developers often send messages between JVMs on machines, published on a "clustering" topic where all instances listen and learn about objects, transactions, and other state through specific sharing of the data. Developers must integrate JMS into the business logic to share information across JVMs. This integration is commingled in the application, making the application's intent hard to decipher as more and more lines of code become about clustering instead of remaining purely business.

The JMS approach provides high availability by sharing critical information across machines and JVMs. However, it sends all information to all JVMs and will bottleneck on the network long before the business logic taxes the CPU. Hence, JMS delivers availability without scalability, and it has a negative impact (serialization and lack of cooperation) on business logic.

Databases can store serialized Java objects under a unique ID in the database. That ID is usually a session ID. This scheme can be used to store cached data without an OR mapper. Session is a prime example. Databases act as a central data hub for all objects and ensure transactional updates. This leads to a stable application, but capacity is bound by the database, as is availability. So, it delivers neither predictable capacity nor high availability (without clustering the database, of course).

The database approach provides high availability by storing data in a highly available database server. However, it sends all information to the database and will bottleneck on the DB server long before the business logic would have taxed the CPU. Just like JMS, databases deliver availability without scalability, and they have a negative impact (serialization and lack of cooperation) on business logic.

JGroups, according to its own site, "is a toolkit for reliable multicast communication." (Note that this doesn't necessarily mean IP Multicast; JGroups can also use transports such as TCP).

It can be used to create groups of processes whose members can send messages to each other. The main features include the following:

Group creation and deletionGroup members can be spread across LANs or WANs.

Joining and leaving of groups

Membership detection and notification about joined/left/crashed members

As is hopefully evident by its description, JGroups would be used much in the same way JMS would be used when clustering applications. Objects would get serialized in any one JVM and sent as a message to all other JVMs. Because of the similarities, you can guess what JGroups' clustering [dis]advantages are.

Custom API solutions are most easily characterized as a "shared bucket" of data or clustered shared memory. They may not use Java native serialization, but they still copy data between the JVM's natural heap and the bucket in order to move data across machines. These solutions impact the application in all the same ways as JMS, databases, or JGroups do. The main difference is that these custom tools are built to be scalable and designed to have no single point of failure (i.e., any one machine loss does not constitute a loss of data). So, while custom solutions impact business logic as much as other solutions, they can deliver good operating characteristics.

Application servers such as BEA's WebLogic leverage the notion of sticky load balancers and share objects between two machines, regardless of the size of the cluster. This is generically referred to as the buddy system. Most vendors are now using custom solutions or JGroups to implement the buddy system architecture and are starting to provide capacity and availability as long as a load balancer can be used. Again, buddy systems are a viable option, but they still impact the business logic.

No matter which solution you use for clustering, you have to change the business logic in order to address the impact of serializationhaving multiple copies of objects floating around on many machines. And, while this impact may be acceptable for a single, small application, most businesses have some front-office, back-office, and partner-integration applications, each of which is usually running on Java.

The Bottom Line

Moving objects between machines and JVMs has nothing to do with the core of any business andmore important to developershas nothing to do with the core business logic you set out to deliver with Java. In other words, clustering is clustering whether using JMS or JGroups, and whether or not the developer is clustering an e-commerce Web site session or a financial trade.