As the J2EE platform has matured, it has
opened up the opportunity to deploy commodity servers in networked
cluster configurations for scaling of Web services and Web applications
at the Web tier. These commodity servers, interconnected through
commodity LAN hardware, can provide cost-effective clustering solutions.
The last piece of the clustering puzzle is in the software. In this
series, Sing Li examines three open source software substrates that can
enable high-impact Web tier clustering, beginning with
JavaGroups.

On the Internet, the popularity of J2EE-based Web applications and Web
services has pushed the requirement to handle thousands (or more) users
simultaneously to the forefront. It is no longer a "future deliverable"
luxury in many commercial deployments, but a necessity. In this
competitive business environment, an online shop that hangs or crashes
when there are too many shoppers simply will not survive. While scalable
solutions are widely available for the transaction tier of the J2EE model
(for databases, transaction monitors, message queues, and so on),
solutions for scaling Web applications or services at the Web tier are
still emerging. In this series, we'll take a look at several software
technologies that can be applied to scale applications at the Web tier.
Each technology takes a different approach and resolves a slightly
different set of issues. In this first article, we will examine a popular
open source distributed communication substrate called
JavaGroups.

Scaling applications at the Web
tierThere are tried-and-true ways to scale the Web tier.
The intuitive way to scale up the number of concurrent sessions handled by
a service is to add resources to the application server. These resources
can take the form of memory, disk space (storage resources), and CPU
(computing resource). Figure 1 illustrates this single-machine approach to
scalability:

The roadblocks to this approach are the hard limit imposed by the
address space of the processor used and the limits of commodity
(reasonable cost) symmetric multiprocessor (SMP) hardware. Server
configurations beyond four processors might require proprietary or custom
hardware to handle the resource load and can quickly become costly to
acquire and maintain. These restrictions place a practical upper limit to
the number of user sessions we can handle on the Web tier with a
single-server solution.

In addition to the limit on sessions, the single-server solution is
often not a robust solution because of its single point of failure.
Availability could be sporadic because the service is not available when
the single server is down. While there are viable technical solutions to
the problem (such as hot-swappable, redundant back-up resources), these
solutions can also be very costly.

Micro and minicomputer system manufacturers have long turned to
clustering as a viable solution to the scalability problem.
Clustering enables a group of (typically loosely coupled) servers to
operate logically as a single server. The advantages of clustering
include:

Elimination of a single point of failure

High service availability if multiple servers in the cluster can
handle the same service

Load balancing by diverting requests to the least loaded server
hosting the same service

Recently, clustering has "hit the mainstream" due to a number of
converging factors:

J2EE Web tier containers (application servers) technology is finally
maturing, and their state management and operational models are well
specified and understood. By replicating the state of Web tier
containers across a cluster of servers, you can implement a scalable
service solution.

The cost of commodity PC-based servers is at historically low levels
(with CPU power per server continuing to increase), making clustering
more affordable than ever.

High-speed LAN-based interconnects are widely available and
inexpensive. At the same time, sockets, TCP/IP, and higher level
networking APIs make the programming requirement quite simple. Now you
can use LAN-based interconnects for a cluster in place of proprietary
hardware link/bus-interconnects.

The wide adoption of the open source Linux operating system enables
even custom clustering solutions to be implemented, maintained, and
sustained in a non-proprietary manner.

While these LAN-connected, commodity server-based clustering solutions
often do not have the hard guarantees of well-calibrated, finely-tuned,
custom-designed proprietary systems, they do offer a highly cost-effective
solution for implementing scalable, available, load-balanced systems.
Figure 2 illustrates the topology of this commodity hardware-based
solution:

Of course, having the right hardware is only half the story. Instead of
writing custom networking communications software for every specific
application, it would be ideal if some generic "glue" software could be
found for creating cluster solutions. This is an emerging area of
practical research, and is the final enabler that can make
commodity-clustered solutions a reality. Before discussing how JavaGroups,
the open source distributed communication substrate, can provide this
glue, let's first get a better picture of the Web tier scaling
problem.

Visualizing the scaling problem at
the Web tierImagine a shopper at an online store. She has
been going through catalogs and placed several items in her shopping cart.
Typical shopping cart implementations manage a session on the server. The
key to the session is either stored as a cookie on the client's browser or
URLs that have been rewritten with attached session ID information.
Subsequent requests from her browser will send back this session ID,
enabling the server to track her session. Many shoppers may be online
concurrently, and the online store service must manage all the sessions.
In our scenario, we will assume that these sessions are non-persistent and
that they are stored in memory by the online store service.

The problem with scaling is that if the online store site is actually
serviced by a cluster of machines, successive requests for a particular
session must all be directed to the same machine (because the session is
stored only on that machine). By externalizing the sessions and
replicating it across a cluster of servers, all servers in the cluster can
take the incoming shopper requests for any replicated session.

It is certainly possible to write our own custom networking software to
handle this session replication. Due to the possibility of network
hardware failure, however, this software can be difficult to code, test,
and maintain. Thankfully, JavaGroups provides ready-to-deploy solutions
for session replication in a cluster.

To understand how this replication works, and why several open source
application servers have already selected JavaGroups for session
replication, let's examine JavaGroups in more details.

JavaGroups
architectureJavaGroups is a software toolkit (API library)
for designing, implementing, and experimenting with distributed system
solutions (more precisely, in the academic realm, it is known as
process group communications). The architecture of JavaGroups is
divided into two interrelated pieces, as shown in Figure 3. A Java API
abstraction called a Channel provides the boundary of separation.

This boundary also separates the two distinct roles of potential
JavaGroups users: distributed application developers and
protocol implementers.

JavaGroups usersOn
top of the Channel boundary, we have distributed application developers
who will use JavaGroups as a substrate to perform distributed operations.
In fact, that is our role -- we are distributed application developers who
will use JavaGroups to implement Web tier clustering.

Below the Channel boundary, JavaGroups supports a flexibly stackable,
run-time reconfigurable, 100-percent pure Java protocol stacking
framework. This is a fantasy come true for communications protocol
experimenters, designers, and implementers. Using the framework, you can
write and test a moderately complex protocol implementation within a
couple of pages of Java code -- making it easy to debug, maintain, and
evolve. Programming at the protocol framework level is beyond the scope of
this series, but interested readers can consult the "JavaGroups User's
Guide" in the Resources
section.

Virtual synchrony versus probabilistic
broadcastThe basic set of JavaGroups microprotocols,
included with the JChannel implementation, provides some very strong
guarantees in terms of quality of service for the protocol stack.
The group management service (GMS), is based on the virtual
synchrony model (see Resources
for a reference book on this topic). Each member installs a sequence
of views (membership lists) through time and is guaranteed to
receive the same set of messages between views. Any message sent in
one view is also guaranteed to be received in that view. While
stable for small memberships, the implementation is not scalable to
a very large membership. In fact, the virtual synchrony
implementation in JavaGroups can be quite problematic with large
group memberships.

To support very large memberships -- where probabilistic rather
than absolute guarantees are acceptable -- JavaGroups supplies a set
of protocols based on probabilistic broadcast. These
protocols are scalable and stable as the membership
grows.

JavaGroups ChannelA
Channel is a socket-like entity that has a lot of value added to make our
distributed programming life simpler. As distributed application
developers, we program above the Channel line, and Channel provides a
facade for us to access the rich protocol support provided by
JavaGroups.

Like a socket, a Channel has an address associated with it and is the
object we use to send and receive data. Unlike a socket, however, the
address associated with a Channel is opaque (application developers need
not know the physical details of the address) and the data that we send
and receive from a Channel are messages -- a higher-level entity than a
socket's packet.

To be useful for process group communications, a Channel is associated
with a group of processes. Every Channel has a textual Channel name
(sometimes called a group address) associated with it, and Channel
instances with the same name logically belong to the same group.

Group membership management in a distributed network is not an easy
problem to solve or program. In fact, group membership management is one
of JavaGroup's most valuable features. The protocol stack below the
Channel abstraction can perform group membership management for us,
keeping track of members as they join and leave the group. You even have a
choice of using algorithms based on virtual synchrony or probabilistic
broadcast (see Virtual
synchrony versus probabilistic broadcast).

Together, the address and Channel name uniquely identify a Channel
instance. To use a Channel instance, you must first connect to it. Only
one process can connect to a Channel instance at a time. You can also
disconnect from a Channel instance, freeing it for use by others, or close
a Channel instance to permanently disable it. Figure 4 illustrates how
Channel instances facilitate process group communications:

Figure 4 shows four Channel instances on three machines over a LAN.
Note that all Channel instances have the same Channel name, but different
physical addresses. Each physical address in this case consists of
<IP address: port>, allowing multiple physical
addresses on the same host (one IP address). Also, note that only one
process is connected to each Channel instance. All four processes belong
to the same group.

Communications in process
groupsYou can send messages through a Channel to a specific
member of the group (unicast) or to all members of the group
(multicast). To send message to a specific member of the group, you
need its address.

Since the Channel manages group membership for us, you can always
obtain the members in the current "view" by retrieving the membership list
from the Channel. In addition, you can also obtain view changes
(membership change notifications) from the Channel as it occurs. Figure 5
illustrates the separation of responsibility between the distributed
application developer and the Channel abstraction:

Reusable communications coding
patternsTo further simplify distributed application
programming, JavaGroups offers a collection of frequently used
communications coding patterns in the form of Java classes. You can use
many of these programming patterns (also called building
blocks) instead of, or in addition to, direct access to the Channel
abstractions.

You can find all of these patterns in the
org.javagroups.blocks package. Table 1 shows a partial
listing of the most useful building blocks:

Table 1. Coding patterns building blocks

Class Name

Description

PullPushAdapter

Alleviates the need for the user to check the Channel
for incoming messages. The user registers a listener, and the
adapter will call back upon receipt of incoming messages or change
of view.

MessageDispatcher

Encapsulates the synchronous sending of a request to
all members and correlates the subsequent receipt of responses. Can
wait for first response, all responses, a specific number of
responses, majority, or timeout. API is push in nature, through
registration of a MessageListener. In addition, the
user can register a RequestHandler to deal with
incoming requests to the Channel.

RpcDispatcher

Layered on top of MessageDispatcher,
adds remote method invocation semantics to the message
dispatcher-managed scenario. Enables the user to call remote methods
and correlates return values from all or a subset of members. Also
supports incoming remote calls from other RpcDispatcher
instances by using reflection on a server object supplied by the
user.

The coding patterns are applicable to many distributed designs, and are
specially created to work well with JavaGroups Channels. For example, by
programming to RpcDispatcher, you can substantially reduce
the code required for distributed applications involving remote procedure
call semantics.

Manages a complete distributed tree data structure,
replicating all changes reliably to group members. Any member can
add and delete nodes.

DistributedHashtable

Implements a replicated hash table that will
propagate changes of the hash table to all group members.

NotificationBus

Self-contained (creates its own Channel) building
block implementing a notification bus where consumers can register
for notification sent by producers. Each group member can
participate in either or both roles. Designed to support the
implementation of a replicated cache.

Pluggable Channel
implementationSo far, we've talked about the JavaGroups
Channel as if it is a concrete implementation. However,
Channel is actually an abstract class in JavaGroups. In fact,
the current JavaGroups distribution comes with multiple
Channel implementations, as illustrated in Figure 6:

JChannel is the 100-percent pure Java implementation of
the flexible protocol stack framework, combined with an extensible
collection of protocols. This is the most frequently used Channel
implementation.

EnsembleChannel accesses Ensemble (see Resources),
a robust process groups communication substrates (non-Java), through a
connector written in the Java language.

It is also possible to extend JavaGroups by creating your own
Channel implementation.

Starts sending messages or processing incoming messages (possibly
with the help of the building blocks)

Disconnects and closes the Channel

Assuming we are using JChannel, we can create a protocol
stack by simply setting an initialization string (alternatively, an
external XML-based configuration file may also be used -- see the
JavaGroups User's Guide link in the Resources
section for more information on the XML-based configuration). Listing 1 is
an example of such a string:

Each component of the string, separated by a colon (:), specifies a
microprotocol that implements a composable protocol feature or quality. In
fact, each microprotocol is implemented by a Java class of the same name
and is loaded at run time by JChannel. Many of these
microprotocols can be found in the org.javagroups.protocol
package, but the protocol designer is free to use any package name. Each
microprotocol specified in the stack can have one or more associated
properties that can be set in parentheses in the initialization string.
The protocol stack is built from the bottom up at run time, layer upon
layer of microprotocol, according to the initialization string.

Run-time configurable, stackable
microprotocolsTable 3 shows descriptions of some of the
frequently used microprotocols, including ones used by our sample
initialization string. For more information on individual property
details, see the Resources
section for a link to the "JavaGroups User's Guide."

Failure detection using heartbeat protocol. Heartbeat
messages are sent to neighbor members according to ordering in
membership list.

FD_SOCK

Failure detection based on TCP sockets. Ring-based
pings are sent between neighboring members. Works best when all
members are on the same physical host.

FD_PID

Failure detection using process ID (native JNI code
to obtain PID required). Works only on the same hosts (one IP
address).

FD_PROB

Failure detection using probabilistic algorithm.
Every member of the group sends heartbeat and maintains heartbeat
counters of others.

FLOW_CONTROL

Flow control implementation limiting maximum number
of messages sent between message receipts.

FLUSH

Flushes all messages in a consistent way across all
members. Typically performed before view changes.

FRAG

Message fragmentation and reassembly. Ensures larger
messages are fragmented to FRAG_SIZE before being sent down the
stack. Fragmented messages are reassembled at the receiver before
being sent up the stack.

GMS

Group management service. Manages group membership
based on the virtual synchrony model.

MERGEMERGE2

Merge separated subgroups. Subgroups are formed when
the network separates into partitions due to failure.

Group management service, based on probabilistic
broadcast (gossip). Does not require FLUSH.

pbcast.FD

Passive failure detection based on gossip. Does not
send heartbeat message.

pbcast.PBCAST

Implements probabilistic broadcast, gossips regularly
to a random subset of the membership.

pbcast.STABLE

Implements distributed garbage collection protocol
(that is, deletes all messages that have been received by all
members of the group).

pbcast.NAKACK

Negative acknowledgement implementation for
retransmission of missing messages and sequenced delivery of
messages.

pbcast.STATE_TRANSFER

Uses probabilistic broadcast for state transfer
implementation. Does not require QUEUE.

To traverse WANs and firewalls, JavaGroups also provides the
microprotocol support shown in Table 5:

Table 5. Microprotocols for WAN and firewall traversal

TCP

Used in place of UDP as the lowest layer transport.
Sends multiple unicast messages to members through TCP connections
instead of multicast (not possible). Reliability, FIFO ordering, and
flow control are already built-in.

TCPPING

Uses a known set of members to bootstrap the
membership management over TCP.

TCPGOSSIP

Uses an external gossip (see Resources)
server to locate initial set of members for bootstrap of membership
management.

TUNNEL

Enables tunneling through firewalls when used in
place of UDP or TCP as the lowest layer transport. Works in
conjunction with a JavaGroups Router process outside of the
firewall.

Exploring session replication with a
visual shopping cartTo see how we can use JavaGroups for
session replication -- enabling Web tier clustering -- we can create a
sample visual shopping cart called JGCart. JGCart represents an x-ray view
into a single session managed by a Web application server. Imagine that
there are hundreds of these on each application server instance. Figure 7
shows the GUI for this visual shopping cart:

Shopping cart GUI and event
flowAt the top of the cart is a product catalog. We can
select any category of product by clicking its tab. Clicking the
Buy button next to the item we want adds it to the cart below. The
cart keeps track of the items that we have ordered -- including price and
quantity -- and calculates the extended price (price per item multiplied
by the quantity ordered). This is the visual representation of a single
shopping cart session within an application server. At any time, an
application server may be managing many such sessions in memory. We can
use JavaGroups to enable replication of sessions such as this one in a way
that is easy to program and maintain.

Figure 8 shows the hierarchy of GUI components in our JGCart
application. The entire GUI is created using the Swing GUI library.

On the top half of the GUI is the CatalogUI component.
CatalogUI is a JPanel component with a managed
JTabbedPane displaying a list of CatalogItem
components. Each CatalogItem is a JPanel
component with a JButton and two JLabel
components. JButton is the Buy button, and the two
JLabels display the description and price of each item.

CatalogUI supports the forwarding of Buy events by
offering a setOrderListener() method. The
OrderListener interface is used to forward an
OrderEvent from the CatalogUI component whenever
a Buy button is clicked. Figure 9 shows the event forwarding action:

On the bottom half of the GUI in Figure 8, we have the
OrderList component. This component is a JPanel
with a managed JTable displaying the content of the shopping
cart at any time. The managed JTable has a custom model
(containing the data displayed), as implemented by our
OrderTableModel class. This custom model ensures that the
state we maintain in our session -- in an instance of the
CartState class -- is synchronized with what is displayed
within the managed JTable. We can update the data in the
model (and thus the data displayed) at any time by using the
OrderTableModel.changeData() method.

Programming the GUI and wiring the
events flowWe can see in Listing 2 (highlighted in red) how
the CatalogUI component is wired to the
OrderList component by the
CreateUIandPrepChannel() and addOrderItem()
methods of the JGCart class.

Note that the addOrderItem() method is not directly called
due to a Buy button click event. Instead, the Buy button
click generates the broadcast of an AddItemMessage message to
all the members of the cluster, including the member who sent the message.
It is during the handling of this message, through the
receive() method of the MessageListener
interface (which the JGCart class implements), where the
addOrderItem() method will be called. This effectively
replicates all changes in the cart to all members of the group.

If you are interested in the detailed operations of all the GUI
classes, see the Resources
section to download the source code.

Start an instance of JGCart on your system by using the
run.bat batch file (under the src directory of
the code distribution). You will need to edit the run.bat file
specifying where your JavaGroups library is located.

Start another instance on another PC on the same LAN (or start
another instance on the same PC if you're not working on a LAN).

On the first instance, click on the catalog, then click the
Buy button for several items. Note that the session state changes
are replicated immediately to the other machine.

Figure 10 illustrates the two replicated sessions as represented by
JGCart. Notice how they are kept in sync with one another.

If you have more than two machines on the LAN, you can easily extend
this experimental cluster to more machines -- just start new instances of
JGCart on those machines.

Now, imagine that this shopping cart session is running inside an
application server and the server hardware crashes. We can simulate this
scenario by closing the first JGCart instance. Of course, it is easy to
see that we can continue to shop by sending requests to the second
instance of the session. In clustering terms, this is a fail-over.
The ability to survive hardware crashes across a number of servers within
the cluster ensures high availability of the service.

In fact, even if there is no system crash, shopping cart requests can
be directed to either server A or B at any time, since the session exists
on both servers and can be changed on either server. The incoming request
can be directed to the server currently with the lowest workload (called
load balancing), transparent to the shopper. As you can see, Web
tier session replication across a cluster of machines using JavaGroups can
provide high availability service, with fail-over and load balancing
possibilities.

A problem in session
replicationAs I'll now illustrate, there can be problems
with session replication. Start another instance of JGCart (on another
machine over the LAN or on the same machine). Now, go back to the original
instance and add some more items. Figure 11 illustrates what you should
see:

While all the additions to the shopping cart session are still being
replicated to the second instance, the two instances are completely out of
sync. The reason for this problem, of course, is that the second instance
was started after we had already placed several items into the first
shopping cart. We did not have this problem initially because all
replicated sessions were started at the same time.

In cluster implementations, it is unreasonable to insist that clustered
machines be started at the same time. In fact, we should be able to add
machines or remove machines at any time. This requires that the newly
joining clustered machine be able to request the "current state" from the
replicated sessions. Not surprisingly, JavaGroup's JChannel provides a
STATE_TRANSFER microprotocol specifically for this purpose.
See the Resources
section for details about the STATE_TRANSFER microprotocol
implementation.

Set Channel options to respond to GET_STATE requests.
By default, any GET_STATE event from the
STATE_TRANSFER protocol will not be propagated up to the
application to simplify typical client implementations. In our case, we
want to receive the GET_STATE event. Listing 3 shows how we
set the option (highlighted in red). This is done inside the
PrepareChannel() method.

After you're connected to the Channel, call JChannel's
getState() method to obtain the current state from the
cluster (since the session is identically replicated in the cluster,
this can be obtained from any member), as shown in Listing 3
(highlighted in green). Again, it is part of the
PrepareChannel() method.

After the SET_STATE request is received from the
microprotocol stack, the PullPushAdapter instance will call
back to the setState() method of the
MessageListener interface. We need to implement this method
to set our state. Listing 4 shows this implementation. Here, we simply
set the private cstate variable, then update
TableModel with cstate. This will update the
GUI view automatically, because the Swing library routine will
synchronize the GUI view with TableModel.

After the GET_STATE request has been received --
typically from a new member joining the cluster -- we need to return the
current state. The PullPushAdapter class will call back on
the getState() method of the MessageListener
interface. Listing 5 shows our implementation of this method. We simply
make a copy of the state, then return it as an object.

It is worthwhile to note
that a deep copy (a member-by-member copy of every referenced
Java object) of the state must be made because it is possible that the
STATE_TRANSFER protocol may hold on to the state for awhile
before it is transmitted on the wire. If any object referenced by the
state is modified at this time, the state transmitted can become
inconsistent. In fact, because the deep copy operation itself can take
significant time, the state must be protected from concurrent access
during the copy through synchronization of state operations.

Finally, we need to call getState() ourselves during
JGCart start up. The line highlighted in green in Listing 3 is
responsible for this action and is part of the
PrepareChannel() method. Note that the
getState() call is asynchronous. It merely starts the
STATE_TRANSFER protocol to obtain the state from the
cluster, but it returns immediately. A setState() callback
on the MessageListener interface by the
PullPushAdapter class will occur sometime later.

The code in Listings 3 through 5 are in the source code (see Resources),
but are commented out. To get the state transfer support functionality,
remove the comments and recompile.

With the STATE_TRANSFER support in place, we are ready to
try the JGCart cluster simulation again. First, start up a JGCart
instance, then add several items to the cart. Now, simulate the addition
of a new machine to the cluster by starting another JGCart instance. The
instance now starts up with all the items in the cart, identical to the
first instance. After it is started, we can use either JGCart to continue
our shopping. JavaGroups' state transfer protocol enables us to add new
machines to a cluster at any time.

Of course, if the state that needs to be transferred is very large, it
may not be practical or even possible to send it over the wire to every
new machine joining the cluster. This is especially true if machines are
constantly joining and leaving the cluster. Fortunately, J2EE Servlet/JSP
container-level implementations often involve a low membership count and
infrequent membership changes (for instance, machines crashing or being
taken out of the cluster for maintenance).

ConclusionsCommodity
machine clusters can provide a scalable and highly available platform to
deploy Web application and Web services. The networking software required
for such clusters, however, is often custom to the specific application
and can be daunting to write and test. JavaGroups, an open source
distributed systems toolkit, can help by providing ready-to-deploy,
high-level features such as:

Group membership management

Multicast and unicast message-based group communications

State transfer protocol

Functional distributed data structure

A library of reusable, frequently used communication coding patterns

By leveraging the features of JavaGroups, we have created a visual
demonstration of a session replication mechanism for a clustered shopping
cart Web application: JGCart. Experimenting with JGCart, we've seen how
session replication can improve the availability and scalability of the
Web tier application.

By using JavaGroups to handle the group communications, state transfer,
and data replication aspects of a clustering solution, designers can focus
on other application-specific requirements.

Information on Ensemble
and Horus, JavaGroup's predecessors, is available in research papers
from the Cornell University Web site.

Check out Building
Secure and Reliable Network Applications by Kenneth P. Birman
(Manning Publications Co., 1996) to learn more about virtual
synchrony, probabilistic broadcast, how some of the microprotocols work,
and how to design reliable network applications in general. This classic
is a must read.

About the
author Sing Li is the author of Early
Adopter JXTA and Professional Jini, as well as numerous
other books with Wrox Press. He is a regular contributor to
technical magazines and is an active evangelist of the P2P
evolution. Sing is a consultant and freelance writer and can be
reached at westmakaha@yahoo.com.