Java Development News:

ACID is Good. Take it in Short Doses

Some of you may remember the Five
Minute University by Father Guido Sarducci of Saturday Night Live fame. In
five minutes, Sarducci teaches you everything you actually remember from college
after five years. His topics include Economics (supply and
demand), Spanish (¿Cómo está Usted?
Muy bien), Theology (Where is God? God is everywhere)
and Business (Buy something, sell it for more).

If Sarducci’s Five Minute University were to cover Transaction
Theory, he would probably teach you “ACID is good. Take it
in short doses.” Congratulations if you still remember that after
five years out of college. As with all of Sarducci’s five-minute phrases
there is no explanation behind the five-minute phrases. This article is about
why ACID is good for you, why ACID doesn’t work in long doses, why you
shouldn’t give up and what concepts, models and technologies you can take
in longer doses.

ACID is good

Do you remember when you first learned to write a program? Perhaps it was in
high school. Looking back, remember how simple it was? You didn’t worry
about the effects of your program failing. You didn’t worry about the
effects of multiple users and multiple threads of control accessing shared data.
You simply wrote your single threaded algorithms on transient data. Maybe you
accessed files, but you probably didn’t worry too much about it.

If you develop challenging distributed enterprise applications and systems,
short ACID transactions are your friends. The ACID properties of transactions
enable you to write software without considering the complex environment in
which the application runs. ACID transactions bring simple high-school programming
to the complex real world. With ACID transactions you can concentrate on the
application logic and not on failure detection, recovery and synchronizing access
to shared data.

With ACID transactions, your software need not include logic to recover the
state of the application should it fail. Instead you simply define transaction
boundaries in your application and the system ensures atomicity – the
actions taken within the transaction will happen completely, or not at all.
If the application fails midstream, the system will recover to the previous
state, as if the transaction never took place. If you have ever written an application
without transactions that attempted to detect failures and recover from them,
you know that logic can get quite complex.

ACID transactions preserve consistency. Assuming there are no bugs in your
transaction, the transaction will take the system from one consistent state
to another. The good news is that atomicity and isolation make writing a transaction
without bugs easier. You focus on getting the application logic correct in the
high school programming environment, not in the complex environment.

Consistency is especially important in a web application with dynamic servers.
When users navigate a web application, they are viewing snapshots of the server
state. If the snapshot is computed within a transaction, the state returned
to the user is consistent. For many applications this is extremely important.
Otherwise the inconsistent view of the data could be confusing to the user.
Many developers have the incorrect perception that you don’t need transactions
if all you are doing is reading a database. If you are doing multiple reads
and you want them to be consistent, then you need to do them within a transaction.

With ACID transactions, your software need not be a complicated concurrent
program in which you explicitly synchronize multiple concurrent activities accessing
shared data. The concurrent transactions are isolated from each other. There
is concurrency – multiple transactions accessing shared data can run concurrently
– it’s just that you do not have to worry about it in your application.
The transaction code is written assuming it is the only code accessing the data.

The individual actions of a transaction are scheduled to execute according
to some notion of correctness. A schedule is serializable if it has the same
effect as running the transactions serially, that is just the way you wrote
the code. It is up to the system to produce a correct schedule. If you have
ever written a complex multi-threaded program, you know it is hard to get it
right, test it and debug it. Serializability is a formally defined property
that allows you to avoid concurrent programming.

Many databases and some application servers weaken serializability with their
so-called isolation levels. This requires you to reason about using inconsistent
data and this is hard. You have to use application knowledge to argue that a
transaction reading an inconsistent, possibly to be rolled-back value, doesn’t
matter to the correctness of the application. Furthermore, the weaker isolation
levels are not the same from one database to another. This makes porting your
application very hard. Only until recently have the weaker isolation levels
been formally defined. [ALO]

Finally, the effects of transactions are durable, that is when a transaction
commits the new state is persistently stored. “Durable” makes for
a nice acronym.

Take it in Short Doses

Unfortunately, ACID transactions do not work effectively over a long period
of time. Do not expect things to work if your transactions last more than even
a few seconds. Forget putting transaction boundaries around units of work that
last minutes, hours, days, months or years. This eliminates defining transactions
that do a lot of computation or ones that include user input. Users are finicky
– they drink coffee, go on vacation and die. You cannot expect them to
commit their work in a timely manner.

This is the so-called “long transaction problem”. No one has found
a solution for it after many years of research. The basic problem is achieving
isolation – the “I” in “ACID”. There are no known
concurrency control algorithms that will operate over a long period of time.
Concurrency control algorithms for ensuring serializability come in two flavors:
pessimistic and optimistic.

Pessimistic concurrency control algorithms achieve serializable transactions
by locking shared resources. In two-phase locking, a transaction obtains its
locks in phase 1 and holds on to them until it completes. Competing transactions
waiting for the same shared resource block until the first transaction completes.
If the transaction holds the shared resource for a long time, little work is
accomplished concurrently because the competing transactions are blocked for
a long time.

Optimistic concurrency control algorithms let transactions access shared resources
and then validate the resulting schedule at commit time. If the schedule violates
serializability, the transaction is rolled back. This works when the transactions
are short – a small amount of work is rolled back. If the transaction
is long-lived, the system appears to be humming along but after doing all of
that work, it starts rolling back the transactions.

Don’t Give Up

Just because you cannot achieve ACID properties over a long period of time,
do not throw up your hands and forget long-lived activities. [MP] Applications
do operate over long periods of time and without considering what happens over
the long period of time, you may end up with unwanted processing, inconsistent
data or garbage in your database.

Consider a simple Web application collecting user data through a series of
forms. To be more specific, assume the application lets a user plan a trip and
each form is related to one piece of the trip. The first reserves a car, the
second reserves a hotel and the third reserves a flight. In order to achieve
atomicity, we might be tempted to start a transaction before the trip planning
exercise begins and to commit the transaction after the car, the hotel and the
flight have been reserved. But since a user is involved, the activity is long-lived
and we cannot make it an ACID transaction.

We should still model this long-lived activity and consider what happens over
the long period of time. At each form the user enters data and clicks on the
submit button. After submitting the first form, the server reserves the hotel.
After submitting the second form, the server reserves the car and after submitting
the third form, the server reserves the flight. Each one of these steps is a
single, short ACID transaction. But what happens if the user completes the first
two steps and not the third? If we don’t consider the long-lived activity,
the application could end up reserving the car and the hotel but not the flight.

Over the years, several techniques have been proposed for managing long-lived
activities. One of the first is called a Saga. [GGKKS] Sagas require you to
define compensating transactions. A compensating transaction compensates for
the effects of a transaction. For example, a compensating transaction for reserving
a hotel room would be a transaction that cancels the reservation.

Given a long-lived activity as a sequence of short ACID transactions T1, T2
… Tn and compensating transactions C1, C2 … Cn, the Saga ensures
that either T1, T2 … Tn complete or T1, T2 … Tj, Cj, Cj-1 …
C1 complete. In other words, either the long-lived activity completes or compensating
transactions are run in reverse order from the last successful short ACID transaction.

Consider the long-lived activity:

Begin
reserveCar();
reserveHotel();
reserveFlight();
End

Assume that after running reserveHotel(), the long-lived activity “rolls
back.” The Saga compensates by running cancelHotel() and cancelCar().
The sequence of short transactions actually run would be:

reserveCar();
reserveHotel();
cancelHotel();
cancelCar();

The Saga is approximating atomicity over a long period of time. Note however,
it is not providing the isolation property. In the example, if a transaction
reserves the last car, a second transaction can observe that fact and conclude
there are no cars available. But if later we compensate for the first transaction
by canceling the reservation, the second transaction has observed an inconsistent
state. If they were isolated, the second transaction would never have observed
the status of the rental car agency until the first long-lived activity completed.
But remember we cannot achieve isolation over a long period of time. Just as
we live with this in the real world, our computer applications must as well.

Sagas define a simple model approximating atomicity over a long period of time.
But when we consider what we need in a real application, we want to generalize
Sagas. For example, if the reserveHotel() operation failed because there were
no more rooms left in the hotel, we may want to try to reserve a room in a different
hotel. Rather than rolling back the long-lived activity, we want to explore
a different path. When we generalize Sagas to support more general computations
we end up with workflow.

Unfortunately, today’s application servers do not have any built in support
for long-lived activities. You should model what happens in your application
over a long period of time but you will need to implement all of it. For example,
if Sagas fit your application, you would need to implement the infrastructure
necessary to run compensating transactions.

Fortunately, this lack of support may change in the future. Standards have
been defined to support long-lived activities and extended transaction models.
The Web Services world is adopting compensating transactions. CORBA has defined
the Activity Service and the Java world has JSR 95: J2EETM Activity Service
[JSR95] for Extended Transactions.

J2EE Activity Service

As you’ve seen, ACID transactions aren’t sufficient for everything
and Sagas are one possible solution. However, there are a range of extended
transaction models, each typically suited to a specific set of use cases. What
this means is that one size doesn’t fit all and as usual it is necessary
to “use the right tool for the right job”. A good architect, whether
working in the world with bricks and mortar or databases and entity beans, needs
to know about all of the tools at his disposal.

Therefore, rather than provide support for a single model, such as Sagas, JSR
95 defines an infrastructure to support a wide range of extended transaction
models. The architecture is based on the insight that the various extended transaction
models can be supported by providing a general purpose event signaling mechanism
that can be programmed to enable activities (application specific units of computations)
to coordinate each other in a manner prescribed by the extended transaction
model under consideration.

An activity is actually a fairly abstract entity, whose precise nature needs
to be defined by applications or users of the service. Whatever work an activity
does, the result of a completed activity is its outcome, which can be used to
determine subsequent flow of control to other activities. Activities can run
over long periods of time and can be suspended and resumed later, similarly
to the Java Transaction API (JTA). Activities can also be transactional, using
JTA transactions, though they don’t have to use the native application
server transactions at all.

If you look at the example activity structure, the solid ellipses represent
JTA transaction boundaries, whereas the dotted ellipses are activity boundaries.
Activity A1 uses two top-level transactions during its execution, whereas A2
uses none. Additionally, transactional activity A3 has another transactional
activity, A3’ nested within it. The J2EE Activity Service is responsible
for distributing both the activity and transaction contexts between execution
environments in order that the hierarchy can be fully distributed.

Containers and high level services

The HLS is the embodiment of an extended transaction service in the J2EE Activity
Service architecture. It’s a service-provider component that plugs into
the application server and offers to applications, service-specific interfaces
that are mediated by the application server through interactions between the
HLS and the Activity Service. The most important components in the HLS are the
Action, Signal and SignalSet: it’s these interfaces and classes that are
at the heart of the pluggable coordination nature of the Activity Service.

At its heart, the J2EE Activity Service is really about supporting the coordination
and control of these activities through a pluggable protocol layer: the coordinator
intelligence (for example, whether it runs a typical two-phase commit protocol
or a three-phase commit protocol) can be written by a third party to be plugged
into the J2EE Activity Service infrastructure.

Associated with each activity is a coordinator that can coordinate the execution
of constituent activities or participants. Demarcation messages (javax.activity.Signals)
are sent between activities by the coordinator. In order to allow the architecture
to be extensible, Signals are used to encode arbitrary protocol messages that
flow between activities.

The org.omg.CORBA.Any is CORBA’s way of allowing arbitrary object types
to be communicated between clients and services and is needed for interoperability
with the original OMG work. Obviously at some point it’s necessary to
be able to decode the Signal payload, and the getName methods helps with this.

Now a coordination protocol that only sends a single type of message is not
the normal case (e.g., two-phase commit can have 4 types of message flowing
from the coordinator to the participant: prepare, commit, rollback commit_one_phase).
So, the Activity Service lets messages associated with a specific coordination
model be grouped into a javax.activity.SignalSet. The SignalSet is also the
place where the pluggable coordinator intelligence goes, but we’ll come
onto that in a moment, after we finish gluing together activities.

To receive a Signal from one activity, you would register a participant (Action)
with that activity’s coordinator. (It’s like registering an XAResource
with a JTA transaction.) Although Actions are registered with the coordinator,
they are associated with a specific SignalSet, so that any specific Action will
receive all messages generated by a SignalSet. The Action interface is fairly
generic, as you might expect.

Signals can be used to infer a flow of control during the execution of an application.
For example, the termination of one activity may initiate the start/restart
of other activities in a workflow-like environment.

One of the keys to the extensibility of this framework is the SignalSet, whose
behavior is peculiar to the kind of extended transaction. This is the entity
that generates Signals that are sent to participants by the coordinator and
processes the results returned. As a result, the coordinator is a fairly lightweight
entity, having delegated most of its responsibilities to the SignalSet.

The activity coordinator interacts with the SignalSet to obtain the Signal
to send to registered Actions. A SignalSet may generate a different sequence
of Signals depending upon the state of the activity (e.g., rollback versus commit).
The setCompletionStatus method tells the SignalSet what state the activity is
in before it starts to generate signals.

The coordinator can then start calling the getSignal method to get the Signal
to send to each participant. The coordinator sends each Signal to every registered
participant and passes the results (the Outcomes) back to the SignalSet via
the setResponse method. This method returns a CoordinationInformation instance
which the coordinator uses to determine the flow of the Signals.

With the exception of some predefined Signals and SignalSets, the majority
of Signals and SignalSets will be defined and provided by the higher-level applications
that make use of this Activity Service framework. Predefined SignalSets include
a Synchronization protocol (similar to the JTA one) and a Lifetime protocol
that allows activities to be informed when other activities start or end.

Leveraging the Activity Service for the Trip Planning Activity

We’ve already mentioned that Sagas are one means whereby acidity may
be relaxed: any work performed by a committed transaction can be undone later
if required, by a compensation transaction. Obviously managing these compensation
transactions is an issue if you had to do it by hand. We’ll now show how
the J2EE Activity Service can be leveraged instead.

Take a look at the sequence of transactions shown below; the dotted ellipse
is the controlling activity (Saga “manager”) and the solid ellipses
are transactions (C for reserveCar, H for reserveHotel and F for reserveFlight).
Each transaction has a corresponding compensation transaction denoted by !C,
for example. What we want to do is ensure that if the overall activity decides
it can complete successfully (car, hotel and flight have been obtained), then
nothing happens. If, however, it needs to cancel, each of the compensation transactions
executes in the reverse order.

In order to support this scheme within the Activity Service, we first assume
that C, H and F execute within their own activity, which has a CompletionSignalSet
supporting the Failure (equivalent to roll back) and Propagate (successful completion,
but later compensation may be required) Signals. Associated with these activities
will be a CompensationAction participant, whose job it is to ensure that the
compensation transaction is propagated to the enclosing activity.

For the enclosing activity, we’ll have a SagaSignalSet, which has Signals
for Success (no compensation) and Failure (do compensation). It will also ensure
that each compensation transaction is executed in the right order (the Activity
Service API enables an ordering to be placed on participants when the coordination
protocol fires).

So, whenever a transaction begins, a CompensationAction is registered with
the enclosing activity. Assuming the transaction commits successfully, the CompletionSignalSet
will propagate the CompensationAction to the parent activity and ensure that
it is placed in the right order of compensations to be executed.

If at any point a transaction rolls back, the CompletionSignalSet is responsible
for ensuring that the enclosing activity must fail, triggering any remaining
compensation transactions.

Assuming we manage to get the taxi, hotel and flight, the enclosing activity
can terminate successfully and the CompensationActions will be ignored: there’s
no work for them to do.

Conclusions

We have discussed why ACID is good, why ACID does not work in long doses,
why you should not give up and what concepts, models and technologies you can
take in longer doses. We used the simple example of planning a trip to illustrate
a long-lived activity. We focused on the simple concept of a Saga as one way
of approaching the trip planning activity. But when a Saga is extended, we end
up with more generalized workflow models.

We next turned to forthcoming infrastructure to support activities. In particular,
we focused on the J2EE Activity Service as defined by JSR-95. We look forward
to support for activities in application servers.

Finally, we described how to implement our trip planning activity as a Saga
with the J2EE Activity Service. The Activity Service is a more general mechanism
than that. Other trip planning behaviors besides atomicity can be achieved using
the Activity Service. We leave that as an exercise for you.

References

[ALO]

Atul Adya, Barbara Liskov, and Patrick O'Neil. “Generalized
Isolation Level Definitions.” In Proceedings of the IEEE International
Conference on Data Engineering, March 2000.

Author Bios

Mark Little
Before Arjuna Technologies, Mark was a Distinguished Engineer/Architect within HP Arjuna Labs, where he lead the HP-TS and HP-WST teams, developing J2EE and Web services transactions products respectively. Mark is one of the primary authors of the OMG Activity Service specification and is on the expert group for the same work in J2EE (JSR 95). He is on the OTS Revision Task Force and the OASIS BTP and OASIS WS-CAF technical committees. Mark has published extensively in the Web Services Journal, Java Developers Journal and other journals and magazines.

Bruce Martin
Bruce Martin is a Middleware Maven at the Middleware Company. For the past few years, Bruce has been writing, teaching and consulting about J2EE and distributed object technologies. Bruce's recent endeavors have included The Middleware Company's TORPEDO initiative. Bruce created The Middleware Company's popular Architect's Course and has given it to several hundred software architects. Bruce is one of the pioneers of distributed object computing. At Hewlett Packard Laboratories, he designed and implemented an interface definition language that became the basis for HP's original CORBA submission. At Sun Microsystems, he was one of Sun's CORBA architects and was the primary author of five of the OMG's CORBA Services specifications. Bruce holds a Ph.D. and M.S. in Computer Science from the University of California at San Diego.

TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations technology projects - with its network of technology-specific websites, events and online magazines.