Oracle Blog

Software matters

Monday Jan 15, 2007

A few months ago, in my blog entry Transactions,
disks, and performance
I went into the importance of minimizing the number of writes.
Transaction logging is one of those cases where minimizing the number
of writes greatly enhances performance. In this entry, I'll describe a
way to avoid transaction logging altogether.

What is transaction logging? Transaction logging
refers to persisting the state of a two-phase transaction so that in
the event of a crash, the transaction can either be committed or rolled
back (recovered). I won't go into the details of what XA is; more
information about XA transactions can be found elsewhere, e.g. in Mike Spille's XA
Exposed.

Let me illustrate what recovery is using a
"diagram".
Consider an XA two phase transaction with three Resource Managers (RMa, RMb, and RMc). To
indicate what happens at what time, I'll put all actions in a table;
each row corresponds to a different time.

time

RMa

RMb

RMc

Coordinator

t1

start(xid1a,
TMNOFLAGS)

t2

start(xid1b,
TMNOFLAGS)

t3

start(xid1c,
TMNOFLAGS)

t4

end(xid1a, TMSUCCESS)

t5

end(xid1b, TMSUCCESS)

t6

end(xid1c, TMSUCCESS)

t7

prepare(xid1a)

t8

prepare(xid1b)

t9

prepare(xid1c)

t10

log

t11

commit(xid1a, false)

t12

commit(xid1b, false)

t13

commit(xid1c, false)

t14

delete from log

At t10
the transaction manager records the decision
to commit to the log.
Let's say that the system crashes after t10, say between t11 and t12. When the system restarts,
it will call recover() on
all known Resource Managers and it will read the transaction log. In
the transaction log it will find that xid1x was marked for
commit.
Through recover() it will
find that xid1b
and xid1c are in doubt. It knows that these two
need to be committed because of the commit decision in the log.

What happens if the system crashes before the
commit decision is written to the log, for example between t8 and t9? Upon recovery, the recover() method of RMa, RMb and RMc return xid1a and xid1b (but not xid1c because
prepare was not
called on RMc
yet). The
transaction manager will rollback RMa
and RMb
because no commit
decision was found in the log.

SeeBeyond's Logless XA Transactions

Let's take a look at the recover() method on the XAResource. This method returns
an array of Xid objects.
Each Xid object holds two
byte[]
arrays. These two arrays represent the global transaction ID and the
branch
qualifier. They are typically random numbers picked by the transaction
manager. The Resource Managers that receive these Xids should use these objects
as identifiers and return them in the recover() method unmodified.

At SeeBeyond, Jerry Waldorf and Venugopalan
Venkataraman came up with an idea to use the storage space in the byte[] arrays of the Xid as a way to persist the
transaction state. Here's how it works. Let's modify the above
example by removing transaction logging:

time

RMa

RMb

RMc

Coordinator

t1

start(xid1a,
TMNOFLAGS)

t2

start(xid1b,
TMNOFLAGS)

t3

start(xid1c,
TMNOFLAGS)

t4

end(xid1a, TMSUCCESS)

t5

end(xid1b, TMSUCCESS)

t6

end(xid1c, TMSUCCESS)

t7

prepare(xid1c)

t8

prepare(xid1b)

t9

prepare(xid1a)

t10

commit(xid1c, false)

t11

commit(xid1b, false)

t12

commit(xid1a,
false)

A commit decision is still
being made, but this decision is no longer
persisted in a separate transaction log. In stead, it is persisted in xid1a. If the system
finds xid1a
upon recovery, it knows
that a commit decision was made. If it doesn't find xid1a, it knows that
a commit
decision was not made. Note that the order in which both prepare and commit are called on the three
Resource Managers is very important.

As in the first example, if the system crashes before a commit
decision has been made, it will rollback any resources upon recovery.
E.g. if the system crashes between t8 and t9, it will encounter xid1c and xid1b and will call rollback() on these because it
cannot find a record of a commit-decision for xid1, i.e. it cannot find xid1a. Hence, xid1b and xid1c need to be
rolled back.

If the system crashes after a commit decision has been made, for
example between t10 and t11, it will find xid1b and xid1a. Since xid1a signifies a
commit
decision, both xid1b
and xid1a
should be committed.

So far so good. But how does the transaction manager know that if it
encounters xidb
it should
look for xida to figure out if a
commit
decision was made? This is where the transaction manager uses the byte[] of the Xid: it stores this information
in one of them.

Complicating factors

A problem in this scheme occurs when the prepare(xid1a) method returns XA_RDONLY. If that happens, commit(xid1a, false)
cannot be
called, and RMa will not return xid1a
upon calling recover().
Recall that xid1a had special significance!
Hence it is important to order the Resource Managers such that the
first one on which prepare()
is called, is both reliable and will not return XA_RDONLY. However, in normal
EE applications, the application prescribes in which order resources
are enlisted in a transaction. Hence, to use this logless transaction
scheme, the application server either needs to be extended with a way
to
specify resources a priori, or the application server needs to be
extended with a learning capability so that it knows which resources
are enlisted in a particular operation so that it can pick the right
resource manager to write the commit decision to.

The SeeBeyond logless transaction approach is one of the ways that
transaction logging can be made less exensive. In a future blog, I'll
cover additional ones.