I have just been discussing this with Jonathan and the we are proposing the following approach to transaction timeout.

1. When the transport determines that the transaction is flowing to a different server, it should query TransactionTimeoutConfiguration to find out how long is left for the transaction by calling: getTimeLeftBeforeTransactionTimeout

2. It flows this value over to the remote side and calls SubordinationManager.getTransactionImporter().importTransaction(xid, timeout) with a timeout equal to this original value plus a value (for the purposes of this discussion we can call it a fudgefactor)

Why the need for the fudgefactor?

Basically we need to ensure that the subordinate transactions do not timeout before the parent transaction. If such a situation arises then the transaction when it tries to complete will get a heuristic. Why? Well basically after the subordinate transaction rollsback at time T (because of the timeout) then it will clean up after itself. Now when the parent tries to rollback the transaction (at time T+1) the subordinate will not know about the transaction. This will cascade a rollback but a heuristic one which is not great.

Ideally we would have the parent transaction manager responsible for monitoring the timeout and cascading this down, but if the root fails (or a link in the chain) then locks would be help indefinitely until that node resumed.

If we have a directly connected transport that can determine when a linked node fails then we could remove the fudgefactor and look at implementing an immediate rollback in the case of parent/child failure...

By the way, a more favourable description of the so-called fudgefactor is the following:

"The amount of time after a transaction should have timed out to wait for the root transaction manager to time out the transaction before assuming that the root transaction manager has died and so locks should be released by rolling back the transaction at this subordinate and below".

How will you determine the value of the fudge factor? All you can ever do is try to narrow the window of vulernability and no value is ever going to be right for everyone or every scenario. It could also be argued that by adding a fudge factor you're allowing transactions to live longer than they should do.

As I mentioned above, the fudgefactor is really a transport specific detail. In the normal case the transaction will timeout at the root transaction manager and cascade down to the subordinate transactions managers. Ideally we would not monitor timeout in the subordinate at all and beable to rely on the root TransactionReaper to time us out but if the root (or an intermediary fails) we can't rely on solely on this, hence fudgefactor is there in case a parent transaction manager fails, in which case if the transport can't tell us that a parent has died then we need fudge factor to allow us to say basically we haven't heard anything from the parent in a reasonable time, timeout the transaction here and accept the fact that there will be a heuristic to the parent

I will check what happens in JTS though I am guessing that in JTS the ORB tells you a parent has died so you can clean up your transaction immediately.

Totally appreciate your point about how to determine timeout though, it will need to be a configuration of the transport I would argue.

Ideallly we would be notified by the transport, I am waiting to hear from David whether this is something we can rely on.

In the JTS we propagate the remaining timeout down to the subordinates and the transaction they create locally has that timeout associated with it. So the leaf node can timeout even if the parent doesn't send a completion message. There's no fudge factor, i.e., we don't add a network delay to the timeout at the leaf, assuming that the transmission is instantaneous. Realistically since most timeouts are measured in seconds not milliseconds this hasn't caused a problem so far.

Transport-wise we can definitely propagate timeout information down. I think we can configure the extra timeout in the client transaction context (with a reasonable default). And yes, we should be able to rely on the transport for propagating the connection timeout message, assuming that XAResource is informed about timeouts that is (via a rollback message I would assume?). The extra time value is related to the latency of the connection which can vary based on conditions, but in a normal setup would probably be fairly low (10 seconds or less) I would guess.

We definitely cannot count on synchronized clocks, so each node will be informed of the remaining transaction time upon inflow.

Also we probably need a better term than "fudge factor" in the API... maybe "extra time" or something.

Say there is a root TM and a subordinate TM with a timeout of say 2 second

time T+0: root TM starts transaction timeout of 2

time T+1: tx flows to subordinate TM (using the remaining amount of timeout at root TM as the timeout for subordinate TM, i.e. the timeout value appears to be 1 at the subordinate TM)

time T+2: subordinate TM timesout and forgets about the transaciton AND root TM timesout. The root TM timeout cascades the abort to all XAResources (remember in the model we are going for subordinate transaction managers are registered as XAResources so each subordinate gets a call to abort from the root tm)

time T+3: the subordinate transaction manager receives the abort message from the root TM via the proxy XAResource but by now has cleaned up the transaction so will return an error indicating that it can't find the transaction

Say there is a root TM and a subordinate TM with a timeout of say 2 second

time T+0: root TM starts transaction timeout of 2

time T+1: tx flows to subordinate TM (using the remaining amount of timeout at root TM as the timeout for subordinate TM, i.e. the timeout value appears to be 1 at the subordinate TM)

time T+2: subordinate TM timesout and forgets about the transaciton AND root TM timesout. The root TM timeout cascades the abort to all XAResources (remember in the model we are going for subordinate transaction managers are registered as XAResources so each subordinate gets a call to abort from the root tm)

time T+3: the subordinate transaction manager receives the abort message from the root TM via the proxy XAResource but by now has cleaned up the transaction so will return an error indicating that it can't find the transaction

Yeah I get the potential issue. The extra time added has to be greater than the expected latency between the time the root controller issues the timeout notice and the time that the subordinate can receive and process it.

That said, we need some level of tolerance for the case where the given extra time is not sufficient and the subordinate node is unable to abort the transaction because it's already gone, because it will happen at some point. Ideally, this transaction abort would be idempotent.

Ideally the subordinate will never need to timeout the transaction as the parent will have done it for it. If the parent timesout the transaction at the subordinate it is removed from the subordinates reaper so that is ideal.

As you say, in certain scenarios a delay of <insert your fudgefactor here> is realistic (say a stop the world GC in the parent) at that point we will get the heuristic scenario in the parent where the child has already timed out the tx. I don't think we can safely leave the transaction laying around indefinitely in the subordinate waiting for the parent to try to check if it is timeoutable as we can't be sure the parent hasn't died and will never recover.

Is remoting permanent connection oriented? Can we rely on a notification in remoting to say that the connection between the an application server has broken as a result of a server crash?

Ideally the subordinate will never need to timeout the transaction as the parent will have done it for it. If the parent timesout the transaction at the subordinate it is removed from the subordinates reaper so that is ideal.

As you say, in certain scenarios a delay of <insert your fudgefactor here> is realistic (say a stop the world GC in the parent) at that point we will get the heuristic scenario in the parent where the child has already timed out the tx. I don't think we can safely leave the transaction laying around indefinitely in the subordinate waiting for the parent to try to check if it is timeoutable as we can't be sure the parent hasn't died and will never recover.

Is remoting permanent connection oriented? Can we rely on a notification in remoting to say that the connection between the an application server has broken as a result of a server crash?

Unfortunately no - there is no contract specifying that a connection has to remain active for the life of the server, or even for the life of a transaction. Connections can be dropped and reestablished between nodes, and the transaction can be completed as if nothing happend. Also, the connection may disappear due to network issues, temporarily or permanently.

In normal operation a user would be expected to keep a single connection active, but that's going to be a best practice, not a requirement.

I understand the scenario. We do this in JTS. Still do it without a "fudge factor"

Tom Jenkinson wrote:

I don't think I am explaining this well enough.

Say there is a root TM and a subordinate TM with a timeout of say 2 second

time T+0: root TM starts transaction timeout of 2

time T+1: tx flows to subordinate TM (using the remaining amount of timeout at root TM as the timeout for subordinate TM, i.e. the timeout value appears to be 1 at the subordinate TM)

time T+2: subordinate TM timesout and forgets about the transaciton AND root TM timesout. The root TM timeout cascades the abort to all XAResources (remember in the model we are going for subordinate transaction managers are registered as XAResources so each subordinate gets a call to abort from the root tm)

time T+3: the subordinate transaction manager receives the abort message from the root TM via the proxy XAResource but by now has cleaned up the transaction so will return an error indicating that it can't find the transaction

I understand the scenario. We do this in JTS. Still do it without a "fudge factor"

Tom Jenkinson wrote:

I don't think I am explaining this well enough.

Say there is a root TM and a subordinate TM with a timeout of say 2 second

time T+0: root TM starts transaction timeout of 2

time T+1: tx flows to subordinate TM (using the remaining amount of timeout at root TM as the timeout for subordinate TM, i.e. the timeout value appears to be 1 at the subordinate TM)

time T+2: subordinate TM timesout and forgets about the transaciton AND root TM timesout. The root TM timeout cascades the abort to all XAResources (remember in the model we are going for subordinate transaction managers are registered as XAResources so each subordinate gets a call to abort from the root tm)

time T+3: the subordinate transaction manager receives the abort message from the root TM via the proxy XAResource but by now has cleaned up the transaction so will return an error indicating that it can't find the transaction

It's not needed. If the subordinate times out "early" and there are only two scenarios when the coordinator eventually times out too:

1: it hadn't been prepared, so the "I can't find this transaction" message that comes from the subordinate is OK, i.e., the coordinator knows that the subordinate rolled back anyway.

2: if it had prepared and decided to roll back then the "Heuristic Law" states that the subordinate needs to remember the fact and it can't ever say "I can't find this transaction". Instead it needs to say "I found the transaction and rolled back" or "I found the transaction and committed".

Case 2 has to be considered even without user level transaction time outs since it's basic heuristic capabilities.

David Lloyd wrote:

Tom Jenkinson wrote:

I don't think I am explaining this well enough.

Say there is a root TM and a subordinate TM with a timeout of say 2 second

time T+0: root TM starts transaction timeout of 2

time T+1: tx flows to subordinate TM (using the remaining amount of timeout at root TM as the timeout for subordinate TM, i.e. the timeout value appears to be 1 at the subordinate TM)

time T+2: subordinate TM timesout and forgets about the transaciton AND root TM timesout. The root TM timeout cascades the abort to all XAResources (remember in the model we are going for subordinate transaction managers are registered as XAResources so each subordinate gets a call to abort from the root tm)

time T+3: the subordinate transaction manager receives the abort message from the root TM via the proxy XAResource but by now has cleaned up the transaction so will return an error indicating that it can't find the transaction

Yeah I get the potential issue. The extra time added has to be greater than the expected latency between the time the root controller issues the timeout notice and the time that the subordinate can receive and process it.

That said, we need some level of tolerance for the case where the given extra time is not sufficient and the subordinate node is unable to abort the transaction because it's already gone, because it will happen at some point. Ideally, this transaction abort would be idempotent.

Einstein shows that simultaneity isn't possible Lamport shows the problems of trying to do synchronization in a distributed system (check out Time, Clocks and the Ordering of Events" paper from the late 1970s.

I know the idea of a "fudge factor" seems appealing, but as I already said, it's going to be difficult to manage across different types of infrastructrue (please, not yet another configuration option!). 10ms may be "enough" for traditional ethernet, but may be overkill for microchannel. And when timeouts are typically measured in seconds anyway, it's probably not going to do much in narrowing the window of vulernability.

Tom Jenkinson wrote:

Ideally the subordinate will never need to timeout the transaction as the parent will have done it for it. If the parent timesout the transaction at the subordinate it is removed from the subordinates reaper so that is ideal.

As you say, in certain scenarios a delay of <insert your fudgefactor here> is realistic (say a stop the world GC in the parent) at that point we will get the heuristic scenario in the parent where the child has already timed out the tx. I don't think we can safely leave the transaction laying around indefinitely in the subordinate waiting for the parent to try to check if it is timeoutable as we can't be sure the parent hasn't died and will never recover.

Is remoting permanent connection oriented? Can we rely on a notification in remoting to say that the connection between the an application server has broken as a result of a server crash?