1. For all log files gather all committing Xids2. For each resource, call recover to get xids3. commit each resource for xids found in log files4. For each xid left over for each resource, call rollback

There's really no other way to do it, right? That's the way it is supposed to be done?

The reason I ask this, is that this becomes very tricky with clustering.

One and only one server in the cluster should do recovery. This is because another server may have logs that another server may not have and may cause unnecessary rollbacks.

Another potential problem is: How can even an HA Clustered Singleton perform recovery when any other node is live? Let's say a node fails in the cluster and is later re-brought up. It has a recovery log. It passes the recovery log to the cluster singleton. Couldn't the singleton possibly rollback live transactions? Does there have to be a stop the world to do recovery?

This may be made easier if the Xid had some information in the globalid of which node did the transaction. Then, you don't rollback Xid's returned by the XAResource that don't match that node identifier.

You need to differentiate imported transactions (OTS/JCA inbound)which will be recovered by the TM that propagated the tx.

Yes. The simplest way to get it to work correctly is to use a GID.Especially when you might want to run multiple instances on the same server.The host/sequence number doesn't work:1) You could have multiple servers on the same host2) The sequence number can't go backwardsAnother solution to the sequence number problem is to persist it in the log, but thatis inefficient and requires early recovery/reading of the logs.

The way to sidestep all problems is to use a singleton transaction manager in the cluster.