I am looking for a good strategy of dealing with database deadlocks from within a Java 6 application; several parallel threads could, potentially, write into the same table at the same time. The database (Ingres RDMBS) will randomly kill one of the sessions if it detects a deadlock.

What would be an acceptable technique to deal with the deadlock situation, given the following requirements?

the total elapsed time should be kept
as small as reasonably possible

killing a session will incur a
significant (measurable) rollback

time threads have no way to
communicate with each other i.e. the
strategy should be autonomous

Can it be improved in any way? e.g. waiting for a fixed amount (magic number) of seconds.
Is there a different strategy that will produce better results?

Note: Several database level techniques will be used to ensure deadlocks are, in practice, very rare. Also, the application will attempt to avoid scheduling threads that write into the same table at the same time. The situation above will be just a “worst case scenario”.

Note: The table in which records are inserted is organised as a heap partitioned table and has no indexes; each thread will insert records in it's own partition.

5 Answers
5

A commonly used approach is some form of exponential back-off. Rather than your 1000*attempts+random aproach, make the delay an exponential function of the number of attempts. This ensures minimal latency in the first one or two attempts, where it might have just been bad luck that you deadlocked, but gives you much bigger delays later, when it is clear that the connection really is congested.

Of course, another approach would be to try to arrange your database accesses so that deadlocks are less likely to occur. But without knowing what your queries do (and how, and when they're executed), it's impossible to say if that can be done

Exponential function sounds good - I will try to simulate it! As I mentioned in the note, the application design will aim to avoid deadlocks, including arranging the database access favourably. But there is no guarantee that, in corner cases, a large number of deadlocks will occur.
–
AdrianDec 8 '09 at 11:26

Can you provide any technical reference for the "exponential back-off" technique you mention?
–
AdrianDec 11 '09 at 22:08

2

It is commonly used in network protocols to avoid congestion. Check the wiki article: en.wikipedia.org/wiki/Exponential_backoff But the basic idea is simple. You simply use some kind of exponential function to determine the delay at each retry attempt. The exact details can be tweaked to suit your purposes. Of course the simplest possible implementation would be to delay for 2^n ms where n is the number of retries so far. But perhaps you think that grows too slowly to begin with, or starts too low, or grows too quickly. Then you just add a multiplier, or add something to n
–
jalfDec 11 '09 at 22:25

The basic idea is simply that the first 1-3 tries or so should be cheap and have very short delays. Most likely they were just bad luck, and you can retry again almost immediately and you'll avoid the deadlock. But if you keep failing, it means that connection is congested, and you have to back off and retry less often, to reduce the pressure on the connection. If the line is congested, all clients will do this, and so the congestion will resolve itself as all clients wait longer and longer until you find a balance.
–
jalfDec 11 '09 at 22:26

Also important: add random jitter to the delay! Otherwise you might have the bad luck of making your ith attempt at almost the exact same time as the other guy makes his ith attempt, for ever-increasing i...
–
j_random_hackerOct 14 '13 at 16:35

That's the way we did it. Loop and retry the transaction until it finishes.

We didn't mess with random delays.

Also, we did the commit inside the try block and the rollback in the exception handler.

When you have multiple lockable resources and multiple concurrent transactions, deadlock is unavoidable. It's a logical consequence of contention for locks.

If you avoid contention for locks (i.e., pessimistic table-level locking) then you also tend to prevent concurrency. If you can define transaction which don't contend for locks, you can avoid deadlock. Concurrent access to the same table, however, is pretty much the definition of deadlock.

When loading, inserts (especial in a HEAP table) can (often) proceed in parallel without many contention issues. If you delay building the indices, then there's no other updates going on during the insert.

So, you may be able to avoid by dropping the indexes, changing the organization to a heap, loading with multiple concurrent processes (or threads, it's usually faster to have multiple processes), then build your indices (and possibly reorganize the table), you may be able to avoid deadlocks.

In my simulations (20 threads) a zero delay will trigger a huge number of deadlocks; a large delay will actually improve significantly the overall elapsed time. autocommit is set to on - but a database deadlock will also involve an automatic rollback
–
AdrianDec 8 '09 at 11:21

@Adrian: We had fairly complex error handling, so we did the rollback "just to be sure". Also we were working in C, so it was a "virtual exception". Finally, we had some C/Ingres n00bs doing the coding, so we did that in case they had hidden little logic errors.
–
S.LottDec 8 '09 at 11:25

@Adrian: In our actual practice, the presence of absence of a delay didn't make any practical difference. A heavily loaded OS introduces it's own randomized delays. YMMV. I'm not debating your simulation. I'm telling you what we did. We didn't mess with random delays.
–
S.LottDec 8 '09 at 11:28

@S.Lott We do use a heap table (actually a partitioned heap table) with no indexes
–
AdrianDec 8 '09 at 11:34

If you don't need to have concurrent access to the database a simple solution might be to remove it and use a task processing queue to update the database instead, serialising access to the database via the queue. I realise this will introduce an asynchronous element to your application, and so would not be suitable for most user initiated applications or online web-apps, but might be worth considering for a batch/offline type application (I realise probably not the answer your looking for though).

Processing queue will slow down the application considerably - avoiding deadlocks at all costs it is not an option. Production server is a 4 processor Unix server, should handle some number of parallel threads easy
–
AdrianDec 8 '09 at 11:26

With a database like Ingres you will always get some deadlocks, so you have to assume that any insert, update or delete will fail and have a retry strategy in place (as in your example).
You should design your database so that contention is minimised and deadlocks only happen rarely. If you are continually getting transactions failing even after several retries, then this is a sign that you'll have to do some major database redesign (or move to a system like Oracle where it is usually possible to design applications to avoid deadlocks by suitable use of row-level locking).