2) When deadlocks go unnoticed…

Ok, big buzz word in the title – I’m sure that just the word ‘deadlock’ will get me the audience this time 😉

Before we start with the actual point, let’s get clear about what is meant by ‘deadlock’.
Among the huge variety of possible locking and hanging situations parallel working systems can get into, deadlocks are very specific.The point of deadlocks is not that the system is hanging for a long time, but that it is impossible for the processes involved in the deadlock to resolve it by themselves.

Since locking situations can be thought as (mathematical) graphs. A deadlock can be defined as a closed circular graph with the minimum number of vertices.
The simplest example would look like this:

In this case, PA and PB need to wait for each other for the release of the requested resource. But since they both wait, no process can actually release a lock – this is a deadlock.

Of course, deadlocks can be way more complex, including many resources, more processes and sometimes even multiple application layers (these are really nasty since usually there is no coherent view to these cross-layer locks).

One advantage of this rather abstract view to deadlocks is that this makes it easier to recognize them.
That is what’s behind the deadlock detection feature of current DBMS.

Whenever a process needs to wait for a resource for a long time (say 1 second or so), the DBMS looks out for such a deadlock graph and eventually ‘resolves’ the situation by telling one of the waiting processes that it won’t get the lock.

The general idea behind the feature is of course not to prevent deadlocks.Deadlocks are usually design-errors, bugs of the application program. This cannot be fixed automatically.

However, it is important for heavy-duty databases to keep running as long as possible.

To make this possible, the deadlock detection and resolution helps a great deal.
Once a deadlock is removed, the whole system can continue it’s work, while only one transaction gets an error.

So far the story is rather nice, isn’t it?

The DBMS checks for deadlocks and makes sure that the system will stay responsive even if the application designers made a mistake.

Unfortunately, nothing is perfect – and so isn’t the deadlock detection in MaxDB.
As you may know (or learn now) MaxDB knows different kinds of SQL locks:

Table locks

Row locks

Dictionary/Catalog locks

As long as the deadlock is just between table/row-locks everything works just as expected:

Besides our two already known row_exclusive locks on tables LOCKA and LOCKB we also find one for SYSDDLHISTORY and a row_share lock for SYS%CAT2.

What are those about?
Well, the lock for SYSDDLHISTORY is for an insert statement that is automatically done with MaxDB >= 7.7 whenever a DDL statement is issued.
The SYSDDLHISTORY table will contain all committed DDL statements by that – neat feature but has nothing to do with what we want to do here.
The SYS%CAT2, in turn, is the mentioned catalogue lock.

Now, this is, in fact, a deadlock but MaxDB does not do anything about it.

The reason for that is simple:The deadlock detection does not include the share locks!

To be precise, for share locks the kernel does not maintain a list of session IDs, but only a single counter.
Based on this counter it’s not possible to find out which session is holding/waiting for a specific share lock and in consequence, the kernel cannot tell which tasks to roll back.
In this case, one user task needs to be manually cancelled or the lock timeout will deny the first request.

Although this is an ugly limitation of the deadlock detection it’s not really that bad in day to day DB usage.
The reason simply is that usually there are only few DDL commands running in parallel – especially when it’s not the upgrade weekend.

3) The dead walk – how deleted rows reappear

Ok, one last thing 🙂

It’s a simple effect that I found to be surprising while I was playing around with locks during the ‘research’ phase for this blog.

#### Session 1
select * from locktest

THE_ROW THE_ROW2
1 ?
10 ?
2 ?
3 ?
4 ?
5 x
6 x
7 x
8 x
9 x

delete from locktest where the_row >='5'
More than one row updated or deleted. Affected Rows: 5

-> SEE: no commit here!

#### Session 2
select * from locktest

THE_ROW THE_ROW2
1 ?
10 ?
2 ?
3 ?
4 ?

Where is the data?

#### Session 1
rollback

#### Session 2
select * from locktest

THE_ROW THE_ROW2
1 ?
10 ?
2 ?
3 ?
4 ?
5 x
6 x
7 x
8 x
9 x

There it is!

This is a really nasty feature if you come from other DBMS like Oracle.
MaxDB currently (!) does not support a consistent view concurrency and it does not reconstruct deleted rows.
Since deletions are done in-place during the statement execution (and not at commit time) the deleted rows are really just gone when the second session looks into the table.
There’s nothing there to tell the second session to look for old data, the data is just gone.

If your application really relies on a consistent view of the data without data access phenomena like ‘dirty reads’, ‘non-repeatable reads’ etc. then you either need to use a higher transaction isolation mode (but loose scalability by that) or make your application aware of this.

Looking back

As we’ve seen locking is not really something that is ‘just there’.
It can become pretty important to be able to differentiate between what locking can do for you and what it wouldn’t do.

One important thing I did not mention yet explicitly: I’ve been just writing about SQL locks. But MaxDB (and the other DBMS as well) rely on multiple different shared resources that need to be protected/serialized as well.

Resources

Marketing

If you’re not already booked for October 27-29 this year and you happen to stay in Vienna and you keep asking yourself what to do … then get your ticket for SAP TechED 2009 and make sure to attend my MaxDB session !

In addition to the presentation there will be a expert session on the afternoon, where I’ll await your questions that I hopefully can answer.
It’s session EXP349 MaxDB Q&A Tuesday, 2:30 P.M that you should register for.