Conor vs. Isolation Level Upgrade on UPDATE/DELETE Cascading RI

A customer recently asked our support organization about an unexplained series of range locks that they were seeing on a DELETE to a table with a foreign key.

Here is the repro:

1: createtable Foo (FooId intnotnullprimarykey)

2: createtable Bar (FooId intnotnull, BarId intnotnull)

3:

4: altertable Bar

5: addconstraint PK_Bar

6: primarykey (FooId, BarId)

7:

8: altertable Bar

9: addconstraint FK_Bar_Foo

10: foreignkey (FooId) references Foo(FooId)

11: ondeletecascade

12:

13: insert into Foo values (1)

14: insert into Bar values (1, 1)

15:

16:

17: settransactionisolationlevelread committed

18: begintran

19: deletefrom Foo where FooId = 1

20: committran

So the basic idea is to delete a row from a table in such a way that it would cause a cascading delete to occur. (It would also work for updates with update cascades). The customer saw that we were taking key range locks, which is really something that is done only in the serializable isolation level (which is SQL Server’s safest but “most blocking” isolation mode).

Locking is a very important aspect of database applications, but it is not well understood because it happens almost entirely implicitly. If you look at the plan for this delete, you don’t see much in terms of locking being explained:

(So, I think that this is probably something we need to do a better job of documenting in the future 😉

To understand the customer problem, we need to understand how locks get taken by the system. Internally, the Storage Engine will lock all pages that are scanned by the Query Processor. So, if the QP generates a seek plan, the SE will lock each structure that is touched as part of the Seek (Index object, then each intermediate page, and then the leaf row, assuming row-level locking as the granularity is on). Likewise, a scan will acquire locks for each row it touches, but it might have a lot more. At a certain point, SQL Server will try to escalate locks to a less granular lock so it can work better on large operations.

Now, what happens when we have a delete that actually touches two tables? The semantic properties of a cascading delete are:

1. Delete from the Foo table implies that all associated rows in Bar get deleted (the cascading part)

2. At the end of the statement, the Foreign Key property is still valid

So, if we just used nice read-committed, we hold X locks on things that we modify until the transaction commits (good), but we don’t prevent someone else from inserting a row while the Delete is being processed. This is a classic definition of why you would need a SERIALIAZABLE transaction… This diagram shows what is happening on the plan temporally:

(Remember that SQL Server often has many threads executing at once. So, operations against different access paths will not happen at identical times. There is a small time gap between the operations, and this is where you can get race conditions).

So, rather than require that customers set the serialization level for operations like this, SQL Server automatically upgrades the isolation level of certain operations to block other operations from inserting a row that would violate the foreign key property. In this case, it will take some range locks as if those parts of the query were running in a serializable isolation level without upgrading the whole statement. This is one way to minimize the blocking associated with the requirement by the server to avoid data corruptions.

SERIALIZABLE locks are typically held until the end of the transaction. If you run multiple statements in the same batch, it could take awhile until those locks get released. Since the server knows that exactly why it took the locks, it happens to know that it is safe to release those serialized range locks at the end of the statement. So, these internal serialized locks are held only until the statement finishes.

This same condition applies to indexed view maintenance, but I’ll save that for another day :).

I was playing around with (NOLOCK) and isolation levels – I work in a shop that slaps (NOLOCK) on every query, without exception – looking at the locking that occurs, and I from this testing I concluded that UPDATE with (NOLOCK) converts a serializable or repeatable read isolation level into read committed. Will (NOLOCK) alter the locking behavior in this case as well?

Note: I do not agree with my shop’s use of the NOLOCK hint. I’m asking out of curiosity. Plus if you had anything to say about the use (and rampant overuse) of NOLOCK I’m sure there are people who would find your opinion useful and persuasive. 🙂