Select For Update – In What Order are the Rows Locked?

21112011

November 21, 2011

A recent thread in the comp.databases.oracle.server usenet group asked whether or not a SELECT FOR UPDATE statement locks rows in the order specified by the ORDER BY clause. Why might this be an important question? Possibly if the SKIP LOCKED clause is implemented in the SELECT FOR UPDATE statement? Possibly if a procedure is hanging, and it is important to know at what point an enqueue happened? Any other reasons?

Without testing, I would have suggested that the rows are locked as the rows are read from the table blocks, and not after the ORDER BY clause alters the order of the rows. Why? Oracle Database is fundamentally lazy, or put another way, fundamentally efficient – in general it does not perform unnecessary work. Locking the rows after applying the ORDER BY clause would require a second visit to the table blocks (possibly having to visit each block multiple times to lock different rows in the same block) in the order specified by the ORDER BY clause. Such an approach could be incredibly inefficient and also error prone (what happens if a row was locked by a second session while the first session was sorting the rows per the ORDER BY clause?).

We could guess, but why guess when we are able to easily test the theory? Let’s create a simple table with 10,000 rows:

Now we need 2 sessions (I will call them Session 1 and Session 2). We will execute the same SELECT FOR UPDATE statement in both sessions, with Session 1 first selecting the table rows in ascending order and then Session 2 selecting the table rows in descending order. If the ORDER BY clause determines the order in which the rows are locked, the row with a C1 value of 10,000 should be identified as the row that caused the enqueue because that is the first row that should be returned to Session 2.

In Session 1:

SELECT
C1,
C2
FROM
T1
WHERE
MOD(C1,100)=0
ORDER BY
C1
FOR UPDATE;

In Session 2:

SELECT
C1,
C2
FROM
T1
WHERE
MOD(C1,100)=0
ORDER BY
C1 DESC
FOR UPDATE;

(Session 2 is hung)

Let’s try to identify the row that Session 2 is waiting to lock. There are a couple of ways to accomplish this task (on Oracle Database 10.1 and higher the join to the V$SESSION_WAIT view is unnecessary because that information is found in V$SESSION).

We have the object ID (71913 – technically we need the DATA_OBJECT_ID, not the OBJECT_ID for the DBMS_ROWID.ROWID_CREATE call, but the two values will be identical in this test case), absolute file number (4), block (4262), and row (18) in the block that caused the enqueue. Let’s select that row from the table:

The row with a C1 value of 100 is the first row that was returned by Session 1 (it will be the last row requested to be returned by Session 2), but it is also the first row that matched the WHERE clause predicates for Session 2’s SQL statement as the blocks were read (you could confirm the order in which the blocks are read by flushing the buffer cache before the select, setting the DB_FILE_MULTIBLOCK_READ_COUNT parameter at the session level to 1, enabling a 10046 trace, and then executing the SELECT FOR UPDATE statement).

Let’s retrieve the execution plan for Session 2 to determine why the original poster (OP) might have throught that the rows were locked after the ORDER BY is applied (note that the SQL_ID and SQL_CHILD_NUMBER columns only exist in V$SESSION as of Oracle Database 10.1, so technically the join to V$SESSION_WAIT is unnecessary; however, for consistency with the previous SQL statement that determined the locked row, the join to the V$SESSION_WAIT view is included):

SELECT
S.SQL_ID,
S.SQL_CHILD_NUMBER
FROM
V$SESSION_WAIT SW,
V$SESSION S
WHERE
S.USERNAME IS NOT NULL
AND SW.SID=S.SID
AND SW.EVENT NOT LIKE '%SQL*Net%'
AND SW.EVENT NOT IN ('Streams AQ: waiting for messages in the queue', 'wait for unread message on broadcast channel');
SQL_ID SQL_CHILD_NUMBER
------------- ----------------
2dnpymtj0rc1r 0

As mentioned by the OP, the execution plan appears to be slightly misleading – unless of course you remember that locking the rows after sorting the rows based on the ORDER BY clause would require revisiting the rows in the table blocks. The situation could be different in this test case if there was an index on column C1. In such a case the index could be read in descending order, thus making it appear that the rows were attempted to be locked in the order described by the ORDER BY clause.

Let’s slightly adapt the original test case to test the second theory, that it could appear that the rows are locked in the order specified by the ORDER BY clause.

In Session 1:

ROLLBACK;

In Session 2:

ROLLBACK;

Now we will create an index on column C1.

In Session 1:

CREATE INDEX IND_T1_C1 ON T1(C1);

Now the revised test begins.

In Session 1:

SELECT /*+ INDEX(T1) */
C1,
C2
FROM
T1
WHERE
C1<=100
ORDER BY
C1
FOR UPDATE;

In Session 2:

SELECT /*+ INDEX(T1) */
C1,
C2
FROM
T1
WHERE
C1<=100
ORDER BY
C1 DESC
FOR UPDATE;

Notice in the above output, now block number 4259 is identified, while in the earlier test script block number 4262 was identified by the above SQL statement. We have the object ID (we actually need the DATA_OBJECT_ID, but the values will be the same in this test case), the absolute file number, the block number, and the row number in the block (0 in this case). Let’s select that row from the table:

The row with a C1 value of 1 is the first row that was returned by Session 1 (it will be the last row requested to be returned by Session 2), but it is also the first row that matched the WHERE clause predicates for Session 2’s SQL statement as the blocks were read. This is the same result as we saw before – maybe it does not matter whether an index access path is used to avoid the sort operation that would be otherwise required to satisfy the ORDER BY clause – is the second theory false? Before making that determination, let’s take a look at the execution plan for Session 2’s SQL statement:

The index that we created was definitely used, but notice that there is still a SORT ORDER BY operation in the execution plan. The rows in the index were read in ascending order, not descending order! Let’s try again using an INDEX_DESC hint in the SQL statement for Session 2.

In Session 1:

ROLLBACK;

In Session 2:

ROLLBACK;

In Session 1:

SELECT /*+ INDEX(T1) */
C1,
C2
FROM
T1
WHERE
C1<=100
ORDER BY
C1
FOR UPDATE;

In Session 2:

SELECT /*+ INDEX_DESC(T1) */
C1,
C2
FROM
T1
WHERE
C1<=100
ORDER BY
C1 DESC
FOR UPDATE;

Notice in the above execution plan that there is no longer a SORT ORDER BY operation in the execution plan (replaced by a BUFFER SORT operation) and the INDEX RANGE SCAN operation was also replaced by an INDEX RANGE SCAN DESCENDING operation. Simply because the index range scan is performed in the same order as specified by the ORDER BY clause, the rows are locked in the same order as is specified by the ORDER BY clause – that is, after all, the order in which the rows were touched.

Any other ideas for a demonstration of the order in which rows are locked when a SELECT FOR UPDATE is used?

I think that you have a good suggestion, but I am having trouble putting together a full picture of what you are suggesting (I have a feeling that your comment ties in with Mark’s comment below). Are you describing a situation where the resultset of a SQL statement like this:

SELECT
C1,
C2
FROM
T1
ORDER BY
DBMS_RANDOM.VALUE;

(Note in the above that the FOR UPDATE clause is not specified.) Would be used in a loop to update the rows one at a time in table T1? I could see this arrangement resulting in a deadlock if two sessions perform the action at roughly the same time. A related side note: if an ORDER BY clause is not specified, it could very well be the case that the rows for two sessions will be returned in different sort orders if the execution plans were not identical for the same SQL statement executed by the two sessions, and I suspect that could lead to the same problem described above with the ORDER BY DBMS_RANDOM.VALUE.

I’m referring to just a plain SELECT … FOR UPDATE ORDER BY DBMS_RANDOM.VALUE. Some time ago I easily got deadlock in such a case with 2 sessions running the same statement. But I can’t reproduce it now, so maybe I’m wrong.

An interesting thing about testing for deadlocks is that you can sometimes drive deadlocks with procedures consistently, but some other techniques are only prone to deadlocking, while yet others are not prone to deadlocking but may still be susceptible to deadlocks in certain race conditions under heavy load.

I think you were trying to demonstrate an easy way to show that if you don’t pay attention to order it is trivial to get deadlocks. Don’t be confident your pair of random updates won’t ever deadlock because they didn’t once. That just demotes your test to one that won’t always deadlock. I suspect your earlier test had some additional wrinkle (or was on a different data set) if it consistently deadlocked.

Of course there are plenty of ways to intentionally generate deadlocks consistently. The goal of avoiding deadlocks is a bit trickier and Charles’ demonstration of a way to influence the order of locking as of current releases is useful toward that goal.

For now it seems a very useful technique. Like any other detail of current physical access method and order, it may change in the future. Unlike eschewing an order by in V5 when you had the matching group by because you knew order only had a sort based aggregation and it actually completely re-sorted the result set, Charles’ process won’t break your query if the order of acquiring locks changes in the future due to some change in the access order. It will just lose its current positive effect on avoiding deadlocks. And while hash aggregation methods for group by were well understood computer science just waiting to reach a positive tradeoff for advantages versus cost of implementation and testing, acquiring locks in order when there is an order in place and supported by an index seems solid as long as parallel access methods are not considered. (And multiple threads of parallel DML against the same objects is just asking for trouble as far as deadlocks go. I suggest you either run one thread as parallel as you need or as many threads as you need each not parallel rather than multiple threads somewhat less in parallel.)

a) brilliant and useful post
b) Additional reason why it is useful: A key strategy contributing to minimizing application driven deadlocks is defining an order of update across the organization. The first step in getting this right is to always have the tables in the same order in multistatement logical units of work. Getting the same order of update by rows is the next piece. While I *think* you can still get interdigitated sets of rows ultimately ending up in a deadlock because of range overlaps with different filters so you get different winners in the race for different rows, getting the locks in the same order within each table plus getting the tables in the same order will stamp out the vast majority of deadlocks pre-emptively.

Thank you for the compliment and the very useful comments related to avoiding deadlocks.

On an unrelated side note, I wondered how Oracle Database would handle a Top-N type query with a FOR UPDATE clause – would it have to revisit the table’s blocks to lock just the rows that were to be returned, or would it lock all of the rows in the table as the rows were read from the blocks? I adjusted the first SELECT FOR UPDATE query so that without the FOR UPDATE clause it would only return the first 5 rows when sorted by column C1. I then added a FOR UPDATE clause:

SELECT
*
FROM
(SELECT
C1,
C2
FROM
T1
WHERE
MOD(C1,100)=0
ORDER BY
C1)
WHERE
ROWNUM<=5
FOR UPDATE;
(SELECT
*
ERROR at line 4:
ORA-02014: cannot select FOR UPDATE from view with DISTINCT, GROUP BY, etc.

It looks like the programmers anticipated this particular problem, but that is a bit of an incorrect/vague error message description.

I don’t know if that qualifies as a bug in Oracle Database, but not only is the order not in descending order, but we lost the row with C1 value 10000 – we ended up with the *bottom* 9,999 rather than the *top* 9,999.

Hints for Posting Code Sections in Comments

********************
When the spacing of text in a comment section is important for readability (execution plans, PL/SQL blocks, SQL, SQL*Plus output, etc.) please use a <pre> tag before the code section and a </pre> tag after the code section:

<pre>

SQL> SELECT
2 SYSDATE TODAY
3 FROM
4 DUAL;
TODAY
---------
01-MAR-12

</pre>
********************
When posting test case samples, it is much easier for people to reproduce the test case when the SQL*Plus line prefixes are not included - if possible, please remove those line prefixes. This:

SELECT
SYSDATE TODAY
FROM
DUAL;

Is easier to execute in a test case script than this:

SQL> SELECT
2 SYSDATE TODAY
3 FROM
4 DUAL;

********************
Greater than and Less than signs in code sections are often interpretted as HTML formatting commands. Please replace these characters in the code sections with the HTML equivalents for these characters: