Enqueue Experimentations

6122009

December 6, 2009

Over the years I have participated in several discussions about enqueue waits. Below are some of those discussions.

http://forums.oracle.com/forums/thread.jspa?threadID=638246
ENQ: TX – ROW LOCK CONTENTION – this is a row level lock caused by a possible primary key or unique index violation if the first session commits its current transaction. A second session attempting to insert the value that will result in the primary key violation will sit in this wait event until the first session commits or rolls back.

Problems with initrans would show up as ENQ: TX – ALLOCATE ITL ENTRY waits in 10g/11g. But, such problems should be rare with tablespaces using ASSM.

After fixing the formatting of the top 5 wait events (total duration unknown):

I now have a test table with 1,000,000 rows. I start monitoring the changes in the wait events roughly every 60 seconds, and V$SESSION_WAIT and V$LOCK roughly 4 times per second.

Back in session 1:

UPDATE
T1
SET
C1=-C1
WHERE
C1<=100;

I have now modified the first 100 rows that were inserted into the table, time to make the pot of coffee.

In session 2, I try to insert a row with a primary key value of -10:

INSERT INTO T1 VALUES (
-10,
10);

Session 2 hangs.

If I take the third 60 second snap of the system wide wait events as the zero point, and the 11th snap as the end point. There were 149 waits on ENQ: TX – ROW LOCK CONTENTION, 148 time outs, 446.62 seconds of total time in the wait event, with an average wait time of 2.997450 seconds.

Rolling down to the session level wait events, SID 208 (my session 2) had 149 waits on ENQ: TX – ROW LOCK CONTENTION, for a total time of 446.61 seconds with an average wait time of 2.997383 seconds. All of the 149 waits and the wait time was in this one session that was locked up for the full duration of this time period because session 1 was making a pot of coffee.

Rolling down to V$SESSION_WAIT (sampled roughly 4 times per second): At the start of the third time interval, SID 208 has been in the ENQ: TX – ROW LOCK CONTENTION wait event for 39 seconds and is actively waiting trying to execute SQL with a hash value of 1001532423, the wait object is -1, wait file is 0, wait block is 0, wait row is 0, P1 is 1415053316, P2 is 196646, P3 is 4754.
At the end of the 11th time interval: , SID 208 has been in the ENQ: TX – ROW LOCK CONTENTION wait event for 483 seconds and is actively waiting trying to execute SQL with a hash value of 1001532423, the wait object is -1, wait file is 0, wait block is 0, wait row is 0, P1 is 1415053316, P2 is 196646, P3 is 4754.

Rolling down to V$LOCK (sampled roughly 4 times per second): I see that SID 214 (session 1) is blocking SID 208 (session 2). SID 214 has a TX lock in mode 6 with ID1 of 196646 and ID2 of 4754. SID 208 is requesting a TX lock in mode 4 with ID1 of 196646 and ID2 of 4754.

So, it seems that I need a faster coffee pot rather than an additional index on my table. It could be that the above process would have found that the application associated with SID 214 was abandoned or crashed and for some reason the lock was not released for a long period of time, a little less than 10.44 hours in your case.

In the above, notice that there are 58 time outs for the enqueue. The timeout period for an enqueue is 3 seconds, which means that you potentially had a single wait that lasted 58*3 (+ up to 2.99) seconds, which totals 174 (+ up to 2.99) seconds. To make the calculation simple, assume that this was a single wait lasting exactly 174 seconds. The remaining 47 seconds of the wait divided by 25 (83-58) means that the remaining average wait time is 1.88 seconds. With Oracle 9.0.1 it is more difficult to see how much time was spent in each of the different types of enqueues compared to Oracle 10g and above.

It appears that all of the enqueue activity is of the type TX, while are row level enqueues – usually meaning that someone updated a row, and there was a pause between the time that the update was performed and a commit or rollback was executed. There are multiple causes for TX enqueues which may not be found with a Statspack report. From my notes (possibly based on the Oracle Performance Tuning Guide):
* TX enqueues are acquired exclusive when a transaction initiates its first change and held until the transaction COMMITs or ROLLBACK.
* Waits for TX in mode 6 occurs when a session is waiting for a row level lock that is already held by another session. The wait event for mode 6 TX enqueues is ‘enq: TX – row lock contention’ on 10g and above.
* Waits for TX in mode 4 can occur when a session wants to lock a row in the block, but one or more other sessions have rows locked in the same block and there is no free Interested Transaction LIST slot in the block. Oracle will usually dynamically add another ITL slot if there is sufficient space in the block. To fix this problem, increase INITRANS or MAXTRANS for the table. This type of TX enqueue corresponds to the wait event ‘enq: TX – allocate ITL’ on 10g and above.
* Waits for TX in mode 4 can also occur if the session is waiting for potential duplicates in a UNIQUE index, which would occur if a second session inserts a value before the first tries to insert the same value, and the second session has not yet committed. This type of enqueue corresponds to the wait event: ‘enq: TX – row lock contention’ on 10g and above.
* ‘Waits for TX in mode 4 can also occur when a transaction is inserting a row into an index and must wait for the end of an index block split being done by another transaction. This type of TX enqueue wait corresponds to the wait event ‘enq: TX – index’ on 10g and above.
You might want to look at the SQL statements sorted by CPU utilization based on the above description.

The first of the two looks interesting. I am a little surprised by the small number for the CPU utilization considering the number of buffer gets. The WHERE clause “WHERE :b1 < 600” seems a little unusual.

Hi Charles…
Nice post as always , with lots of details demonstrations , really helps readers.
I would like to add one more scenario …of enq: TX row contention , which is pretty tough to interpret when it occurred but at the end it turned out to be a good one to investigate.
Scenario:
– enq: TX row contention for a transaction ( merge load) — ROW_WAIT_FILE# (from v$session) – pointing to undo file, whereas ROW_WAIT_OBJ# points to 0 (which essential means an undo segment).
– no user spawned sid exists that is rolling back this transaction, so I interpreted the USN from P2 value of the waitevent and could find out the status of rollback (how many blocks rolled back and how many blocks total blocks left for rollback) and found that smon was rolling back the transaction.
– the load job which completes in maximum 5 minutes was running for 200 minutes.
– so it was obvious that the load job processes were waiting on enqueue held by smon on undo segments.

Later came to know that someone was doing a big update on the table and when it was running long , he killed the update statement and asked the DBA to make the rollback fast. So , the DBA killed the user session from the OS , to force SMON to do the rollback and changed the parameter fast_start_parallel_rollback to high which made smon to spawn 256 slaves.

I traced the SMON (event 10500 , level 10) and could find , it was not doing this rollback dedicatedly because it has to do other chores time to time like drop temp segment , shrink undo etc…
It was wait and watch thing but was a good learning… Just wanted to share.
Later , googling got me this link and it matched ditto with my case:http://www.freelists.org/post/oracle-l/How-much-rollback-left-to-apply
Question was raised by Kyle Hailey and the best explanation I could see was from J Lewis.

Hints for Posting Code Sections in Comments

********************
When the spacing of text in a comment section is important for readability (execution plans, PL/SQL blocks, SQL, SQL*Plus output, etc.) please use a <pre> tag before the code section and a </pre> tag after the code section:
<pre>

SQL> SELECT
2 SYSDATE TODAY
3 FROM
4 DUAL;
TODAY
---------
01-MAR-12

</pre>
********************
When posting test case samples, it is much easier for people to reproduce the test case when the SQL*Plus line prefixes are not included - if possible, please remove those line prefixes. This:

SELECT
SYSDATE TODAY
FROM
DUAL;

Is easier to execute in a test case script than this:

SQL> SELECT
2 SYSDATE TODAY
3 FROM
4 DUAL;

********************
Greater than and Less than signs in code sections are often interpretted as HTML formatting commands. Please replace these characters in the code sections with the HTML equivalents for these characters: