In order to ensure data consistency, Oracle uses different types of lock to manage access to shared resources
such as a table definition, a set of data in a table, etc. When a session requests lock
on a shared resource and another session has already hold lock on it, it may need to wait.
This would lead to a blocking situation where waiter sessions make no progress
until the blocker sessions release the lock.

Depending on how long the blocking situation lasts and how many waiter sessions get involved,
the impact of a blocking issue varies. Many databases have implemented monitoring mechanism
to watch blocking situations to avoid it persisting over an extended period of time.
DBA gets alerted about an on-going blocking issue, and takes actions such as killing blocker session to resolve the conflicts.
Diagnostic SQL scripts querying real-time database performance data (v$lock and v$session)
are very helpful for DBA to pinpoint blocker and waiter sessions.

However, blocking situation sometimes seizes database swiftly. For example, a very busy production database had
a blocking issue all of sudden and the issue was self-resolved after 5 minutes.
Many customers noticed system was ‘hung’ during that short period of time and after that,
things came back to normal. When DBA got called in to investigate the issue one hour later,
the real-time database performance data showed no issue and left
many question unanswered: "what really happened?", "will it happen again?", etc.

For troubleshooting historical blocking issue, DBA can always resort to AWR and ASH data.
Oracle’s ASH report has a 'top blocking session' section, however, the information there is too simple
to construct blocker and waiter diagram and help further investigation.

DBspeed is designed to be able to display the blocker
and waiter diagram with corresponding wait events and SQL statements on a single page.
Supplemented by AWR top wait events and SQL Stats data, DBspeed offers a deeper understanding of the root cause of blocking issue
and enable us resort to effective solutions .

Case Study Summary

ASH data contains information about a session and its blocking sessions if it is being blocked during ASH sampling time.
It is possible to use ASH data to construct the blocker and waiter diagram for a blocking situation.
The case described here explains how ASH data was used to dig out the blocker session and help to understand what caused blocking.
AWR data such as top wait event and SQL stats are also utilized to crosscheck the findings from ASH data.

Step 1 : Check DB Time to determine when performance bottleneck started

The first step is always to identify the time window for performance bottleneck issue.
DBspeed has 'AWR AAS' report which uses DB Time to do that job.

Step 2 : Check AWR top 5 events to see which event was moving up to the top during performance bottleneck

AWR shows the top wait events changed to wait events 'enq: TM - contention' and 'enq: TX - row lock contention'.
These two wait events means there were two different types of lock involved.

TM locks are acquired for a base table or partition when Oracle performs operations such as enabling referential constraints
(Metalink note 34664.1). TX locks are acquired when a transaction initiates its first change and is held
until the transaction does a COMMIT or ROLLBACK (Metalink note 62354.1)

According to these two wait events, we are sure that the database ran into blocking issue.

Step 3 : Check ASH top wait events and see if there were corresponding blocking sessions

With blocking issue in mind, we turned to ASH data. DBspeed has 'ASH Top Events with Blocking Session' report to
rank ASH top wait events together with corresponding blocking session.

As we can see here, there are two outstanding blocker sessions. Let us start scratching blocker and waiter diagram from now on.

SID 659 SERIAL# 29653

SID 695 SERIAL# 61429

Step 4 : Check ASH top blocking session to see who were the top blockers.

DBspeed has 'Top Blocking Sessions' report to rank blocker sessions according to the number of waiter sessions it blocks.

As we can see here, the top three outstanding blocker sessions are :

SID 659 SERIAL# 29653, there were 855 waiter sessions being blocked by this blocker session

SID 1743 SERIAL# 53855, there were 92 waiter sessions being blocked by this blocker session

SID 695 SERIAL# 61429, there were 91 waiter sessions being blocked by this blocker session

Two blocker sessions are confirmed by the previous ASH top wait event report.
(The result is not fully matched because we used different 'Activity %' threshold as the ‘top’ criteria
in two different reports. We can adjust 'Activity %' threshold to match results.).

In DBspeed's 'Top Blocking Sessions' report, the blocker session is the base dimension.
We can add more dimension in to explore if there are correlational relationship between a blocker session
and a specific wait event, or a SQL statement, or waiter sessions.

As we can see here, we added dimension ‘wait event’.
We can confirm the blocker sessions and those wait events on which they caused waiter sessions waiting.

Step 6 : Construct blocker and waiter diagram

Let us add more dimensions in to construct the blockers and waiters diagram:

The lock type, mode and object requested can also help us understand why blocking was happening.
For wait event 'enq: TM - contention', the P1 value points to lock type and mode requested,
while P2 value points to the object ID of the object requested.

DBspeed has 'ASH Top Event P1/P2/P3 Values' report to display P values for a specific event.
On this report, the dimension ‘SQL’ can also be correlated with the wait event specified.
As we can see here, we can identify the waiter sessions,
SQL statements they were trying to run, wait events on which they were being blocked, and P values of those wait events.

DBspeed has 'AWR Object Search' report to retrive object name from the P2 value.

In order to further break down P1 value into lock type and mode, script (check_lock.sql) is used.

Step 10 : Understand blocker and waiter diagram

With all the information we have so far, let us add details to the blocker and waiter diagram.

By checking the foreign keys and indexes on table US_SALES, we can confirm that foreign key column not
being indexes can explain why session SID 659 SERIAL# 29653 requested TM lock mode 5 on table US_SALES.

With a TM lock mode 5 request on table US_SALES happening, all the following DML operations
on the same table would request TM lock mode 3 and be blocked.
This explains how session SID 695 SERIAL# 61429 with 'insert into US_SALES..' statement got blocked by session SID 659 SERIAL# 29653.

However, we searched ASH data and all we could find out about blocker session SID 1743 SERIAL# 53855 is very little.
The only SQL statement (‘SELECT .. from US_SALES .. ’) related to this session during problem time window was not believed
to cause blocking. We need to keep in mind that the blocker session may issue a sequence of SQL statements
including a DML on objects but not all of them will be recorded in ASH (Metalink note 729727.1).
It is a reasonable assumption that blocker session SID 1743 SERIAL# 53855 had an un-committed transaction
on table US_SALES which blocked the request of TM lock mode 5 on the same table.

Step 12 : Draw conclusions and provide solutions

SID

1743

SERIAL# 53855

a type of DML on table US_SALES,

currently hold a lock on table US_SALES

--->

SID 659 SERIAL# 29653,

SQL='delete from PRODUCTS...',

requested TM mode 5 on table US_SALES

---> SID 695 SERIAL# 61429,

SQL='insert into US_SALES...',

requested TM mode 3 on table US_SALES

By tracing down the blocker and waiter diagram, we concluded that the request TM lock mode 5 on table US_SALES
was the most significant factor which caused blocking. It happened due to the fact that the table US_SALES
had a foreign key column not being indexed. In order to avoid this type of lock and the same type of blocking issue again,
we provided the solution to index the foreign key column.