3- What might be wrong for free buffer event for me to get the picture below ?

This is free buffer events graph when I sorted that event only around 12:30 on 27th ( I only show graph of free buffer events because our problem was different which was caused by a missing index on the main table which also caused a free buffer waits after the problem sorted but it is not the point of this topic so please ignore for the 28th :) )

Update 18:44 05-01-11: Think about this system as brand new system for you. You only knew main 500GB table partitioned by dbms_redefinition and partitioning finished. You got call that everything is slow and the main DBA was on holiday.

Update 10:46 07-01-11 I got very impressive and very interesting answers. I really like Charles way of thinking and Radoslavs approach.

When it comes to what I did, I checked what session 961 and 1057 doing

Like this:

Related

First, I would have checked the contentions and wait events so we have a contention and a wait event in this very example. Row lock contention could happened because of the application itself. Free buffer waits are there because DBWn can not cope with the dirty buffers fast enough. Jonathan Lewis posted a nice article here –> http://jonathanlewis.wordpress.com/2006/11/21/free-buffer-waits/
So either the objects were pinned in the buffer or DBWn was not fast enough. However, i also see in the chart that we had a vicious log file switch (checkpoint incomplete) wait event for a while near Dec 28, 09:00 AM. I believe there is something wrong with the log_buffer and the online redolog sizes. If they can be tuned than the problem may dissapper before digging into a real analysis.

There really is not enough information to make an accurate guess of what is happening in this case. Is swact.sql Tanel’s session watcher script?

If I had to make a guess, I would look at the SEC_IN_STATE column and see that there are several really big numbers there, the largest at 174,744. The script seems to be showing that only one session is on the CPU (probably the session executing swact.sql), while all other sessions are listed as waiting.

So, that puts the first row lock contention wait at roughly 11 AM on December 25. Something, probably a scheduled job, likely started at 11 AM on Christmas day and attempted to insert a row using the SQL statement with SQL_ID 903wwaubvn68k, causing a possible primary key violation if session 1259 committed. Three other sessions also started being blocked when attempting to execute the same SQL statement (all had different CHILD_NUMBER values, one additional session was attempting to execute SQL ID drjnqxby1927s). This might suggest that the primary key is not generated by a sequence, but instead by something like:

SELECT MAX(ID)+1 NEW_ID FROM T1;

The likely cause of that wait is probably an end-user failing to click OK to indicate that a set of documents printed (or something similar) so that a COMMIT could be performed, before leaving for the Christmas holiday.

The ‘enq: KO – fast object checkpoint’ waits typically indicate that a direct path read is about to happen, possibly due to a parallel query execution or a serial direct read of a table. The first wait of that type shown started at about 9:12 AM on December 27, and the last one at 11:20 AM. At roughly 11:20 AM you yelled at the SAN causing it to panic (blogs.sun.com/brendan/entry/unusual_disk_latency) which then triggered additional waits for blocks to be read from the SAN. The decrease in the ‘free buffer waits’ is simply an indication of less pressure on the buffer cache because all of the other sessions are still waiting for their previous IO requests to complete.

I can say you add different dimension to my own analysis which is good that I learn more things from my own post:)

Script is slightly modified version of SW.sql (modification is for buffer busy reads nothing to do with this post) called for active foreground sessions with a newer name @swact

This server was completely “new” to me apart from the fact that I only knew the massive core table with 500GB size was partitioned with dbms_redefinition and “partitioning” operation was finished reported as success but as I said the main index was missing that was causing other sessions suffer.

All I did was check with @swact and another check and took an action. I am after my action and the other check as answer.

All I did was check with @swact and another check and took an action. I am after my action and the other check as answer
I am still trying to understand/digest all the explainations so excuse me if I am shooting in the dark but did your “another check” involve checking what the session 961 was doing? It seems session with SID 961 is causing “Free Buffer Waits”.

Hm, I’m not very experienced troubleshooter, but I would probably check statements with that fullscans where buffer cache is used (db file scattered read), especially: 96b6hrt8csw55 (fullscan on partitioned table)
and 903wwaubvn68k
903wwaubvn68k is very interesting it contains fullscan (scattered reads), requests an exclusive locks (so it’s modifying data) and it is also doing index reads (sequential reads)
Is it an MERGE statement? Or some update statement with IN/EXISTS?