How could the query of the 4 tables cause TKPROF to output the above information, and is there anything wrong with the information provided? To get you started, what would cause a 2,302.70 second wait for a multi-block read?

Don’t worry too much if the above does not make too much sense. At 1180445875.641860 seconds there was a fetch call that required 0 CPU seconds, 0.000131 elapsed seconds, 15 rows were retrieved, there was one consistent get, and 0 physical reads. The next fetch happened 0.00091 (5.642770 – 5.641860) seconds later with two short duration wait events between the two fetch calls (0.000704 seconds on SQL*Net message from client and 0.000001 seconds on SQL*Net message to client). Take a closer look at the waits between the fetch call at 1180445875.643630 seconds and the fetch call at 1180445875.643760 seconds (0.00013 later).

The thread contains other interesting details, but it does not look like the OP ever received an explanation for why his query required 45 minutes to execute. My money is still on a faulty RAID 10 floppy array.

Actions

Information

6 responses

The only reasons that are preventing me from saying “This is PLAIN WRONG” are
1) It is ORACLE after all ; no shortage of strange bugs
2) Your last statement (the concepts of which are Latin to me :) )

I have not read the forum post but I find it almost impossible to believe that the execution plan is of the query which is based on 4 “tables”. I do not think all of them are tables. At least, MASTER and CASE (or CASE_SIZE) look like (non-mergeable) views or materialized views. How else do you justify the presense of SYSTEM (steps 16 to 18), ITEM_SUPPLIER_COUNTRY(Steps 20, 24 to 27) ?
Also, the plan from step 22 onwards looks impossible with plain tables.

When you read the thread, you will find that the claim of there being 4 tables named Case, Item, Master, and Warehouse was quickly determined to be incorrect – for the very reasons that you identified in your comment.

The RAID 10 floppy array was a joke that I wrote in the thread as an explanation for the nearly 68 minute wait for the single block read (DBA 1 to DBA2: “quick, find and mount floppy #1765799 so that the query will finish executing”).

Considering the nearly 68 minute wait for the single block read written to the trace file, how could the OP in the thread explain the 45 minute query execution time. Maybe the server was stuck in a time warp? Maybe an accuracy warp? Maybe someone was constantly re-syncing the clock on the server?

After posting my comment, I read the thread. Apart from technical details, I must say how much I appreciate you and Jonathan for your persistence. If what OP mentioned in the thread is true, I believe his(or her) organization was sitting on a time bomb. :) First, OP hardly knew anything about oracle and he (or she) claimed they do not have a DBA. Now coming to the problem, I would say the biggest root cause is poor design and/or poor capacity planning. The query is retrieving over 12 million records with 15 records at a time. For what ? Also, looking at the way OP was providing trace details, it appears he (or she) was executing some kind of procedure (from SQL*Plus) that involved many DML statements, before executing this query. It might be possible that OP was using some kind of logic to populate/update table data before querying it, which may result in stale statistics. Now, you did mention that OP might be using what appear to be some standard softwares but OP did not confirm the same. And as Hemant suggested (I guess), if the underlying tables were being updated simultaneously, I must say somebody has messed up the project big time.
I think this thread is a perfect example of how NOT to ask questions and how one can spend endless time, in vain, resolving performance issues due to lack of vital information.

“the nearly 68 minute wait for the single block read “.
Haven’t gone through the thread in exact detail, but it appears the initial report of 45 minutes was the first ‘problem’ run and the trace was on a subsequent execution, which could have taken much longer.
Reading the thread, it appears the slow read was on a block in an UNDO tablespace.

Didn’t see any exact description of hardware, but I wonder if there is anything that would case a process to be completely de-prioritised /suspended by the OS.

Narendra, I think that your final sentence summarizes the entire thread. The OP begins the thread with inaccurate information, and that causes confusion and inconsistencies that are brought up later in the thread. But, in support of the OP, at times it is very difficult to determine what information may be relevant to solving a particular problem. This is the case regardless of whether that problem is “where did I leave my car keys” or “why is it taking a long time to retrieve 12 million rows from the database when I am fetching 15 rows at a time”. I suspect that in many cases prior experience helps a lot when trying to determine what information needs to be provided. Suppose that the DBA on site (or possibly a consultant in the case of the OP) stated quite simply “I have been a DBA for 25 years and have never needed to look at a raw 10046 trace file – just pass the file through TKPROF, or use explain plan.” In such a case, prior experience may make it very difficult to find any answers. Jonathan definitely demonstrated considerable skill in that thread, and I think that I managed to extract a little bit of information from the OP to find some of the performance issues.

Gary, you make a good point that enabling a 10046 trace could have had an impact on the execution time. The OP’s disk sub-system seemed to be heavily loaded, causing long waits for disk reads, and the writing of the 10046 trace file could have further complicated the disk performance problem. I had not thought about the OS suspending the process as a possible cause, but as I understand it, the session checks the server’s current time, and then makes a disk read request – at that point operating system kernel level code is responsible for fetching the block – once the fetch completes the process again checks the server’s current time to calculate the elapsed time of the disk read. In the same response where I made the joke about the RAID 10 floppy array, I stated the following:

“There is a possible explanation for the 4179109066us time for the SQL*Net message from client wait and the 4063882583us time for the db file sequential read. The clue is found on page 148 of “Optimizing Oracle Performance” in the pseudo-code for procedure wevent. What if the value of ela1 is less than ela0 due to a timer resolution problem with gettimeofday? If I recall correctly, I believe that it is possible that the operating system does not check the hardware clock on every time call (on some platforms), so it is theoretically possible that the operating system occasionally corrects its “software” clock with the hardware clock (on some Windows systems, this drift may not be corrected until a reboot is performed, or until the time is synchronized with the domain).

This is just a guess, but a little mathematics with powers of 2, assuming that the OP is using a 32 bit system [note that if the software clock time drift was significant, the long duration wait time reported may be caused by an overflow condition that was caused by a negative value in the unsigned integer value):
4294967296us = 2^32us = 32 bit unsigned integer max value
4179109066us = OP’s long SQL*Net time
4063882583us = OP’s long single block read time

Browsing through Metalink finds a couple documents that may be of interest (yes these are old, but there may be others), that might identify other causes:
Bug No. 777326 USING POST WAIT THE CPU TIME STATISTICS ARE COMPLETELY WRONG
Bug No. 1704769 V$SESSION_WAIT.SECONDS_IN_WAIT SHOWS IMPOSSIBLY HIGH VALUES
Bug No. 2717586 UNREALISTIC HUGE VALUES IN V$SYSTEM_EVENT AND V$SESSION_EVENT

It is, however, odd that the tim= values were not thrown off in the same manner in that portion of the trace file.”

With the above in mind, the actual wait time may have been very short, but the server’s current time at the end of the wait may have been earlier than the server’s current time at the start of the wait if the server re-synchronized its software clock with the hardware clock (or some external time source) between the time the disk read started and the disk read completed.

Hints for Posting Code Sections in Comments

********************
When the spacing of text in a comment section is important for readability (execution plans, PL/SQL blocks, SQL, SQL*Plus output, etc.) please use a <pre> tag before the code section and a </pre> tag after the code section:

<pre>

SQL> SELECT
2 SYSDATE TODAY
3 FROM
4 DUAL;
TODAY
---------
01-MAR-12

</pre>
********************
When posting test case samples, it is much easier for people to reproduce the test case when the SQL*Plus line prefixes are not included - if possible, please remove those line prefixes. This:

SELECT
SYSDATE TODAY
FROM
DUAL;

Is easier to execute in a test case script than this:

SQL> SELECT
2 SYSDATE TODAY
3 FROM
4 DUAL;

********************
Greater than and Less than signs in code sections are often interpretted as HTML formatting commands. Please replace these characters in the code sections with the HTML equivalents for these characters: