High Value for MBRC Causes High BCHR, High CPU Usage, and Slow Performance

27122009

December 27, 2009

What follows is part of a presentation that I gave in 2008.

A sudden unexpected spike in server CPU usage was noticed while running a batch process in an ERP package that normally has most of its time captured in the SQL*Net message from client wait event. My Toy Project for Performance Tuning was actively logging database performance when the problem was noticed. Below is a marked up view of the program as it captured the spike in CPU activity:In this 83 second time period, the database instance performed 7.8 million consistent gets, and not a single physical read. The database had a 100% buffer cache hit ratio during this time period, and that is good, right? The database instance read 505.6 million rows by full table scan in the 83 seconds with relatively few rows read by index lookup, which might be a sign of a problem.

Clicking on the Rows Read by Table Scan statistic, we see the running history of the number of rows read by a full table scan, with the current statistic value in blue:If we dig a bit more, knowing that the number of rows retrieved by table scan seems high, we see that we had a quiet period, and then the number of rows read by full table scan jumped considerably. At the same time, the CPU usage also jumped. We should dig into the system a bit more deeply to understand why this is a problem.

Switching over the the SQL Monitor, we start to investigate:If we look at the SQL statements as they are executed, we see one SQL statement that seems to be performing a large number of buffer gets, or logical reads – about 5.1 million per minute when fetching 123 times. The execution plans for the SQL statement are shown in the bottom pane. Note that there are two execution plans for this SQL statement, with the second created due to a bind mismatch – the initial parse call did not initialize the bind variables which caused Oracle to treat the bind variable as a VARCHAR2(2000), the second parse initialized the bind variable as a CHAR.

Zooming in on the grid at the top:If we look at the SQL statements as they are executed, we see one SQL statement that seems to be performing a large number of buffer gets, or logical reads – about 5.1 million per minute when fetching 123 times with 120 executions. Something seems a bit odd.

Zooming in on the execution plan at the bottom:
Taking a look at the execution plan, we see that the query optimizer decided to perform a full table scan when executing this SQL statement, rather than using the selective index on the PART_ID column. Oracle is applying a filter against the table rows to find only those with a specific PART_ID having a QTY greater than the COSTED_QTY, and with a TYPE of O.

If we submit the SQL statement with the STATISTICS_LEVEL parameter set to ALL at the session level, and retrieve the actual execution timing by passing in ‘ALLSTATS LAST’ as the format parameter for DBMS_XPLAN.DISPLAY_CURSOR, we see the following:The above slide shows that a single execution of the query required roughly 0.5 seconds (10% of the estimated time), required 42,751 consistent gets, and returned 1 row when the optimizer predicted that 943 rows would be returned. The good news is that the BCHR is 100%, or maybe that is not good news, depending on who you ask.

The final slide shows the expected execution plan output by DBMS_XPLAN.DISPLAY_CURSOR:The above output shows that the query should be executing in 0.01 seconds or less with the same value specified for the PART_ID, with just 16 consistent gets. While not checked, the clustering factor for the X_INV_TRANS_1 index on the PART_ID column will certainly be high, perhaps close to the number of rows in the table.

What caused the above problem? Assume that the OPTIMIZER_INDEX_COST_ADJ parameter is left at its default value of 100, and system statistics were collected with a command like the following:

EXEC DBMS_STATS.GATHER_SYSTEM_STATS('interval',interval=>30)

The above command captures system (CPU) statistics with a 30 minute time interval. The statistics describe the performance of the server, including CPU speed (possibly a measure of the number of consistent gets per second), single block physical read time, multi-block physical read time, average number of blocks read by multi-block reads, as well as several other statistics – these statistics are visible in the SYS.AUX_STATS$ view. Assume that the data is stored in ASSM tablespaces with AUTO segment space management, which gradually ramps up the extent size from 64KB (for the first 1MB of data) to 1MB and beyond. If DB_FILE_MULTIBLOCK_READ_COUNT is set to allow 1MB multi-block reads (or auto-tuned to allow 1MB multi-block reads), it is quite possible that the MBRC statistic in SYS.AUX_STATS$ could be set to a very high value, possibly close to 100, if system statistics are gathered shortly after bouncing the database. It is also possible that a clever DBA might have executed something like the following to force the MBRC statistic to a value that is close to what would be expected if full table scans involving physical reads might be required (DB_FILE_MULTIBLOCK_READ_COUNT was auto-tuned to 128):

EXEC DBMS_STATS.SET_SYSTEM_STATS('MBRC',100)

Since the optimizer assumes that every block needed for a query will result in a physical read, odd side-effects, such as high CPU usage with long execution times, may happen when every block needed is satisfied by blocks already in the buffer cache, and a popular value (possibly identifying 0.98% of the rows in the table) is specified on the initial hard parse when bind peeking is enabled (the default starting with Oracle 9.0.1).

6 responses

It is a program that I custom developed over several years, primarily to help identify problems and find solutions. The program is NOT for sale, and is not available for general usage by the public.

The use of the program in the examples of the blog articles is intended to inspire people to think outside the box, so to speak, and allow *you* to put together your own solution for detecting performance problems. Note that there are articles on this blog that show how to bring Oracle data into Excel – it is then possible to create an auto-scrolling chart in Excel that would then, for instance, show a running history of the last 60 minutes for a specific statistic. In a later blog article I may show how to create such an auto-scrolling chart.

Late follow-up:
This particular performance problem highlighted a couple of weaknesses in my program, which were correctly quickly after this performance problem was identified. The program’s capture periods are variable in length, with those captures being triggered either when an elapsed time timer expires or an event happens in the database (such as unexpected, excessive CPU consumption). At the time of the capture, the program did not line-level the statistics shown in the graph to an average rate per minute, which very likely means that the first tall green bar does not represent a problem.

A second weakness appears in the SQL Monitor that was originally designed to work with Oracle 8i databases. Prior to Oracle 9i R2, Oracle did not provide a CPU_TIME or ELAPSED_TIME statistic in V$SQL. Since this portion of my program was not significantly updated since originally developed, those two key statistics that could have been used to directly tie one or two SQL statements back to the spike in overall CPU usage were not readily available without manually sampling V$SQL (or one of the other views).

I am looking for a Performance tools which can give me the Charts of all my Database & OS performance . I have tried using OS Watcher but i am looking for a tool which has both Database & OS Performance integrated .

My company cannot afford money to buy a tool :) this is the major constraint.

Any recommendations which comes to your mind will be appreciated , either it can be a ready maded tool or technical tips to gather stats or AWR reports to excel and develop charts or what ever you recommend .

Keep in mind that you cannot legally access AWR reports (or AWR/ADDM data) unless you have the Enterprise Edition of Oracle AND a Diagnostic Pack License. If you do have a Diagnostic Pack License, you should be able to just use the Performance tab in Enterprise Manager Database Control and review ADDM reports.

Hints for Posting Code Sections in Comments

********************
When the spacing of text in a comment section is important for readability (execution plans, PL/SQL blocks, SQL, SQL*Plus output, etc.) please use a <pre> tag before the code section and a </pre> tag after the code section:

<pre>

SQL> SELECT
2 SYSDATE TODAY
3 FROM
4 DUAL;
TODAY
---------
01-MAR-12

</pre>
********************
When posting test case samples, it is much easier for people to reproduce the test case when the SQL*Plus line prefixes are not included - if possible, please remove those line prefixes. This:

SELECT
SYSDATE TODAY
FROM
DUAL;

Is easier to execute in a test case script than this:

SQL> SELECT
2 SYSDATE TODAY
3 FROM
4 DUAL;

********************
Greater than and Less than signs in code sections are often interpretted as HTML formatting commands. Please replace these characters in the code sections with the HTML equivalents for these characters: