Watching Consistent Gets – 10200 Trace File Parser

24012011

January 24, 2011

It happened again, another blog article that forced me to stop, think, and … hey, why did Oracle Database 11.2.0.2 do something different than Oracle Database 10.2.0.5? What is different, even when the OPTIMIZER_FEATURES_ENABLE parameter is set to 10.2.0.4 (or 10.2.0.5)? The number of consistent gets for a SQL statement is significantly different – we did see a similar difference between release version before, but for a different reason. We need the help of Oracle Database trace event 10200 to determine why there is a difference. Once we have the trace file, we need an easy way to process the trace file.

Excel Macro that will work with a trace file produced by Oracle Database running on Windows (also works in Microsoft Visual Basic 6.0 and earlier; for an Oracle Database running on Unix/Linux, open the trace file with Wordpad first, and then save the trace file using Wordpad):(Replace c:\or10s_ora_4256_watch_consistent.trc with the actual name of the generated trace file – the script as written seems to work with 10.2.0.x and 11.2.0.x)

Private Sub Read_10200_Trace1()
Dim intFileNum As Integer '10200 trace file
Dim intFileNum2 As Integer 'Output file
Dim strInput As String 'Line read from the 10200 trace file
Dim strOutput As String 'Line to be written to the output file
Dim strBlock(2000) As String 'Block read from the trace file
Dim strBlockCounter(2000) As Integer 'Number of times read
Dim intBlocks As Integer 'Total number of blocks
Dim i As Integer 'Loop counter
Dim intFound As Integer 'Indicates whether or not the block was found
intFileNum = FreeFile
Open "c:\or10s_ora_4256_watch_consistent.trc" For Input As #intFileNum
intFileNum2 = FreeFile
Open "c:\watch_consistent.txt" For Output As #intFileNum2
Do While Not EOF(intFileNum)
Line Input #intFileNum, strInput
If InStr(strInput, "started for block") > 0 Then
strOutput = Trim(Right(strInput, Len(strInput) - InStr(InStr(strInput, "started for block"), strInput, ":")))
'Find the number of times the block was accessed
intFound = 0
For i = 1 To intBlocks
If strOutput = strBlock(i) Then
intFound = i
strBlockCounter(i) = strBlockCounter(i) + 1
Exit For
End If
Next i
'If the block was not found, record it
If intFound = 0 Then
intBlocks = intBlocks + 1
intFound = intBlocks
strBlockCounter(intBlocks) = 1
strBlock(intBlocks) = strOutput
End If
Print #intFileNum2, strOutput; vbTab; strBlockCounter(intFound)
End If
Loop
Print #intFileNum2, ""
For i = 1 To intBlocks
Print #intFileNum2, strBlock(i); vbTab; strBlockCounter(i)
Next i
Close #intFileNum
Close #intFileNum2
End Sub

Excel Macro equivalent that will work with a trace file produced by Oracle Database running on Windows/Unix/Linux (also works in Microsoft Visual Basic 6.0 and earlier):(Replace c:\or10s_ora_4256_watch_consistent.trc with the actual name of the generated trace file)

Private Sub Read_10200_Trace2()
Dim strInput As String 'Line read from the 10200 trace file
Dim strOutput As String 'Line to be written to the output file
Dim strBlock(2000) As String 'Block read from the trace file
Dim strBlockCounter(2000) As Integer 'Number of times read
Dim intBlocks As Integer 'Total number of blocks
Dim i As Integer 'Loop counter
Dim intFound As Integer 'Indicates whether or not the block was found
Dim objFSO As Object 'FileSystemObjects
Dim objFile1 As Object 'The FileSystemObjects handle to the raw 10020 trace file
Dim objFile2 As Object 'The FileSystemObjects handle to the output file
Const ForReading = 1
Const ForWriting = 2
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile1 = objFSO.OpenTextFile("c:\or10s_ora_4256_watch_consistent.trc", ForReading)
Set objFile2 = objFSO.CreateTextFile("c:\watch_consistent.txt", True)
Do While Not (objFile1.AtEndOfStream)
strInput = objFile1.ReadLine
If InStr(strInput, "started for block") > 0 Then
strOutput = Trim(Right(strInput, Len(strInput) - InStr(InStr(strInput, "started for block"), strInput, ":")))
'Find the number of times the block was accessed
intFound = 0
For i = 1 To intBlocks
If strOutput = strBlock(i) Then
intFound = i
strBlockCounter(i) = strBlockCounter(i) + 1
Exit For
End If
Next i
'If the block was not found, record it
If intFound = 0 Then
intBlocks = intBlocks + 1
intFound = intBlocks
strBlockCounter(intBlocks) = 1
strBlock(intBlocks) = strOutput
End If
objFile2.Write strOutput & vbTab & strBlockCounter(intFound) & vbCrLf
End If
Loop
objFile2.Write "" & vbCrLf
For i = 1 To intBlocks
objFile2.Write strBlock(i) & vbTab & strBlockCounter(i) & vbCrLf
Next i
objFile1.Close
objFile2.Close
End Sub

VBS Script Equivalent that will work with a trace file produced by Oracle Database running on Windows/Unix/Linux (also works in Excel and Microsoft Visual Basic 6.0 and earlier):(Replace c:\or10s_ora_4256_watch_consistent.trc with the actual name of the generated trace file)

In the above, there were 2 consistent gets for the PAR_I1 index, 4 consistent gets for the T1 table, 202 consistent gets for the CHI_I1 index, and 200 consistent gets for the T2 table. While it might not be obvious from the above, the BLEVEL for both indexes is 1 (HEIGHT = 2 – see the quiz article linked to at the start of this article for an explanation). When I first saw the quiz that is linked to at the start of this article, I mentally assumed that there would be about 400 consistent gets for the CHI_I1 index – for every Start of the INDEX UNIQUE SCAN operation, I expected the index root block and the index leaf block to count as a consistent get, while the above showed that did not happen. Let’s trace the consistent gets to see why there were only 202 consistent gets and not roughly 400:

If we then process the resulting 10200 trace file through one of the above trace file parsers, we might see output like what is listed below (the RDBA in hex is listed first, followed by the number of times that block had been accessed by a consistent get to that point in the trace file):

At the bottom of the output is a summary that shows (in order) RDBA 0206e214 was accessed a total of 1 time, RDBA 0206e215 was accessed 1 time, RDBA 01c0000c was accessed 1 time, RDBA 01c72e14 was accessed 2 times, RDBA 01c72e15 was accessed 200 times, etc. Nice, but what do those RDBA numbers represent? We will get to that later.

Inside the raw 10200 trace file we might see something like this (I am able to identifysome items that appear in the raw trace file, but I do not yet fully understand the file):

Now that we see the RDBA numbers again, I suppose that it is time to try to determine the objects that are referenced by the RDBA numbers. We can try dumping the index structure to see which blocks are read, but first need to find the OBJECT_IDs for the two indexes:

Taking the above HEADER_FILE, HEADER_BLOCK, and MAX_BLOCKS numbers and dumping the block contents to a trace file (this will work in this test case script because all of the extents for the table blocks are probably close to each other – looking back, it probably would have been a better idea to use DBA_EXTENTS rather than DBA_SEGMENTS and just dump the first extent for each object):

Those datafile dumps can be quite time consuming, is there anything else we can try?

We could try to find the RDBA for the ten blocks (note that there is a risk here if the first extent is only eight blocks in length) in the first extent of each segment using the DBMS_UTILITY.MAKE_DATA_BLOCK_ADDRESS function (items appearing in bold were identified in the trace file summary):

Or, we could try working from the opposite direction. With the knowledge that the lower 22 bits of the RDBA is the block number and the upper ten bits of the RDBA is the relative file number, we can manually calculate the relative file number and the block number from the RDBA and then look up the object name associated with the file and block. First, we need the decimal equivalent of (binary) 1111111111111111111111:

(binary) 1111111111111111111111 = (decimal) 4194303

So, if we BITAND the RDBA with 4194303 we can obtain the block number, and if we divide the RDBA by 4194304 we can determine the relative file number for two of the RDBA numbers that were listed in the trace file summary, as shown below:

Remembering the number 4194303 might be challenging, so we can just use the DBMS_UTILITY.DATA_BLOCK_ADDRESS_FILE and DBMS_UTILITY.DATA_BLOCK_ADDRESS_BLOCK functions instead. Looking up the associated object names can be slow, unless we are able to limit the object names to a list of specific objects (ideally, we would also specify the DE.OWNER column in the WHERE clause):

Notice in the above that the 202 consistent gets that we saw for the CHI_I1 index in Oracle Database 10.2.0.5 oddly only required 9 consistent gets in Oracle Database 11.2.0.2. But that is not the only change. If we process the 10200 trace file through one of the trace file parsers, we might see something like this:

Interesting – it appears that Oracle Database 11.2.0.2 writes the DATA_OBJECT_ID that is related to the block, directly into the trace file so that we no longer need to execute several SQL statements to determine the object names related to the blocks.

Inside the raw 10200 trace file from 11.2.0.2 we might see something like the following:

The 10200 trace file in 11.2.0.2 provided the DATA_OBJECT_ID for the consistent reads, while the 10.2.0.5 trace file did not. We can use this information to determine which objects were accessed, and in which order by pulling in the unique OBJD values from the summary:

I think that you are right. There is a statistic with the name ‘buffer is pinned count’, and the amount by which this increased for the session on 10.2.0.5 and 11.2.0.2 was different – I ran out of time and was not able to spend much time comparing the differences, but I will check again later. Tanel Poder has an excellent write up on this statistic, part of which is quoted below:http://blog.tanelpoder.com/2010/01/15/beyond-oracle-wait-interface-part-2/

“However, check the value for buffer is pinned count statistic! This counter is updated every time we go to a buffer to get some data from it AND it already happens to be open by my session! Oracle keeps buffers pinned and relevant buffer handles cached during a database call in some cases (like nested loop joins and index scans) to avoid reopening the buffer again (getting a buffer again if its closed would mean another logical IO).”

It appears that 11.2.0.2 is much more intelligent with how index blocks are pinned, quickly pinning the index leaf blocks of the index belonging to the inner table in the nested looks join, while 10.2.0.5 did not pin these blocks. This increased willingness to pin observation of course is based on very limited testing, and might not apply to other situations. I welcome other people to test and report their findings – maybe it only happens in small index structures?

The above shows that the ‘buffer is pinned count’ increased by 593 on Oracle Database 10.2.0.5, and 498 (95 less) on Oracle Database 11.2.0.2. I would have expected that statistic would be about 190 higher on 11.2.0.2 than on 10.2.0.5.

So, according to your results our guesses related to ‘buffer is pinned count’ might be not right. I was running the test in my desktop (Windows XP running 11.2.0.1) and i got some results:

1) even when the OPTIMIZER_FEATURES_ENABLE is set to ‘10.2.0.4’, the query returned 215 BG instead of 408 BG in my tests, still don’t know why.
2) reading http://blog.tanelpoder.com/2009/11/19/finding-the-reasons-for-excessive-logical-ios/, tanel says the quote below:
“These buffer get reason counters are not maintained properly in Oracle 11g, probably due an optimization effort and some changes for faster pinning of buffer cache buffers (there’s a parameter called _fastpin_enable which is set to 1 in 11g and it enables so called fastpath buffer gets. If you see v$sesstat statistics such “consistent gets from cache (fastpath) or “db block gets from cache (fastpath)” being inremented, then fastpath buffer gets/pins are used. Note that I do have a script which works also on 11g but I’ll write about that one some time in the future🙂 ”

Said that, i changed the ‘_fastpin_enable’ to 0 and still got the same results.

This write up took the better part of a day, and I was a bit tired as I went back through writing and then editing the comments that appear between the code blocks. I keep running out of ideas for blog articles, but seemingly like clockwork I see something written by someone else that triggers another idea for an article.

Good that someone is actually trying the test. Sorry if it was not clear if you need to change the value of OPTIMIZER_FEATURES_ENABLE for the 11.2.0.2 execution. Yes, that setting is necessary to eliminate the second NESTED LOOPS join. So, the script to run the execution plan (somewhere near the top of this article is:

Please provide the missing sections in a comment and I will edit your comment above. Thanks for taking the time to post an awk script.

On this blog the <pre> and </pre> codes seem to work well when posting code, but < and > symbols may cause code to disappear.
Before posting code, please replace:
< symbols with &lt ; (without a space before the ; )
> symbols with &gt ; (without a space before the ; )

OK, I know, that this post is over 3 years old, though since I stumbled upon it by chance only now, I thought I’d chime in with a perl “oneliner” version of the event 10200 trace file parser, as perl is installed these days pretty much everywhere, and I personally like it better than awk🙂

(well, it was originally a one-liner, but then I wanted to make the formatting nicer, so it grow to a multi-liner)

1) version for 10g, no object IDs
Lists the following columns in the output, seperated by colons (:): rdba:hex:dec:count
Output sorted descending by count.
+ adds a line at the end of the listing, with the count(distinct(rdba))

2) Version for 11g (containing the object ID in the tracefile)
Contains 2 separate listings:
a) Lists the following columns in the output, seperated by colons (:): rdba:hex:dec:dec(obj#):count
Output sorted descending by count.
+ adds a line at the end of the listing, with the count(distinct(rdba))
b) Lists the following columns in the output, seperated by colons (:): obj:hex:dec:count
Output sorted descending by count.
+ adds a line at the end of the listing, with the count(distinct(obj))

Josef,
Yes, this is an old blog article. However, the invitation for readers to provide their translation of the 10020 trace using their tool of choice was still open. Thank you for providing the translation to Perl. The only code I wrote in Perl is an extension for Nagios to allow it to verify that a database instance is up, and willing to process simple SQL statements.

Also, thank you to sz for providing the awk translation.

I am in the process of reading the book Troubleshooting Oracle Performance, Second Edition”, and from what I have found so far, discussion of 10020 traces is the only performance related item not mentioned in that book (I did not realize that until I saw your comment on this blog article).

Hints for Posting Code Sections in Comments

********************
When the spacing of text in a comment section is important for readability (execution plans, PL/SQL blocks, SQL, SQL*Plus output, etc.) please use a <pre> tag before the code section and a </pre> tag after the code section:
<pre>

SQL> SELECT
2 SYSDATE TODAY
3 FROM
4 DUAL;
TODAY
---------
01-MAR-12

</pre>
********************
When posting test case samples, it is much easier for people to reproduce the test case when the SQL*Plus line prefixes are not included - if possible, please remove those line prefixes. This:

SELECT
SYSDATE TODAY
FROM
DUAL;

Is easier to execute in a test case script than this:

SQL> SELECT
2 SYSDATE TODAY
3 FROM
4 DUAL;

********************
Greater than and Less than signs in code sections are often interpretted as HTML formatting commands. Please replace these characters in the code sections with the HTML equivalents for these characters: