Friday, November 18, 2011

Cache buffer chain (CBC) latch contention is a common top Oracle wait event. There are a number of interrelated causes but also a number of solutions. The trick is to properly diagnose the problem which results in a short list of solutions.

The CBCs are created as a hashing structure and are primarily used to determine if a block currently resides in the buffer cache. (More->>) As you can image, even the smallest Oracle systems ask, "Is a block in the buffer cache?" a ga-zillion times each day. If CBC access continues to intensify, at some point the time to acquire the desired CBC latch will be a performance problem.

Is there a CBC issue?

When CBC latch contention is raging, you're system is likely to have a crippling CPU bottleneck because the application SQL is concurrently and repeatedly checking if specific blocks reside in the buffer cache. If the answer to, "Is the block in the buffer cache?" is usually, "Yes" then IO reads are minimized and memory structure access is maximized...hence the crippling CPU bottleneck and CBC latch contention.

The OraPub System Monitor (OSM and OSM) script I use to interactively determine overall time situation is rtpctx.sql. Here's an example of a 707 second interval.

Figure 1. Typical CBC latch contention result based on rtsysx.sql, response time report.

In Figure 1, the rtsysx.sql output the "% WT" column shows the percentage of wait time by wait event over the report interval. The "% RT" column shows the percentage of the total response time (CPU time and Wait time). The values in Figure 1 are typical when there is a very serious CBC latch contention issue. An AWR or Statspack report will tell a similar story; the top wait event being "latch: cache buffers chains" and most of the system's CPU resources being consumed by Oracle.

While there are a number of causes for CBC latch contention, I tend to see two CBC acquisition patterns. The first pattern is when many CBC latches are very active, that is, the access pattern is very disperse. The second pattern when a single CBC latch is very active. So once you know there is a significant CBC issue, the next step is to determine the acquisition pattern characteristic. Read on!

Determining CBC Wait Pattern

To determine the CBC wait pattern, you can run a very simple script like this:

Notice that all four sessions are sleeping (i.e., they are posting the wait event) while trying to acquire the same CBC latch (note latch address is identical). While the above snippet and the result are interesting, you could easily misled from this single sample. A more statically sound method is to gather multiple samples. Using my OSM script, latchchild.sql we can a sample each second to glean from statistical analysis. The latchchild.sql script essentially does this:

Below is some actual output. The "delta" columns are simply the difference between the beginning and ending values for gets and sleeps. Notice there is not a massive gap between the delta_gets and delta_sleeps and there is not a single (or a few) latch that is significantly more active then the others. This would be classified as dispersed CBC latch contention.

To really grasp the situation, a visual histogram based on the sleeps is very useful.

Figure 2. Histogram of the number of CBC latch address and their respective sleep activity.

Figure 2 is a standard histogram I copied from the statistical analysis Mathematica notepad (you can download below). Just over 600 CBC latches have zero sleeps while only two CBC latches had eight sleeps. Notice that while there are differences in the number of sleeps, we don't see a pattern with a massive jump like; 0,0,0,1,2,3,5,6,7,1021. Again, this is an example of dispersed CBC latch contention. But sometimes the situation is not very dispersed, but singular towards just a couple or perhaps even one single CBC latch!

Figure 3. Histogram of the number of CBC latch address and their respective sleep activity.

Figure 3 is the result, as we'll see, of three very popular buffers which are each related to three different three buffer chains. While nearly 1000 CBC latches have zero sleeps (far left vertical bar in histogram), just as important is there is an obvious large delta_sleeps gap near the most active delta_sleeps CBC latches. For some people, the histogram tells a better story, but other others the below numeric snippet better captures the situation. (I personally like to use both.)

Figure 2 and the code snippet directly above Figure 2 show a typical example of what you'll likely see when many CBC latches are active enough to cause a significant performance problem. Here's the likely situation: There is an intense CBC latch situation along with a raging CPU bottleneck, and you can probably easily see the heavy logical IO (sysstat.session logical IO) SQL as well. There are a number of solutions, with some of them listed below.

An Oracle focused solution is to increase the number of CBC latches by increasing the hidden instance parameter, _db_block_hash_latches.

An application focused solution is to find the most logical IO intensive SQL and reduce the LIO's by executing it less often or tuning it. Either way, your objective is to reduce the LIOs generated during times of critical performance.

An operating system focused solution is to increase CPU resources by removing CPU consuming process if possible, adding more CPU cores, or increasing CPU speed.

There are of course other solutions, but I think you get the idea.

Solutions for Singular CBC Latch Contention

Figure 3 and the output snippet directly below it are typical when there is intense CBC contention focused on one or perhaps a few CBC latches. When this is the situation, additional diagnosis is needed to determine specifically why the intense singular activity is occuring. This is the topic of my next posting...

Summary

The Cache buffer chain (CBC) structure is used to answer the question, "Is this block in the buffer cache?" At some point, this question can get asked enough to cause significant performance problems known as CBC latch contention. While there are a number of causes for CBC latch contention, I tend to see two CBC acquisition patterns: The first pattern is when many CBC latches are very active, that is, the access pattern is very disperse. The second pattern when a single CBC latch is very active. So once you know there is a significant CBC issue, the next step is to determine the acquisition pattern characteristic.

In this posting I focused on how to determine the CBC latch contention acquisition pattern; disperse or singular. I then presented some disperse CBC latch contention solutions. In my next posting I'll focus on additional steps to diagnose singular CBC latch contention, two common situations, and some possible solutions.

Friday, November 11, 2011

Suppose you want to check if a specific Oracle block is an index root block. Why? Here are two very real situations. You notice a specific block is very active and want to know if it's an index root block. Even more common is, perhaps there is a very active cache buffer chain latch related to a specific block/buffer and you want to know if this hot buffer is an index root block. Besides these very real examples, it's also an interesting journey into Oracle internals!

Folklore States...

Some very respectable blogs and a simple test I ran indicate an index root block is the block after it's segment header block.

Figure 1 is a diagram of an Oracle index segment. If it wasn't for the index root block, Figure 1 would be a good diagram for any Oracle segment. The light blue colored block is the segment header block. Notice the orange colored index root block follows the segment header bock.

As mentioned above, folklore says if the segment is indeed an index, then the orange block will be the index root block. And not just now, but for the life of the index! Wow... This is a pretty strong statement and one that needs to be tested. So that's what I did and what this posting is all about.

It's Kind of Complicated

We need to determine if the block following an index segment header block is the index root block... for always and forever until the index is dropped. First, just dump the index and locate the root block's data block address (DBA). Second, get the DBA for the block following the index segment header block. And finally, compare them. If they match, then we have shown a situation where the block following the index segment header block is indeed the index root block. So let's do that.

Once we get the object_id from dba_segments, here's how to dump an index:

The first/top mentioned "branch" block is the index's root block. In this case, the index root block has a data block address (DBA) of 5002337. Now let's get the data block address for the block after the index's segment header block. But first we need to get the file number and block number of the index segment header block.

Do you see it? The DBA just above (with the header block + 1) matches the first/top "branch" block's DBA (5002337) from the index trace file! So now we know how to check if the block following the index's segment header block is truly the index root block.

Now the question becomes, does it always remain this way? For example, what if create the table, create the index, and then insert rows into the table? Or what if we create the table, then insert rows, and finally create the index? If that's not enough, how about this: What if the index grows and splits? Or how about if we delete all the table's rows, insert rows until the index splits? Or how about if it we truncate the related table? As you can see, there are an infinite number of possibilities and there is no way we can test all of them.

The Experimental Setup

I created a number of tests that could be repeatedly run and easily modified and extended. There are two related scripts. The driving script is a SQL script called, doRbExpr.sql and takes a single argument, called the prefix. This prefix is the begining name of all the objects the script creates. This allows you to quickly and easily re-run the script without first removing all the objects from the previous run. The second script, getIdxRtBlk.sql, retrieves the index root block's DBA from both the data dictionary and by dumping the index, and then nicely displays them so you can easily see if there is a difference. I also show the index depth (blevel) as an added test to help ensure I'm looking at the current statistics.

The Experimental Results

Click here to see the results. As you can see, in every case the DBA of the index segment header block plus one, matches the index trace file's root block. I have rerun this test many times, and the results are always the same.

What Does This Prove?

Actually the experiments prove very little, yet they yield a tremendous value. The experiments clearly and repeatedly demonstrate that I have not found a way to disprove an index root block is the block immediately following its segment header block. All it would take is just one of my experiments to break the "block after" rule... but I could not break the rule! If you can devise a situation to break the rule, please let me know and I'll post it.

So next time you need to check if a particular block is an index root block, simply get it's segment header file and block number, add one to the block number, and compare. In my opinion, that's much easier and faster than dumping the index, parsing it, etc.

P.S. If you want me to respond with a comment or you have a question,
please feel free to email me directly at craig@orapub .com. I use a
challenge-response spam blocker, so you'll need to open the challenge
email and click on the link or I will not receive your email. Another
option is to send an email to OraPub's general email address, which is
currently orapub.general@gmail .com.