Buffer Cache Hit Ratios: Useful or Not ? December 16, 2007

The question of whether the Buffer Cache Hit Ratio (BCHR) is a useful metric or not is one of those discussions in Oracle that seems to crop up on a regular basis. I thought it might be worth briefly mentioning the topic here.

The BCHR represents the percentage of LIOs in which the required block is already loaded in the buffer cache(s). The higher the BCHR value, the greater the percentage of blocks accessed directly from memory and the lower the subsequent ratio of physical reads. A “higher” BCHR is generally considered a good thing as it’s a good thing to find required blocks in memory (right ?), a “lower” BCHR is generally considered not such a good thing as it’s bad to perform a higher ratio of physical reads (right ?).

The first problem with the BCHR of course is that it’s often miscalculated with many not appreciating for example the impact of direct reads on the actual physical reads used in BCHR calculations.

Assuming the BCHR is correctly calculated, the second problem with the BCHR is that it represents a database wide average. Averages are always dangerous things as they generally completely hide specific details and anomalies. Unless the average represents and can guarantee some consistent database metric or behaviour, then the average quickly becomes just a number, without any actual or inherent meaning.

A specific BCHR value at any specific point in time doesn’t actually tell us anything meaningful about the performance of specific tasks within the database. A database generally performs 100s or 1000s or 10000s of tasks at any given point of time. Unless all tasks or a significant percentage of tasks exhibit the same performance issue, then a single database-wide metric will be unable to determine issues with these specific tasks. The “average” figure hides details of the specific issue.

Therefore, at what point or at what value does an average figure provide meaning ?

The third problem with the BCHR is that these specific tasks within the database sometimes perform activities that are “good” and “efficient” but can result in the BCHR either going up or down or remain unchanged. Sometimes these activities can be “bad” and “inefficient” but can also result in the BCHR either going up or down or remain unchanged. Therefore without understanding what these specific activities might be, it’s simply impossible to know whether a specific change in the BCHR is good or bad.

Let’s assume we have a BCHR of 90%. Is this a good thing or is this a bad thing ? Is database performance an issue with a BCHR at 90% or is database performance good ? Is database performance good or bad generally or are there specific processes within the database that are problematic ? The answer of course is that it entirely “depends” and a value of 90% or 95% or 50% can’t in of itself answer any of these questions.

We can have a BCHR at 90% and performance can be terrible. It can be terrible at a database wide level due to any number of problems or issues or it can be terrible for specific users using specific applications or processes.

We can have a BCHR at 90% and performance can be perfect, with all users achieving optimal response times.

A BHCR of 90% is useless on it’s own. We need to go and perform all manners of additional checks to ensure the database is “healthy”.

However, even those who claim the BCHR is a meaningful and useful tuning metric generally agree and admit the BCHR on its own has no inherent usefulness and that it needs to be used in combination with other database “checks”. They generally claim that it’s the BCHR when monitored and used from a historical point of view with fluctuations of its value over time that makes the BCHR useful.

Really ?

Let’s again assume the BCHR has been at 90% for the past few weeks (or whatever time-frame) and it has now dropped to 85%. The Quest Spotlight monitor is flashing red and obviously something has “changed”. However, has it changed for the better, for the worse or has it had no noticeable impact on the “health” of the database (perhaps simply the specific workload has changed) ?

Well again, without looking at other specific metrics, one can’t possibly answer these questions. Perhaps we have an important process or (processes) that have suddenly started performing expensive, inefficient Full Table Scans. That’s not good, as the extra logical and physical IOs have impacted response times detrimentally. Things have indeed gone worse. Or perhaps we have a process that was performing an extremely inefficient nested loop operation, reading the same cached table numerous of times that is now performing the same function much more efficiently, reducing LIOs significantly. Response times may have improved and things are indeed better. Or perhaps there’s a large batch program or report that needs to be urgently run during normal business hours that’s resulting in lots of physical IOs to the database, but is not actually impacting the more important online transactional systems. Actually, the health of the database may not have changed at all.

Therefore, having a BCHR that has dropped to 85% (or whatever value ) doesn’t actually tell us much other than something may have changed. But it may have changed for the worse or the better or not significantly at all. There’s no way of knowing without performing further database checks.

Let’s assume the BCHR has gone from 90% to 95% (or whatever higher value). The Quest Spotlight monitor is flashing nice and green but something appears to have changed. However, has it changed for the better, for the worse or has it had no noticeable impact on the “health” of the database ?

Well again, without looking at other specific metrics, one can’t possibly answer these questions. Perhaps we have a key process or (processes) that was previously performing expensive, inefficient Full Table Scans that are now performing efficient index scans. That’s good, the reduction in logical and physical IOs have impacted response times positively. Things have indeed gone better. Or perhaps we have a process that was previously performing efficiently that has suddenly started to perform extremely inefficient nested loop operations, reading the same cached table numerous of times, increasing LIOs significantly causing the overall BCHR to increase as well. Response times may have plummeted and things are much worse. Or perhaps there’s a series of large batch programs or reports that usually run during normal business hours but the reporting section are on a Xmas lunch and haven’t bothered to run them today resulting in a reduction of physical IOs to the database, but is not actually impacting the more important online transactional systems. Actually, the health of the database may not have changed.

Therefore, having a BCHR that has increased to 95% (or whatever value ) doesn’t actually tell us much other than something may have changed. But it may have changed for the worse or the better or not significantly at all. There’s no way of knowing without performing further database checks.

Let’s assume the BCHR has not changed at all and is still sitting at 90% (or whatever value). The Quest Spotlight monitor is flashing nice and green but nothings appears to have changed. However, has nothing really changed, or could things now be seriously wrong with the database ?

Well again, without looking at other specific metrics, one can’t possibly answer these questions. Perhaps we have a key process or (processes) that was previously performing expensive, inefficient Full Table Scans and are now performing efficient index scans. That’s good, the reduction in logical and physical IOs have impacted response times positively. Things have indeed gone better but because the BCHR is a database-wide metric, this improvement made have gone unnoticed. Or perhaps at the same time we have a process that was previously performing efficiently that has suddenly started to perform extremely inefficient nested loop operations, reading the same cached table numerous of times, increasing LIOs causing response times to plummet and making key business processes much worse. But again because the BCHR is a database-wide metric, the overall BCHR may not have been impacted.

Or of course, one of hundreds of things have caused serious database performance issues while the BCHR remains totally unchanged …

Therefore, having a BCHR that has remains unchanged doesn’t actually tell us much either. The database made be running better than previously, the database may be having serious problems or the “health” of the database may remain unaltered.

So, the BCHR at any specific value doesn’t tell us much without having to check other database metrics as well.

The BCHR increasing doesn’t tell us much without having to check other database metrics as well.

The BCHR decreasing doesn’t tell us much without having to check other database metrics as well.

The BCHR remaining unchanged doesn’t tell us much without having to check other database metrics as well.

Note the database metrics we would need to check (for example, those queries using the most LIOs, those queries using the most PIOs, those queries using the most CPU, those queries being executed the most, those queries with excessive response times, causes of the most significant waits in the database, specific application/process response times, etc. etc. etc…) are exactly the same in all the above scenarios.

The biggest problem of all with the BCHR is that regardless of it’s values, or whether it goes up or down or remains unchanged, we need to perform precisely the same database checks regardless as it doesn’t tell us whether the “health” of the database has improved, got worse or remains unchanged.

If we need to perform the same actions regardless of the BCHR, then I suggest the answer to whether the BCHR is useful or not is a big and resounding no.

Share this:

Like this:

Related

Agreed on the usefulness (or not) of the BCHR. Except that in general (and normally I hate generalisations, averages – the very thing the post is describing😉 ), although the BCHR changing, going higher, going lower, staying the same does not in itself tell us much about the health of the database; we wouldn’t normally expect to see an ongoing BCHR of 10% as it would imply an abnormally large number of blocks being read from disk.

That said, a periodic drop to 10% as a long running batch job that performs large selects, might be perfectly normal and expected.

So although higher is not necessarily better, too low for too long might indicate a potential issue.

A 10% BCHR is a classic example of where again, the BCHR is totally useless.

Again let’s assume you have existing database checks that monitors and ensures your database is healthy, which are able to detect performance problems when the BCHR doesn’t help. The BCHR is unchanged but performance is appalling for any number of reasons. We have processes and checks in place to detect these issues, right.

Now if the BCHR were to plummet to 10%, how long do you wait until you begin to suspect maybe this dramatic drop isn’t a result of a one-off event such as a large batch job, but an actual symptom of something more serious that’s impacting performance ? A couple of minutes, 5 minutes perhaps, an hour maybe as one doesn’t necessarily sit there all day looking at BCHRs ?

If database performance were terrible, I would suggest you would know there’s something wrong way before you noticed any change in the BCHR. And if indeed performance worsens significantly, if the BCHR has been sitting at 10% for a long time and it was really an actual symptom, I would suggest there’s something really wrong with your database checking processes if they don’t pick up the performance issue.

So again, what additional help has the 10% BCHR been in detecting and resolving the performance issue.

I like the BCHR and I think it is a very valuable piece of information. I am somewhat vexed at what seems to have been close to a decade of people dismissing it or even attacking it’s use because either you can fiddle it’s value with scripts (I think I first saw Connor McDonald provide a script that could increase your BCHR to a value of your choice via looping nested joins) or because scenarios and examples can be put forward where a change is not indicative of a problem or improvement. Just as many can be put forward where a significant change in BCHR is indicating something significant is happening.

“However, even those who claim the BCHR is a meaningful and useful tuning metric generally agree and admit the BCHR on its own has no inherent usefulness and that it needs to be used in combination with other database “checks”. They generally claim that it’s the BCHR when monitored and used from a historical point of view with fluctuations of its value over time that makes the BCHR useful.

Really ?”

Yes, really . If I am working on a system that I am used to, a significant change in the BCHR is like a fire alarm ringing. Chances are it will be nothing, but in my metaphor, I am going to feel foolish sitting at my desk as smoke pours around me and the fire brigade is asking me what the hell I am doing. What is significant? When the gap between the BCHR and 100% has halved or doubled for that point in the day is a good starting point, but like almost all performance monitoring, is not a hard rule. As you point out, you do not know what has changed but it is something impacting the whole system. It is worth seeing what else is changed.

If I do not know the system then the BCHR is a very easy piece of information that I can get which helps me get a feel for the environment. For example, If the system is predominantly OLTP and the block buffer cache an order of magnitude or two greater than the size of the database and the BCHR is below 90%-ish, I’d know that there is something causing a lot of physical IO for that system and I would keep this in mind.

Giving hard and fast rules for absolute values of BCHR which are significant is, of course, futile as someone will come up with an example that contradicts it, but the value gives you part of the feel of the system. When you have seen many systems and their BCHRs and you start to understand the new system you are looking at, you get a feel for if the BCHR is within the bounds you would expect. If it is in those bounds then your worry less that general physical IO is excessive, if not, then there is more likely going to be something significant that is easy to find.

The BCHR as an absolute value is actually of use as well. When I look at SQL execution in depth, how do I know if it is performing well within the context of that system? It’s own logical to physical ratio as compared to the BCHR is again a good general indication. If it is lower, this statement is contributing more than average to physical I/O. This again is neither good nor bad, but is another factor to consider. And is often bad.

BCHR is not a performance silver bullet or even a bronze one, but it is part of your ammunition and can be used effectively. I’d love to see more people promote that it can be used within context rather than dismiss it totally.

[…] https://richardfoote.wordpress.com/2007/12/16/buffer-cache-hit-ratios-useful-or-not/ (a very detailed blog entry) “The biggest problem of all with the BCHR is that regardless of it’s values, or whether it goes up or down or remains unchanged, we need to perform precisely the same database checks regardless as it doesn’t tell us whether the ‘health’ of the database has improved, got worse or remains unchanged.” […]