JL Computer Consultancy

Cache Advice – what could it be doing?

May 2005

I received an email recently asking me what the connection was between the
buffer cache hit ratio (BCHR)and the buffer cache advisory
feature (v$db_cache_advice) in Oracle 9i and 10g. The answer is simple
– as much, or as little, as you want to pretend.

On one hand, you could start with the claim that the buffer cache hit
ratio is a number that compares the count of latched buffer
acquisitions (logical I/Os) with the count of Oracle blocks requested from the
file system (physical reads) and then point out that the v$db_cache_advice is
trying to tell you what this fraction would have been if the buffer cache had
been a different size. Starting from this viewpoint, you can certainly read a
line from v$db_cache_advice and say “my
buffer cache hit ratio is X, and if I had had a buffer cache of size M, then my
buffer cache hit ratio wouldhave been
Y”. All you have to do take the estd_physical_reads column from v$db_cache_advice
and put it into your chosen formula for the buffer cache hit ratio
in the place where you would normally put the actual physical read count.
(That’s assuming you’ve picked a version of the BCHR formula that
actually counts physical reads the same way that Oracle is counting physical
reads).

On the other hand, you could point out that all you get from the buffer
cache hit ratio is a simple piece of arithmetic based on counts of logical
I/Os and physical reads. If you try to work towards the v$db_cache_advice
from this viewpoint you have to ask where, in those counts, is the information
that could allow an advisor to come up with indications that (for example)
256MB of extra cache won’t make much difference to physical I/O levels
but 288MB could cut the physical I/O by 90%. Even if you were to bounce the
database every 24 hours, increasing the buffer cache by 16MB every time, you
could find a virtually flat trend line in the buffer cache hit ratio
– and still be told that a specific value for the cache would make a big
difference. The work of the advisor is much more subtle and detailed than
anything you could emulate by playing about with the buffer cache hit
ratio. (Although you could do something quite useful with v$segstat
– segment level statistics – but I’ll have to leave that
topic for another day).

So if someone tells you that the buffer cache hit ratio must
be a good thing because Oracle has based their v$db_cache_advice technology
on it, then they are displaying a lack of understanding about the deficiencies
of the buffer cache hit ratio in particular, and how LRU
(least recently used) caching mechanisms work in general.

But if someone tells you that you may be able to improve your buffer cache
hit ratio by using the information in v$db_cache_advice, they are correct
– because it’s been designed to help you choose the best place to
allocate the memory you have, given the current workload you typically see, if
you want to get a particular buffer cache hit ratio (since
startup).

Personally, of course, I would rather be told that I could reduce physical
I/O to relieve the stress on the I/O subsystem by X%, or reduce the logical I/O
by Y% to reduce CPU cycles and latch contention, thereby improving response
times on critical actions by Z% , rather than hear someone say that the target
was simply to change an inherently meaningless number.

Design of a cache advisor

There are two significant features to the design: the conceptual bit, which
is very easy, and the mechanical implementation bit, which is very difficult if
you want to do something which is both useful and efficient.

Conceptual Requirement – the easy bit:

First you have to understand that Oracle runs an LRU chain through the
list of buffer headers (x$bh), and implements a touch-count
algorithm that indicates the number of visits that a buffer header has
received. To keep things simple I am going to pretend that there is only one
such chain, even though the buffer cache is usually split into multiple working
data sets (x$kcbwds) with an LRU chain (and LRU latch) per set.

When Oracle needs a buffer in order to read a new block into memory it
checks the touch count of the buffer header at the end of the LRU
chain – if this shows that people have been using the related
block the buffer header is moved (relinked) to the top of the LRU chain (and
its touch count is reset). If the touch count shows that the block was not
popular the buffer is cleared and re-used.

Obviously, if you clear block X from the tail-end buffer, you may have to
re-read that block at some later stage. The function of the cache advisory is
to let you know whether you could have avoided a later re-read of block X if
the buffer cache had been a little larger – just large enough that block
X hadn’t got to the tail-end position before someone wanted to re-read
it. Conversely, the cache advisory can also let you know that you could safely
reduce the buffer cache size without putting block X into jeopardy.

If you want to implement an advisor, the easiest thing to do is to keep two
copies of the buffer headers, each with their own LRU chain running through
them, and make the second copy much bigger than the primary copy but strip out
all the stuff that isn’t relevant to block identities. So when a buffer
is dumped from the primary copy its reference could still exist in the
secondary copy. Now all you have to do on a block request is compare what would
have happened on the secondary LRU with what actually happens on the real
(primary) LRU.Let’s do a worked
example:

Imagine the buffer cache (hence LRU chain) is 1,000 blocks.

Assume the advisory is supposed to report the effects of
having buffer caches sized at 500, 1,000, 1,500 and 2,000 blocks.

We want to read block X, so we check the secondary LRU chain – and
happen to find the block at location 1,800. Check the primary LRU, the block is
(obviously) not in memory, so we read it. Because we have done a physical read
on the primary LRU, we have to promote the block on the secondary LRU (but
where to?). Now we record some counts: Since the block was in the secondary LRU
at location 1,800, we increment the ‘estimated physical reads’
counter for the three cache sizes less than 1,800 but don’t increment the
counter for the largest cache size.

A little later we want to read block X again, so we check the secondary LRU
chain – we find that the block is at location 750. Check the primary LRU,
the block is in memory, so we don’t need to read it. But we would
have needed to read it if the buffer cache had been only 500 blocks. So
we increment the ‘estimated physical reads’ counter for the 500
block buffer cache size.

That’s it. The concept is very easy – but then you have to
implement it correctly and efficiently.

An alternative strategy – which may, in fact, be the one followed by
Oracle – is to have N (in our example 4) copies of the buffer headers,
where each copy is a different size (but all copies share the same LRU latches)
so that you can deal with the whole LRU/TCH algorithm properly for each copy
individually. I suspect that Oracle
is not doing this because when I’ve checked the content ofx$kcbsh, the block addresses
recorded in the structure haven’t shown any signs of overlapping or
interleaving.

The difficult bit(s):

You couldn’t run two copies of the LRU with a pretend double-size
buffer without paying a significant performance penalty. In my conceptual
description, every buffer request now does twice the work it used to because
two LRU chains have to be searched for every single block request. So
there has to be a cunning sampling strategy that trades precision with cost.
You can get a clue about this from x$kcbsh (Kernel Cache Buffer
Simulated Headers ?) which is broken into sets (set_id) and segments (segidx).
The sets correspond to the working data sets (x$kcbwds) of the buffer
cache, and each segment holds one row for every 128 rows in the corresponding
set from x$bh – a scale factor probably controlled by the hidden
parameter _db_cache_advice_batch_size.

If you are going to use a sampling mechanism, then you probably need a clever
strategy for deciding how to sample – a round-robin (e.g. every 13th
block in the file) is the easiest and least CPU intensive (as in - I only check
the secondary LRU if mod(block_id, 13) = 0; and only push a buffer header into
the secondary LRU on the same condition). But unless the sampling strategy was
a little devious, some systems could end up sampling blocks that gave a very
misleading image of how a change in cache size would affect things.

Oracle uses a modified LRU mechanism that loads blocks into the middle of
the LRU chain and promotes them to the top of the LRU chain only if they are
known to be useful by the time they reach the end of the LRU chain. This makes
it much harder to emulate fairly what would happen with a
different sized buffer cache. In my example, the block at location 1,800 ought
to have its touch count (which isn’t in the secondary buffer header)
incremented – in fact I had to promote it, and decided to promoted it to
the top of the secondary LRU. Maybe I should have promoted it to position 500
– which is where it would loaded in the primary LRU – or
position 1,000 which is where it would have been loaded in a buffer cache
matching the size of the nextlarger advisory
limit.

Similar, but opposite, problems arise with the simulation of caches that are
smaller than the current cache – you cannot simulate promotions that would
have taken place in a reduced cache because the simulated buffer header
has to stay where it is for the benefit of all the larger simulation sizes.

The problems of promotion and touch count from the last two paragraphs
explain why I made the comment in the concepts section above that Oracle may be
maintaining N (where N = 20) copies of the buffer headers – it would
certainly allow for a more realistic simulation of block promotion. However, if
they are keeping 20 copies, then each copy holds only 1 row for every 128 rows
in the real x$bh – so the scope for anomalies based on data
distribution goes up.

Conclusion

Even though the idea is simple, the technology, and intelligence, behind the
db cache advisories (and the other advisories based on LRU activity) is vastly
more sophisticated than the simple minded division of a couple of counters.
Don’t be fooled into thinking that the existence of the advisories retrospectively
justifies the use of cache hit ratios as targets for performance
analysis.

Afternote (21st May 2005)

If you are interested in hacking around with Oracle’s implementation,
here are a couple of (probably) related details.

Apart from the parameter mentioned above, there is another parameter labeled
_db_cache_advice_sample_factor
which may have something to do with the frequency with which an event is
sampled – perhaps Oracle only plays cache games on every fourth physical
read request.

In v$latch_children, you can find at least two latches that are
probably relevant, the simulator hash latch (I had 32 of
these on an Oracle 9.2 system with 1024 cache buffers chains latches), and
the simulator
lru latch (and my system showed the same number for this as it showed
for the cache buffers lru chain latch.

Addendum (23rd Nov 2005)

I’ve just received an email warning me about a bug in 9.2 that can
cause problems on highly active systems. If your system is a very busy one and
you use alter system to switch db_cache_advice to ON
after you have resized the cache then you may crash the instance.

In earlier versions of Oracle there used to be a bug that stopped you from
enabling cache advice because the simulator code needed to allocate some memory
from the shared pool and could not acquire it. The work around was to ensure
that you always started the database with db_cache_advice set to READY
so that the memory was pre-allocated though unused.. Of course, if you then
increased the size of the cache that did beg the question about how the extra
memory gets allocated.

In this latest bug, the crash can occur even if the db_cache_advice is being
changed from READY to ON.