Increasing disk I/O throughput by increasing the buffer cache size

If the read and write buffer cache hit rates
(%rcache and %wcache) reported by
sar -b (or mpsar -b for SMP)
show consistently low values,
you can improve disk I/O performance by
increasing the size of the buffer cache.
This is particularly worth doing if the number of kilobytes
of data transferred per second between the buffer cache and disk
(bread/s + bwrit/s) is high.
You can also examine the
benefit to disk I/O performance
using sar -d as described in
``Viewing disk and other block I/O activity''.
This should show improved
%busy, avque, and avwait
figures for disks containing regularly accessed filesystems
as the buffer cache size is increased.
Even if the impact on disk I/O is not significant,
requesting processes benefit by not having to perform as many
waits because of cache misses.

You should also note that increasing the
size of the buffer cache directly reduces the amount of memory
available for user processes. If free memory is reduced, the
system may be more susceptible to
paging out
and
swapping.
If you increase the buffer cache size, you should
monitor paging and swapping as well as buffer cache activity.

If a compromise cannot be reached between these
resources and the applications being run
cannot be tuned to reduce disk access,
then the only alternative is to add either more
memory or improve the throughput of the disk drives.

The following table is a summary of the commands
that can be used to view buffer cache activity:

Viewing buffer cache activity

Command

Field

Description

[mp]sar -b

%rcache

percentage by volume of data read from block devices satisfied
using the buffer cache

%wcache

percentage by volume of data written to block devices satisfied
using the buffer cache

To increase the size of the buffer cache first determine the
number of I/O buffers as outlined in the subsection
``Viewing buffer cache activity''.
The number of buffers can then be changed by modifying the
NBUF kernel parameter.

It is not possible to recommend values of %rcache and
%wcache for which you should aim. The values depend to a
great extent on the mix of applications that your system is
running, the speed of its disk subsystems, and on the amount of
memory available. Lower limits can be quoted such as 90% for
%rcache and 65% for %wcache, but
you should not assume that these are ideal for your system. Ideal
values would be 100% for both hit rates but you are unlikely
to see these on a real system.

The maximum possible value of %rcache depends on how
often new files are accessed whose data has not already been
cached. Applications which read files sporadically or randomly
will tend to have lower values for %rcache.
If files are read which are not then subsequently re-read,
this has the additional disadvantage of removing possibly useful
buffers from the cache for reading and writing.

The effectiveness of caching blocks for write operations depends
on how often applications need to modify data within the same blocks
and how long delated-write buffers can remain in the buffer cache
before their contents are written to disk.
The average time that data remains in memory before being flushed
is NAUTOUP + (BDFLUSHR / 2). This is 25
seconds given the default values of these parameters.

If applications tend to write to the same
blocks on a time scale that is greater than this,
the same buffers will be flushed to disk more
often. If applications append to files
but do not modify existing buffers, the write hit rate will be
low and the newly written blocks will tend to remove possibly
useful buffers from the cache.
If you are running such applications on your system,
increasing the buffer cache size may adversely affect system
performance whenever the buffer flushing daemon runs.
When this happens,
applications may appear to stop working temporarily (hang)
although most keyboard input will continue to be echoed to the screen.
Applications such as
vi(C)
and
telnet(TC)
which process keyboard input in user mode may
appear to stop accepting key strokes.
The kernel suspends the activity of all
user processes until the flushing daemon has written the delayed-write
buffers to disk. On a large buffer cache, this could take several seconds.
To improve this situation, spread out the disk activity
over time in the following ways:

Decrease the value of BDFLUSHR so that the flushing daemon runs
more often. This will reduce the peak demand on disk I/O at the
possible expense of a slight increase in context switching activity.

Decrease the value of NAUTOUP so that fewer delayed-write buffers
accumulate in the cache. Potentially useful data remains in the buffers
that have been marked clean until they are reused. Do not reduce
NAUTOUP too much or caching may become ineffective.

Use caching disk controllers (with battery backup if you are concerned
about the integrity of your data).

Some applications such as database management systems provide their own
buffer caching strategy. This usually operates through the raw disk device
and so does not use the operating system buffer cache.

You cannot independently tune the read and write hit rates
(%rcache and %wcache).
If the number of kilobytes of data read per second into the buffer
cache from disk (bread/s) is much higher than
the number written to disk (bwrit/s), you should
attach more significance to the value of %rcache.
On most systems, you will find that there is more data read
from than written to disk.

Increasing the value of NBUF has most effect for low
cache hit rates -- for high cache hit rates, the curves start
to level off (saturate) and you need a large increase in
NBUF to produce a small increase in the hit rate.
For example, to increase the read hit rate (%bread)
from 90% to 95%, a relative increase of 5.6%,
you might need to double the value of NBUF. Although
the read hit rate increases by only 5.6%, the amount of data
that needs to be read from disk has been reduced by 50%.
If disk I/O is a problem and your system is also
not short of memory, you may consider it worthwhile
to increase the size of the buffer cache.

If your system has a large amount of memory and shows no
swapping or significant paging out activity at peak load,
you may wish to try increasing the size of the buffer cache.
Provided that you do not allocate too much memory to buffers
(so causing the system to page out and swap), this should
reduce I/O activity and improve the interactive
performance of applications.
You should do this as an iterative
process while monitoring the buffer cache hit rate and the
amount of physical memory available to user processes.

If the amount of free memory drops drastically and the
system begins to page out and swap, you should
reduce the size of the buffer cache.
See
``Tuning memory resources''
for more information.

Overriding the size of the buffer cache at boot time

You can use the nbuf bootstring to set a different size for
the buffer cache when the system is booted.
The value supplied as the argument to nbuf
overrides the value of NBUF configured into the kernel.
For example, the following command to
boot(HW)
sets the buffer cache size to 150KB in addition to using the
default bootstring:

defbootstr nbuf=150

If NHBUF is set to 0, the number of hash queues
will automatically be adjusted for the new buffer cache size.