At this point, when all the tuning options have been exhausted, there
are only two options: reduce the demand for memory by modifying the
work load or add memory to the system.

The cost to add memory to a system has decreased significantly over
time. This trend will likely continue.

For many modern systems, adding memory is the most cost effective way
to address performance problems. For older systems, the cost of adding
memory may be significantly higher than for newer systems, but it may
still be less than the cost of a system manager performing many hours
of system analysis and tuning, and the additional time it may take to
achieve better performance. All relevant costs need to be taken into
account when deciding if working with the existing hardware will be
less expensive than adding hardware.

If you conclude you need to add memory, you must then determine how
much to add. Add as much memory as you can afford. If you need to
establish the amount more scientifically, try the following empirical
technique:

Determine or estimate a paging rate you believe would represent a
tolerable level of paging on the system. (If many applications share
memory, make allowances for global valid faults by deducting the global
valid fault rate from the total page fault rate.)

Turn off swapper trimming (set SWPOUTPGCNT to the maximum value
found for WSQUOTA).

Give the processes large enough working set quotas so that you
achieve the tolerable level of paging on the system while it is under
load.

The amount of memory required by the processes that are outswapped
represents an approximation of the amount of memory your system would
need to obtain the desired performance under load conditions.

Once you add memory to your system, be sure to invoke AUTOGEN so that
new parameter values can be assigned on the basis of the increased
physical memory size.

If you identify certain disks as good candidates for improvement, check
for excessive use of the disk resource by one or more processes. The
best way to do this is to use the MONITOR playback feature to obtain a
display of the top direct I/O users during each collection interval.
The direct I/O operations reported by MONITOR include all user disk I/O
and any other direct I/O for other device types. In many cases, disk
I/O represents the vast majority of direct I/O activity on OpenVMS
systems, so you can use this technique to obtain information on
processes that might be supporting excessive disk I/O activity.

If it appears that one or two processes are consistently the top direct
I/O users, you may want to obtain more information about which images
they are running and which files they are using. Because this
information is not recorded by MONITOR, it can be obtained in any of
the following ways:

Run MONITOR in live mode. Enter DCL SHOW commands when the
situation reoccurs.

When you have identified a process that consistently issues a
significant number of direct I/O requests, use the SHOW
PROCESS/CONTINUOUS DCL command to look for more information about the
process and the image being run.

In addition, you can use the SHOW DEVICE /FILES command to show all
open files on particular disk volumes. It is important to know the file
names of heavily used files to perform the offloading and
load-balancing operations. (For more information, see Sections
12.1.3 and 12.1.4.)

The system uses the disk I/O subsystem for three activities: paging,
swapping, and XQP operations. This kind of disk I/O is a good place to
start when setting out to trim disk I/O load. All three types of system
I/O can be reduced readily by offloading to memory. Swapping I/O is a
particularly data-transfer-intensive operation, while the other types
tend to be more seek-intensive.

Page Read I/O Rate, also known as the hard fault rate,
is the rate of read I/O operations necessary to satisfy page faults.
Since the system attempts to cluster several pages together whenever it
performs a read, the number of pages actually read will be greater than
the hard fault rate. The rate of pages read is given by the Page Read
Rate.

Use the following equation to compute the average transfer size (in
bytes) of a page read I/O operation:

Most page faults are soft faults. Such faults require
no disk I/O operation, because they are satisfied by mapping to a
global page or to a page in the secondary page cache (free-page list
and modified-page list). An effectively functioning cache is important
to overall system performance. A guideline that may be applied is that
the rate of hard faults---those requiring a disk I/O
operation---should be less than 10% of the overall page fault rate,
with the remaining 90% being soft faults. Even if the hard fault rate
is less than 10%, you should try to reduce it further if it represents
a significant fraction of the disk I/O load on any particular node or
individual disk (see Section 7.2.1.2).

Note that the number of hard faults resulting from image activation can
be reduced only by curtailing the number of image activations or by
exercising LINKER options such as /NOSYSSHR (to reduce image
activations) and reassignment of PSECT attributes (to increase the
effectiveness of page fault clustering).

This guideline is provided to direct your attention to a potentially
suboptimal configuration parameter that may affect the overall
performance of your system. The nature of your system may make this
objective unachievable or render change of the parameter ineffective.
Upon investigating the secondary page cache fault rate, you may
determine that the secondary page cache size is not the only limiting
factor. Manipulating the size of the cache may not affect system
performance in any measurable way. This may be due to the nature of the
workload, or bottlenecks that exist elsewhere in the system. You may
need to upgrade memory, the paging disk, or other hardware.

The Page Write I/O Rate represents the rate of disk I/O operations to
write pages from the modified-page list to backing store (paging and
section files). As with page read operations, page write operations are
clustered. The rate of pages written is given by the Page Write Rate.

Use the following equation to compute the average transfer size (in
bytes) of a page write I/O operation:

The frequency with which pages are written depends on the page
modification behavior of the work load and on the size of the
modified-page list. In general, a larger modified-page list must be
written less often than a smaller one.

Swapping I/O should be kept as low as possible. The Inswap Rate item of
the I/O class lists the rate of inswap I/O operations. In typical
cases, for each inswap, there can also be just as many outswap
operations. Try to keep the inswap rate as low as possible---no greater
than 1. This is not to say that swapping should always be eliminated.
Swapping, as implemented by the active memory reclamation policy, is
desirable to force inactive processes out of memory.

Swap I/O operations are very large data transfers; they can cause
device and channel contention problems if they occur too frequently.
Enter the DCL command SHOW MEMORY/FILES/FULL to list the swapping files
in use. If you have disk I/O problems on the channels servicing the
swapping files, attempt to reduce the swap rate. (Refer to
Section 11.13 for information about converting to a system that rarely
swaps.)

To determine the rate of I/O operations issued by the XQP on a nodewide
basis, do the following:

Add the Disk Read Rate and Disk Write Rate items of the FCP class
for each node.

Compare this number to the sum of the I/O Operation Rate figures
for all disks on that same node. If this number represents a
significant fraction of the disk I/O on that node, attempt to make
improvements by addressing one or more of the following three sources
of XQP disk I/O operations: cache misses, erase operations, and
fragmentation.

Check the FILE_SYSTEM_CACHE class for the level of activity (Attempt
Rate) and Hit Percentage for each of the seven caches maintained by the
XQP. The categories represent types of data maintained by the XQP on
all mounted disk volumes. When an attempt to retrieve an item from a
cache misses, the item must be retrieved by issuing one or more disk
I/O requests. It is therefore important to supply memory caches large
enough to keep the hit percentages high and disk I/O operations low.

Cache sizes are controlled by the ACP/XQP system parameters. Data items
in the FILE_SYSTEM_CACHE display correspond to ACP/XQP parameters as
follows:

FILE_SYSTEM_CACHE Item

ACP/XQP Parameters

Dir FCB

ACP_SYSACC
ACP_DINDXCACHE

Dir Data

ACP_DIRCACHE

File Hdr

ACP_HDRCACHE

File ID

ACP_FIDCACHE

Extent

ACP_EXTCACHE
ACP_EXTLIMIT

Quota

ACP_QUOCACHE

Bitmap

ACP_MAPCACHE

The values determined by AUTOGEN should be adequate. However, if hit
percentages are low (less than 75%), you should increase the
appropriate cache sizes (using AUTOGEN), particularly when the attempt
rates are high.

If you decide to change the ACP/XQP cache parameters, remember to
reboot the system to make the changes effective. For more information
on these parameters, refer to the OpenVMS System Management Utilities Reference Manual.

If your system is running with the default HIGHWATER_MARKING attribute
enabled on one or more disk volumes, check the Erase Rate item of the
FCP class. This item represents the rate of erase I/O requests issued
by the XQP to support the high-water marking feature. If you did not
intend to enable this security feature, see Section 2.2 for
instructions on how to disable it on a per-volume basis.

When a disk becomes seriously fragmented, it can cause additional XQP
disk I/O operations and consequent elevation of the disk read and disk
write rates. You can restore contiguity for badly fragmented files by
using the Backup (BACKUP) and Convert (CONVERT) utilities, the
COPY/CONTIGUOUS DCL command, or the Compaq File Optimizer for OpenVMS,
an optional software product. It is a good performance management
practice to do the following:

Perform image backups of all disks periodically, using the output
disk as the new copy. BACKUP consolidates allocated space on the new
copy, eliminating fragmentation.

Test individual files for fragmentation by entering the DCL command
DUMP/HEADER to obtain the number of file extents. The fewer the
extents, the lower the level of fragmentation.

Pay particular attention to heavily used indexed files, especially
those from which records are frequently deleted.

Use the Convert utility (CONVERT) to reorganize the index file
structure.

To avoid excessive disk I/O, enable RMS local and global buffers on the
file level. This allows processes to share data in file caches, which
reduces the total memory requirement and reduces the I/O load for
information already in memory.

Global buffering is enabled on a per file basis via the SET
FILES/GLOBAL_BUFFER=n DCL command. You can also set default
values for RMS for the entire system through the SET RMS_DEFAULT
command and check values with the SHOW RMS_DEFAULT command. For more
information on these commands, refer to the OpenVMS DCL Dictionary. Background
material on this topic is available in the Guide to OpenVMS File Applications.

Note that file buffering can also be controlled programmatically by
applications (see the description of XAB$_MULTIPLEBUFFER_COUNT in the
OpenVMS Record Management Services Reference Manual). Therefore, your DCL command settings may be overridden.

Install frequently used images to save memory and decrease the
number of I/O operations required during image activation. (See
Section 2.4.)

Decompress library files (especially HELP files) to decrease the
number of I/O operations and reduce the CPU time required for library
operations. Users will experience faster response to DCL HELP commands.
(See Section 2.1.)

Use global data buffers (if your system has sufficient memory) for
the following system files: VMSMAIL_PROFILE.DATA, SYSUAF.DAT, and
RIGHTSLIST.DAT.

Tune applications to reduce the number of I/O requests by improving
their buffering strategies. However, you should make sure that you have
adequate working sets and memory to support the increased buffering.
This approach will decrease the number of accesses to the volume at the
expense of additional memory requirements to run the application.
The following are suggestions of particular interest to application
programmers:

Read or write more data per I/O operation.

For sequential files, increase the multiblock count to move more
data per I/O operation while maintaining proper process working set
sizes.

Turn on deferred write for sequential access to indexed and
relative files; an I/O operation then occurs only when a bucket is
full, not on each $PUT. For example, without deferred write enabled, 10
$PUTs to a bucket that holds 10 records require 10 I/O operations. With
deferred write enabled, the 10 $PUTs require only a single I/O
operation.

Enable read ahead/write behind for sequential files. This provides
for the effective use of the buffers by allowing overlap of I/O and
buffer processing.

Given ample memory on your system, consider having a deeper index
tree structure with smaller buckets, particularly with shared files.
This approach sometimes reduces the amount of search time required for
buckets and can also reduce contention for buckets in high-contention
index file applications.

For indexed files, try to cache the entire index structure in
memory by manipulating the number and size of buckets.

If it is not possible to cache the entire index structure, you may
be able to reduce the index depth by increasing the bucket size. This
will reduce the number of I/O operations required for index information
at the expense of increased CPU time required to scan the larger
buckets.

The objective of disk I/O load balancing is to minimize the amount of
contention for use by the following:

Disk heads available to perform seek operations

Channels available to perform data transfer operations

You can accomplish that objective by moving files from one disk to
another or by reconfiguring the assignment of disks to specific
channels.

Contention causes increased response time and, ultimately, increased
blocking of the CPU. In many systems, contention (and therefore
response time) for some disks is relatively high, while for others,
response time is near the achievable values for disks with no
contention. By moving some of the activity on disks with high response
times to those with low response times, you will probably achieve
better overall response.

Use the guidelines in Section 8.2 to identify disks with excessively
high response times that are at least moderately busy and attempt to
characterize them as mainly seek intensive or data-transfer intensive.
Then use the following techniques to attempt to balance the load by
moving files from one disk to another or by moving an entire disk to a
different physical channel:

Distribute seek-intensive activity evenly among the disks available
for that purpose.

Distribute data-transfer-intensive activity evenly among the disks
available for that purpose (on separate channels where possible).

Note

When using Array Controllers (HSC, HSJ, HSZ, or other network or RAID
controllers), the channels on the controller should also be balanced.
You can use the controller console to obtain information on the
location of the disks.

To move files from one disk to another, you must know, in general, what
each disk is used for and, in particular, which files are ones for
which large transfers are issued. You can obtain a list of open files
on a disk volume by entering the SHOW DEVICE/FILES DCL command.
However, because the system does not maintain transfer-size
information, your knowledge of the applications running on your system
must be your guide.

Use search lists to move read-only files, such as images, to
different disks. This technique is not well suited for write operations
to the target device, because the write will take place to the first
volume/directory for which you have write access.

Define volume sets to distribute access to files requiring read and
write access. This technique is particularly helpful for applications
that perform many file create and delete operations, because the file
system will allocate a new file on the volume with the greatest amount
of free space.

Move paging and swapping activity off the system disk by creating,
on a less heavily utilized disk, secondary page and swapping files that
are significantly larger than the primary ones on the system disk. This
technique is particularly important for a shared system disk in an
OpenVMS Cluster, which tends to be very busy.

Move frequently accessed files off the system disk. Use logical
names or, where necessary, other pointers to access them. (See
Section 2.6 for a list of frequently accessed system files.) This
technique is particularly effective for a shared system disk in an
OpenVMS Cluster.

All the tuning solutions for performance problems based on I/O
limitations involve using memory to relieve the I/O subsystem. The five
most accessible mechanisms are the Virtual I/O or extended file cache,
the ACP caches, RMS buffering, file system caches, and RAM disks.

Virtual I/O cache (VIOC) is a clusterwide, write-through,
file-oriented, disk cache that can reduce the number of disk I/O
operations and increase performance. The virtual I/O cache increases
system throughput by reducing file I/O response times with minimum
overhead. The virtual I/O cache operates transparently to system
management and application software, and maintains system reliability
while it significantly improves virtual disk I/O read performance.

The Extended File Cache (XFC) is a virtual block data cache provided
with OpenVMS Alpha Version 7.3 as a replacement for the Virtual I/O
Cache. Similar to the Virtual I/O Cache, the XFC is a clusterwide, file
system data cache.

Both file system data caches are compatible and coexist in an OpenVMS
Cluster. You can use only one cache (XFC or VIOC) on each node. XFC is
available only on Alpha systems.