Tuning HADB

The Application Server uses the high-availability database
(HADB) to store persistent session state data. To optimize performance, tune the HADB
according to the load of the Application Server. The data volume, transaction frequency,
and size of each transaction can affect the performance of the HADB, and consequently
the performance of Application Server.

Disk Use

This section discusses how to calculate HADB data device size and explains the
use of separate disks for multiple data devices.

Calculating HADB Data Device Size

When the HADB database
is created, specify the number, and size of each data device. These devices must have
room for all the user data to be stored. In addition, allocate extra space to account
for internal overhead as discussed in the following section.

If the database runs out of device space, the HADB returns error codes 4593
or 4592 to the Application Server.

HADB also writes these
error messages to history files. In this case, HADB blocks any client requests to
insert, or update data. However, it will accept delete operations.

HADB stores session states as binary data. It serializes the session state and
stores it as a BLOB (binary large object). It splits each BLOB into chunks of approximately
7KB each and stores each chunk as a database row (context row is synonymous with tuple,
or record) in pages of 16KB.

There is some small memory overhead for each row (approximately 30 bytes). With
the most compact allocation of rows (BLOB chunks), two rows are stored in a page.
Internal fragmentation can result in each page containing only one row. On average,
50% of each page contains user data.

For availability in case of node failure, HADB always replicates user data.
An HADB node stores its own data, plus a copy of the data from its mirror node. Hence,
all data is stored twice. Since 50% of the space on a node is user data (on average),
and each node is mirrored, the data devices must have space for at least four times
the volume of the user data.

In the case of data refragmentation, HADB keeps both the old and the new versions
of a table while the refragmentation operation is running. All application requests
are performed on the old table while the new table is being created. Assuming that
the database is primarily used for one huge table containing BLOB data for session
states, this means the device space requirement must be multiplied by another factor
of two. Consequently, if you add nodes to a running database, and want to refragment
the data to use all nodes, you must have eight times the volume of user data available.

Additionally, you must
also account for the device space that HADB reserves for its internal use (four times
that of the LogBufferSize). HADB uses this disk space for temporary
storage of the log buffer during high load conditions.

Tuning Data Device Size

To increase the size of the HADB data devices, use the following command:

Placing HADB files on Physical Disks

For best performance,
data devices should be allocated on separate physical disks. This applies if there
are nodes with more than one data device, or if there are multiple nodes on the same
host.

Place devices belonging to different nodes on different devices. Doing this
is especially important for Red Hat AS 2.1, because HADB nodes have been observed
to wait for asynchronous I/O when the same disk is used for devices belonging to more
than one node.

An HADB node writes information, warnings, and errors to the history file synchronously,
rather than asynchronously, as output devices normally do. Therefore, HADB behavior
and performance can be affected any time the disk waits when writing to the history
file. This situation is indicated by the following message in the history file:

BEWARE - last flush/fputs took too long

To avoid this problem, keep the HADB executable files and the history files
on physical disks different from those of the data devices.

Memory Allocation

It is essential to allocate sufficient memory for HADB,
especially when it is co-located with other processes.

The HADB Node
Supervisor Process (NSUP) tracks the time elapsed since the last time it performed
monitoring. If the time exceeds a specified maximum (2500 ms, by default), NSUP restarts
the node. The situation is likely when there are other processes in the system that
compete for memory, causing swapping and multiple page faults. When the blocked node
restarts, all active transactions on that node are aborted.

If Application Server throughput slows and requests abort or time out, make sure
that swapping is not the cause. To monitor swapping activity on Unix systems, use
this command:

vmstat -S

In addition, look for this message in the HADB history files. It is written
when the HADB node is restarted, where M is greater than N:

Process blocked for .M. sec, max block time is .N. sec

The presence of aborted transactions will be signaled by the error message

HADB00224: Transaction timed out or HADB00208: Transaction aborted.

Performance

For best performance, all HADB processes (clu_xxx_srv) must
fit in physical memory. They should not be paged or swapped. The same applies for
shared memory segments in use.

You can configure the size of some of the shared memory segments. If these segments
are too small, performance suffers, and user transactions are delayed or even aborted.
If the segments are too large, then the physical memory is wasted.

DataBufferPoolSize

The HADB stores data on data devices, which are allocated on disks. The data
must be in the main memory before it can be processed. The HADB node allocates a portion
of shared memory for this purpose. If the allocated database buffer is small compared
to the data being processed, then disk I/O will waste significant processing capacity.
In a system with write-intensive operations (for example, frequently updated session
states), the database buffer must be big enough that the processing capacity used
for disk I/O does not hamper request processing.

The database buffer is
similar to a cache in a file system. For good performance, the cache must be used
as much as possible, so there is no need to wait for a disk read operation. The best
performance is when the entire database contents fits in the database buffer. However,
in most cases, this is not feasible. Aim to have the “working set” of
the client applications in the buffer.

Also monitor the disk I/O. If HADB performs many disk read operations, this
means that the database is low on buffer space. The database buffer is partitioned
into blocks of size 16KB, the same block size used on the disk. HADB schedules multiple
blocks for reading and writing in one I/O operation.

Use the hadbm deviceinfo command to monitor disk use. For
example, hadbm deviceinfo --details will produce output similar
to this:

NodeNo TotalSize FreeSize Usage
0 512 504 1%
1 512 504 1%

The columns in the output are:

TotalSize: size of device in MB.

FreeSize: free size in MB.

Usage: percent used.

Use the hadbm resourceinfo command to monitor resource usage, for example the following command displays
data buffer pool information:

Free: Free size, when the data volume is larger than the buffer. (The
entire buffer is used at all times.)

Access: Number of times blocks that have been accessed in the buffer.

Misses: Number of block requests that “missed the cache”
(user had to wait for a disk read)

Copy-on-write: Number of times the block has been modified while it
is being written to disk.

For a well-tuned system, the number of misses
(and hence the number of reads) must be very small compared to the number of writes.
The example numbers above show a miss rate of about 4% (200 million access, and 8
million misses). The acceptability of these figures depends on the client application
requirements.

LogBufferSize

Before it executes them, HADB logs all operations that modify the database,
such as inserting, deleting, updating, or reading data. It places log records describing
the operations in a portion of shared memory referred to as the (tuple) log buffer.
HADB uses these log records for undoing operations when transactions are aborted,
for recovery in case of node crash, and for replication between mirror nodes.

The log records remain in the buffer until they are processed locally and shipped
to the mirror node. The log records are kept until the outcome (commit or abort) of
the transaction is certain. If the HADB node runs low on tuple log, the user transactions
are delayed, and possibly timed out.

Tuning LogBufferSize

Begin with the default value. Look for HIGH LOAD informational
messages in the history files. All the relevant messages will contain tuple
log or simply log, and a description of the internal
resource contention that occurred.

Under normal operation the log is reported as 70 to 80% full. This is because
space reclamation is said to be “lazy.” HADB requires as much data in
the log as possible, to recover from a possible node crash.

Use the following command to display information on log buffer size and use:

hadbm resourceinfo --logbuf

For example, output might look like this:

Node No. Avail Free Size
0 44 42
1 44 42

The columns in the output are:

Node No.:The node number.

Avail: Size of buffer, in megabytes.

Free Size: Free size, in MB, when the data volume is larger than the
buffer. The entire buffer is used at all times.

InternalLogbufferSize

The node internal log (nilog) contains information about
physical (as opposed to logical, row level) operations at the local node. For example,
it provides information on whether there are disk block allocations and deallocations,
and B-tree block splits. This buffer is maintained in shared memory, and is also checked
to disk (a separate log device) at regular intervals. The page size of this buffer,
and the associated data device is 4096 bytes.

Large BLOBs necessarily allocate many disk blocks, and thus create a high load
on the node internal log. This is normally not a problem, since each entry in the nilog is small.

Tuning InternalLogbufferSize

Begin with the default value. Look out for HIGH LOAD informational
messages in the history files. The relevant messages contain nilog,
and a description of the internal resource contention that occurred.

Use the following command to display node internal log buffer information:

If the size of the nilog buffer is changed, the associated
log device (located in the same directory as the data devices) also changes. The size
of the internal log buffer must be equal to the size of the internal log device. The
command hadbm set InternalLogBufferSize ensures this requirement.
It stops a node, increases the InternalLogBufferSize, re initializes
the internal log device, and brings up the node. This sequence is performed on all
nodes.

NumberOfLocks

Each row level operation requires a lock in the database. Locks are held until a transaction commits or rolls back.
Locks are set at the row (BLOB chunk) level, which means that a large session state
requires many locks. Locks are needed for both primary, and mirror node operations.
Hence, a BLOB operation allocates the same number of locks on two HADB nodes.

When a table refragmentation is performed, HADB needs extra lock resources.
Thus, ordinary user transactions can only acquire half of the locks allocated.

Calculating the number of locks

To calculate the number of locks needed, estimate the following parameters:

Number of concurrent users that request session data to be stored
in HADB (one session record per user)

Maximum size of the BLOB session

Persistence scope (max session data size in case of session/modified
session and maximum number of attributes in case of modified session). This requires setAttribute() to be called every time the session data is modified.

If:

x is the maximum number of concurrent users,
that is, x session data records are present in the HADB, and

Record operations such as insert, delete, update and read will use one lock
per record.

Note –

Locks are held for both primary records and hot-standby records. Hence,
for insert, update and delete operations a transaction will need twice as many locks
as the number of records. Read operations need locks only on the primary records.
During refragmentation and creation of secondary indices, log records for the involved
table are also sent to the fragment replicas being created. In that case, a transaction
needs four times as many locks as the number of involved records. (Assuming all queries
are for the affected table.)

Summary

If refragmentation is performed, the number of locks to be configured is:

Nlocks = 4x (y/7000
+ 2) = 2xy/3500 + 2x

Otherwise, the number of locks to be configured is:

Nlocks = 2x (y/7000
+ 2) = xy/3500 + 4x

Tuning NumberOfLocks

Start with the default value. Look for exceptions with the indicated error codes
in the Application Server log files. Remember that under normal operations (no ongoing
refragmentation) only half of the locks might be acquired by the client application.

To get information on allocated locks and locks in use, use the following command:

hadbm resourceinfo --locks

For example, the output displayed by this command might look something like
this:

Node No. Avail Free Waits
0 50000 50000 na
1 50000 50000 na

Avail: Number of locks available.

Free: Number of locks in use.

Waits: Number of transactions that have waited for a lock.“na”
(not applicable) if all locks are available.

Timeouts

This section describes some of the timeout
values that affect performance.

JDBC connection pool timeouts

These values govern how much time the server waits for a connection from the
pool before it times out. In most cases, the default values work well. For detailed
tuning information, see Tuning JDBC Connection Pools.

Load Balancer timeouts

Some values that may affect performance are:

response-timeout-in-seconds -The time for which the load balancer
plug-in will wait for a response before it declares an instance dead and fails over
to the next instance in the cluster. Make this value large enough to accommodate the
maximum latency for a request from the server instance under the worst (high load)
conditions.

health checker: interval-in-seconds - Determines how frequently the
load balancer pings the instance to see if it is healthy. Default value is 30 seconds.
If the response-timeout-in-seconds is optimally tuned, and the server doesn’t
have too much traffic, then the default value works well.

health checker: timeout-in-seconds - How long the load balancer waits
after “pinging” an instance. The default value is 100 seconds.

The combination of the health checker’s interval-in-seconds and timeout-in-seconds
values determine how much additional traffic goes from the load balancer plug-in to
the server instances.

HADB timeouts

Operating System Configuration

The following section describes configuration of the operating system.

Semaphores

If the number of semaphores
is too low, HADB can fail and display this error message:

No space left on device

This can occur either while starting the database, or during run time. Since
the semaphores are provided as a global resource by the operating system, the configuration
depends on all processes running on the host, and not the HADB alone. In Solaris,
configure the semaphore settings by editing the /etc/system file.

To run the nodes, NNODES (the number of nodes submitted implicitly by --hosts option to the HADB) and NCONNS connections (HADB configuration parameter NumberOfSessions, default value being 100) per host, use the following semaphore
settings:

If you plan to run multiple nodes per host, make sure semmap = NNODES. Use the sysinfo and sysdef commands
to inspect the settings.

Shared Memory

Set the maximum shared
memory size to the total amount of physical RAM. Additionally, set the maximum number
of shared memory segments per process to six or more to accommodate the HADB processes.
Set the number of system-wide, shared memory identifiers based on the number of nodes
running on the host.

Solaris

In Solaris 9, because of the
kernel changes, the hmsys:shminfo_shmseg variable is obsolete.
In Solaris 8, add the following settings to the /etc/system file: