Upgrading the High Availability Database

Pre-upgrade Tasks/Data Migration

Before You Begin

Users should keep the HADB history files, management agent configuration
files, log files and repository, and all the data devices outside the installation
path. If not, this should be done prior to the upgrade. To move the management
repository and configuration files:

Stop all the old management agents and keep the HADB nodes running.

On each host, move the repository directory to the new location.

On each host, copy the dbconfig directory
to the new location.

On each host, update the mgt.cfg file, and
set the correct path for dbconfig and repository directory.

Start the management agents using the updated mgt.cfg file.

Upgrade Procedure

To upgrade from HADB version 4.4.x to version 4.4.2-7, perform the following
steps:

Perform the pre-upgrade tasks mentioned above as necessary.

Install HADB version 4.4.2-7 on all HADB hosts (on another path
than that of version 4.4.x, for instance on /opt/SUNWhadb/4.4.2-7).

Install the HADB 4.4.2-7 version on the hadbm client
hosts, if they are different than that of the HADB hosts.

Stop all management agents running on all HADB hosts.

Start the management agent processes using the new version's software,
but with the old configuration files. In the remaining steps, please use the hadbm command found in the new version's bin directory.

Register the package in the management domain (default package
name becomes V4.4, so another package name may be required to avoid conflicts
with existing packages having the same name):

hadbm registerpackage --packagepath=/opt/SUNWhadb/4.4.2-7 V4.4.2-7

Run the hadbm listpackages command and check
that the new package is registered in the domain.

Restart the database with the new hadbm version
4.4.2-7. If it is necessary to move the devices and history files, run online
upgrade combined with setting new paths for devices and history files in one
single operation:

Otherwise, if the devices and history files are already outside of the
installation directory, run the following command, which only does a rolling
restart of the nodes:

hadbm set packagename=V4.4.2-7 database name

Check that the database status is “running” (using
the hadbm status command) and that it functions normally,
serving the client transactions.

If everything is working, the old installation can be removed
later. Before unregistering the old package, remove all references to the
old package from the ma repository. Otherwise, hadbm
unregisterpackage will fail with “package in use.” A
dummy reconfiguration operation, for instance, hadbm set connectiontrace=same as previous value will remove all references
to the old package. Now, unregister the old package:

hadbm unregisterpackage [--hosts=host-list] old pacakge name

Remove the old installation from the file system.

Testing the Upgrade

On Solaris, to test that the upgrade was successful, check that the
upgrade was performed properly:

Ensure that the running processes use the new binaries. Check
the following in all HADB nodes:

new path/bin/ma -v
new path/bin/hadbm -v

Check whether the database is running. The following command should
show that all the HADB nodes are in a “running” state.

new path/bin/hadbm status -n

Ensure that the products using HADB have changed their pointers
to point to the new HADB path.

The products using the HADB can run their upgrade tests to verify
the HADB upgrade is also working.

After an online upgrade, if
the new version does not work properly, go back to using the previous HADB
version. However, if there has been a change to the management agent repository,
the HADB itself can be downgraded, but the new management agent must be kept
running.

Special Deployment and Upgrade Information

This section lists additional information about HADB deployment and
upgrading.

Deployment

Store device, log and history files on local disks only, do
not use remote-mounted file systems.

If more than one node is placed on a host, it is recommended
to keep the devices belonging to each node on different disks. Otherwise,
the disk contention would reduce the performance. Symptoms of this problem
can be seen in the history files by the messages such as BEWARE
- last flush/fputs took too long. When one single node has more
than one data device file, it is recommended to use separate disks for these
device files.

Use local disks (preferably separate disk than the one used
for data devices) to install HADB binaries on HADB hosts. NFS delays or disk
contention may cause node restarts with warning, Process blocked
for nnn, max block time is nnn in the history files.

Do not place the HADB devices, history files, management agent
directories and agent configuration files in the HADB package path. This will
cause problems when upgrading to newer versions and deleting the old package
path.

This release of HADB is officially supported for a maximum
of 28 nodes; 24 active data nodes with 4 spares.

We recommend using the same version for the JDBC driver and
the HADB server.

We do not support IPv6, only IPv4.

The command line length on Windows is restricted to 2048 bytes.

The network must be configured for UDP multicast.

Due to excessive swapping observed in RedHat Enterprise Linux
3.0, updates 1 through 3, we do not recommend it as a deployment platform.
The problem is fixed in RedHat Enterprise Linux 3.0 update 4.

Possibility of running NSUP with real time
priority.

The node supervisor (NSUP) processes
(clu_nsup_srv) ensure the high availability of the HADB
with the help of exchanging “heartbeat” messages in a timely manner.
The timing gets affected when an NSUP is colocated with
other processes causing resource starvation. The consequence is false network
partitioning and node restarts (preceded by a warning “Process blocked
for n seconds” in history files) resulting in aborted
transactions and other exceptions.

To solve this problem, clu_nsup_srv (found in installpath/lib/server)
must have the suid bit set and the file must be owned by
root. This is achieved manually by the commands:

# chown root clu_nsup_srv
# chmod u+s clu_nsup_srv

This causes the clu_nsup_srv process to run as the
user root when started, and this in turn allows the process
to automatically give itself real-time priority after startup. To avoid any
security impact by using setuid, the real-time priority
is set in the very beginning and the process falls back to the effective uid
once the priority has been changed. Other HADB processes will lower their
priority to timeshare priority.

If NSUP could not set the real-time
priority, it issues a warning, “Could not set realtime priority”
(unix: errno will be set to EPERM), which is written out
in ma.log file and continues without real-time priority.

There are cases where it is not possible to set real-time priorities;
for example:

When installed in Solaris 10 non-global zones

When PRIV_PROC_LOCK_MEMORY (Allow a process
to lock pages in physical memory) and/or PRIV_PROC_PRIOCNTL privileges
are revoked in Solaris 10

Users turn off setuid permission

Users install the software as tar files (nonroot install option
for the App.server)

The clu_nsup_srv process is not CPU consuming, its
footprint is small and running it with real-time priority will not impact
performance.

Sun recommends that Solaris hosts running
HADB be set up with network multipathing in order to ensure the highest possible
network availability. Network multipathing setup is covered in detail in the IP Network Multipathing Administration Guide. If you decide to
use multipathing with HADB, refer to the Administering Network Multipathing
section of the IP Network Multipathing Administration Guide in
order to set up multipathing before you proceed with adapting the multipathing
setup for HADB as described below. The IP Network Multipathing
Administration Guide is part of the Solaris 9 System Administrator
Collection, and can be downloaded from http://docs.sun.com.

Set network interface failure detection
time

For HADB to properly support multipathing failover,
the network interface failure detection time must not exceed 1000 milliseconds
as specified by the FAILURE_DETECTION_TIME parameter in /etc/default/mpathd. Edit the file and change the value of this
parameter to 1000 if the original value is higher:

FAILURE_DETECTION_TIME=1000

In order for the change to take effect, issue the following command:

pkill -HUP in.mpathd

IP addresses to use with HADB

As described in the Solaris IP Network Multipathing Administration
Guide, multipathing involves grouping physical network interfaces
into multipath interface groups. Each physical interface in such a group has
two IP addresses associated with it: a physical interface address and a test
address. Only the physical interface address can be used for transmitting
data, while the test address is for Solaris internal use only. When hadbm
create --hosts is run, each host should be specified with only one
physical interface address from the multipath group.

Example

Assume
that Host 1 and Host 2 have two physical network interfaces each. On each
host, these two interfaces are set up as a multipath group, and running ifconfig -a yields the following:

Here, the physical network interfaces on both hosts are the ones listed
as bge0 and bge1. The ones listed as bge0:1 and bge1:1 are multipath test interfaces
(they are thus marked as DEPRECATED in the ifconfig output),
as described in the IP Network Multipathing Administration Guide.

To set up HADB in this environment, select one physical interface
address from each host. In this example. we choose 129.159.115.10 from
host 1 and 129.159.115.20 from host 2. To create a database
with one database node per host, use the following argument to hadbm
create:

--host 129.159.115.10,129.159.115.20

To create a database with two database nodes on each host, use the following
argument:

--host 129.159.115.10,129.159.115.20,129.159.115.10,129.159.115.20

In both cases, the ma.server.mainternal.interfaces variable
on both hosts should be set to 129.159.115.0/24.

Online Upgrade from 4.4.1 to 4.4.2

It is not possible to upgrade from 4.2 or 4.3
to 4.4 online. However, 4.4 supports online upgrade for the future
versions. To upgrade from 4.4.1 to 4.4.2, perform the following steps:

Install 4.4.2 on all HADB hosts (On another path than that
of 4.4.1 – for instance /opt/SUNWhadb/4.4.2-6).

Install the new version on the hadbm client hosts.

Stop all management agents running on the HADB hosts.

Start the management agent processes using the new version's
software, but with the old configuration files. In the remaining steps, please
use the hadbm command found in the new version's bin directory.

Register the package in the management domain (default package
name here becomes V4.4, so another package name may be
required to avoid conflicts with existing packages having the same name):

hadbm registerpackage --packagepath=/opt/SUNWhadb/4.4.2-6 V4.4.2

Restart the database with the new version (the following command
does a rolling restart of the nodes):

hadbm set packagename=V4.4.2 database_name

Check that the database status is “running” (using
the command hadbm status) and that it functions normally,
serving the client transactions.

If everything works, the old installation can be removed later.

Before unregistering the old package, remove all references to the old
package from the ma repository. Otherwise, hadbm
unregisterpackage will fail with “package in use.” A
dummy reconfiguration operation, for instance, hadbm set connectiontrace=<same_as_previous_value> will remove all references to the old
package. Now, unregister the old package: