You can use SunPlexTM Manager to install and configure
this data service. See the SunPlex Manager online help for details.

Overview of the Installation and Configuration
Process for Sun Cluster HA for Oracle

The following table summarizes the tasks for installing and configuring Sun Cluster HA for Oracle.
The table also provides cross-references to detailed instructions for performing the
tasks. Perform these tasks in the order that they are listed.

Planning the Sun Cluster HA for Oracle Installation and Configuration

This section contains the information that you need to plan your Sun Cluster HA for Oracle installation
and configuration.

Configuration Requirements

Caution –

Your data service configuration might not be supported if you do not
adhere to these requirements.

Use the requirements in this section to plan the installation and configuration
of Sun Cluster HA for Oracle. These requirements apply to Sun Cluster HA for Oracle only. You must meet these
requirements before you proceed with your Sun Cluster HA for Oracle installation and configuration.

Oracle application files – These
files include Oracle binaries, configuration files, and parameter files. You can install
these files either on the local file system, the highly available local file system,
or on the cluster file system.

Database-related files – These files include the control file, redo logs, and data files. You must
install these files on the highly available local file system or the cluster file
system as either raw devices or regular files.

Preparing the Nodes and Disks

This section contains the procedures that you need to prepare the nodes and
disks.

How to Prepare the Nodes

Use this procedure to prepare for the installation and configuration of Oracle
software.

Caution –

Perform all of the steps in this section on all of the nodes. If you
do not perform all of the steps on all of the nodes, the Oracle installation is incomplete.
An incomplete Oracle installation causes Sun Cluster HA for Oracle to fail during startup.

Note –

Consult the Oracle documentation before you perform this procedure.

The following steps prepare your nodes and install the Oracle software.

If you use the Solstice DiskSuiteTM/Solaris Volume Manager software, configure the Oracle
software to use UNIX file system (UFS) logging on mirrored metadevices or raw-mirrored
metadevices. See the Solstice DiskSuite/Solaris Volume Manager documentation for more information about how to configure
raw-mirrored metadevices.

Prepare the $ORACLE_HOME directory
on a local or multihost disk.

Note –

If you install the Oracle binaries on a local disk, use a separate disk
if possible. Installing the Oracle binaries on a separate disk prevents the binaries
from overwrites during operating environment reinstallation.

On each node, create an entry for the database administrator
(DBA) group in the /etc/group file, and add potential users
to the group.

You typically name the DBA group dba.
Verify that the root and oracle users
are members of the dba group, and add entries as necessary
for other DBA users. Ensure that the group IDs are the same on all of the nodes that
run Sun Cluster HA for Oracle, as the following example illustrates.

dba:*:520:root,oracle

You can create group entries in a network name service (for example, NIS or
NIS+). If you create group entries in this way, add your entries to the local /etc/inet/hosts file to eliminate dependency on the network name service.

On each node, create an entry for the Oracle user
ID (oracle).

You typically name the Oracle
user ID oracle. The following command updates the /etc/passwd and /etc/shadow files with an entry for
the Oracle user ID.

# useradd -u 120 -gdba-d /Oracle-home oracle

Ensure that the oracle user entry is the same on
all of the nodes that run Sun Cluster HA for Oracle.

How to Configure Oracle Database Access With Solstice DiskSuite

Use this procedure to configure the Oracle database with Solstice DiskSuite volume
manager.

Steps

Configure the disk devices for the Solstice DiskSuite software
to use.

Regardless
of where you install the Oracle software, modify each node's /etc/system files as you would in standard Oracle installation procedures. Then reboot.

Log in as oracle to ensure ownership of the
entire directory before you perform this step. See the appropriate Oracle installation
and configuration guides for instructions about how to install Oracle software.

(Optional) If you are using Sun Cluster HA for Oracle with
Oracle 10g, prevent the Oracle cssd daemon from being started.

Remove the entry for the Oracle cssd daemon from the /etc/inittab file on the node where the Oracle software is installed. To
remove this entry, remove the following line from the /etc/inittab file:

h1:23:respawn:/etc/init.d/init.cssd run >/dev/null 2>&1 > </dev/null

Sun Cluster HA for Oracle does not require the Oracle cssd daemon.
Therefore, removal of this entry does not affect the operation
of Oracle 10g with Sun Cluster HA for Oracle. If your Oracle installation changes so that
the Oracle cssd daemon is required, restore the entry for this
daemon to the /etc/inittab file.

Caution –

If you are using Oracle 10g Real Application Clusters, do not remove
the entry for the cssd daemon from the /etc/inittab file.

If you remove the entry for the Oracle cssd daemon from the /etc/inittab file, you prevent unnecessary error messages from being displayed.
Otherwise, an attempt by the init(1M) command
to start the Oracle cssd daemon might cause such error messages
to be displayed. These error messages are displayed if the Oracle binary files are
installed on a highly available local file system or on the cluster file system. The
messages are displayed repeatedly until the file system where the Oracle binary files
are installed is mounted.

If you are using Sun Cluster HA for Oracle on the x86 platform, unnecessary error messages
about the unavailability of the UNIX Distributed Lock Manager (Oracle UDLM) might also be displayed.

These messages are displayed if the following events occur:

A node is running in noncluster mode. In this situation, file systems
that Sun Cluster controls are never mounted.

A node is booting. In this situation, the messages are displayed repeatedly
until Sun Cluster mounts the file system where the Oracle binary files are installed.

Oracle is started on or fails over to a node where the Oracle installation
was not originally run. In such a configuration, the Oracle binary
files are installed on a highly available local file system. In this situation, the
messages are displayed on the console of the node where the Oracle installation was
run.

Verifying the Oracle Installation and Configuration

This section contains the procedure that you need to verify the Oracle installation
and configuration.

How to Verify the Oracle Installation

This procedure does not verify that your application is highly available because
you have not yet installed your data service.

Steps

Confirm that the owner, group, and mode of the $ORACLE_HOME/bin/oracle file are as follows:

Owner: oracle

Group: dba

Mode: -rwsr-s--x

# ls -l $ORACLE_HOME/bin/oracle

Verify that the listener binaries exist in the $ORACLE_HOME/bin directory.

Next Steps

Creating an Oracle Database

This section contains the procedure to configure and create the initial Oracle
database in a Sun Cluster environment. If you create and configure additional databases,
omit the procedure How to Create an Oracle Database.

How to Create an Oracle Database

Steps

Prepare database configuration files.

Place all of the database-related files (data files, redo log files, and control files)
on either shared raw global devices or on the cluster file system. See Preparing the Nodes and Disks for information about installation locations.

Within the init$ORACLE_SID.ora or config$ORACLE_SID.ora file, you might need to modify the assignments for control_files and background_dump_dest to specify the locations of
the control files and alert files.

Note –

If you use Solaris authentication for database logins, set the remote_os_authent variable in the init$ORACLE_SID.ora file
to True.

Start the creation of the database by using one utility from
the following list:

The Oracle installer

The Oracle sqlplus(1M) command

During creation, ensure that all of the database-related files are placed in
the appropriate location, either on shared global devices or on the cluster file system.

Verify that the file names of your control files
match the file names in your configuration files.

Create the v$sysstat view.

Run the catalog scripts that create the v$sysstat view. The Sun Cluster HA for Oracle fault
monitor uses this view.

prefix is the setting of the os_authent_prefix parameter. The default setting of this parameter is ops$.

user is the user for whom you are enabling
Solaris authentication. Ensure that this user owns the files under the $ORACLE_HOME directory.

Note –

Do not type a space between prefix and user.

Configure NET8 for the Sun Cluster software.

The listener.ora file must be accessible from all of the
nodes that are in the cluster. Place these files either under the cluster file system
or in the local file system of each node that can potentially run the Oracle resources.

Note –

If you place the listener.ora file in a location
other than the /var/opt/oracle directory or the $ORACLE_HOME/network/admin directory, you must specify the TNS_ADMIN variable
or an equivalent Oracle variable in a user-environment file. For information about
Oracle variables, see the Oracle documentation. You must also run the scrgadm(1M) command to set the resource extension parameter
User_env, which sources the user-environment file. See SUNW.oracle_listener Extension Properties or SUNW.oracle_server Extension Properties for format details.

Sun Cluster HA for Oracle imposes no restrictions on the listener name—it can be any
valid Oracle listener name.

The following code sample identifies the lines
in listener.ora that are updated.

Next Steps

Installing the Sun Cluster HA for Oracle Packages

If you did not install the Sun Cluster HA for Oracle packages during your
initial Sun Cluster installation, perform this procedure to install the packages.
Perform this procedure on each cluster node where you are installing the Sun Cluster HA for Oracle packages.
To complete this procedure, you need the Sun Cluster Agents CD-ROM.

Install the Sun Cluster HA for Oracle packages by using one of the following installation
tools:

The Web Start program

The scinstall utility

Note –

If you
are using Solaris 10, install these packages only in the global
zone. To ensure that these packages are not propagated to any local zones that are
created after you install the packages, use the scinstall utility
to install these packages. Do not use the Web Start program.

How to Install the Sun Cluster HA for Oracle Packages by Using
the Web Start Program

You can run the Web Start program with a command-line interface
(CLI) or with a graphical user interface (GUI). The content and sequence of instructions
in the CLI and the GUI are similar. For more information about the Web Start program,
see the installer(1M) man page.

Steps

On the cluster node where you are installing
the Sun Cluster HA for Oracle packages, become superuser.

(Optional) If you intend
to run the Web Start program with a GUI, ensure that your DISPLAY environment variable is set.

Insert the Sun Cluster Agents CD-ROM into the CD-ROM
drive.

If the Volume Management daemon vold(1M) is
running and configured to manage CD-ROM devices, it automatically mounts the CD-ROM
on the /cdrom/cdrom0 directory.

Change to the Sun Cluster HA for Oracle component
directory of the CD-ROM.

The Web Start program for the Sun Cluster HA for Oracle data
service resides in this directory.

# cd /cdrom/cdrom0/components/SunCluster_HA_Oracle_3.1

Start the Web Start program.

# ./installer

When you are prompted, select the type of installation.

To install only
the C locale, select Typical.

To install other locales, select Custom.

Follow the instructions on the screen
to install the Sun Cluster HA for Oracle packages on the node.

After
the installation is finished, the Web Start program provides an installation
summary. This summary enables you to view logs that the Web Start program
created during the installation. These logs are located in the /var/sadm/install/logs directory.

Exit the Web Start program.

Remove the Sun Cluster Agents CD-ROM from the CD-ROM
drive.

To ensure that the CD-ROM is not being used, change to a directory that
does not reside on the CD-ROM.

SUNW.oracle_server Extension Properties describes
the extension properties that you can set for the Oracle server. For the Oracle server,
you are required to set only the following extension properties:

ORACLE_HOME

ORACLE_SID

Alert_log_file

Connect_string

How to Register and Configure Sun Cluster HA for Oracle

Use this procedure to configure Sun Cluster HA for Oracle as a failover data service. This
procedure assumes that you installed the data service packages during your initial Sun Cluster installation.
If you did not install the Sun Cluster HA for Oracle packages as part of your initial Sun Cluster installation,
go to Installing the Sun Cluster HA for Oracle Packages to install the data service packages. Otherwise, use this procedure to configure
the Sun Cluster HA for Oracle.

You must have the following information to perform this procedure.

The names of the cluster nodes that master the data service.

The network resource that clients use to access the data service.
Normally, you set up this IP address when you install the cluster. See the Sun Cluster Concepts Guide for Solaris
OS for details about network resources.

The path to the Oracle application binaries for the resources that
you plan to configure.

Steps

Become superuser on a cluster member.

Run the scrgadm command to register
the resource types for the data service.

For Sun Cluster HA for Oracle, you register
two resource types, SUNW.oracle_server and SUNW.oracle_listener, as follows.

Create a failover resource group to hold the network
and application resources.

You can optionally select the set of nodes
on which the data service can run with the -h option, as follows.

# scrgadm-a-gresource-group [-hnodelist]

-gresource-group

Specifies the name of the resource group. This name can be your choice
but must be unique for resource groups within the cluster.

-hnodelist

Specifies an optional comma-separated list of physical node names
or IDs that identify potential masters. The order here determines the order in which
the nodes are considered as primary during failover.

Note –

Use the -h option to specify the order of the node
list. If all of the nodes that are in the cluster are potential masters, you do not
need to use the -h option.

Verify that all of the network resources that you
use have been added to your name service database.

You should have performed
this verification during the Sun Cluster installation.

Note –

Ensure that all of the network resources are present in the server's and
client's /etc/inet/hosts file to avoid any failures because of
name service lookup.

Add a network resource to the failover resource group.

# scrgadm-a-L-gresource-group-llogical-hostname [-nnetiflist]

-llogical-hostname

Specifies a network resource. The network resource is the logical
hostname or shared address (IP address) that clients use to access Sun Cluster HA for Oracle.

[-nnetiflist]

Specifies an optional, comma-separated list that identifies the IP Networking Multipathing groups
that are on each node. Each element in netiflist must be
in the form of netif@node. netif can be given
as an IP Networking Multipathing group name, such as sc_ipmp0. The node can be
identified by the node name or node ID, such as sc_ipmp0@1 or sc_ipmp@phys-schost-1.

Note –

Sun Cluster does not currently support the use of the adapter name for netif.

AffinityOn must be set to TRUE and
the local file system must reside on global disk groups to be failover.

Run the scswitch command
to complete the following tasks and bring the resource group oracle-rg online on a
cluster node.

Caution –

Be sure to switch only at the resource group level. Switching at the
device group level confuses the resource group, causing it to fail over.

Move the resource group into a MANAGED state.

Bring the resource group online.

This node is made the primary for device group ora-set1 and
raw device /dev/global/dsk/d1. Device groups that are associated
with file systems such as /global/ora-inst and /global/ora-data/logs are also made primaries on this node.

Specifies the name of the resource group into which the resources
are to be placed.

-tSUNW.oracle_server/listener

Specifies the type of the resource to add.

-xAlert_log_file=path-to-log

Sets the path under $ORACLE_HOME for the server
message log.

-xConnect_string=user/passwd

Specifies the user and password that the fault monitor uses to connect
to the database. These settings must agree with the permissions that you set up in How to Set Up Oracle Database Permissions. If you use
Solaris authorization, type a slash (/) instead of the user name
and password.

-xORACLE_SID=instance

Sets the Oracle system identifier.

-xLISTENER_NAME=listener

Sets the name of the Oracle listener instance. This name must match
the corresponding entry in listener.ora.

-xORACLE_HOME=Oracle-home

Sets the path to the Oracle home directory.

-xRestart_type=entity-to-restart

Specifies the entity that the server fault monitor restarts when the
response to a fault is restart. Set entity-to-restart as
follows:

To specify that only this resource is restarted, set entity-to-restart to RESOURCE_RESTART. By default,
only this resource is restarted.

To specify that all resources in the resource group that contains
this resource are restarted, set entity-to-restart to RESOURCE_GROUP_RESTART.

If you set entity-to-restart to RESOURCE_GROUP_RESTART, all other resources (such
as Apache or DNS) in the resource group are restarted, even if they are not faulty.
Therefore, include in the resource group only the resources that you require to be
restarted when the Oracle server resource is restarted.

Where to Go From Here

Verifying the Sun Cluster HA for Oracle Installation

Perform the following verification tests to make sure that you have correctly
installed Sun Cluster HA for Oracle.

These sanity checks ensure that all of the nodes that run Sun Cluster HA for Oracle can
start the Oracle instance and that the other nodes in the configuration can access
the Oracle instance. Perform these sanity checks to isolate any problems in starting
the Oracle software from Sun Cluster HA for Oracle.

How to Verify the Sun Cluster HA for Oracle Installation

Steps

Log in as oracle to the
node that currently masters the Oracle resource group.

Set the environment variables ORACLE_SID and ORACLE_HOME.

Confirm that you can start the Oracle instance from
this node.

Confirm that you can connect to the Oracle instance.

Use the sqlplus command with the user/password variable that is defined in the connect_string property.

# sqlplususer/passwd@tns_service

Shut down the Oracle instance.

The Sun Cluster software
restarts the Oracle instance because the Oracle instance is under Sun Cluster control.

Switch the resource group that contains the Oracle
database resource to another cluster member.

The following example shows
how to complete this step.

# scswitch -z -gresource-group-hnode

Log in as oracle to the
node that now contains the resource group.

Repeat Step 3 and Step 4 to confirm interactions
with the Oracle instance.

Oracle Clients

Clients must always refer to the database by using the network resource, not
the physical hostname. The network resource is an IP address that can move between
physical nodes during failover. The physical hostname is a machine name.

For example, in the tnsnames.ora file, you must specify
the network resource as the host on which the database instance is running. The network
resource is a logical hostname or a shared address. See How to Set Up Oracle Database Permissions.

Note –

Oracle client-server connections cannot survive a Sun Cluster HA for Oracle switchover.
The client application must be prepared to handle disconnection and reconnection or
recovery as appropriate. A transaction monitor might simplify the application. Further, Sun Cluster HA for Oracle node
recovery time is application dependent.

Location of Sun Cluster HA for Oracle Log Files

Each instance of the Sun Cluster HA for Oracle data service maintains log files in subdirectories
of the /var/opt/SUNWscor directory.

The /var/opt/SUNWscor/oracle_server directory contains
log files for the Oracle server.

The /var/opt/SUNWscor/oracle_listener directory
contains log files for the Oracle listener.

These files contain information about actions that the Sun Cluster HA for Oracle data
service performs. Refer to these files to obtain diagnostic information for troubleshooting
your configuration or to monitor the behavior of the Sun Cluster HA for Oracle data service.

Tuning the Sun Cluster HA for Oracle Fault Monitors

Fault monitoring for the Sun Cluster HA for Oracle data service is provided by the following
fault monitors:

The Oracle server fault monitor

The Oracle listener fault monitor

Each fault monitor is contained in a resource
whose resource type is shown in the following table.

Table 2 Resource Types for Sun Cluster HA for Oracle Fault
Monitors

Fault Monitor

Resource Type

Oracle server

SUNW.oracle_server

Oracle listener

SUNW.oracle_listener

System properties and extension properties of these resources
control the behavior of the fault monitors. The default values of these properties
determine the preset behavior of the fault monitors. The preset behavior should be
suitable for most Sun Cluster installations. Therefore, you should tune the Sun Cluster HA for Oracle fault
monitors only if you need to modify this preset behavior.

Tuning the Sun Cluster HA for Oracle fault monitors involves the following tasks:

Operation of the Oracle Server Fault Monitor

The fault monitor for the Oracle server uses a request to the server to query
the health of the server.

The server fault monitor is started through pmfadm to make
the monitor highly available. If the monitor is killed for any reason, the Process
Monitor Facility (PMF) automatically restarts the monitor.

The server fault monitor consists of the following processes.

A main fault monitor process

A database client fault probe

Operation of the Main Fault Monitor

The main fault monitor determines that an operation is successful if the database
is online and no errors are returned during the transaction.

Operation of the Database Client Fault Probe

The database client fault probe performs the following operations:

Monitoring the partition for archived redo logs

If the partition is healthy, determining whether the database is operational

The probe uses the timeout value that is set in the resource property Probe_timeout to determine how much time to allocate to successfully probe
Oracle.

Operations to Monitor the Partition for Archived Redo Logs

The database client fault probe queries the dynamic performance view v$archive_dest to determine all possible destinations for archived redo
logs. For every active destination, the probe determines whether the destination is
healthy and has sufficient free space for storing archived redo logs.

If the destination is healthy, the probe determines the amount of
free space in the destination's file system. If the amount of free space is less than
10% of the file system's capacity and is less than 20 Mbytes, the probe prints a message
to syslog.

If the destination is in ERROR status, the probe
prints a message to syslog and disables operations to determine
whether the database is operational. The operations remain disabled until the error
condition is cleared .

Operations to Determine Whether the Database is Operational

If the partition for archived redo logs is healthy, the database client fault
probe queries the dynamic performance view v$sysstat to obtain
database performance statistics. Changes to these statistics indicate that the database
is operational. If these statistics remain unchanged between consecutive queries,
the fault probe performs database transactions to determine if the database is operational.
These transactions involve the creation, updating, and dropping of a table in the
user table space.

The database client fault probe performs all its transactions as the Oracle
user. The ID of this user is specified during the preparation of the nodes as explained
in How to Prepare the Nodes.

Actions by the Server Fault Monitor in Response to
a Database Transaction Failure

If the action requires an external program to be run, the program is run as
a separate process in the background.

Possible actions are as follows:

Ignore. The server fault monitor
ignores the error.

Stop monitoring. The server fault
monitor is stopped without shutting down the database.

Restart. The server fault monitor
stops and restarts the entity that is specified by the value of the Restart_type extension property:

If the Restart_type extension property is set
to RESOURCE_RESTART, the server fault monitor restarts the database
server resource. By default, the server fault monitor restarts the database server
resource.

If the Restart_type extension property is set to RESOURCE_GROUP_RESTART, the server fault monitor restarts the database server
resource group.

Note –

The number of attempts to restart might exceed the value of the Retry_count resource property within the time that the Retry_interval resource property specifies. If this situation occurs, the server fault
monitor attempts to switch over the resource group to another node.

Switch over. The server fault monitor
switches over the database server resource group to another node. If no nodes are
available, the attempt to switch over the resource group fails. If the attempt to
switch over the resource group fails, the database server is restarted.

Scanning of Logged Alerts by the Server Fault Monitor

The Oracle software logs alerts in an alert log file. The absolute path of this
file is specified by the alert_log_file extension property of the SUNW.oracle_server resource. The server fault monitor scans the alert log
file for new alerts at the following times:

When the server fault monitor is started

Each time that the server fault monitor queries the health of the
server

If an action is defined for a logged alert that the server fault monitor detects,
the server fault monitor performs the action in response to the alert.

Operation of the Oracle Listener Fault Monitor

The Oracle listener fault monitor checks the status of an Oracle listener.

If the listener is running, the Oracle listener fault monitor considers a probe
successful. If the fault monitor detects an error, the listener is restarted.

Note –

The listener resource does not provide a mechanism for setting the listener
password. If Oracle listener security is enabled, a probe by the listener fault monitor
might return Oracle error TNS-01169. Because the listener is able to respond, the
listener fault monitor treats the probe as a success. This action does not cause a
failure of the listener to remain undetected. A failure of the listener returns a
different error, or causes the probe to time out.

The listener probe is started through pmfadm to make the
probe highly available. If the probe is killed, PMF automatically restarts the probe.

If a problem occurs with the listener during a probe, the probe tries to restart
the listener. The value that is set in the resource property Retry_count determines the maximum number of times that the probe attempts the restart.
If, after trying for the maximum number of times, the probe is still unsuccessful,
the probe stops the fault monitor and does not switch over the resource group.

Customizing the Sun Cluster HA for Oracle Server Fault Monitor

Customizing the Sun Cluster HA for Oracle server fault monitor enables you to modify the
behavior of the server fault monitor as follows:

Overriding the preset action for an error

Specifying an action for an error for which no action is preset

Caution –

Before you customize
the Sun Cluster HA for Oracle server fault monitor, consider the effects of your customizations,
especially if you change an action from restart or switch over to ignore or stop monitoring.
If errors remain uncorrected for long periods, the errors might cause problems with
the database. If you encounter problems with the database after customizing the Sun Cluster HA for Oracle server
fault monitor, revert to using the preset actions. Reverting to the preset actions
enables you to determine if the problem is caused by your customizations.

Specifying the custom action file that a server fault monitor should
use

Defining Custom Behavior for Errors

The Sun Cluster HA for Oracle server
fault monitor detects the following types of errors:

DBMS errors that occur during a probe of the database by the server
fault monitor

Alerts that Oracle logs in the alert log file

Timeouts that result from a failure to receive a response within the
time that is set by the Probe_timeout extension property

To define custom behavior for these types of errors, create a custom action
file.

Custom Action File Format

A custom action file is a plain
text file. The file contains one or more entries that define the custom behavior of
the Sun Cluster HA for Oracle server fault monitor. Each entry defines the custom behavior for
a single DBMS error, a single timeout error, or several logged alerts. A maximum of
1024 entries is allowed in a custom action file.

Note –

Each entry in a custom action file overrides the preset action for an
error, or specifies an action for an error for which no action is preset. Create entries
in a custom action file only for the preset actions that you
are overriding or for errors for which no action is preset. Do not create
entries for actions that you are not changing.

An entry in a custom action file consists of a sequence of keyword-value pairs
that are separated by semicolons. Each entry is enclosed in braces.

White space may be used between separated keyword-value pairs and between entries
to format the file.

The meaning and permitted values of the keywords in a custom action file
are as follows:

ERROR_TYPE

Indicates the type of the error that the server fault monitor has
detected. The following values are permitted for this keyword:

DBMS_ERROR

Specifies that the error is a DBMS error.

SCAN_LOG

Specifies that the error is an alert that is logged in the alert log
file.

TIMEOUT_ERROR

Specifies that the error is a timeout.

The ERROR_TYPE keyword is optional. If you omit this keyword,
the error is assumed to be a DBMS error.

ERROR

Identifies the error. The data type and the meaning of error-spec are determined by the value of the ERROR_TYPE keyword
as shown in the following table.

ERROR_TYPE

Data Type

Meaning

DBMS_ERROR

Integer

The error number of a DBMS error that is generated by Oracle

SCAN_LOG

Quoted regular expression

A string in an error message that Oracle has logged to the Oracle alert log
file

TIMEOUT_ERROR

Integer

The number of consecutive timed-out probes since the server fault monitor was
last started or restarted

You must specify the ERROR keyword. If you
omit this keyword, the entry in the custom action file is ignored.

ACTION

Specifies the action that the server fault monitor is to perform in
response to the error. The following values are permitted for this keyword:

NONE

Specifies that the server fault monitor ignores the error.

STOP

Specifies that the server fault monitor is stopped.

RESTART

Specifies that the server fault monitor stops and restarts the entity
that is specified by the value of the Restart_type extension property
of the SUNW.oracle_server resource.

SWITCH

Specifies that the server fault monitor switches over the database
server resource group to another node.

The ACTION keyword is optional. If you omit this keyword,
the server fault monitor ignores the error.

CONNECTION_STATE

Specifies the required state of the connection between the database
and the server fault monitor when the error is detected. The entry applies only if
the connection is in the required state when the error is detected. The following
values are permitted for this keyword:

*

Specifies that the entry always applies, regardless of the state of
the connection.

co

Specifies that the entry applies only if the server fault monitor
is attempting to connect to the database.

on

Specifies that the entry applies only if the server fault monitor
is online. The server fault monitor is online if it is connected to the database.

di

Specifies that the entry applies only if the server fault monitor
is disconnecting from the database.

The CONNECTION_STATE keyword is optional. If you omit this
keyword, the entry always applies, regardless of the state of the connection.

NEW_STATE

Specifies the state of the connection between the database and the
server fault monitor that the server fault monitor must attain after the error is
detected. The following values are permitted for this keyword:

*

Specifies that the state of the connection must remain unchanged.

co

Specifies that the server fault monitor must disconnect from the database
and reconnect immediately to the database.

di

Specifies that the server fault monitor must disconnect from the database.
The server fault monitor reconnects when it next probes the database.

The NEW_STATE keyword is optional. If you omit this keyword,
the state of the database connection remains unchanged after the error is detected.

MESSAGE

Specifies
an additional message that is printed to the resource's log file when this error is
detected. The message must be enclosed in double quotes. This message is additional
to the standard message that is defined for the error.

The MESSAGE keyword is optional. If you omit this keyword,
no additional message is printed to the resource's log file when this error is detected.

Changing the Response to a DBMS Error

The action that the server fault monitor performs in response to each DBMS error
is preset as listed in Table 1.
To determine whether you need to change the response to a DBMS error, consider the
effect of DBMS errors on your database to determine if the preset actions are appropriate.
For examples, see the subsections that follow.

To change the response to a DBMS error, create an entry in a custom action file
in which the keywords are set as follows:

ERROR_TYPE is set to DBMS_ERROR.

ERROR is set to the error number of the DBMS error.

ACTION is set to the action that you require.

Responding to an Error Whose Effects Are Major

If an error that the server fault monitor ignores affects more than one session,
action by the server fault monitor might be required to prevent a loss of service.

For example, no action is preset
for Oracle error 4031: unable to allocatenum-bytesbytes of shared memory. However, this Oracle error
indicates that the shared global area (SGA) has insufficient memory, is badly fragmented,
or both states apply. If this error affects only a single session, ignoring the error
might be appropriate. However, if this error affects more than one session, consider
specifying that the server fault monitor restart the database.

The following example shows an entry in a custom action file for changing the
response to a DBMS error to restart.

Example 2 Changing the Response to a DBMS Error to Restart

This example shows an entry in a custom action file that overrides the preset
action for DBMS error 4031. This entry specifies the following behavior:

In response to DBMS error 4031, the action that the server fault monitor
performs is restart.

This entry applies regardless of the state of the connection between
the database and the server fault monitor when the error is detected.

The state of the connection between the database and the server fault
monitor must remain unchanged after the error is detected.

The following message is printed to the resource's log file when this
error is detected:

Insufficient memory in shared pool.

Ignoring an Error Whose Effects Are Minor

If the effects of an error to which the server fault monitor responds are minor,
ignoring the error might be less disruptive than responding to the error.

For example, the preset action for Oracle
error 4030: out of process memory when trying to allocatenum-bytesbytes is restart. This Oracle error indicates
that the server fault monitor could not allocate private heap memory. One possible
cause of this error is that insufficient memory is available to the operating system.
If this error affects more than one session, restarting the database might be appropriate.
However, this error might not affect other sessions because these sessions do not
require further private memory. In this situation, consider specifying that the server
fault monitor ignore the error.

The following example shows an entry in a custom action file for ignoring a
DBMS error.

Example 3 Ignoring a DBMS Error

This example shows an entry in a custom action file that overrides the preset
action for DBMS error 4030. This entry specifies the following behavior:

The server fault monitor ignores DBMS error 4030.

This entry applies regardless of the state of the connection between
the database and the server fault monitor when the error is detected.

The state of the connection between the database and the server fault
monitor must remain unchanged after the error is detected.

No additional message is printed to the resource's log file when this
error is detected.

Changing the Response to Logged Alerts

The Oracle software logs alerts in a file that is identified by the Alert_log_file extension property. The server fault monitor scans this file
and performs actions in response to alerts for which an action is defined.

Logged alerts for which an action is preset are listed in Table 2. Change the response to logged alerts to change the preset action,
or to define new alerts to which the server fault monitor responds.

To change the response to logged alerts, create an entry in a custom action
file in which the keywords are set as follows:

ERROR_TYPE is set to SCAN_LOG.

ERROR is set to a quoted regular expression that
identifies a string in an error message that Oracle has logged to the Oracle alert
log file.

ACTION is set to the action that you require.

The server
fault monitor processes the entries in a custom action file in the order in which
the entries occur. Only the first entry that matches a logged alert is processed.
Later entries that match are ignored. If you are using regular expressions to specify
actions for several logged alerts, ensure that more specific entries occur before
more general entries. Specific entries that occur after general entries might be ignored.

For example, a custom action file might define different actions for errors
that are identified by the regular expressions ORA-65 and ORA-6. To ensure that the entry that contains the regular expression ORA-65 is not ignored, ensure that this entry occurs before the entry that
contains the regular expression ORA-6.

The following example shows an entry in a custom action file for changing the
response to a logged alert.

Example 4 Changing the Response to a Logged Alert

This example shows an entry in a custom action file that overrides the preset
action for logged alerts about internal errors. This entry specifies the following
behavior:

In response to logged alerts that contain the text ORA-00600:
internal error, the action that the server fault monitor performs is restart.

This entry applies regardless of the state of the connection between
the database and the server fault monitor when the error is detected.

The state of the connection between the database and the server fault
monitor must remain unchanged after the error is detected.

No additional message is printed to the resource's log file when this
error is detected.

Changing the Maximum Number of Consecutive Timed-Out
Probes

By default, the server fault monitor restarts the database after the second
consecutive timed-out probe. If the database is lightly loaded, two consecutive timed-out
probes should be sufficient to indicate that the database is hanging. However, during
periods of heavy load, a server fault monitor probe might time out even if the database
is functioning correctly. To prevent the server fault monitor from restarting the
database unnecessarily, increase the maximum number of consecutive timed-out probes.

Caution –

Increasing the maximum number of consecutive timed-out probes increases
the time that is required to detect that the database is hanging.

To change the maximum number of consecutive timed-out probes allowed, create
one entry in a custom action file for each consecutive timed-out probe that is allowed except the first timed-out probe.

Note –

You are not required to create an entry for the first timed-out probe.
The action that the server fault monitor performs in response to the first timed-out
probe is preset.

For the last allowed timed-out probe, create an entry in which the keywords
are set as follows:

ERROR_TYPE is set to TIMEOUT_ERROR.

ERROR is set to the maximum number of consecutive
timed-out probes that are allowed.

ACTION is set to RESTART.

For each remaining consecutive timed-out probe except the first timed-out probe,
create an entry in which the keywords are set as follows:

ERROR_TYPE is set to TIMEOUT_ERROR.

ERROR is set to the sequence number of the timed-out
probe. For example, for the second consecutive timed-out probe, set this keyword to
2. For the third consecutive timed-out probe, set this keyword to 3.

ACTION is set to NONE.

Tip –

To facilitate debugging, specify a message that indicates the sequence
number of the timed-out probe.

The following example shows the entries in a custom action file for increasing
the maximum number of consecutive timed-out probes to five.

This example shows the entries in a custom action file for increasing the maximum
number of consecutive timed-out probes to five. These entries specify the following
behavior:

The server fault monitor ignores the second consecutive timed-out
probe through the fourth consecutive timed-out probe.

In response to the fifth consecutive timed-out probe, the action that
the server fault monitor performs is restart.

The entries apply regardless of the state of the connection between
the database and the server fault monitor when the timeout occurs.

The state of the connection between the database and the server fault
monitor must remain unchanged after the timeout occurs.

When the second consecutive timed-out probe through the fourth consecutive
timed-out probe occurs, a message of the following form is printed to the resource's
log file:

Timeout #number has occurred.

When the fifth consecutive timed-out probe occurs, the following message
is printed to the resource's log file:

Timeout #5 has occurred. Restarting.

Propagating a Custom Action File to All Nodes in a
Cluster

A server fault monitor must behave consistently on all cluster nodes. Therefore,
the custom action file that the server fault monitor uses must be identical on all
cluster nodes. After creating or modifying a custom action file, ensure that this
file is identical on all cluster nodes by propagating the file to all cluster nodes.
To propagate the file to all cluster nodes, use the method that is most appropriate
for your cluster configuration:

Locating the file on a file system that all nodes share

Locating the file on a highly available local file system

Copying the file to the local file system of each cluster node by
using operating system commands such as the rcp(1) command
or the rdist(1) command

Specifying the Custom Action File That a Server Fault
Monitor Should Use

To apply customized actions to a server fault monitor, you must specify the
custom action file that the fault monitor should use. Customized actions are applied
to a server fault monitor when the server fault monitor reads a custom action file.
A server fault monitor reads a custom action file when the you specify the file.

Specifying a custom action file also validates the file. If the file contains
syntax errors, an error message is displayed. Therefore, after modifying a custom
action file, specify the file again to validate the file.

Caution –

If syntax errors in a modified custom action file are detected, correct
the errors before the fault monitor is restarted. If the syntax errors remain uncorrected
when the fault monitor is restarted, the fault monitor reads the erroneous file,
ignoring entries that occur after the first syntax error.

How to Specify the Custom Action File That a Server
Fault Monitor Should Use

Steps

On a cluster node, become superuser.

Set the Custom_action_file extension
property of the SUNW.oracle_server resource.

Set this
property to the absolute path of the custom action file.

# scrgadm -c -j server-resource\-x custom_action_file=filepath

-jserver-resource

Specifies the SUNW.oracle_server resource

-x custom_action_file=filepath

Specifies the absolute path of the custom action file

Upgrading Sun Cluster HA for Oracle Resource Types

The resource types for the Sun Cluster HA for Oracle data service are as follows:

SUNW.oracle_listener, which represents an Oracle listener

SUNW.oracle_server, which represents an Oracle server

Upgrade these resource types if all conditions in the following list apply:

You are upgrading from an earlier version of the Sun Cluster HA for Oracle data
service.

Upgrading the SUNW.oracle_listener Resource
Type

The information that you require to complete the upgrade of the SUNW.oracle_listener resource type is provided in the subsections that follow.

Information for Registering the New Resource Type
Version

The relationship between
the version of the SUNW.oracle_listener resource type and the release
of Sun Cluster data services is shown in the following table. The release of Sun Cluster data
services indicates the release in which the version of the resource type was introduced.
The table also summarizes the changes that were introduced in each new version.

To determine the version of the resource type that is registered, use one command
from the following list:

scrgadm -p

scrgadm -pv

The resource
type registration (RTR) file for this resource type is /opt/SUNWscor/oracle_listener/etc/SUNW.oracle_listener.

Information for Migrating Existing Instances of the
Resource Type

The information that you require to edit each instance of the SUNW.oracle_listener resource type is as follows:

You can perform the migration at any time.

If you need to use the features of the SUNW.oracle_listener resource type that were introduced in version 3.1 4/04, the required
value of the Type_version property is 4.

If you need to use the features of the SUNW.oracle_listener resource type that were introduced in version 3.1 8/05, the required
value of the Type_version property is 5.

If you need to specify the timeout value in seconds that the fault
monitor uses to probe an Oracle listener, set the Probe_timeout extension
property. For more information, see SUNW.oracle_listener Extension Properties.

Note –

If you are using version 4 of the SUNW.oracle_listener resource
type, upgrade to version 4 only if you require the new default
values. If the default values in version 4 are satisfactory, you do not need to upgrade.

The following example shows a command for editing an instance of the SUNW.oracle_listener resource type.

Example 6 Editing an Instance of the SUNW.oracle_listener Resource Type

# scrgadm -cj oracle-lrs -y Type_version=4 \-x probe_timeout=60

This command edits a SUNW.oracle_listener resource as follows:

The SUNW.oracle_listener resource is named oracle-lrs.

The Type_version property of this resource is set
to 4.

The timeout value in seconds that the fault monitor uses to probe
an Oracle listener is set to 60 seconds.

Upgrading the SUNW.oracle_server Resource
Type

The information that you require to complete the upgrade of the SUNW.oracle_server resource type is provided in the subsections that follow.

Information for Registering the New Resource Type
Version

The relationship between
the version of the SUNW.oracle_server resource type and the release
of Sun Cluster data services is shown in the following table. The release of Sun Cluster data
services indicates the release in which the version of the resource type was introduced.
The table also summarizes the changes that were introduced in each new version.

If you are using version 4 of the SUNW.oracle_server resource
type, upgrade to version 4 only if you require the new default
values. If the default values in version 4 are satisfactory, you do not need to upgrade.

The following example shows a command for editing an instance of the SUNW.oracle_server resource type.