About Oracle Clusterware

Oracle Real Application Clusters (Oracle RAC) uses Oracle Clusterware as the infrastructure that binds together multiple nodes that then operate as a single server. Oracle Clusterware is a portable cluster management solution that is integrated with Oracle Database. In an Oracle RAC environment, Oracle Clusterware monitors all Oracle components (such as instances and listeners). If a failure occurs, Oracle Clusterware automatically attempts to restart the failed component and also redirects operations to a surviving component.

Oracle Clusterware includes a high availability framework for managing any application that runs on your cluster. Oracle Clusterware manages applications to ensure they start when the system starts. Oracle Clusterware also monitors the applications to make sure that they are always available. For example, if an application process fails, then Oracle Clusterware attempts to restart the process based on scripts that you customize. If a node in the cluster fails, then you can program application processes that typically run on the failed node to restart on another node in the cluster.

Oracle Clusterware includes two important components: the voting disk and the OCR. The voting disk is a file that manages information about node membership, and the OCR is a file that manages cluster and Oracle RAC database configuration information.

The Oracle Clusterware installation process creates the voting disk and the OCR on shared storage. If you select the option for normal redundant copies during the installation process, then Oracle Clusterware automatically maintains redundant copies of these files to prevent the files from becoming single points of failure. The normal redundancy feature also eliminates the need for third-party storage redundancy solutions. When you use normal redundancy, Oracle Clusterware automatically maintains two copies of the OCR file and three copies of the voting disk file.

About Backing Up and Recovering Voting Disks

High availability configurations have redundant hardware and software that maintain operations by avoiding single points of failure. When a component is down, Oracle Clusterware redirects its managed resources to a backup component.

The voting disk records node membership information. A node must be able to access more than half of the voting disks at any time. To avoid simultaneous loss of multiple voting disks, each voting disk should be on a storage device that does not share any components (controller, interconnect, and so on) with the storage devices used for the other voting disks.

For example, if you have five voting disks configured, then a node must be able to access at least three of the voting disks at any time. If a node cannot access the minimum required number of voting disks it is evicted, or removed, from the cluster. After the cause of the failure has been corrected and access to the voting disks has been restored, you can instruct Oracle Clusterware to recover the failed node and restore it to the cluster.

Backing Up Voting Disks

Because the node membership information does not usually change, you do not need to back up the voting disk every day. However, back up the voting disks at the following times:

After installation

After adding nodes to or deleting nodes from the cluster

After performing voting disk add or delete operations

When you use the dd command for making backups of the voting disk, the backup can be performed while the Cluster Ready Services (CRS) process is active; you do not need to stop the crsd.bin process before taking a backup of the voting disk.

To make a backup copy of the voting disk:

Use the Linux dd command, as shown in the following example, where voting_disk_name is the name of the active voting disk and backup_file_name is the name of the file to which you want to back up the voting disk contents:

dd if=voting_disk_name of=backup_file_name

Perform this operation on every voting disk as needed.

If your voting disk is stored on a raw device, use the device name in place of voting_disk_name, for example:

dd if=/dev/sdd1 of=/tmp/voting.dmp

Recovering Voting Disks

If a voting disk is damaged, and no longer usable by Oracle Clusterware, you can recover the voting disk if you have a backup file.

To recover the voting disk from a backup:

Run the following command, where backup_file_name is the name of the voting disk backup file and voting_disk_name is the name of the active voting disk:

dd if=backup_file_name of=voting_disk_name

Adding and Removing Voting Disks

You can dynamically add and remove voting disks after installing Oracle RAC. Do this using the following commands where path is the fully qualified path for the additional voting disk. If the new voting disk is stored on a network file server (NFS), then create an empty voting disk file location with the correct owner and permissions before using the command to add a new voting disk.

To add or remove a voting disk:

Run the following command as the root user to add a voting disk:

crsctl add css votedisk path

Run the following command as the root user to remove a voting disk:

crsctl delete css votedisk path

Note:

If your cluster is down, then you can use the -force option to modify the voting disk configuration when using either of these commands without interacting with active Oracle Clusterware daemons. However, you may corrupt your cluster configuration if you use the -force option while a cluster node is active.

About Backing Up and Recovering the Oracle Cluster Registry

Oracle Clusterware automatically creates OCR backups every 4 hours. At any one time, Oracle Clusterware always retains the latest 3 backup copies of the OCR that are 4 hours old, 1 day old, and 1 week old.

You cannot customize the backup frequencies or the number of files that Oracle Clusterware retains. You can use any backup software to copy the automatically generated backup files at least once daily to a different device from where the primary OCR file resides. The default location for generating backups on Red Hat Linux systems is CRS_home/cdata/cluster_name where cluster_name is the name of your cluster and CRS_home is the home directory of your Oracle Clusterware installation.

Viewing Available OCR Backups

Use the ocrconfig utility to view the backups generated automatically by Oracle Clusterware.

To find the most recent backup of the OCR:

Run the following command on any node in the cluster:

ocrconfig -showbackup

Backing Up the OCR

Because of the importance of OCR information, Oracle recommends that you use the ocrconfig utility to make copies of the automatically created backup files at least once a day.

In addition to using the automatically created OCR backup files, you should also export the OCR contents to a file before and after making significant configuration changes, such as adding or deleting nodes from your environment, modifying Oracle Clusterware resources, or creating a database. Exporting the OCR contents to a file lets you restore the OCR if your configuration changes cause errors. For example, if you have unresolvable configuration problems, or if you are unable to restart your cluster database after such changes, then you can restore your configuration by importing the saved OCR content from the valid configuration.

To export the contents of the OCR to a file:

Log in as the root user.

Use the following command, where backup_file_name is the name of the OCR backup file you want to create:

[root]# ocrconfig -export backup_file_name

About Recovering the OCR

There are two methods for recovering the OCR. The first method uses automatically generated OCR file copies and the second method uses manually created OCR export files.

Checking the Status of the OCR

In event of a failure, before you attempt to restore the OCR, ensure that the OCR is unavailable.

To check the status of the OCR:

Run the following command:

ocrcheck

If this command does not display the message 'Device/File integrity check succeeded' for at least one copy of the OCR, then both the primary OCR and the OCR mirror have failed. You must restore the OCR from a backup.

If there is at least one copy of the OCR available, you can use that copy to restore the other copies of the OCR.

Restoring the OCR from Automatically Generated OCR Backups

When restoring the OCR from automatically generated backups, you first have to determine which backup file you will use for the recovery.

To restore the OCR from an automatically generated backup on a Red Hat Linux system:

Log in as the root user.

Identify the available OCR backups using the ocrconfig command:

[root]# ocrconfig -showbackup

Review the contents of the backup using the following ocrdump command, where file_name is the name of the OCR backup file:

[root]# ocrdump -backupfile file_name

As the root user, stop Oracle Clusterware on all the nodes in your Oracle RAC cluster by executing the following command:

[root]# crsctl stop crs

Repeat this command on each node in your Oracle RAC cluster.

As the root user, restore the OCR by applying an OCR backup file that you identified in Step 1 using the following command, where file_name is the name of the OCR that you want to restore. Make sure that the OCR devices that you specify in the OCR configuration exist, and that these OCR devices are valid before running this command.

[root]# ocrconfig -restore file_name

As the root user, restart Oracle Clusterware on all the nodes in your cluster by restarting each node, or by running the following command:

[root]# crsctl start crs

Repeat this command on each node in your Oracle RAC cluster.

Use the Cluster Verification Utility (CVU) to verify the OCR integrity. Run the following command, where the -n all argument retrieves a list of all the cluster nodes that are configured as part of your cluster:

[root]# cluvfy comp ocr -n all [-verbose]

Recovering the OCR from an OCR Export File

The ocrconfig -export command creates a backup of the OCR, enabling you to restore the OCR using the -import option if your configuration changes cause errors.

To restore the previous configuration stored in the OCR from an OCR export file:

Place the OCR export file that you created previously using the ocrconfig -export command in an accessible directory on disk.

As the root user, stop Oracle Clusterware on all the nodes in your Oracle RAC cluster by executing the following command:

[root]# crsctl stop crs

Repeat this command on each node in your Oracle RAC cluster.

As the root user, restore the OCR data by importing the contents of the OCR export file using the following command, where file_name is the name of the OCR export file:

[root]# ocrconfig -import file_name

As the root user, restart Oracle Clusterware on all the nodes in your cluster by restarting each node, or by running the following command:

[root]# crsctl start crs

Repeat this command on each node in your Oracle RAC cluster.

Use the CVU to verify the OCR integrity. Run the following command, where the -n all argument retrieves a list of all the cluster nodes that are configured as part of your cluster:

[root]# cluvfy comp ocr -n all [-verbose]

Note:

You cannot use the ocrconfig command to import an OCR backup file, only an OCR export file.

About Changing the Oracle Cluster Registry Configuration

This section describes how to administer the OCR. The OCR contains information about the cluster node list, which instances are running on which nodes, and information about Oracle Clusterware resource profiles for applications that have been modified to be managed by Oracle Clusterware.

The operations in this section affect the OCR for the entire cluster. However, the ocrconfig command cannot modify OCR configuration information for nodes that are shut down or for nodes on which Oracle Clusterware is not running. Avoid shutting down nodes while modifying the OCR using the ocrconfig command.

Adding an OCR Location

You can add an OCR location after an upgrade or after completing the Oracle RAC installation. If you already have a mirror of the OCR, then you do not need to add an OCR location; Oracle Clusterware automatically manages two OCRs when you configure normal redundancy for the OCR. Oracle RAC environments do not support more than two OCRs, a primary OCR and a secondary OCR.

To add a primary or secondary OCR location:

Run the following command using either destination_file or disk to designate the target location of the primary OCR:

ocrconfig -replace ocr destination_file
ocrconfig -replace ocr disk

Run the following command using either destination_file or disk to designate the target location of the secondary OCR:

Replacing an OCR

If you need to change the location of an existing OCR, or change the location of a failed OCR to the location of a working one, you can use the following procedure as long as one OCR file remains online.

To change the location of an OCR:

Use the OCRCHECK utility to verify that a copy of the OCR other than the one you are going to replace is online, using the following command:

ocrcheck

Note:

The OCR that you are replacing can be either online or offline.

Use the following command to verify that Oracle Clusterware is running on the node on which the you are going to perform the replace operation:

crsctl check crs

Run the following command to replace the primary OCR using either destination_file or disk to indicate the target OCR location:

ocrconfig -replace ocr destination_file
ocrconfig -replace ocr disk

Run the following command to replace a secondary OCR using either destination_file or disk to indicate the target OCR location:

If any node that is part of your current Oracle RAC cluster is shut down, then run the following command on the stopped node to let that node rejoin the cluster after the node is restarted:

ocrconfig -repair ocr [device_name]

Repairing an OCR Configuration on a Local Node

You may need to repair an OCR configuration on a particular node if your OCR configuration changes while that node is stopped. For example, you may need to repair the OCR on a node that was shut down while you were adding, replacing, or removing an OCR.

To repair an OCR configuration:

Run the following command on the node on which you have stopped the Oracle Clusterware daemon:

ocrconfig –repair ocrmirror [device_name]

Note:

You cannot perform this operation on a node on which the Oracle Clusterware daemon is running.

This operation changes the OCR configuration only on the node from which you run this command.

For example, if the OCR mirror is on a disk named /dev/raw1, then use the following command to repair its OCR configuration:

ocrconfig -repair ocrmirror /dev/raw1

Removing an OCR

To remove an OCR location, at least one OCR must be online. You can remove an OCR location to reduce OCR-related overhead or to stop mirroring your OCR because you moved the OCR to a redundant storage system, such as a redundant array of independent disks (RAID).

To remove an OCR location from your Oracle RAC cluster:

Use the OCRCHECK utility to ensure that at least one OCR other than the OCR that you are removing is online.

ocrcheck

Note:

Do not perform this OCR removal procedure unless there is at least one active OCR online.

Run the following command on any node in the cluster to remove one copy of the OCR:

ocrconfig -replace ocr

This command updates the OCR configuration on all the nodes on which Oracle Clusterware is running.

About Troubleshooting the Oracle Cluster Registry

This section includes the following topics about troubleshooting the Oracle Cluster Registry (OCR):

About the OCRCHECK Utility

The OCRCHECK utility displays the data block format version used by the OCR, the available space and used space in the OCR, the ID used for the OCR, and the locations you have configured for the OCR. The OCRCHECK utility calculates a checksum for all the data blocks in all the OCRs that you have configured to verify the integrity of each block. It also returns an individual status for each OCR file as well as a result for the overall OCR integrity check. The following is a sample of the OCRCHECK output: