11.1 Introduction to Oracle Clusterware

Oracle Clusterware manages the availability of user applications and Oracle databases in a clustered environment. In an Oracle Real Application Clusters (RAC) environment, Oracle Clusterware manages all of the Oracle database processes automatically. Anything managed by Oracle Clusterware is known as a cluster resource, which could be a database instance, a listener, a virtual IP (VIP) address, or an application process.

Oracle Clusterware was initially created to support Oracle RAC. Its benefits, however, are not limited to Oracle RAC; it can also be used to manage other applications through add-on modules (or action scripts). It is this flexibility and extensibility in Oracle Clusterware that forms the basis of a high availability solution for Oracle Fusion Middleware.

For more information about Oracle Clusterware, see the Oracle Clusterware Administration and Deployment Guide. You can find this guide on the Oracle Technology Network.

11.2 Cluster Ready Services and Oracle Fusion Middleware

Oracle Clusterware includes a high availability framework that offers protection to any resource with the help of resource-specific add-on modules. In Oracle Clusterware terminology, a resource refers to an object that is created by Oracle Clusterware to identify the entity to be managed, such as an application, a virtual IP or a shared disk. Oracle Clusterware monitors a resource to make sure it is always available by frequently checking its state, and attempting to restart it if it is down. If restarting fails, the resource will be started on a new node, a process called failover. Resource switchover, an intentional switch of the operating environment of resources, is also allowed through the proper user interface.

With this high availability framework, Oracle Clusterware manages resources through user-provided add-on modules. For example, to create a resource for an application running in a single process and a module has to be supplied by the user to start, stop this process, and check its state. If this application fails, Oracle Clusterware attempts to restart it using this module. If the node on which this application is currently running fails, Oracle Clusterware attempts to restart it on another node if the application and resource are configured properly. You can configure the monitoring frequency for a resource and define its relationships to other resources.

Since Oracle Fusion Middleware is such a critical and complex application environment, it would be beneficial if users are relieved of writing their own add-on modules and given easy access to Oracle Clusterware's high availability features. This is the task of Application Server Cluster Ready Service (ASCRS).

ASCRS consists of a frontend and a backend. The frontend is a command line interface, ascrsctl, with which you can perform administrative tasks, such as resource creation, deletion, update, start, stop or switchover between cluster nodes. The backend is logic for the life cycle management of the various Fusion Middleware resources. The frontend and backend have their own separate log files.

Oracle Clusterware and ASCRS provide a means to improve the survivability of the various resources when their hosting environment is corrupted or lost. However, they do not protect disk corruption or application malfunctioning caused by disk corruption.

ASCRS supports Oracle Clusterware version 10.2.0.4 or 11.1.0.7 and higher. It has online help that can be invoked using the following command:

ascrsctl help -c command -t resource_type

As an example, the following command shows the help for creating a virtual IP resource:

11.3 Installing and Configuring Oracle Clusterware with CRS

As an extension of CRS, ASCRS must be installed within each CRS home on each node of the cluster and configured separately before it can be used.

With the ASCRS command line tool ascrsctl, you can give ASCRS control of various middleware components. Once a component is controlled by CRS, its runtime state is closely monitored and CRS takes the proper actions if the component fails. With ascrsctl, you can create a CRS resource, and once a resource is created, you can perform start, stop, update, switch, status, and delete operations on it.

11.3.1 Installing ASCRS

Install ASCRS on each node of the cluster. To successfully install ASCRS on a particular node, check the following:

Operating system: ASCRS is supported on Unix platforms, and Windows Server 2008. The system version and patch level should be compatible to the CRS version supported on that platform.

CRS and version: CRS is installed on this system, started, and functioning correctly. The CRS version must be 10.2.0.4 or higher. For information about installing Oracle Clusterware and CRS, see the Oracle Clusterware Installation Guide for Linux.

User account: The ASCRS installation user should be the same as the owner of the CRS home. It should also be the same as the owner of the application resource being managed. On Windows, this user must have administrator privilege.

Note:

To install ASCRS for CRS 10.2.0.4, Sun JDK (or JRE) 1.5 or higher must be installed on the local system. It is needed by ascrsctl, the command line tool.

To install ASCRS:

Login as the CRS owner.

Insert the Oracle Fusion Middleware Companion CD and run the following commands to unzip and install the ascrs.zip file:

cd CRS_HOME
unzip Disk1/ascrs/ascrs.zip
cd ascrs/bin
setup

If the CRS version is 10.2.0.4, after JDK(JRE) 1.5+ or higher is installed, run the following command:

setup -j JDK/JRE_HOME

When the installation is complete, the following ASCRS directory structure appears:

11.3.2 Configuring ASCRS with Oracle Fusion Middleware

After you install ASCRS, it is ready for use with the default configuration. To customize logging locations, logging levels, or the default CRS properties, edit the config.xml file located in the CRS_HOME/ascrs/config directory.

The config.xml file contains the configuration for ascrsctl ASCRS agent logging. To change either of their locations, specify an existing path name or a path name within this CRS home using the ORACLE_HOME prefix. The available logging levels in the decreasing order of verbosity are ALL, FINEST, FINER, FINE, INFO, WARNING and SEVERE. Each resource has its own agent log file that rolls over after its size exceeds rollover_size bytes.

CRS properties are configured for policies. A policy name describes the characteristics of the CRS property values under that policy. A policy can be normal or fast. Policy 'fast' means more frequent resource health checking, and less delay in failover.

The following represents the default config.xml file shipped with ASCRS:

Consult Oracle Clusterware documentation for the definitions of these parameters before editing their values.

Since computing environments vary in speed, Oracle recommends measuring the application's start and stop latency before setting the script, start, and stop timeout values. These values may be twice as much as the observed latencies.

11.4 Using ASCRS to Manage Resources

With the ascrsctl command line you manage CRS resources created for Fusion Middleware components. With this tool you can create, update, start, stop, switch and delete resources.

As mentioned in a previous section, a resource refers to an object that is created by CRS to identify the entity to be managed, such as an application, a virtual IP, or a shared disk. If the auto start for a resource is set to 1, CRS ensures this resource starts when CRS starts. Since Fusion Middleware resources depend on each other, start or stop of one resource may affect other resources. Resource dependency is enforced in the resource creation through ascrsctl syntax. At runtime, CRS uses this dependency knowledge for start/stop.

CRS resources created with the ascrsctl command line follow a naming convention. Follow this naming convention to ensure that the resources function correctly. To avoid unexpected errors, Oracle recommends using the CRS installation exclusively for Oracle Fusion Middleware, so that all the CRS managed resources are created with ascrsctl.

Under this naming convention, the canonical name for a resource has the following format:

ora.name.cfctype

Where name refers to the short name of the resource, for example, sharedisk, or myvip, and type refers to one of the resource types, such as vip, disk, db, dblsnr or as.

For example, on Linux, the following command creates a virtual IP resource named ora.myvip.cfcvip from the IP address 192.168.1.10 on network interface eth0 with netmask 255.255.255.0:

11.4.1.2 Creating a Shared Disk Resource

In a Fusion Middleware environment, shared disks are those disk storages that are used to hold Oracle database software, the database data files, WebLogic servers, OPMN managed components, and their Oracle homes. Shared disks allow the use of the same data when application resources are switched among the nodes within a cluster.

When creating a shared disk resource, carefully consider the following:

On Unix:

Before creating a shared disk resource, create an empty signature file named .ascrssf on the root of the shared disk. The owner of the CRS home should own this file. This file is used by CRS after the resource is created.

You can specify nop for either the mount or unmount command. You can use it for the mount command if the shared disk is never offline. If the disk does go off line for some reason, CRS will detects it and mark it as down. The nop command can be used for the unmount command if the disk does not need to be unmounted by CRS. In such a case, be absolutely sure that the disk does not need to be unmounted. There are potential disk corruption issues if the shared disk is mounted on two nodes without protection. Again, the signature file is always needed on the shared disk.

The unmount command, may fail if there are active processes using the shared disk. To prevent this command failure, avoid accessing this disk from other applications while this disk resource is in online state.

For complex mount and unmount commands, encapsulate the logic in executable scripts and specify the full path of these scripts as the mount and unmount commands. A proper unmount script is capable of killing other processes that are using this disk to ensure a successful and clean disk unmount. If the unmount command is in a script, do some basic file system checking, such as running a fsck command. Such a script should return 0 for success and 1 for failure.

A shared disk resource is a system resource. Create, update, or delete commands generate scripts that must be executed as root to complete the create operation. Follow the instructions from the screen output.

If the signature file is at the mount point of the shared disk, the start/stop operation may fail. Having the signature file on the mount point signals ASCRS that the disk is mounted, even if its not.

Validate the mount/unmount command before using it in the mc or umc parameters or in the script file. There is no validation from ASCRS for the commands.

If the shared disk is not protected by a cluster file system, it could be corrupted if it is mounted from multiple nodes. To avoid this, before creating the ASCRS resource, mount the disk only on the node where you create the resource.

On Windows Server 2008:

Open Microsoft Disk Management and take note of the shared disk number. A disk number is a non-negative integer, such as 0, 2, or 5.

Create an empty mount directory on the system drive on each cluster node, such as c:\oracle\asdisk.

Ensure this disk is no longer used by any application on any node.

From one node, in Disk Management, right click the drive and, online it, remove all partitions on it, create a single partition on this hard drive, and format it with NTFS. Remove any drive letter that may be assigned to it, and mount it to the directory you just created. Right click the drive again and offline it.

On each other node, open Microsoft Disk Management, online this drive, remove the drive letter, if any, and mount it to the directory you just created. Right click the drive and offline it.

Go to the node where you will create the disk resource, online the disk.

This disk root should be accessible from the mount directory.

Create an empty signature file named .ascrssf on the root of the shared disk. The CRS home owner should own this file. This file is used by CRS after the resource is created.

The mount command is "diskmgr online disknumber" and unmount command is "diskmgr offline disknumber", where diskmgr is an ASCRS build in command.

To create a shared disk resource, on Unix, run the following ascrsctl command that includes a valid mount point, a mount command, and an unmount command:

Before creating the Oracle database resource, carefully check the following:

On Windows, ensure the built-in user system is in DBA_GROUP.

The database home is installed on a shared disk. The data files of this database are on the same or different shared disk(s). CRS resources have been created for all these shared disks with ascrsctl and started.

A CRS resource has been created for the database listener with an ascrsctl command, and the resource is started.

For online help information for creating an Oracle database resource, use the following command:

ascrsctl help -c create -t db

Note:

In this current release, ASCRS does not manage the Oracle Database Console and job scheduler processes.

11.4.1.5 Creating a Middleware Resource

OPMN instances and WebLogic servers are collectively called Application Server (AS) components and are managed by separate resources. Specifically, all OPMN managed components have to be managed by one resource and all servers under a WebLogic domain have to be managed by a different resource. Due to the complexity of WebLogic environment, we use a special section 11.4.8 to describe its resource creation procedure. This section is only for OPMN.

The following information is needed for creating a resource for an OPMN managed instance:

A valid instance home for the OPMN managed components.

A disk resource name for the instance home

A disk resource name for the instance's Oracle home if is on a different shared disk

The names of the OPMN managed applications for inclusion in the resource. If you plan to include only a subset of all the components, the other remaining components won't be managed by CRS and they shouldn't be started outside CRS. By default, all the components are included.

Before creating the OPMN resource, carefully check the following:

Creating OPMN resources is only supported for Unix platforms.

The Oracle home is installed on a shared disk. The OPMN instance is on the same or different shared disk. CRS resources have been created for all these shared disks with ascrsctl and started. Shutdown all OPMN managed applications.

The instance has been CFC enabled.

The following is a syntax example for creating the resource (Oracle home and instance home are on the same disk. All components are included.):

ascrsctl create -n myopmn -type as -ch /cfcas -disk ohdisk

For online help information for creating an OPMN instance resource, use the following command:

ascrsctl help -c create -t as

11.4.2 Updating Resources

You can update resources created with ascrsctl using the update command. Depending on the resource type, you can update the resource profile by specifying the appropriate parameter through the update command line. You can perform updates only when the resource is in the offline state.

For example, to update the virtual IP resource created in the last section with a new IP address and a different interface, use the following command:

ascrsctl update -n myvip -type vip -ip 192.168.1.20 -if eth1

Note:

If you want to change the set of nodes hosting a particular resource, you must stop all dependent resources and then update the cluster nodes for each resource with the same node set and ordering. To find out related resources, run ascrsctl status command for this resource.

11.4.3 Starting Up Resources

When a resource is started, it is put under the control of CRS and its runtime status is monitored continuously by CRS. If the resource depends on other resources, starting this resource automatically starts the dependent resources. Refer to Oracle Clusterware documentation for information about the role of resource placement policy during resource start up. The ascrsctl start command maps to the CRS command.

For example, to start the virtual IP resource, use the following command:

ascrsctl start -n ora.myvip.cfcvip

Note:

If a resource depends on more than one resource, while starting that resource, be sure that the resources, if online, are targeted on the same node.

11.4.4 Shutting Down Resources

When a resource is stopped, it is brought down and put in offline state and CRS stops monitoring its runtime status. If the resource depends on other resources that are in online status, the dependent resources are not stopped unless you confirm the prompt or (-np) option is specified. Refer to Oracle Clusterware documentation for more information about the implications of resource dependency during resource stop. The ascrsctl stop command maps to CRS command crs_stop.

For example, to stop the virtual IP resource, run the following command:

ascrsctl stop -n ora.myvip.cfcvip

11.4.5 Resource Switchover

Resource switchover is a process of shutting down the resource on the node on which it is running and restarting it on another node. The new node, if not specified, is determined by CRS, based on the placement policy. If the resource to be switched depends on other resources, or there are resources that are online and depend on it, this resource must be switched with -np flag.

To switch over a resource to another available node in the cluster, run the following command:

ascrsctl switch -n ora.myvip.cfcvip

11.4.6 Deleting Resources

You can delete a resource from CRS control. After a resource is deleted, the corresponding application or component's functionality is not affected, but CRS no longer monitors that resource. If a resource has dependent resources, it can not be removed.

To delete a resource from CRS control, run the following command:

ascrsctl delete -n ora.myvip.cfcvip

Note:

If you delete a resource from CRS, the log directory and the log files for the deleted resource are NOT automatically removed. If you don't plan to reuse them in the future, you should delete them manually. The log files are located in the ORA_CRS_HOME/ascrs/log directory by default.

11.4.7 Checking Resource Status

You can check resource status with the ascrsctl status command. With this command, you can view the states of all resources and their dependents. If a particular resource is specified, the status command show its CRS profile, its direct and indirect dependency relationships, and its current state information.

For example, to check the status of a resource, run the following command:

ascrsctl status -n ora.myvip.cfcvip

Assuming the virtual IP resource is used by a database listener resource and the listener resource is in turn required by a database resource, all the dependency information is shown in a tree structure, along with other status information in the following status output:

11.4.8 Configuring the Oracle WebLogic Environment

Creating a CRS resource for a WebLogic domain requires more preparation than other resource types. Due to its complexity, the procedure is divided into the following sections:

Basic Setup

Node Manager Setup

Administration Server Setup

Creating a Resource

Basic Setup

Before starting the basic setup, be sure that WebLogic is installed on shared disk(s). WebLogic Server software and the domain instance can be installed on either the same or separate shared disk.

In addition, ensure that the WebLogic Server environment is CFC enabled. See Section 10.2.2.1, "Administration Server Topology 1" for details on enabling WebLogic Server for CFC. Once CFC is enabled, you can manually start and stop the server, the original node, and the failover node(s) without noticeable difference.

To create the dependency resources:

Create a CRS resource for each shared disk and start it on the node on which it was created.

Create a CRS resource for the virtual IP with the ascrsctl command and start it on the same cluster node.

Node Manager Setup

To set up the Node Manager

For Windows Server 2008, on each node, create Node Manager Windows service if it does not already exist, by executing the following command from the WL_HOME/server/bin directory:

installNodeMgrSvc.cmd

From Windows Service Manager, make sure this service is in manual start mode.

If you have not yet done so, change Node Manager's username and password. The initial password is randomly generated. To change the Node Manager password, in the WebLogic Server Administration Console, select Domain, Security, General, and then Advanced. Enter the new password and click Save.

If you have changed anything in steps 1 or 2, restart the Node Manager.

On Unix, using the following command from the WL_HOME/server/bin directory:

startNodemanager.sh

On Windows, start Node Manager from the service manager.:

Start the WebLogic scripting tool in the WL_HOME/common/bin directory. To persist Node Manager's user login information in the ascrscf.dat and ascrskf.dat files, use the following commands:

For Unix platforms, copy CRS_HOME/ascrs/public/cfcStartNodeManager.sh to the WL_HOME/server/bin directory, and make the script executable.

Note:

To keep the setup consistently in sync, step 4 must be performed whenever the Node Manager passwords or usernames are changed.

After you have started Node Manager for the first time, you can edit the nodemanager.properties file to set the StartScriptEnabled property. The nodemanager.properties file does not exist until Node Manager is started for the first time.

In the WL_HOME/common/nodemanager directory, set the StartScriptEnabled property in the nodemanager.properties file to true.

StartScriptEnabled=true

Check the nodemanager.properties file to ensure no value is assigned to ListenAddress, and that a valid port number is assigned to ListenPort.

When this property is set in the nodemanager.properties file, you no longer need to define it in the JAVA_OPTIONS environment variable.

Server Setup

All WebLogic servers listen on the virtual IP. To ensure this is configured correctly, log in to the WebLogic Server Administration Console and navigate to the server listen address page and verify that the virtual IP and the port number are both set correctly and click Save.

The Administration Server must also listen on the localhost. To ensure this is configured correctly, login WebLogic Server Administration Console and do the following:

In the Domain tree, select Environment, Servers, server_name, Protocols, and then Channels.

Click the Lock and Edit button.

Click New, select a channel name, and protocol t3, and continue to the next screen.

Enter the localhost for both the Listen Address and External ListenAddress.

Enter the port number to the Listen Port and External ListenPort. This port number must be exactly the same as the port number used for the virtual IP.

Continue to the next screen and verify that Enabled is selected.

Click Finish.

Click Activate Changes.

If this is the Administration Server, ensure the DOMAIN_HOME/servers/<admin server name>/security directory exists. This directory should contain the boot.properties file. If this file does not exist, create it and include the following properties:

If this domain does not have an Administration Server, ensure the DOMAIN_HOME/servers/myserver/security directory exists. This directory should contain the boot.properties file. If this file does not exist, create it and include the following properties:

11.5 Example Topologies

Figure 11-1 illustrates the CRS Example 1 topology. In this example, Oracle HTTP Server and SOA are installed in a two-node cluster. Oracle HTTP Server is managed by OPMN. The SOA installation has a WebLogic server running that it hosts four Java EE applications.

Assumptions:

Operating Environment: This is a Linux, two-node cluster with node1.company.com and node2.company.com as its members. Node 1 is designated as the primary node and node2 is the failover node. CRS has been installed on both nodes in the /crshome directory and is started. ASCRS has been installed on both nodes and has been configured.

One shared disk has been allocated for both Oracle HTTP Server (Oracle home and instance home), WebLogic installation (server software and the domain home of the SOA server). It is a SCSI drive identified with /dev/sda1 and has ext2 file system on it. It is mounted on /sharedisk1. Assume this shared disk is not used for other purposes.

Both Oracle HTTP Server and WebLogic use virtual IP 192.168.1.10 for its public listen address. One each node, two network interface controllers, eth0 and eth1 are available for binding the virtual IP. The netmask is 255.255.255.0.

Oracle HTTP Server Oracle home is /sharedisk1/ohsoh. The instance home is /sharedisk1/ohinst.

WebLogic Server software is installed in the /sharedisk1/fmw directory, and the domain directory is /sharedisk1/fmw/user_projects/domains/asdomain.

Under these assumptions, the following procedure describes the Cold Failover Clusters automation setup:

Installing WebLogic software and enabling Cold Failover Clusters:

If the shared disk /dev/sda1 is mounted on Node 2, unmount it. Mount the shared disk on /shareddisk1 on Node 1 if not yet mounted.

Create an empty signature file .ascrssf in /shareddisk1. Create this file only after this shared disk is mounted.

Bind the virtual IP to eth0 using /sbin/ifconfig.

4. Install SOA and OHS on the shared disk. Perform the CFC enabling procedure for both SOA and OHS using the virtual IP. After CFC enabling, shutdown all processes belonging to SOA server and OHS instance. To do a basic checking of the CFC enabling, unmount the shared disk on Node 1 and mount on Node 2 and try to start SOA and OHS on Node 2. If start fails, fix it before you proceed.

After step 3 is done, shutdown all OHS and SOA processes and unmount the disk on Node 2 and mount it on Node 1.

Follow the procedure in Section 11.4.8 to configure the WebLogic server in SOA install for ASCRS.

Figure 11-2 illustrates the CRS Example 2 topology. In this example topology, WebLogic Server and the Oracle database are installed on a two-node cluster with the following characteristics:

The WebLogic Administration Server is the only WebLogic server running on this cluster. The WebLogic software and the domain home reside on the same shared disk.

The Administration Server, along with Oracle Enterprise Manager, run in a WebLogic Java EE container with the first node as its primary node.

The database software and its data files reside on two other shared disks with the second node as its primary node.

The goal of this topology is to provide a failover solution for both the WebLogic Administration Server and the database instance.

Assumptions:

Operating Environment: This is a Linux two-node cluster with node1.company.com and node2.company.com as its members. Node 1 is designated as the primary node for WebLogic Server and Node 2 as its failover node. Node 2 is designated as the primary node for the Oracle database, and node 1 as its failover node. CRS is installed on both nodes in the /crshome directory and is started. ASCRS is installed on both nodes and is configured.

Three shared disks are allocated. They are all SCSI drives identified with /dev/sda1, /dev/sda2, /dev/sda3 and have ext2 file system on them. /dev/sda1 is used for WebLogic software.

Domain home: /dev/sda2 is used for Oracle database software. /dev/sda3 is used for the database data files. They are mounted on /sharedisk1, /sharedisk2, and /sharedisk3, respectively.

WebLogic Server uses virtual IP 192.168.1.10 for its public listen address, and 7001 for its listen port. On each node, two network interface controllers, eth0 and eth1 are available for binding this virtual IP.

The database listener uses virtual IP 192.168.1.20 for its listen address, and 1521 for its listen port. On each node, network interface controller eth2 are used for binding this virtual IP.

The netmask is 255.255.255.0 for both virtual IPs.

WebLogic Server is installed on the shared disk in the /sharedisk1/fmw directory, and the domain directory is /sharedisk1/fmw/user_projects/domains/asdomain.

The database is installed on the shared disk in the /sharedisk2/dbhome directory, and the data files are created in the /sharedisk3/dbdata directory. Assume orcl is the Oracle SID name and LISTENER is the listener name.

Under these assumptions, the following describes the procedure for automating Cold Failover Clusters:

Install WebLogic software and enable Cold Failover Clusters:

If the shared disk /dev/sda1 is mounted on Node 2, unmount it. Mount the shared disk on /shareddisk1 on Node 1 if it is not yet mounted.

Create an empty signature file .ascrssf in /shareddisk1. Create this file only after this shared disk is mounted.

Bind the virtual IP 192.168.1.10 to eth0 using /sbin/ifconfig.

Install WebLogic on the shared disk. Perform the Cold Failover Clusters enabling procedure for this installation using this virtual IP.

Start all the database related resources on Node 2. Since the database resource depends on all the other resources directly or indirectly, starting the database resource automatically starts the others as well.

ascrsctl start -n ora.asdb.cfcdb -node node2

11.6 Troubleshooting Oracle CRS

ASCRS relies on logging for diagnosing unexpected issues. To get more diagnostic information, you can increase the verbosity of the log level by changing the ASCRS configuration file config.xml.

In addition, you can also check CRS daemon logs for basic CRS issues.

Scripting on this page enhances content navigation, but does not change the content in any way.