Registering a Resource Type

A resource type provides specification of common properties and callback methods
that apply to all of the resources of the given type. You must register a resource
type before you create a resource of that type. For details about resource types,
see Chapter 1, Planning for Sun Cluster Data Services.

How to Register a Resource Type

Note –

Perform this procedure from any cluster node.

Before You Begin

Ensure that you have the name for the resource type that you plan to register.
The resource type name is an abbreviation for the data service name. For information
about resource type names of data services that are supplied with Sun Cluster, see
the release notes for your release of Sun Cluster.

Steps

Become superuser on a cluster member.

Register the resource type.

# scrgadm-a-tresource-type

-a

Adds the specified resource type.

-tresource-type

Specifies name of the resource type to add. See the release notes
for your release of Sun Cluster to determine the predefined name to supply.

Verify that the resource type has been registered.

# scrgadm -pv -tresource-type

Example 2–1 Registering a Resource Type

The following example registers the SUNW.iws resource type,
which represents the Sun Java System Web Server application in a Sun Cluster configuration.

Next Steps

See Also

Upgrading a Resource Type

Upgrading a resource type enables you to use new features that are introduced
in the new version of the resource type. A new version of a resource type might differ
from a previous version in the following ways.

Default settings of resource type properties might change.

New extension properties of the resource type might be introduced.

Existing extension properties of the resource type might be withdrawn.

The set of standard properties that are declared for the resource
type might change.

The attributes of resource properties such as min, max, arraymin, arraymax, default, and tunability might change.

The set of declared methods might differ.

The implementation of methods or the fault monitor might change.

Upgrading a resource type involves the tasks that are explained in the following
sections:

How to Install and Register an Upgrade
of a Resource Type

The instructions that follow explain how to use the scrgadm(1M) command
to perform this task. However, you are not restricted to using the scrgadm command for this task. Instead of the scrgadm command,
you can use SunPlex Manager or the Resource Group option of the scsetup(1M) command to perform this task.

Before You Begin

Consult the documentation for the resource type to determine what you must do
before installing the upgrade package on a node. One action from the following list
will be required:

You must reboot the node in noncluster mode.

You may leave the node running in cluster mode, but you must turn
off monitoring of all instances of the resource type.

You may leave the node running in cluster mode and leave monitoring
turned on for all instances of the resource type.

If you must reboot the node in noncluster mode, prevent a loss of service by
performing a rolling upgrade. In a rolling upgrade, you install the package on each
node individually while leaving the remaining nodes running in cluster mode.

Steps

Become superuser or assume an equivalent role.

Install the package for the resource
type upgrade on all cluster nodes where instances of the resource type are to be brought
online.

Register the new version of the resource
type.

To ensure that the correct version of the resource type is registered,
you must specify the following information:

If necessary, set the Installed_nodes property to the nodes where the package for the resource type upgrade is
installed.

You must perform this step if the package for the resource
type upgrade is not installed on all cluster nodes.

The nodelist property of all resource groups that contain instances of the resource
type must be a subset of the Installed_nodes property of the resource
type.

# scrgadm -c -t resource-type -h installed-node-list

How to Migrate Existing Resources to a
New Version of the Resource Type

The instructions that follow explain how to use the scrgadm(1M) command
to perform this task. However, you are not restricted to using the scrgadm command for this task. Instead of the scrgadm command,
you can use SunPlex Manager or the Resource Group option of the scsetup(1M) command to perform this task.

Before You Begin

Consult the instructions for upgrading the resource type to determine when
you can migrate resources to a new version of the resource type.

Any time

Only when the resource is unmonitored

Only when the resource is offline

Only when the resource is disabled

Only when the resource group is unmanaged

The instructions might state that you cannot upgrade your existing version of
the resource. If you cannot migrate the resource, consider the following alternatives:

Deleting the resource and replacing it with a new resource of the
upgraded version

Leaving the resource at the old version of the resource type

Steps

Become superuser or assume an equivalent role.

For each resource of the resource
type that is to be migrated, change the state of the resource or its resource group
to the appropriate state.

If you can migrate the resource at any time, no action is required.

If you can migrate the resource only when the resource is unmonitored,
type the following command:

# scswitch -M -n -j resource

If you can migrate the resource only when the resource is offline, type
the following command:

# scswitch -n -j resource

Note –

If other resources depend on the resource that you are migrating, this
step fails. In this situation, consult the error message that is printed to determine
the names of the dependent resources. Then repeat this step, specifying a comma-separated
list that contains the resource that you are migrating and any dependent resources.

If you can migrate the resource only when the resource is disabled, type
the following command:

# scswitch -n -j resource

Note –

If other resources depend on the resource that you are migrating, this
step fails. In this situation, consult the error message that is printed to determine
the names of the dependent resources. Then repeat this step, specifying a comma-separated
list that contains the resource that you are migrating and any dependent resources.

If you can migrate the resource only when the resource group is unmanaged,
type the following commands:

Specifies a comma-separated list of all resources in the resource
group that is to be unmanaged.

resource-group

Specifies the resource group that is to be unmanaged.

Note –

You can specify the resources in resource-list in
any order. The scswitch command disables the resources in the order
that is required to satisfy dependencies between the resources, regardless of their
order in resource-list.

For each resource of the resource
type that is to be migrated, change the Type_version property to
the new version.

If necessary, set other properties of the same resource
to appropriate values in the same command. To set these properties, specify additional -x options or -y options in the command.

To determine
whether you are required to set other properties, consult the instructions for upgrading
the resource type. You might be required to set other properties for the following
reasons:

An extension property has been introduced in the new version of the
resource type.

The default value of an existing property has been changed in the
new version of the resource type.

If the existing version of the resource type does not support upgrades
to the new version, this step fails.

Restore the previous state of the
resource or resource group by reversing the command that you typed in Step 2.

If you can migrate the resource at any time, no action is required.

Note –

After migrating a resource that can be migrated at any time, the resource
probe might not display the correct resource type version. In this situation, disable
and re-enable the resource's fault monitor to ensure that the resource probe displays
the correct resource type version.

If you can migrate the resource only when the resource is unmonitored,
type the following command:

# scswitch -M -e -j resource

If you can migrate the resource only when the resource is offline, type
the following command:

# scswitch -e -j resource

Note –

If you disabled in Step 2 other resources that depend on the resource that you are migrating,
enable the dependent resources also.

If you can migrate the resource only when the resource is disabled, type
the following command:

# scswitch -e -j resource

Note –

If you disabled in Step 2 other resources that depend on the resource that you are migrating,
enable the dependent resources also.

If you can migrate the resource only when the resource group is unmanaged,
type the following commands:

Example 2–2 Migrating a Resource That Can Be Migrated Only When Offline

This example shows the migration of a resource that can be migrated only when
the resource is offline. The new resource type package contains methods that are located
in new paths. Because the methods are not overwritten during the installation, the
resource does not need to be disabled until after the upgraded resource type is installed.

The characteristics of the resource in this example are as follows:

The new resource type version is 2.0.

The resource name is myresource.

The resource type name is myrt.

The new RTR file is in /opt/XYZmyrt/etc/XYZ.myrt.

No dependencies on the resource that is to be migrated exist.

The resource that is to be migrated can be taken offline while leaving
the containing resource group online.

This example assumes that the upgrade package is already installed on all cluster
nodes according to the supplier's directions.

Example 2–3 Migrating a Resource That Can Be Migrated Only When Unmonitored

This example shows the migration of a resource that can be migrated only when
the resource is unmonitored. The new resource type package contains only the monitor
and RTR file. Because the monitor is overwritten during installation, monitoring of
the resource must be disabled before the upgrade package is installed.

The characteristics of the resource in this example are as follows:

The new resource type version is 2.0.

The resource name is myresource .

The resource type name is myrt.

The new RTR file is in /opt/XYZmyrt/etc/XYZ.myrt.

The following operations are performed in this example.

Before the upgrade package is installed, the following command is
run to disable monitoring of the resource:

# scswitch-M-n-j myresource

The upgrade package is installed on all cluster nodes according to
the supplier's directions.

To register the new version of the resource type, the following command
is run:

# scrgadm -a -t myrt -f /opt/XYZmyrt/etc/XYZ.myrt

To change the Type_version property to the new
version, the following command is run:

# scrgadm -c -j myresource -y Type_version=2.0

To enable monitoring of the resource after its migration, the following
command is run:

# scswitch -M -e -j myresource

Downgrading a Resource Type

You can downgrade a resource to an older version of its resource type. The conditions
for downgrading a resource to an older version of the resource type are more restrictive
than the conditions for upgrading to a newer version of the resource type. The resource
group that contains the resource must be unmanaged.

How to Downgrade a Resource to an Older Version
of Its Resource Type

The instructions that follow explain how to use the scrgadm(1M) command
to perform this task. However, you are not restricted to using the scrgadm command for this task. Instead of the scrgadm command,
you can use SunPlex Manager or the Resource Group option of the scsetup(1M) command to perform this task.

Steps

Become superuser or assume an equivalent role.

Switch offline the resource group that contains
the resource that you are downgrading.

scswitch -F -g resource-group

Disable all resources in the resource group that
contains the resource that you are downgrading.

scswitch -n -j resource-list

Note –

You can specify the resources in resource-list in
any order. The scswitch command disables the resources in the order
that is required to satisfy dependencies between the resources, regardless of their
order in resource-list.

If other resources
depend on any resource in resource-list, this step fails.
In this situation, consult the error message that is printed to determine the names
of the dependent resources. Then repeat this step, specifying a comma-separated list
that contains the resources that you originally specified and any dependent resources.

Unmanage the resource group that contains the
resource that you are downgrading.

scswitch -u -g resource-group

If necessary, reregister the old version of the
resource type to which you are downgrading.

Perform this step only if
the version to which you are downgrading is no longer registered. If the version to
which you are downgrading is still registered, omit this step.

scrgadm -a -t resource-type-name

For the resource that you are downgrading, set
the Type_version property to old version to which you are downgrading.

If necessary, edit other properties of the same resource to appropriate
values in the same command.

Bring to a managed state the resource group that contains the resource
that you downgraded.

# scswitch -o -g resource-group

Bring online the resource group that contains the resource that you downgraded.

# scswitch -z -g resource-group

Creating a Resource Group

A resource group contains a set of resources, all of which are brought online
or offline together on a given node or set of nodes. You must create an empty resource
group before you place resources into it.

The two resource group types are failover and scalable. A failover resource group can be online on one
node only at any time, while a scalable resource group can be online on multiple nodes
simultaneously.

The following procedure describes how to use the scrgadm(1M) command to register and configure your data
service.

Next Steps

See Also

How to Create a Scalable Resource Group

A scalable resource group is used with scalable services. The shared address
feature is the Sun Cluster networking facility that enables the multiple instances
of a scalable service to appear as a single service. You must first create a failover
resource group that contains the shared addresses on which the scalable resources
depend. Next, create a scalable resource group, and add scalable resources to that
group.

Note –

Perform this procedure from any cluster node.

Steps

Become superuser on a cluster member.

Create the failover resource group
that holds the shared addresses that the scalable resource is to use.

Specifies your choice of the name of the scalable resource group to
add.

-yMaximum_primaries=m

Specifies the maximum number of active primaries for this resource
group.

-yDesired_primaries=n

Specifies the number of active primaries on which the resource group
should attempt to start.

-yRG_dependencies=depend-resource-group

Identifies the resource group that contains the shared address-resource
on which the resource group that is being created depends.

-hnodelist

Specifies an optional list of nodes on which this resource group is
to be available. If you do not specify this list, the value defaults to all of the
nodes.

Verify that the scalable resource group has been
created.

# scrgadm -pv -gresource-group

Example 2–5 Creating a Scalable Resource Group

This example shows the addition of a scalable resource group (resource-group-1) to be hosted on two nodes (phys-schost-1, phys-schost-2). The scalable resource group depends on the failover resource
group (resource-group-2) that contains the shared addresses.

See Also

Adding Resources to Resource Groups

A resource is an instantiation of a resource type. You must add resources to
a resource group before the RGM can manage the resources. This section describes
the following three resource types.

Logical hostname resources

Shared-address resources

Data service (application) resources

Always add logical hostname resources and shared address resources to failover
resource groups. Add data service resources for failover data services to failover
resource groups. Failover resource groups contain both the logical hostname resources
and the application resources for the data service. Scalable resource groups contain
only the application resources for scalable services. The shared address resources
on which the scalable service depends must reside in a separate failover resource
group. You must specify dependencies between the scalable application resources and
the shared address resources for the data service to scale across cluster nodes.

Specifies an optional resource name of your choice. If you do not
specify this option, the name defaults to the first hostname that is specified with
the -l option.

-gresource-group

Specifies the name of the resource group in which this resource resides.

-lhostnamelist, …

Specifies a comma-separated list of UNIX hostnames (logical hostnames)
by which clients communicate with services in the resource group.

-nnetiflist

Specifies an optional, comma-separated list that identifies the IP Networking Multipathing groups
that are on each node. Each element in netiflist must be
in the form of netif@node. netif can be given
as an IP Networking Multipathing group name, such as sc_ipmp0. The node can be
identified by the node name or node ID, such as sc_ipmp0@1 or sc_ipmp@phys-schost-1.

Note –

Sun Cluster does not support the use of the adapter name for netif.

Verify that the logical hostname resource has been
added.

# scrgadm -pv -jresource

Example 2–6 Adding a Logical Hostname Resource to a Resource Group

This example shows the addition of logical hostname resource (resource-1) to a resource group (resource-group-1).

Next Steps

Troubleshooting

Adding a resource causes the Sun Cluster software to validate
the resource. If the validation fails, the scrgadm command prints
an error message and exits. To determine why the validation failed, check the syslog on each node for an error message. The message appears on the node
that performed the validation, not necessarily the node on which you ran the scrgadm command.

Specifies an optional resource name of your choice. If you do not
specify this option, the name defaults to the first hostname that is specified with
the -l option.

-gresource-group

Specifies the resource group name.

-lhostnamelist, …

Specifies a comma-separated list of shared address hostnames.

-Xauxnodelist

Specifies a comma-separated list of physical node names or IDs that
identify the cluster nodes that can host the shared address but never serve as primary
if failover occurs. These nodes are mutually exclusive, with the nodes identified
as potential masters in the resource group's node list.

-nnetiflist

Specifies an optional, comma-separated list that identifies the IP Networking Multipathing groups
that are on each node. Each element in netiflist must be
in the form of netif@node. netif can be given
as an IP Networking Multipathing group name, such as sc_ipmp0. The node can be
identified by the node name or node ID, such as sc_ipmp0@1 or sc_ipmp@phys-schost-1.

Note –

Sun Cluster does not support the use of the adapter name for netif.

Verify that the shared address resource has been
added and validated.

# scrgadm -pv -jresource

Example 2–8 Adding a Shared Address Resource to a Resource Group

This example shows the addition of a shared address resource (resource-1) to a resource group (resource-group-1).

Next Steps

Troubleshooting

Adding a resource causes the Sun Cluster software to validate
the resource. If the validation fails, the scrgadm command prints
an error message and exits. To determine why the validation failed, check the syslog on each node for an error message. The message appears on the node
that performed the validation, not necessarily the node on which you ran the scrgadm command.

Specifies the name of a failover resource group. This resource group
must already exist.

-tresource-type

Specifies the name of the resource type for the resource.

-xextension-property=value, …

Specifies a comma-separated list of extension properties that you
are setting for the resource. The extension properties that you can set depend on
the resource type. To determine which extension properties to set, see the documentation
for the resource type.

-ystandard-property=value, …

Specifies a comma-separated list of standard properties that you are
setting for the resource. The standard properties that you can set depend on the resource
type. To determine which standard properties to set, see the documentation for the
resource type and Appendix A, Standard Properties.

Verify that the failover application resource has
been added and validated.

# scrgadm -pv -jresource

Example 2–9 Adding a Failover Application Resource to a Resource Group

This example shows the addition of a resource (resource-1)
to a resource group (resource-group-1). The resource depends on
logical hostname resources (schost-1, schost-2),
which must reside in the same failover resource groups that you defined previously.

Next Steps

Troubleshooting

Adding a resource causes the Sun Cluster software to validate
the resource. If the validation fails, the scrgadm command prints
an error message and exits. To determine why the validation failed, check the syslog on each node for an error message. The message appears on the node
that performed the validation, not necessarily the node on which you ran the scrgadm command.

Specifies the name of a scalable service resource group that you previously
created.

-tresource-type

Specifies the name of the resource type for this resource.

-yNetwork_resources_used=network-resource[,network-resource...]

Specifies the list of network resources (shared addresses) on which
this resource depends.

-yScalable=True

Specifies that this resource is scalable.

-xextension-property=value, …

Specifies a comma-separated list of extension properties that you
are setting for the resource. The extension properties that you can set depend on
the resource type. To determine which extension properties to set, see the documentation
for the resource type.

-ystandard-property=value, …

Specifies a comma-separated list of standard properties that you are
setting for the resource. The standard properties that you can set depend on the resource
type. For scalable services, you typically set the Port_list, Load_balancing_weights, and Load_balancing_policy properties.
To determine which standard properties to set, see the documentation for the resource
type and Appendix A, Standard Properties.

Verify that the scalable application resource has
been added and validated.

# scrgadm -pv -jresource

Example 2–10 Adding a Scalable Application Resource to a Resource Group

This example shows the addition of a resource (resource-1)
to a resource group (resource-group-1). Note that resource-group-1 depends on the failover resource group that contains the network addresses
that are in use (schost-1 and schost-2 in the
following example). The resource depends on shared address resources (schost-1, schost-2), which must reside in one or more failover
resource groups that you defined previously.

Next Steps

Troubleshooting

Adding a resource causes the Sun Cluster software to validate
the resource. If the validation fails, the scrgadm command prints
an error message and exits. To determine why the validation failed, check the syslog on each node for an error message. The message appears on the node
that performed the validation, not necessarily the node on which you ran the scrgadm command.

See Also

Bringing Online Resource Groups

To enable resources to begin providing HA services, you must perform the following
operations:

Enabling the resources in their resource groups

Enabling the resource monitors

Making the resource groups managed

Bringing online the resource groups

You can perform these tasks individually or by using a single command.

After you bring online a resource group, it is configured and ready for use.
If a resource or node fails, the RGM switches the resource group online on
alternate nodes to maintain availability of the resource group.

How to Bring Online Resource Groups

Perform this task from any cluster node.

Steps

On a cluster member, become superuser or assume an
equivalent role.

Type the command to bring online the resource groups.

If you have intentionally disabled a resource or a fault monitor that
must remain disabled, type the following command:

Verify that each resource group that you specified
in Step 2 is online.

The output from this command indicates on which nodes each resource group is online.

# scstat-g

Example 2–11 Bringing Online a Resource Group

This example shows how to bring online the resource group resource-group-1 and verify its status. All resources in this resource and their fault monitors
are also enabled.

# scswitch -Z -g resource-group-1
# scstat -g

Next Steps

If you brought resource groups online without enabling
their resources and fault monitors, enable the fault monitors of any resources that
you require to be enabled. For more information, see How to Enable a Resource Fault Monitor.

See Also

Disabling and Enabling Resource Monitors

The following procedures disable or enable resource fault monitors, not the
resources themselves. A resource can continue to operate normally while its fault
monitor is disabled. However, if the fault monitor is disabled and a data service
fault occurs, automatic fault recovery is not initiated.

Run the following command on each cluster node, and
check for monitored fields (RS Monitored).

# scrgadm-pv

Example 2–12 Disabling a Resource Fault Monitor

This example shows how to disable a resource fault monitor.

# scswitch -n -M -j resource-1
# scrgadm -pv
...
RS Monitored: no...

How to Enable a Resource Fault Monitor

Steps

Become superuser on a cluster member.

Enable the resource fault monitor.

# scswitch-e-M-jresource

-e

Enables a resource or resource monitor

-M

Enables the fault monitor for the specified resource

-jresource

Specifies the name of the resource

Verify that the resource fault monitor
has been enabled.

Run the following command on each cluster node, and
check for monitored fields (RS Monitored).

# scrgadm-pv

Example 2–13 Enabling a Resource Fault Monitor

This example shows how to enable a resource fault monitor.

# scswitch -e -M -j resource-1
# scrgadm -pv
...
RS Monitored: yes...

Removing Resource Types

You do not need to remove resource types that are not in use. However, if you
want to remove a resource type, follow this procedure.

Note –

Perform this procedure from any cluster node.

How to Remove a Resource Type

Removing a resource type involves disabling and removing all resources of that
type in the cluster before unregistering the resource type.

Before You Begin

To identify all instances of the resource type that you are removing, type the
following command:

# scrgadm -pv

Steps

Become superuser on a cluster member.

Disable each resource of the resource type that you
are removing.

# scswitch-n -jresource

-n

Disables the resource

-jresource

Specifies the name of the resource to disable

Remove each resource of the resource type that you
are removing.

# scrgadm-r-jresource

-r

Removes the specified resource

-j

Specifies the name of the resource to remove

Unregister the resource type.

# scrgadm-r-tresource-type

-r

Unregisters the specified resource type.

-tresource-type

Specifies the name of the resource type to remove.

Verify that the resource type has been removed.

# scrgadm -p

Example 2–14 Removing a Resource Type

This example shows how to disable and remove all of the resources of a resource
type (resource-type-1) and then unregister the resource type. In
this example, resource-1 is a resource of the resource type resource-type-1.

See Also

Switching the Current Primary of a Resource Group

Use the following procedure to switch over a resource group from its current
primary to another node that is to become the new primary.

How to Switch the Current Primary of a Resource Group

Note –

Perform this procedure from any cluster node.

Before You Begin

Ensure that the following conditions are met:

You have the following information:

The name of the resource group that you are switching over

The names of the nodes on where the resource group is to be brought
online or to remain online

The nodes where the resource group is to be brought online or to remain
online are cluster nodes.

These nodes have been set up to be potential masters of the resource
group that you are switching.

To see a list of potential primaries for the resource group, type the following
command:

# scrgadm -pv

Steps

On a cluster member, become superuser or assume an
equivalent role.

Switch the resource group to a new set of primaries.

# scswitch-z-gresource-group-hnodelist

-z

Switches the specified resource group to a new set of primaries.

-gresource-group

Specifies the name of the resource group to switch.

-hnodelist

Specifies a comma-separated list of the names of the nodes on which
the resource group is to be brought online or is to remain online. The list may contain
one node name or more than one node name. This resource group is switched offline
on all of the other nodes.

See Also

Disabling Resources and Moving Their Resource Group Into
the UNMANAGED State

At times, you must bring a resource group into the UNMANAGED state
before you perform an administrative procedure on it. Before you move a resource group
into the UNMANAGED state, you must disable all of the resources
that are part of the resource group and bring the resource group offline.

How to Disable a Resource and Move Its Resource Group Into
the UNMANAGED State

Note –

When
a shared address resource is disabled, the resource might still be able to respond
to ping(1M) commands from some
hosts. To ensure that a disabled shared address resource cannot respond to ping commands, you must bring the resource's resource group to the UNMANAGED state.

Before You Begin

Ensure that you have the following information.

The name of the resources to be disabled

The name of the resource group to move into the UNMANAGED state

To determine the resource and resource group names that you need for this procedure,
type the following command:

# scrgadm -pv

Steps

Become superuser on a cluster member.

Disable all resources in the resource group.

# scswitch-n-jresource-list

-n

Disables the resources

-jresource-list

Specifies a comma-separated list of the resources in the resource
group

Note –

You can specify the resources in resource-list in
any order. The scswitch command disables the resources in the order
that is required to satisfy dependencies between the resources, regardless of their
order in resource-list.

Run the following command to switch
the resource group offline.

# scswitch-F-gresource-group

-F

Switches a resource group offline

-gresource-group

Specifies the name of the resource group to take offline

Move the resource group into the UNMANAGED state.

# scswitch -u -gresource-group

-u

Moves the specified resource group in the UNMANAGED state

-gresource-group

Specifies the name of the resource group to move into the UNMANAGED state

Verify that the resources are disabled and the resource
group is in the UNMANAGED state.

# scrgadm -pv-gresource-group

Example 2–18 Disabling a Resource and Moving the Resource Group Into the UNMANAGED State

This example shows how to disable the resource (resource-1)
and then move the resource group (resource-group-1) into the UNMANAGED state.

See Also

Before you perform administrative procedures on resources, resource groups,
or resource types, view the current configuration settings for these objects.

Note –

You can view configuration settings for resources, resource groups, and
resource types from any cluster node.

The scrgadm command provides the following levels of configuration
status information.

With the -p option, the output shows a very limited
set of property values for resource types, resource groups, and resources.

With the -pv option, the output shows more details
about other resource type, resource group, and resource properties.

With the -pvv option, the output provides a detailed
view, including resource type methods, extension properties, and all of the resource
and resource group properties.

You can also use the -t, -g, and -j (resource
type, resource group, and resource, respectively) options, followed by the name of
the object that you want to view, to check status information about specific resource
types, resource groups, and resources. For example, the following command specifies
that you want to view specific information about the resource apache-1 only.

Resources also have extension properties, which are predefined for the data
service that represents the resource. For a description of the extension properties
of a data service, see the documentation for the data service.

To determine whether you can change a property, see the Tunable entry for the
property in the description of the property.

The following procedures describe how to change properties for configuring resource
types, resource groups, and resources.

How to Change Resource Type Properties

Note –

Perform this procedure from any cluster node.

Before You Begin

Ensure that you have the following information.

The name of the resource type to change.

The name of the resource type property to change. For resource types,
you can change only certain properties. To determine whether you can change a property,
see the Tunable entry for the property in Resource Type Properties.

Note –

You cannot change the Installed_nodes property explicitly.
To change this property, specify the -hinstalled-node-list option of the scrgadm command.

Steps

Become superuser on a cluster member.

Run the scrgadm command to determine
the name of the resource type that you need for this procedure.

# scrgadm -pv

Change the resource type property.

For
resource types, you can change only certain properties. To determine whether you can
change a property, see the Tunable entry for the property in Resource Type Properties.

Example 2–22 Changing an Extension Resource Property

How to Modify a Logical Hostname Resource
or a Shared Address Resource

By default, logical hostname resources and shared address resources use name
services for name resolution. You might configure a cluster to use a name service
that is running on the same cluster. During the failover of a logical hostname resource
or a shared address resource, a name service that is running on the cluster might
also be failing over. If the logical hostname resource or the shared address resource
uses the name service that is failing over, the resource fails to fail over.

Note –

Configuring a cluster to use a name server that is running on
the same cluster might impair the availability of other services on the cluster.

To prevent such a failure to fail over, modify the logical hostname resource
or the shared address resource to bypass name services. To modify the resource to
bypass name services, set the CheckNameService extension property
of the resource to false. You can modify the CheckNameService property at any time.

Note –

If your version of the resource type is earlier than 2, you must upgrade
the resource type before you attempt to modify the resource. For more information,
see Upgrading a Preregistered Resource Type.

Steps

Become superuser on a cluster member.

Change the resource property.

# scrgadm -c -j resource -x CheckNameService=false

-jresource

Specifies the name of the logical hostname resource or shared address
resource that you are modifying

-y CheckNameService=false

Sets the CheckNameService extension property of
the resource to false

Clearing the STOP_FAILED Error Flag
on Resources

When the Failover_mode resource property is set to NONE or SOFT, a failure of the resource's STOP method causes the following effects:

The individual resource goes into the STOP_FAILED state.

The resource group that contains the resource goes into the ERROR_STOP_FAILED state.

In this situation, you cannot perform the following operations:

Bringing online the resource group on any node

Adding resources to the resource group

Removing resources from the resource group

Changing the properties of the resource group

Changing the properties of resources in the resource group

How to Clear the STOP_FAILED Error Flag
on Resources

Note –

Perform this procedure from any cluster node.

Before You Begin

Ensure that you have the following information.

The name of the node where the resource is STOP_FAILED

The name of the resource and resource group that are in STOP_FAILED state

Steps

Become superuser on a cluster member.

Identify which resources have gone into the STOP_FAILED state and on which nodes.

# scstat-g

Manually stop the resources and their monitors on
the nodes on which they are in STOP_FAILED state.

This step might require that you kill processes or run commands that are specific
to resource types or other commands.

Manually set the state of these resources to OFFLINE on all of the nodes on which you manually stopped the resources.

# scswitch-c-hnodelist-jresource-fSTOP_FAILED

-c

Clears the flag.

-hnodelist

Specifies a comma-separated list of the names of the nodes where the
resource is in the STOP_FAILED state. The list may contain one
node name or more than one node name.

-jresource

Specifies the name of the resource to switch offline.

-fSTOP_FAILED

Specifies the flag name.

Check the resource group state on the nodes where
you cleared the STOP_FAILED flag in Step 4.

# scstat-g

The resource group state should now be OFFLINE or ONLINE.

The resource group remains in the ERROR_STOP_FAILED state
in the following combination of circumstances:

The resource group was being switched offline when the STOP method failure occurred.

The resource that failed to stop had a dependency on other resources
in the resource group.

If the resource group remains in the ERROR_STOP_FAILED state,
correct the error as follows.

Switch the resource group offline on the appropriate nodes.

# scswitch-F-gresource-group

-F

Switches the resource group offline on all of the nodes that can master
the group

See Also

Upgrading a Preregistered Resource Type

In Sun Cluster 3.1 9/04, the following preregistered resource types
are enhanced:

SUNW.LogicalHostname, which represents a logical
hostname

SUNW.SharedAddress, which represents a shared address

The purpose of these enhancements is to enable you to modify logical hostname
resources and shared address resources to bypass name services for name resolution.

Upgrade these resource types if all conditions in the following list apply:

You are upgrading from an earlier version of Sun Cluster.

You need to use the new features of the resource types.

For general instructions that explain how to upgrade a resource type, see Upgrading a Resource Type. The information
that you need to complete the upgrade of the preregistered resource types is provided
in the subsections that follow.

Information for Registering the New Resource
Type Version

The
relationship between the version of each preregistered resource type and the release
of Sun Cluster is shown in the following table. The release of Sun Cluster indicates
the release in which the version of the resource type was introduced.

Resource Type

Resource Type Version

Sun ClusterRelease

SUNW.LogicalHostname

1.0

3.0

2

3.1 9/04

SUNW.SharedAddress

1.0

3.0

2

3.1 9/04

To determine the version of the resource type that is registered, use one command
from the following list:

scrgadm -p

scrgadm -pv

Example 2–23 Registering a New Version of the SUNW.LogicalHostname Resource Type

This example shows the command for registering version 2 of the SUNW.LogicalHostname resource type during an upgrade.

# scrgadm -a -t SUNW.LogicalHostname:2

Information for Migrating Existing Instances
of the Resource Type

The information that you need to migrate an instance of a preregistered resource
type is as follows:

You can perform the migration at any time.

If you need to use the new features of the
preregistered resource type, the required value of the Type_version property
is 2.

If you are modifying the resource to bypass name services, set the CheckNameService extension property of the resource to false.

Example 2–24 Migrating a Logical Hostname Resource

This example shows the command for migrating the logical hostname resource lhostrs. As a result of the migration, the resource is modified to bypass
name services for name resolution.

# scrgadm -c -j lhostrs -y Type_version=2 -x CheckNameService=false

Reregistering Preregistered Resource Types After
Inadvertent Deletion

The resource types SUNW.LogicalHostname and SUNW.SharedAddress are preregistered. All of the logical hostname and shared address resources
use these resource types. You never need to register these two resource types, but
you might inadvertently delete them. If you have deleted resource types inadvertently,
use the following procedure to reregister them.

See Also

Adding or Removing a Node to or From a Resource
Group

The procedures in this section enable you to perform the following tasks.

Configuring a cluster node to be an additional master of a resource
group

Removing a node from a resource group

The procedures are slightly different, depending on whether you plan to add
or remove the node to or from a failover or scalable resource group.

Failover resource groups contain network resources that both failover and scalable
services use. Each IP subnetwork connected to the cluster has its own network resource
that is specified and included in a failover resource group. The network resource
is either a logical hostname or a shared address resource. Each network resource includes
a list of IP Networking Multipathing groups that it uses. For failover resource groups, you must
update the complete list of IP Networking Multipathing groups for each network resource that the
resource group includes (the netiflist resource property).

The procedure for scalable resource groups involves the following steps:

Repeating the procedure for failover groups that contain the network
resources that the scalable resource uses

Adding a Node to a Resource Group

The procedure to follow to add a node to a resource
group depends on whether the resource group is a scalable resource group or a failover
resource group. For detailed instructions, see the following sections:

The output of the command line for nodelist and netiflist identifies the nodes by node name. To identify node IDs, run the
command scconf -pv | grep -inode-id.

Update netiflist for the network
resources that the node addition affects.

This step overwrites the previous
value of netiflist, and therefore you must include all of the IP Networking Multipathing groups
here.

# scrgadm -c -j network-resource -x netiflist=netiflist

-c

Changes a network resource.

-jnetwork-resource

Specifies the name of the network resource (logical hostname or shared
address) that is being hosted on the netiflist entries.

-xnetiflist=netiflist

Specifies a comma-separated list that identifies the IP Networking Multipathing groups
that are on each node. Each element in netiflist must be
in the form of netif@node. netif can be given
as an IP Networking Multipathing group name, such as sc_ipmp0. The node can be
identified by the node name or node ID, such as sc_ipmp0@1 or sc_ipmp@phys-schost-1.

If the HAStorage or HAStoragePlusAffinityOn extension property equals True, add the node
to the appropriate disk set or device group.

If you are using Solstice DiskSuite or Solaris Volume Manager, use the metaset command.

# metaset -s disk-set-name -a -h node-name

-sdisk-set-name

Specifies the name of the disk set on which the metaset command
is to work

-a

Adds a drive or host to the specified disk set

-hnode-name

Specifies the node to be added to the disk set

SPARC: If you are using VERITAS Volume Manager, use the scsetup utility.

On any active cluster member, start the scsetup utility.

# scsetup

The Main Menu is displayed.

On the Main Menu, type the number that corresponds to the option for device
groups and volumes.

On the Device Groups menu, type the number that corresponds to the option
for adding a node to a VxVM device group.

Respond to the prompts to add the node to the VxVM device group.

Update the node list to include all of the nodes
that can now master this resource group.

This step overwrites the previous
value of nodelist, and therefore you must include all of the nodes
that can master the resource group here.

# scrgadm -c -g resource-group -h nodelist

-c

Changes a resource group

-gresource-group

Specifies the name of the resource group to which the node is being
added

-hnodelist

Specifies a comma-separated list of the names of the nodes that can
master the resource group

Removing a Node From a Resource Group

The procedure to follow to remove a node from a resource
group depends on whether the resource group is a scalable resource group or a failover
resource group. For detailed instructions, see the following sections:

Name(s) of the resource group or groups from which you plan to remove
the node

# scrgadm -pv | grep “Res Group Nodelist”

Names of the IP Networking Multipathing groups that are to host the network resources
that are used by the resource groups on all of the nodes

# scrgadm -pvv | grep “NetIfList.*value”

Additionally, be sure to verify that the resource group is not mastered on the node that you are removing. If the resource group is mastered on the node that you are removing, run the scswitch command to switch the resource group offline from that node. The
following scswitch command brings the resource group offline from
a given node, provided that new-masters does not contain
that node.

# scswitch -z -g resource-group -h new-masters

-gresource-group

Specifies the name of the resource group that you are switching offline.
This resource group is mastered on the node that you are removing

If you plan to remove a node from all of the resource groups, and you
use a scalable services configuration, first remove the node from the scalable resource
groups. Then, remove the node from the failover groups.

How to Remove a Node From a Scalable Resource Group

A scalable service is configured as two resource groups, as follows.

One resource group is a scalable group that contains the scalable
service resource.

One resource group is a failover group that contains the shared address
resources that the scalable service resource uses.

Additionally, the RG_dependencies property of the scalable
resource group is set to configure the scalable group with a dependency on the failover
resource group. For information about this property, see Appendix A, Standard Properties.

Removing a node from the scalable resource group causes the scalable service
to no longer be brought online on that node. To remove a node from the scalable resource
group, perform the following steps.

Steps

Remove the node from the list of nodes
that can master the scalable resource group (the nodelist resource
group property).

# scrgadm -c -gscalable-resource-group-hnodelist

-c

Changes a resource group

-gscalable-resource-group

Specifies the name of the resource group from which the node is being
removed

-hnodelist

Specifies a comma-separated list of the names of the nodes that can
master this resource group

(Optional) Remove the
node from the failover resource group that contains the shared address resource.

See Also

How to Remove a Node From a Failover Resource Group

Perform the following steps to remove a node from a failover resource group.

Caution –

If you plan to remove a node from all of the resource groups, and you
use a scalable services configuration, first remove the node from the scalable resource
groups. Then use this procedure to remove the node from the failover groups.

Steps

Update the node list to include all of the nodes
that can now master this resource group.

This step removes the node and
overwrites the previous value of the node list. Be sure to include all of the nodes
that can master the resource group here.

# scrgadm -c -gfailover-resource-group-hnodelist

-c

Changes a resource group

-gfailover-resource-group

Specifies the name of the resource group from which the node is being
removed

-hnodelist

Specifies a comma-separated list of the names of the nodes that can
master this resource group

Display the current list of IP Networking Multipathing groups that
are configured for each resource in the resource group.

# scrgadm -pvv-gfailover-resource-group| grep -i netiflist

Update netiflist for network resources
that the removal of the node affects.

This step overwrites the previous
value of netiflist. Be sure to include all of the IP Networking Multipathing groups
here.

# scrgadm -c -j network-resource -x netiflist=netiflist

Note –

The output of the preceding command line identifies the nodes by node
name. Run the command line scconf -pv | grep “Node ID” to
find the node ID.

-c

Changes a network resource.

-jnetwork-resource

Specifies the name of the network resource that is hosted on the netiflist entries.

-xnetiflist=netiflist

Specifies a comma-separated list that identifies the IP Networking Multipathing groups
that are on each node. Each element in netiflist must be
in the form of netif@node. netif can be given
as an IP Networking Multipathing group name, such as sc_ipmp0. The node can be
identified by the node name or node ID, such as sc_ipmp0@1 or sc_ipmp@phys-schost-1.

To modify the auxnodelist of the shared address resource,
you must remove and re-create the shared address resource.

If you remove the node from the failover group's node list, you can continue
to use the shared address resource on that node to provide scalable services. To continue
to use the shared address resource, you must add the node to the auxnodelist of the shared address resource. To add the node to the auxnodelist, perform the following steps.

Note –

You can also use the following procedure to remove the node from the auxnodelist of the shared address
resource. To remove the node from the auxnodelist, you must delete
and re-create the shared address resource.

Steps

Switch the scalable service resource
offline.

Remove the shared address resource
from the failover resource group.

Create the shared address resource.

Add the node ID or node name of the node that you removed from the failover
resource group to the auxnodelist.

Synchronizing the Startups Between Resource Groups
and Disk Device Groups

After a cluster boots or services fail over to another node, global devices
and cluster file systems might require time to become available. However, a data service
can run its START method before global devices and cluster file
systems come online. If the data service depends on global devices or cluster file
systems that are not yet online, the START method times out. In
this situation, you must reset the state of the resource groups that the data service
uses and restart the data service manually.

To avoid these additional administrative tasks, use the HAStorage resource
type or the HAStoragePlus resource type. Add an instance of HAStorage or HAStoragePlus to
all resource groups whose data service resources depend on global devices or cluster
file systems. Instances of these resource types perform the following operations:

Monitoring global devices and cluster file systems

Forcing the START method of the other resources
in the same resource group to wait until global devices and cluster file systems become
available

How to Set Up HAStorage Resource Type
for New Resources

HAStorage might not be supported in a future
release of Sun Cluster software. Equivalent functionality is supported by HAStoragePlus.
For instructions for upgrading from HAStorage to HAStoragePlus,
see Upgrading From HAStorage to HAStoragePlus.

In the following example, the resource group resource-group-1 contains
the following data services.

Set resource-group-1 to the MANAGED state, and bring resource-group-1 online.

# scswitch -Z -g resource-group-1

Affinity Switchovers

The HAStorage resource type contains another extension
property, AffinityOn, which is a Boolean that specifies whether HAStorage must
perform an affinity switchover for the global devices and cluster file systems that
are defined in ServicePaths. For details, see the SUNW.HAStorage(5) man page.

Note –

HAStorage and HAStoragePlus do
not permit AffinityOn to be set to True if the
resource group is scalable. HAStorage and HAStoragePlus check
the AffinityOn value and internally reset the value to False for a scalable resource group.

How to Set Up HAStorage Resource Type
for Existing Resources

HAStorage might not be supported in a future
release of Sun Cluster software. Equivalent functionality is supported by HAStoragePlus.
For instructions for upgrading from HAStorage to HAStoragePlus,
see Upgrading From HAStorage to HAStoragePlus.

Set up the dependency for each of
the existing resources, as required.

# scrgadm -c -j resource -y Resource_Dependencies=hastorage-1

Verify that you have correctly configured
the resource dependencies.

# scrgadm -pvv -j resource | egrep strong

Upgrading From HAStorage to HAStoragePlus

HAStorage might not be supported in a future release of
Sun Cluster software. Equivalent functionality is supported by HAStoragePlus.
For instructions for upgrading from HAStorage to HAStorage,
see the subsections that follow.

How to Upgrade From HAStorage to HAStoragePlus When
Using Device Groups or CFS

The following example uses a simple HA-NFS resource that is active with HAStorage.
The ServicePaths are the disk group nfsdg and
the AffinityOn property is True. Furthermore,
the HA-NFS resource has Resource_Dependencies set to the HAStorage resource.

Steps

Remove the dependencies the application resources has on HAStorage.

# scrgadm -c -j nfsserver-rs -y Resource_Dependencies=""

Disable the HAStorage resource.

# scswitch -n -j nfs1storage-rs

Remove the HAStorage resource
from the application resource group.

# scrgadm -r -j nfs1storage-rs

Unregister the HAStorage resource
type.

# scrgadm -r -t SUNW.HAStorage

Register the HAStoragePlus resource
type.

# scrgadm -a -t SUNW.HAStoragePlus

Create the HAStoragePlus resource.

Note –

Instead of using the ServicePaths property of HAStorage,
you must use the FilesystemMountPoints property or GlobalDevicePaths property of HAStoragePlus.

To specify the mount point of a file system, type the following command.

The FilesystemMountPoints extension property must
match the sequence that is specified in /etc/vfstab.

Set up the dependencies between the application
server and HAStoragePlus.

# scrgadm -c -j nfsserver-rs -y \Resource_Depencencies=nfs1=hastp-rs

How to Upgrade From HAStorage With
CFS to HAStoragePlus With Failover File System

The following example uses a simple HA-NFS resource that is active with HAStorage.
The ServicePaths are the disk group nfsdg and
the AffinityOn property is True. Furthermore,
the HA-NFS resource has Resource_Dependencies set to HAStorage resource.

Steps

Remove the dependencies the application resource has on HAStorage resource.

# scrgadm -c -j nfsserver-rs -y Resource_Dependencies=""

Disable the HAStorage resource.

# scswitch -n -j nfs1storage-rs

Remove the HAStorage resource
from the application resource group.

# scrgadm -r -j nfs1storage-rs

Unregister the HAStorage resource
type.

# scrgadm -r -t SUNW.HAStorage

Modify /etc/vfstab to
remove the global flag and change “mount at boot” to “no”.

Create the HAStoragePlus resource.

Note –

Instead of using the ServicePaths property of HAStorage,
you must use the FilesystemMountPoints property or GlobalDevicePaths property of HAStoragePlus.

To specify the mount point of a file system, type the following command.

The FilesystemMountPoints extension property must
match the sequence that is specified in /etc/vfstab.

Set up the dependencies between the
application server and HAStoragePlus.

# scrgadm -c -j nfsserver-rs -y \Resource_Depencencies=nfs1=hastp-rs

Enabling Highly Available Local File Systems

Using a highly available local file system improves the performance of I/O intensive
data services. To make a local file system highly available in a Sun Cluster environment,
use the HAStoragePlus resource type.

The instructions for each Sun Cluster data service that is I/O intensive explain
how to configure the data service to operate with the HAStoragePlus resource
type. For more information, see the individual Sun Cluster data service guides.

Do not use the HAStoragePlus resource
type to make a root file system highly available.

Configuration Requirements for Highly Available Local File
Systems

Any file system on multihost disks must be accessible from any host that is
directly connected to those multihost disks. To meet this requirement, configure the
highly available local file system as follows:

Ensure that the disk partitions of the local file system reside on
global devices.

Set the AffinityOn extension property of the HAStoragePlus resource
that specifies these global devices to True.

Create the HAStoragePlus resource in a failover
resource group.

Ensure that the failback settings for the device groups and the resource
group that contains the HAStoragePlus resource are identical.

Note –

The use of a volume manager with the global devices
for a highly available local file system is optional.

Format of Device Names for Devices Without a Volume Manager

If you are not using a volume manger, use the appropriate format for the name
of the underlying storage device. The format to use depends on the type of storage
device as follows:

For block devices: /dev/global/dsk/dDsS

For raw devices: /dev/global/rdsk/dDsS

The replaceable items in these names are as follows:

D is an integer that specifies the device
ID (DID) instance number.

S is an integer that specifies the slice
number.

Sample Entries in /etc/vfstab for Highly
Available Local File Systems

The following examples show entries in the /etc/vfstab file
for global devices that are to be used for highly available local file systems.

Example 2–27 Entries in /etc/vfstab for a Global
Device Without a Volume Manager

This example shows entries in the /etc/vfstab file for
a global device on a physical disk without a volume manager.

Example 2–29 Entries in /etc/vfstab for a Global
Device With VxVM

How to Set Up the HAStoragePlus Resource
Type for an NFS-Exported File System

The HAStoragePlus resource type performs the same functions
as HAStorage, and synchronizes the startups between resource
groups and disk device groups. The HAStoragePlus resource type
has an additional feature to make a local file system highly available. For background
information about making a local file system highly available, see Enabling Highly Available Local File Systems. To use both of these features,
set up the HAStoragePlus resource type.

Note –

These
instructions explain how to use the HAStoragePlus resource type
with the UNIX file system. For information about using the HAStoragePlus resource
type with the Sun StorEdgeTM QFS file system, see your Sun StorEdge QFS documentation.

The following example uses a simple NFS service that exports home directory
data from a locally mounted directory /global/local-fs/nfs/export/ home.
The example assumes the following:

The mount point /global/local-fs/nfs is used to
mount a UFS local file system on a Sun Cluster global device partition.

The /etc/vfstab entry for the /global/local-fs/nfs file system should omit the global option and specify that the mount at
boot flag is no.

The path-prefix directory is on the root directory of the same file
system that is to be mounted, for example, /global/local-fs/nfs.
The path-prefix directory is the directory that HA-NFS uses to maintain administrative
information and status information.

Steps

Become superuser on a cluster member.

Determine whether the HAStoragePlus resource
type and the SUNW.nfs resource type are registered.

The following command prints a list of registered resource types.

# scrgadm -p | egrep Type

If necessary, register the HAStoragePlus resource
type and the SUNW.nfs resource type.

You can use the FilesystemMountPoints extension property
to specify a list of one or more mount points for file systems. This list can consist
of mount points for both local file systems and global file systems. The mount at
boot flag is ignored by HAStoragePlus for global file systems.

Bring online the resource group nfs-rg on a cluster
node.

The node where the resource group is brought online becomes the
primary node for the /global/local-fs/nfs file system's underlying
global device partition. The file system /global/local-fs/nfs is
then locally mounted on this node.

# scswitch -Z -g nfs-rg

Create the resource nfs-rs of type SUNW.nfs and specify its resource dependency
on the resource nfs-hastp-rs.

The file dfstab.nfs-rs must be present in /global/local-fs/nfs/SUNW.nfs.

Before you can set the dependency in the nfs-rs resource,
the nfs-hastp-rs resource must be online.

Take offline the resource group nfs-rg.

# scswitch -F -g nfs-rg

Bring online the nfs-rg group
on a cluster node.

# scswitch -Z -g nfs-rg

Caution –

Ensure that you switch only the resource group. Do not attempt
to switch the device group. If you attempt to switch the device group, the states
of the resource group and the device group become inconsistent, causing the resource
group to fail over.

Whenever the service is migrated to a new node, the primary I/O path for /global/local-fs/nfs will always be online and colocated with the NFS servers.
The file system /global/local-fs/nfs is locally mounted before
the NFS server is started.

Modifying Online the Resource for a Highly
Available File System

You might need a highly available file system to remain available while you
are modifying the resource that represents the file system. For example, you might
need the file system to remain available because storage is being provisioned dynamically.
In this situation, modify the resource that represents the highly available file system
while the resource is online.

In the Sun Cluster environment, a highly available file system is represented
by an HAStoragePlus resource. Sun Cluster enables you to modify
an online HAStoragePlus resource as follows:

Adding file systems to the HAStoragePlus resource

Removing file systems from the HAStoragePlus resource

Note –

Sun Cluster does not enable you to rename a file system while the file
system is online.

How to Add File Systems to an Online HAStoragePlus Resource

When you
add a file system to an HAStoragePlus resource, the HAStoragePlus resource
treats a local file system differently from a global file system.

The HAStoragePlus resource always automatically
mounts a local file system.

The HAStoragePlus resource automatically mounts
a global file system only if the AffinityOn extension
property of the HAStoragePlus resource is True.

Before removing a file system from an online HAStoragePlus resource,
ensure that no applications are using the file system. When you remove a file system
from an online HAStoragePlus resource, the file system might
be forcibly unmounted. If a file system that an application is using is forcibly unmounted,
the application might fail or hang.

Steps

On one node of the cluster, become superuser.

Retrieve the list of mount points for
the file systems that the HAStoragePlus resource already manages.

Specifies the HAStoragePlus resource from which
you are removing file systems.

-x FileSystemMountPoints="mount-point-list"

Specifies a comma-separated list of mount points of the file systems
that are to remain in the HAStoragePlus resource. This list must not include the mount points of the file systems that you are removing.

Confirm that you have a match between the mount point list
of the HAStoragePlus resource and the list that you specified
in Step 3.

How to Recover From a Fault After Modifying
an HAStoragePlus Resource

If a fault occurs during a modification of the FileSystemMountPoints extension property, the status of the HAStoragePlus resource
is online and faulted. After the fault is corrected, the status of the HAStoragePlus resource
is online.

Steps

Determine the fault that caused the
attempted modification to fail.

# scstat -g

The status message of the faulty HAStoragePlus resource
indicates the fault. Possible faults are as follows:

The device on which the file system should reside does not exist.

An attempt by the fsck command to repair a file
system failed.

The mount point of a file system that you attempted to add does not
exist.

A file system that you attempted to add cannot be mounted.

A file system that you attempted to remove cannot be unmounted.

Correct the fault that caused the attempted
modification to fail.

Repeat the step to modify the FileSystemMountPoints extension property of the HAStoragePlus resource.

Upgrading the HAStoragePlus Resource
Type

In Sun Cluster 3.1 9/04, the HAStoragePlus resource
type is enhanced to enable you to modify highly available file systems online. Upgrade
the HAStoragePlus resource type if all conditions in the following
list apply:

You are upgrading from an earlier version of Sun Cluster.

You need to use the new features of the HAStoragePlus resource
type.

For general instructions that explain how to upgrade a resource type, see Upgrading a Resource Type. The information
that you need to complete the upgrade of the HAStoragePlus resource
type is provided in the subsections that follow.

Information for Registering the New Resource
Type Version

The relationship
between a resource type version and the release of Sun Cluster is shown in the following
table. The release of Sun Cluster indicates the release in which the version of
the resource type was introduced.

Resource Type Version

Sun ClusterRelease

1.0

3.0 5/02

2

3.1 9/04

To determine the version of the resource type that is registered, use one command
from the following list:

scrgadm -p

scrgadm -pv

The RTR file for this resource
type is /usr/cluster/lib/rgm/rtreg/SUNW.HAStoragePlus.

Information for Migrating Existing Instances
of the Resource Type

The information that you need to migrate instances of the HAStoragePlus resource
type is as follows:

You can perform the migration at any time.

If you need to use the new features of the HAStoragePlus resource
type, the required value of the Type_version property is 2.

Distributing Online Resource Groups Among
Cluster Nodes

For maximum availability or optimum performance, some combinations of services
require a specific distribution of online resource groups among cluster nodes. Distributing
online resource groups involves creating affinities between resource groups for the
following purposes:

Enforcing the required distribution when the resource groups are first
brought online

Preserving the required distribution after an attempt to fail over
or switch over a resource group

This section provides the following examples of how to use resource group affinities
to distribute online resource groups among cluster nodes:

Enforcing colocation of a resource group with another resource group

Specifying a preferred colocation of a resource group with another
resource group

Balancing the load of a set of resource groups

Specifying that a critical service has precedence

Delegating the failover or switchover of a resource group

Combining affinities between resource groups to specify more complex
behavior

Resource Group Affinities

An affinity between resource groups restricts on which nodes the resource groups
may be brought online simultaneously. In each affinity, a source resource group declares
an affinity for a target resource group or several target resource groups. To create
an affinity between resource groups, set the RG_affinities resource
group property of the source as follows:

-y RG_affinities=affinity-list

affinity-list

Specifies a comma-separated list of affinities between the source
resource group and a target resource group or several target resource groups. You
may specify a single affinity or more than one affinity in the list.

Specify each affinity in the list as follows:

operatortarget-rg

Note –

Do not include a space between operator and target-rg.

operator

Specifies the type of affinity that you are creating. For more information,
see Table 2–2.

target-rg

Specifies the resource group that is the target of the affinity that
you are creating.

Table 2–2 Types of Affinities
Between Resource Groups

Operator

Affinity Type

Effect

+

Weak positive

If possible, the source is brought online on a node or on nodes where the target
is online or starting. However, the source and the target are allowed to be online
on different nodes.

++

Strong positive

The source is brought online only on a node or on nodes where the target is
online or starting. The source and the target are not allowed
to be online on different nodes.

-

Weak negative

If possible, the source is brought online on a node or on nodes where the target
is not online or starting. However, the source and the target
are allowed to be online on the same node.

--

Strong negative

The source is brought online only on a node or on nodes where the target is
not online. The source and the target are not allowed to be online
on the same node.

The current state of other resource groups might prevent a strong affinity from
being satisfied on any node. In this situation, the resource group that is the source
of the affinity remains offline. If other resource groups' states change to enable
the strong affinities to be satisfied, the resource group that is the source of the
affinity comes back online.

Note –

Use caution when declaring a strong affinity on a source resource group
for more than one target resource group. If all declared strong affinities cannot
be satisfied, the source resource group remains offline.

Enforcing Colocation of a Resource Group
With Another Resource Group

A service that is represented by one resource group might depend so strongly
on a service in a second resource group that both services must run on the same node.
For example, an application that is comprised of multiple interdependent service daemons
might require that all daemons run on the same node.

In this situation, force the resource group of the dependent service to be colocated
with the resource group of the other service. To enforce colocation of a resource
group with another resource group, declare on the resource group a strong positive
affinity for the other resource group.

# scrgadm -c|-a -g source-rg -y RG_affinities=++target-rg

-gsource-rg

Specifies the resource group that is the source of the strong positive
affinity. This resource group is the resource group on which
you are declaring a strong positive affinity for another resource group.

-y RG_affinities=++target-rg

Specifies the resource group that is the target of the strong positive
affinity. This resource group is the resource group for which
you are declaring a strong positive affinity.

A resource group follows the resource group for which it has a strong positive
affinity. However, a resource group that declares a strong positive affinity is prevented
from failing over to a node on which the target of the affinity is not already running.

Note –

Only failovers that are initiated by a resource monitor are prevented.
If a node on which the source resource group and target resource group are running
fails, both resource groups are restarted on the same surviving node.

For example, a resource group rg1 declares a strong positive
affinity for resource group rg2. If rg2 fails
over to another node, rg1 also fails over to that node. This failover
occurs even if all the resources in rg1 are operational. However,
if a resource in rg1 attempts to fail over rg1 to
a node where rg2 is not running, this attempt is blocked.

Example 2–33 Enforcing Colocation of a Resource
Group With Another Resource Group

This example shows the command for modifying resource group rg1 to
declare a strong positive affinity for resource group rg2. As a
result of this affinity relationship, rg1 is brought online only
on nodes where rg2 is running. This example assumes that both resource
groups exist.

# scrgadm -c -g rg1 -y RG_affinities=++rg2

Specifying a Preferred Colocation of a
Resource Group With Another Resource Group

A service that is represented by one resource group might use a service in a
second resource group. As a result, these services run most efficiently if they run
on the same node. For example, an application that uses a database runs most efficiently
if the application and the database run on the same node. However, the services can
run on different nodes because the reduction in efficiency is less disruptive than
additional failovers of resource groups.

In this situation, specify that both resource groups should be colocated if
possible. To specify preferred colocation of a resource group with another resource
group, declare on the resource group a weak positive affinity for the other resource
group.

# scrgadm -c|-a -g source-rg -y RG_affinities=+target-rg

-gsource-rg

Specifies the resource group that is the source of the weak positive
affinity. This resource group is the resource group on which
you are declaring a weak positive affinity for another resource group.

-y RG_affinities=+target-rg

Specifies the resource group that is the target of the weak positive
affinity. This resource group is the resource group for which
you are declaring a weak positive affinity.

By declaring a weak positive affinity on one resource group for another resource
group, you increase the probability of both resource groups running on the same node.
The source of a weak positive affinity is first brought online on a node where the
target of the weak positive affinity is already running. However, the source of a
weak positive affinity does not fail over if a resource monitor causes the target
of the affinity to fail over. Similarly, the source of a weak positive affinity does
not fail over if the target of the affinity is switched over. In both situations,
the source remains online on the node where the source is already running.

Note –

If a node on which the source resource group and target resource group
are running fails, both resource groups are restarted on the same surviving node.

Example 2–34 Specifying a Preferred Colocation
of a Resource Group With Another Resource Group

This example shows the command for modifying resource group rg1 to
declare a weak positive affinity for resource group rg2. As a result
of this affinity relationship, rg1 and rg2 are
first brought online on the same node. But if a resource in rg2 causes rg2 to fail over, rg1 remains online on the node where
the resource groups were first brought online. This example assumes that both resource
groups exist.

# scrgadm -c -g rg1 -y RG_affinities=+rg2

Distributing a Set of Resource Groups Evenly
Among Cluster Nodes

Each resource group in a set of resource groups might impose the same load on
the cluster. In this situation, by distributing the resource groups evenly among cluster
nodes, you can balance the load on the cluster.

To distribute a set of resource groups evenly among cluster nodes, declare on
each resource group a weak negative affinity for the other resource groups in the
set.

# scrgadm -c|-a -g source-rg -y RG_affinities=neg-affinity-list

-gsource-rg

Specifies the resource group that is the source of the weak negative
affinity. This resource group is the resource group on which
you are declaring a weak negative affinity for other resource groups.

-y RG_affinities=neg-affinity-list

Specifies a comma-separated list of weak negative affinities between
the source resource group and the resource groups that are the target of the weak
negative affinity. The target resource groups are the resource groups for which you are declaring a weak negative affinity.

By declaring a weak negative affinity on one resource group for other resource
groups, you ensure that a resource group is always brought online on the most lightly
loaded node in the cluster. The fewest other resource groups are running on that node.
Therefore, the smallest number of weak negative affinities are violated.

Example 2–35 Distributing a Set of Resource Groups
Evenly Among Cluster Nodes

This example shows the commands for modifying resource groups rg1, rg2, rg3, and rg4 to ensure that these
resource groups are evenly distributed among the available nodes in the cluster. This
example assumes that resource groups rg1, rg2, rg3, and rg4 exist.

Specifying That a Critical Service Has
Precedence

A cluster might be configured to run a combination of mission-critical services
and noncritical services. For example, a database that supports a critical customer
service might run in the same cluster as noncritical research tasks.

To ensure that the noncritical services do not affect the performance of the
critical service, specify that the critical service has precedence. By specifying
that the critical service has precedence, you prevent noncritical services from running
on the same node as the critical service.

When all nodes are operational, the critical service runs on a different node
from the noncritical services. However, a failure of the critical service might cause
the service to fail over to a node where the noncritical services are running. In
this situation, the noncritical services are taken offline immediately to ensure that
the computing resources of the node are fully dedicated to the mission-critical service.

To specify that a critical service has precedence, declare on the resource group
of each noncritical service a strong negative affinity for the resource group that
contains the critical service.

# scrgadm -c|-a -g noncritical-rg -y RG_affinities=--critical-rg

-gnoncritical-rg

Specifies the resource group that contains a noncritical service.
This resource group is the resource group on which you are declaring
a strong negative affinity for another resource group.

-y RG_affinities=--critical-rg

Specifies the resource group that contains the critical service. This
resource group is the resource group for which you are declaring
a strong negative affinity.

A resource group moves away from a resource group for which it has a strong
negative affinity.

Example 2–36 Specifying That a Critical Service
Has Precedence

This example shows the commands for modifying the noncritical resource groups ncrg1 and ncrg2 to ensure that the critical resource
group mcdbrg has precedence over these resource groups. This example
assumes that resource groups mcdbrg, ncrg1,
and ncrg2 exist.

Delegating the Failover or Switchover of
a Resource Group

The source resource group of a strong positive affinity cannot fail over or
be switched over to a node where the target of the affinity is not running. If you
require the source resource group of a strong positive affinity to be allowed to fail
over or be switched over, you must delegate the failover to the target resource group.
When the target of the affinity fails over, the source of the affinity is forced to
fail over with the target.

Note –

You might need to switch over the source resource group of a strong positive
affinity that is specified by the ++ operator. In this situation,
switch over the target of the affinity and the source of the affinity at the same
time.

To delegate failover or switchover of a resource group to another resource group,
declare on the resource group a strong positive affinity with failover delegation
for the other resource group.

# scrgadm -c|-a -g source-rg -y RG_affinities=+++target-rg

-gsource-rg

Specifies the resource group that is delegating failover or switchover.
This resource group is the resource group on which you are declaring
a strong positive affinity with failover delegation for another resource group.

-y RG_affinities=+++target-rg

Specifies the resource group to which source-rg delegates
failover or switchover. This resource group is the resource group for which
you are declaring a strong positive affinity with failover delegation.

A resource group may declare a strong positive affinity with failover delegation
for at most one resource group. However, a given resource group may be the target
of strong positive affinities with failover delegation that are declared by any number
of other resource groups.

A strong positive affinity with failover delegation is not fully symmetric.
The target can come online while the source remains offline. However, if the target
is offline, the source cannot come online.

If the target declares a strong positive affinity with failover delegation for
a third resource group, failover or switchover is further delegated to the third resource
group. The third resource group performs the failover or switchover, forcing the other
resource groups to fail over or be switched over also.

Example 2–37 Delegating the Failover or Switchover
of a Resource Group

This example shows the command for modifying resource group rg1 to
declare a strong positive affinity with failover delegation for resource group rg2. As a result of this affinity relationship, rg1 delegates
failover or switchover to rg2. This example assumes that both resource
groups exist.

# scrgadm -c -g rg1 -y RG_affinities=+++rg2

Combining Affinities Between Resource Groups

You can create more complex behaviors by combining multiple affinities. For
example, the state of an application might be recorded by a related replica server.
The node selection requirements for this example are as follows:

The replica server must run on a different node from the application.

If the application fails over from its current node, the application
should fail over to the node where the replica server is running.

If the application fails over to the node where the replica server
is running, the replica server must fail over to a different node. If no other node
is available, the replica server must go offline.

You can satisfy these requirements by configuring resource groups for the application
and the replica server as follows:

The resource group that contains the application declares a weak positive
affinity for the resource group that contains the replica server.

The resource group that contains the replica server declares a strong
negative affinity for the resource group that contains the application.

Example 2–38 Combining Affinities Between Resource
Groups

This example shows the commands for combining affinities between the following
resource groups.

Resource group app-rg represents an application
whose state is tracked by a replica server.

Resource group rep-rg represents the replica server.

In this example, the resource groups declare affinities as follows:

Resource group app-rg declares a weak positive
affinity for resource group rep-rg.

Resource group rep-rg declares a strong negative
affinity for resource group app-rg.

Prioritized Service Management (RGOffload) enables your cluster automatically to
free a node's resources for critical data services. Use RGOffload when
the startup of a critical failover data service requires a noncritical, scalable or
failover data service to be brought offline. Use RGOffload to
offload resource groups that contain noncritical data services.

Note –

The critical data service must be a failover data
service. The data service to be offloaded can be a failover or scalable
data service.

How to Set Up an RGOffload Resource

Steps

Become superuser on a cluster member.

Determine whether the RGOffload resource
type is registered.

The following command prints a list of resource types.

# scrgadm -p|egrep SUNW.RGOffload

If necessary, register the resource
type.

# scrgadm -a -t SUNW.RGOffload

Set the Desired_primaries property to
zero in each resource group that the RGOffload resource is to
be offload.

# scrgadm -c -g offload-rg -y Desired_primaries=0

Add the RGOffload resource to the critical
failover resource group and set the extension properties.

Do not place
a resource group on more than one resource's rg_to_offload list.
Placing a resource group on multiple rg_to_offload lists might
cause the resource group to be taken offline and brought back online repeatedly.

Extension properties other than rg_to_offload are shown
with default values here. rg_to_offload is a comma-separated list
of resource groups that are not dependent on each other. This list cannot include
the resource group to which the RGOffload resource is being added.

Enable the RGOffload resource.

# scswitch -ej rgoffload-resource

Set the dependency of the critical
failover resource on the RGOffload resource.

You can also use Resource_dependencies_weak. Using Resource_dependencies_weak on the RGOffload resource
type allows the critical failover resource to start even if errors are encountered
during offload of offload-rg.

Bring online the resource group that
is to be offloaded

# scswitch -z -g offload-rg, offload-rg-2, ... -h [nodelist]

The resource group remains online on all nodes where the critical resource group
is offline. The fault monitor prevents the resource group from running on the node
where the critical resource group is online.

In Step 4, Desired_primaries for resource groups that
are to be offloaded was set to 0. Therefore, the -Zoption cannot bring
these resource groups online.

If the critical failover resource
group is not online, bring it online.

# scswitch -Z -g critical-rg

Example 2–39 Configuring an RGOffload Resource

This example shows how to configure the RGOffload resource rgofl as follows:

The critical resource group oracle_rg contains
the RGOffload resource.

The critical resource is oracle-server-rs.

The scalable resource groups IWS-SC and IWS-SC-2 are to be offloaded when the critical resource group comes online.

The resource groups oracle_rg, IWS-SC, and IWS-SC-2 can be mastered on any node of cluster triped, namely: phys-triped-1, phys-triped-2, or phys-triped-3.

Configuring RGOffload Extension Properties

This section lists the extension properties that you can configure for RGOffload.
The Tunable entry indicates when you can update the property.

Typically, you use the command line scrgadm -xparameter=value to configure
extension properties when you create the RGOffload resource.

continue_to_offload (Boolean)

Specifies whether to continue offloading the remaining resource groups
in the rg_to_offload list after an error in offloading a resource
group.

This property is used only by the START method.

Default:True

Tunable: Any time

max_offload_retry (integer)

Specifies the number of attempts to offload a resource group during
startup if cluster reconfiguration or resource group reconfiguration causes a failure
to offload. The interval between successive retries is 10 seconds.

If max_offload_retry is too high, the START method
of the RGOffload resource might time out before the maximum offload
attempts are completed. To avoid this possibility, use the following formula to calculate max_offload_retry:

max-offload-retry<start-timeout/(num-rg×offload-retry-interval)

max-offload-retry

The value of the max_offload_retry extension property

start-timeout

The value of the Start_timeout of property the RGOffload resource

num-rg

The number of resource groups to that are to be offloaded

offload-retry-interval

The interval between successive retries, which is 10 seconds

This property is used only by the START method.

Default: 15

Tunable: Any time

rg_to_offload (string)

Specifies a comma-separated list of resource groups that are to be
offloaded on a node when a critical failover resource group starts on that node. This
property has no default and must be set.

This list should not contain resource groups that depend upon each other. RGOffload does
not check for dependency loops in the list of resource groups that are set in the rg_to_offload extension property.

For example, if resource group RG-B depends in some way on RG-A, do not include both resource groups in rg_to_offload.

Default: None

Tunable: Any time

Fault Monitor

The RGOffload fault monitor prevents noncritical resource
groups from being brought online on the node that masters the critical resource. The
fault monitor might detect that a noncritical resource group is online on the node
that masters the critical resource. In this situation, the fault monitor attempts
to start the resource group on other nodes. The fault monitor also brings offline
the resource group on the node that masters the critical resource.

Because desired_primaries for noncritical resource groups
is set to 0, offloaded resource groups are not restarted on nodes that become available
later. Therefore, the RGOffload fault monitor attempts to start
noncritical resource groups on as many primaries as possible, until maximum_primaries limit is reached. However, the fault monitor keeps noncritical resource
groups offline on the node that masters the critical resource.

RGOffload attempts to start all offloaded resource groups
unless the resource groups are in the MAINTENANCE or UNMANAGED state. To place a resource group in an UNMANAGED state,
use the scswitch command.

# scswitch -u -g resourcegroup

The value of the RGOffload resource's Thorough_probe_interval property specifies the interval between fault monitor probes.

If you require
identical resource configuration data on two clusters, you can replicate the data
to the second cluster to save the laborious task of setting it up again. Use scsnapshot to propagate the resource configuration information from one
cluster to another cluster. To save effort, ensure that your resource-related configuration
is stable and you do not need to make any major changes to the resource configuration,
before copying the information to a second cluster.

Configuration data for resource groups, resource types, and resources can be
retrieved from the Cluster Configuration Repository (CCR) and formatted as a shell
script. The script can be used to perform the following tasks:

Replicate configuration data on a cluster that does not have configured
resource groups, resource types, or resources

Upgrade configuration data on a cluster that has configured resource
groups, resource types, and resources

The scsnapshot tool retrieves configuration data that is
stored in the CCR. Other configuration data are ignored. The scsnapshot tool
ignores the dynamic state of different resource groups, resource types, and resources.

How to Replicate Configuration Data on a Cluster Without
Configured Resource Groups, Resource Types, and Resources

This procedure replicates configuration data on a cluster that does not
have configured resource groups, resource types, and resources. In this procedure,
a copy of the configuration data is taken from one cluster and used to generate the
configuration data on another cluster.

Steps

Using the system administrator role, log
in to any node in the cluster from which you want to copy the configuration data.

For example, node1.

The system administrator
role gives you the following role-based access control (RBAC) rights:

solaris.cluster.resource.read

solaris.cluster.resource.modify

Retrieve the configuration data from the cluster.

node1 % scsnapshot -sscriptfile

The scsnapshot tool generates a script called scriptfile. For more information about using the scsnapshot tool, see the scsnapshot(1M) man page.

Edit the script to adapt it to the specific features
of the cluster where you want to replicate the configuration data.

For
example, you might have to change the IP addresses and host names that are listed
in the script.

Launch the script from any node in the cluster
where you want to replicate the configuration data.

The script compares
the characteristics of the local cluster to the cluster where the script was generated.
If the characteristics are not the same, the script writes an error and ends. A message
asks whether you want to rerun the script, using the -f option. The -f option forces the script to run, despite any difference in characteristics.
If you use the -f option, ensure that you do not create inconsistencies
in your cluster.

The script verifies that the Sun Cluster resource type
exists on the local cluster. If the resource type does not exist on the local cluster,
the script writes an error and ends. A message asks whether you want to install the
missing resource type before running the script again.

How to Upgrade Configuration Data on a Cluster With
Configured Resource Groups, Resource Types, and Resources

This procedure upgrades configuration data on a cluster that already has
configured resource groups, resource types, and resources. This procedure can also
be used to generate a configuration template for resource groups, resource types,
and resources.

In this procedure, the configuration data on cluster1 is
upgraded to match the configuration data on cluster2.

Steps

Using the system administrator role, log on to
any node in cluster1.

For example, node1.

The system administrator role gives you the following RBAC rights:

solaris.cluster.resource.read

solaris.cluster.resource.modify

Retrieve the configuration data from the cluster
by using the image file option of the scsnapshot tool:

node1% scsnapshot -s scriptfile1 -o imagefile1

When run on node1, the scsnapshot tool
generates a script that is called scriptfile1. The script
stores configuration data for the resource groups, resource types, and resources in
an image file that is called imagefile1. For more information
about using the scsnapshot tool, see the scsnapshot(1M) man page.

On node1, generate a script
to upgrade the configuration data on cluster1 with configuration
data from cluster2:

node1 % scsnapshot -sscriptfile3imagefile1imagefile2

This step uses the image files that you generated in Step 2 and Step 3,
and generates a new script that is called scriptfile3.

Edit the script that you generated in Step 4 to adapt it to the specific features of the cluster1, and to remove data specific to cluster2.

From node1, launch the script
to upgrade the configuration data.

The script compares the characteristics
of the local cluster to the cluster where the script was generated. If the characteristics
are not the same, the script writes an error and ends. A message asks whether you
want to rerun the script, using the -f option. The -f option
forces the script to run, despite any difference in characteristics. If you use the -f option, ensure that you do not create inconsistencies in your cluster.

The script verifies that the Sun Cluster resource type exists on the
local cluster. If the resource type does not exist on the local cluster, the script
writes an error and ends. A message asks whether you want to install the missing resource
type before running the script again.

Tuning Fault Monitors for Sun Cluster Data Services

Each data service that is supplied with the Sun Cluster product has a built-in
fault monitor. The fault monitor performs the following functions:

Detecting the unexpected termination of processes for the data service
server

Checking the health of the data service

The fault monitor is contained in the resource that represents the application
for which the data service was written. You create this resource when you register
and configure the data service. For more information, see the documentation for the
data service.

System properties and extension properties of this resource
control the behavior of the fault monitor. The default values of these properties
determine the preset behavior of the fault monitor. The preset behavior should be
suitable for most Sun Cluster installations. Therefore, you should tune a fault
monitor only if you need to modify this preset behavior.

Tuning a fault monitor involves the following tasks:

Setting the interval between fault monitor probes

Setting the timeout for fault monitor probes

Defining the criteria for persistent faults

Specifying the failover behavior of a resource

Perform these tasks when you register and configure the data service. For more
information, see the documentation for the data service.

Note –

A resource's fault monitor is started when you bring online the resource
group that contains the resource. You do not need to start the fault monitor explicitly.

Setting the Interval Between Fault Monitor Probes

To determine whether a resource is operating correctly, the fault monitor probes
this resource periodically. The interval between fault monitor probes affects the
availability of the resource and the performance of your system as follows:

The interval between fault monitor probes affects the length of time
that is required to detect a fault and respond to the fault. Therefore, if you decrease
the interval between fault monitor probes, the time that is required to detect a fault
and respond to the fault is also decreased. This decrease enhances the availability
of the resource.

Each fault monitor probe consumes system resources
such as processor cycles and memory. Therefore, if you decrease the interval between
fault monitor probes, the performance of the system is degraded.

The optimum interval between fault monitor probes also depends on the time that
is required to respond to a fault in the resource. This time depends on how the complexity
of the resource affects the time that is required for operations such as restarting
the resource.

To set the interval between fault monitor probes,
set the Thorough_probe_interval system property of the resource
to the interval in seconds that you require.

Setting the Timeout for Fault Monitor Probes

The timeout for fault monitor probes specifies the length of time that a fault
monitor waits for a response from a resource to a probe. If the fault monitor does
not receive a response within this timeout, the fault monitor treats the resource
as faulty. The time that a resource requires to respond to a fault monitor probe depends
on the operations that the fault monitor performs to probe the resource. For information
about operations that a data service's fault monitor performs to probe a resource,
see the documentation for the data service.

The time that is required for a resource to respond also depends on factors
that are unrelated to the fault monitor or the application, for example:

System configuration

Cluster configuration

System load

Amount of network traffic

To set the timeout for fault monitor probes, set the Probe_timeout extension property of the resource to the timeout in seconds
that you require.

Defining the Criteria for Persistent Faults

To minimize the disruption that transient faults in a resource cause, a fault
monitor restarts the resource in response to such faults. For persistent faults, more
disruptive action than restarting the resource is required:

For a failover resource, the fault monitor fails over the resource
to another node.

For a scalable resource, the fault monitor takes the resource offline.

A fault monitor
treats a fault as persistent if the number of complete failures of a resource exceeds
a specified threshold within a specified retry interval. Defining the criteria for
persistent faults enables you to set the threshold and the retry interval to accommodate
the performance characteristics of your cluster and your availability requirements.

Complete Failures and Partial Failures of
a Resource

A fault monitor treats some faults as a complete failure of
a resource. A complete failure typically causes a complete loss of service. The following
failures are examples of a complete failure:

Unexpected termination of the process for a data service server

Inability of a fault monitor to connect to a data service server

A complete failure causes the fault monitor to increase by 1 the count of complete
failures in the retry interval.

A fault monitor treats other faults as a partial failure of
a resource. A partial failure is less serious than a complete failure, and typically
causes a degradation of service, but not a complete loss of service. An example of
a partial failure is an incomplete response from a data service server before a fault
monitor probe is timed out.

A partial failure causes the fault monitor to increase by a fractional amount
the count of complete failures in the retry interval. Partial failures are still accumulated
over the retry interval.

The following characteristics of partial failures depend on the data service:

The types of faults that the fault monitor treats as partial failure

The fractional amount that each partial failure adds to the count
of complete failures

For information about faults that a data service's fault monitor detects, see
the documentation for the data service.

Dependencies of the Threshold and the Retry Interval
on Other Properties

The
maximum length of time that is required for a single restart of a faulty resource
is the sum of the values of the following properties:

Thorough_probe_interval system property

Probe_timeout extension property

To ensure that you allow enough time for the threshold to be reached within
the retry interval, use the following expression to calculate values for the retry
interval and the threshold:

retry-interval≥threshold× (thorough-probe-interval+probe-timeout)

System Properties for Setting the Threshold and
the Retry Interval

To set the threshold and the retry interval, set the following system properties
of the resource:

To set the threshold, set the Retry_count system property to the maximum allowed number of complete failures.

To set the retry interval, set the Retry_interval system property to the interval in seconds that you require.

Specifying the Failover Behavior of a Resource

The failover behavior of a resource determines how the RGM responds to
the following faults:

Failure of the resource to start

Failure of the resource to stop

Failure of the resource's fault monitor to stop

To specify the failover behavior of a resource,
set the Failover_mode system property of the resource. For information
about the possible values of this property, see the description of the Failover_mode system property in Resource Properties.