Description

The metaset command administers sets of disks in named disk sets. Named disk
sets include any disk set that is not in the local set. While
disk sets enable a high-availability configuration, Solaris Volume Manager itself does not actually
provide a high-availability environment.

A single-owner disk set configuration manages storage on a SAN or fabric-attached storage,
or provides namespace control and state database replica management for a specified set
of disks.

In a shared disk set configuration, multiple hosts are physically connected to the
same set of disks. When one host fails, another host has exclusive access
to the disks. Each host can control a shared disk set, but
only one host can control it at a time.

When you add a new disk to any disk set, Solaris Volume
Manager checks the disk format. If necessary, it repartitions the disk to ensure
that the disk has an appropriately configured reserved slice 7 (or slice 6
on an EFI labelled device) with adequate space for a state database replica.
The precise size of slice 7 (or slice 6 on an EFI labelled
device) depends on the disk geometry. For tradtional disk sets, the slice is
no less than 4 Mbytes, and probably closer to 6 Mbytes, depending on
where the cylinder boundaries lie. For multi-owner disk sets, the slice is a
minimum of 256 Mbytes. The minimal size for slice 7 might change in
the future. This change is based on a variety of factors, including the
size of the state database replica and information to be stored in the
state database replica.

For use in disk sets, disks must have a dedicated slice (six
or seven) that meets specific criteria:

The slice must start at sector 0

The slice must include enough space for disk label

The state database replicas cannot be mounted

The slice does not overlap with any other slices, including slice 2

If the existing partition table does not meet these criteria, or if the
-L flag is specified, Solaris Volume Manager repartitions the disk. A small portion
of each drive is reserved in slice 7 (or slice 6 on
an EFI labelled device) for use by Solaris Volume Manager. The remainder of
the space on each drive is placed into slice 0. Any existing data
on the disks is lost by repartitioning.

After you add a drive to a disk set, it can be
repartitioned as necessary, with the exception that slice 7 (or slice 6 on
an EFI labelled device) is not altered in any way.

After a disk set is created and metadevices are set up within
the set, the metadevice name is in the following form:

/dev/md/setname/{dsk,rdsk}/dnumber

where setname is the name of the disk set, and number is the
number of the metadevice (0-127).

If you have disk sets that you upgraded from Solstice DiskSuite software, the
default state database replica size on those sets is 1034 blocks, not the
8192 block size from Solaris Volume Manager. Also, slice 7 on the
disks that were added under Solstice DiskSuite are correspondingly smaller than slice 7 on
disks that were added under Solaris Volume Manager.

If disks you add to a disk set have acceptable slice 7s
(that start at cylinder 0 and that have sufficient space for the state
database replica), they are not reformatted.

where setname is the name of the disk set, and hot_spare_pool is the
name of the hot spare pool associated with the disk set.

Multi–node Environment

To create and work with a disk set in a multi—node environment,
root must be a member of Group 14 on all hosts, or
the /.rhosts file must contain an entry for all other host names.
This is not required in a SunCluster 3.x enviroment.

Tagged data

Tagged data occurs when there are different versions of a disk set's replicas.
This tagged data consists of the set owner's nodename, the hardware serial number
of the owner, and the time it was written out to the
available replicas. The system administer can use this information to determine which replica contains
the correct data.

When a disk set is configured with an even number of storage
enclosures and has replicas balanced across them evenly, it is possible that up
to half of the replicas can be lost (for example, through a power
failure of half of the storage enclosures). After the enclosure that went down
is rebooted, half of the replicas are not recognized by SVM. When the
set is retaken, the metaset command returns an error of “stale databases”, and
all of the metadevices are in a read-only state.

Some of the replicas that are not recognized need to be deleted.
The action of deleting the replicas also causes updates to the replicas that
are not being deleted. In a dual hosted disk set environment, the second
node can access the deleted replicas instead of the existing replicas when it
takes the set. This leads to the possibility of getting the wrong replica
record on a disk set take. An error message is displayed, and user
intervention is required.

Use the -q to query the disk set and the -t, -u,
and -y, options to select the tag and take the disk set. See
OPTIONS.

Mediator Configuration

SVM provides support for a low-end HA solution consisting of two hosts that
share only two strings of drives. The hosts in this type of
configuration, referred to as mediators or mediator hosts, run a special daemon, rpc.metamedd(1M). The
mediator hosts take on additional responsibilities to ensure that data is available in
the case of host or drive failures.

A mediator configuration can survive the failure of a single host or a
single string of drives, without administrative intervention. If both a host and a
string of drives fail (multiple failures), the integrity of the data cannot be
guaranteed. At this point, administrative intervention is required to make the data accessible.
See mediator(7D) for further details.

Use the -m option to add or delete a mediator host. See OPTIONS.

Options

The following options are supported:

-adrivename

Add drives or hosts to the named set. For a drive to be accepted into a set, the drive must not be in use within another metadevice or disk set, mounted on, or swapped on. When the drive is accepted into the set, it is repartitioned and the metadevice state database replica (for the set) can be placed on it. However, if a slice 7 (or slice 6 on an EFI labelled device), starts at cylinder 0, and is large enough to hold a state database replica, then the disk is not repartitioned. Also, a drive is not accepted if it cannot be found on all hosts specified as part of the set. This means that if a host within the specified set is unreachable due to network problems, or is administratively down, the add fails.

Specify a drive name in the form cnumtnumdnum. Do not specify a slice number (snum). For drives in a Sun Cluster, you must specify a complete pathname for each drive. Such a name has the form:

/dev/did/[r]dsk/dnum

-a | -d | -mmediator_host_list

Add (-a) or delete (-d) mediator hosts to the specified disk set. A mediator_host_list is the nodename (established by svc:/system/identity:nodesmf(5) service) of the mediator host to be added and (for adding) up to two other aliases for the mediator host. The nodename and aliases for each mediator host are separated only by commas. Up to three mediator hosts can be specified for the named disk set. Specify only the nodename of that host as the argument to -m to delete a mediator host.

In a single metaset command you can add or delete up to three mediator hosts. See EXAMPLES.

-A{enable | disable}

Specify auto-take status for a disk set. If auto-take is enabled for a set, the disk set is automatically taken at boot, and file systems on volumes within the disk set can be mounted through /etc/vfstab entries. Only a single host can be associated with an auto-take set, so attempts to add a second host to an auto-take set or attempts to configure a disk set with multiple hosts as auto-take fails with an error message. Disabling auto-take status for a specific disk set causes the disk set to revert to normal behavior. That is, the disk set is potentially shared (non-concurrently) among hosts, and unavailable for mounting through /etc/vfstab.

-b

Insure that the replicas are distributed according to the replica layout algorithm. This can be invoked at any time, and does nothing if the replicas are correctly distributed. In cases where the user has used the metadb command to manually remove or add replicas, this command can be used to insure that the distribution of replicas matches the replica layout algorithm.

-C{take | release | purge}

Do not interact with the Cluster Framework when used in a Sun Cluster 3 environment. In effect, this means do not modify the Cluster Configuration Repository. These options should only be used to fix a broken disk set configuration.

take

Take ownership of the disk set but do not inform the Cluster Framework that the disk set is available. This option is not for use with a multi-owner disk set.

release

Release ownership of the disk set without informing the Cluster Framework. This option should only be used if the disk set ownership was taken with the corresponding -Ctake option. This option is not for use with a multi-owner disk set.

purge

Remove the disk set without informing the Cluster Framework that the disk set has been purged. This option should only be used when the disk set is not accessible and requires rebuilding.

-ddrivename

Delete drives or hosts from the named disk set. For a drive to be deleted, it must not be in use within the set. The last host cannot be deleted unless all of the drives within the set are deleted. Deleting the last host in a disk set destroys the disk set.

Specify a drive name in the form cnumtnumdnum. Do not specify a slice number (snum). For drives in a Sun Cluster, you must specify a complete pathname for each drive. Such a name has the form:

/dev/did/[r]dsk/dnum

This option fails on a multi-owner disk set if attempting to withdraw the master node while other nodes are in the set.

-f

Force one of three actions to occur: takes ownership of a disk set when used with -t; deletes the last disk drive from the disk set; or deletes the last host from the disk set. Deleting the last drive or host from a disk set requires the -d option.

When used to forcibly take ownership of the disk set, this causes the disk set to be grabbed whether or not another host owns the set. All of the disks within the set are taken over (reserved) and fail fast is enabled, causing the other host to panic if it had disk set ownership. The metadevice state database is read in by the host performing the take, and the shared metadevices contained in the set are accessible.

You can use this option to delete the last drive in the disk set, because this drive would implicitly contain the last state database replica.

You can use -f option to delete hosts from a set. When specified with a partial list of hosts, it can be used for one-host administration. One-host administration could be useful when a host is known to be non-functional, thus avoiding timeouts and failed commands. When specified with a complete list of hosts, the set is completely deleted. It is generally specified with a complete list of hosts to clean up after one-host administration has been performed.

-hhostname...

Specify one or more host names to be added to or deleted from a disk set. Adding the first host creates the set. The last host cannot be deleted unless all of the drives within the set have been deleted. The host name is not accepted if all of the drives within the set cannot be found on the specified host. The host name is the nodename established by the svc:/system/identity:nodesmf(5) service.

-j

Join a host to the owner list for a multi-owner disk set. The concepts of take and release, used with traditional disk sets, do not apply to multi-owner sets, because multiple owners are allowed.

As a host boots and is brought online, it must go through three configuration levels to be able to use a multi-owner disk set:

It must be included in the cluster nodelist, which happens automatically in a cluster or single-node sitatuion.

It must be added to the multi-owner disk set with the -a-h options documented elsewhere in this man page

It must join the set. When the host is first added to the set, it is automatically joined.

On manual restarts, the administrator must manually issue

metaset -smultinodesetname-j

to join the host to the owner list. After the cluster reconfiguration, when the host reenters the cluster, the node is automatically joined to the set. The metaset-j command joins the host to all multi-owner sets that the host has been added to. In a single node situation, joining the node to the disk set starts any necessary resynchronizations.

-L

When adding a disk to a disk set, force the disk to be repartitioned using the standard Solaris Volume Manager algorithm. See DESCRIPTION.

-llength

Set the size (in blocks) for the metadevice state database replica. The length can only be set when adding a new drive; it cannot be changed on an existing drive. The default (and maximum) size is 8192 blocks, which should be appropriate for most configurations. Replica sizes of less than 128 blocks are not recommended.

-M

Specify that the disk set to be created or modified is a multi-owner disk set that supports multiple concurrent owners.

This option is required when creating a multi-owner disk set. Its use is optional on all other operations on a multi-owner disk set and has no effect. Existing disk sets cannot be converted to multi-owner sets.

-o

Return an exit status of 0 if the local host or the host specified with the -h option is the owner of the disk set.

-P

Purge the named disk set from the node on which the metaset command is run. The disk set must not be owned by the node that runs this command. If the node does own the disk set, the command fails.

If you need to delete a disk set but cannot take ownership of the set, use the -P option.

-q

Displays an enumerated list of tags pertaining to ``tagged data'' that can be encountered during a take of the ownership of a disk set.

This option is not for use with a multi-owner disk set.

-r

Release ownership of a disk set. All of the disks within the set are released. The metadevices set up within the set are no longer accessible.

This option is not for use with a multi-owner disk set.

-ssetname

Specify the name of a disk set on which metaset works. If no setname is specified, all disk sets are returned.

-t

Take ownership of a disk set safely. If metaset finds that another host owns the set, this host is not be allowed to take ownership of the set. If the set is not owned by any other host, all the disks within the set are owned by the host on which metaset was executed. The metadevice state database is read in, and the shared metadevices contained in the set become accessible. The -t option takes a disk set that has stale databases. When the databases are stale, metaset exits with code 66, and prints a message. At that point, the only operations permitted are the addition and deletion of replicas. Once the addition or deletion of the replicas has been completed, the disk set should be released and retaken to gain full access to the data.

This option is not for use with a multi-owner disk set.

-utagnumber

Once a tag has been selected, a subsequent take with -utagnumber can be executed to select the data associated with the given tagnumber.

w

Withdraws a host from the owner list for a multi-owner disk set. The concepts of take and release, used with traditional disk sets, do not apply to multi-owner sets, because multiple owners are allowed.

Instead of releasing a set, a host can issue

metaset -s multinodesetname -w

to withdraw from the owner list. A host automatically withdraws on a reboot, but can be manually withdrawn if it should not be able to use the set, but should be able to rejoin at a later time. A host that withdrew due to a reboot can still appear joined from other hosts in the set until a reconfiguration cycle occurs.

metaset-w withdraws from ownership of all multi-owner sets of which the host is a member. This option fails if you attempt to withdraw the master node while other nodes are in the disk set owner list. This option cancels all resyncs running on the node. A cluster reconfiguration process that is removing a node from the cluster membership list effectively withdraws the host from the ownership list.

-y

Execute a subsequent take. If the take operation encounters ``tagged data,'' the take operation exits with code 2. You can then run the metaset command with the -q option to see an enumerated list of tags.

Examples

Example 1 Defining a Disk Set

This example defines a disk set.

# metaset -s relo-red -a -h red blue

The name of the disk set is relo-red. The names of the first
and second hosts added to the set are red and blue, respectively.
(The hostname is the nodename established by svc:/system/identity:nodesmf(5) service.) Adding the first
host creates the disk set. A disk set can be created with just
one host, with the second added later. The last host cannot be
deleted until all of the drives within the set have been deleted.

Example 2 Adding Drives to a Disk Set

This example adds drives to a disk set.

# metaset -s relo-red -a c2t0d0 c2t1d0 c2t2d0 c2t3d0 c2t4d0 c2t5d0

The name of the previously created disk set is relo-red. The names of
the drives are c2t0d0, c2t1d0, c2t2d0, c2t3d0, c2t4d0, and c2t5d0. There is
no slice identifier (“sx”) at the end of the drive names.

Example 3 Adding Multiple Mediator Hosts

The following command adds three mediator hosts to the specified disk set.

The name of the disk set is blue. The names of the first
and second hosts added to the set are hahost1 and hahost2, respectively.
The hostname is the nodename established by svc:/system/identity:nodesmf(5) service. Adding the first
host creates the multi-owner disk set. A disk set can be created with
just one host, with additional hosts added later. The last host cannot be
deleted until all of the drives within the set have been deleted.