Monday, January 26, 2009

Today we're going to take a look at a subject we've covered a bit in the past. If, after reading today's post, you want to things the quick and dirty way, check out our previous post on manually editing the main.cf file.

Today's task, although it sounds grim (since it involves VCS, mostly ;), is actually fairly simple to do and can be done in at least two different ways.

For today's purposes, we're going to show one of those ways and we're going to assume (per the title) that we're going to add an NFS resource to an existing VCS cluster on Linux or Unix (with special notes for any OS-specific steps). We're also going to assume that we're dealing with a two-node cluster since that's easier to write about and cuts down on my over-explanation ;) Also, we've disabled all NFS services on the host OS (meaning that we've stopped them and removed them from init scripts so that the OS won't try to manage them itself! A simple "ps -ef|grep nfs" check before you proceed should show you any remaining running nfs programs which you should kill)

1. The simple way (according to me :)

a. Bring down your entire VCS cluster (all nodes) from the node that you'll be doing the modification of the main.cf file on. It's VERY IMPORTANT that you use the -force flag when running the "hastop -all" command, as this will cause VCS to go down, but won't affect any of the applications running. To put it another way, if you just run "hastop -all" without the -force flag, VCS will go down on all of your cluster's nodes cleanly and bring down all the resources it manages with it. We want to do this as covertly as possible.

host # hastop -all -force

b. Now, proceed to edit your main.cf file. To totally cover our arses, we're going to make a quick backup of main.cf and copy the existing version into a new directory to do our edits:

c. Make the changes and save (Note that you don't have to add any additional "includes" - like types.cf - to the file, as the NFS resource is already supported in the existing types.cf include file). All additions here should be made within the service group that the NFS resource will be a member of. Our service group is going to be called SG (there I go, getting all creative again ;). Add the following lines within the service group definition. These should be added after the SystemList definition, which I've kept in the example (but you don't need to modify), here:

You'll note that I set all of the new entries to non-critical (Critical = 0), since VCS's default is to make the resource critical, which would mean that, if we made mistake, the NFS resource bombing would cause a failover that we don't want. You can, and should, remove this line once you know everything is good. Also, check out the 3 new dependency-tree lines I've added. You don't need to modify the pretty commented version that VCS creates automatically; just the 3 basic dependencies (or requirements) that your NFS resource should rely on.

d. Now, we just need to check our new configuration to make sure it's syntactically correct:

host # hacf -verify /etc/VRTSvcs/conf/config/tmp/main.cf

if this shows any errors, fix them and run this command again until it just returns you to a blank prompt. Although it can be aggravating, VCS' hacf command doesn't tell you anything if it thinks the main.cf is fine ;)

e. Since everything went well, we just need to move our new main.cf file into its proper location (/etc/VRTSvcs/conf/config/). We've already backed up the original, earlier, so - if things get nuts, or the addition causes problems - we can switch back to the working configuration fairly quickly.

host # pwd/etc/VRTSvcs/conf/config/tmphost # mv main.cf ../.

f. Now, we just need to start up VCS again.

IMPORTANT NOTE: Only start VCS on the node on which you did your edits. Do not start it on any other nodes until the node you're working on is up and running VCS without errors. You can check the status of your host's VCS processes using either "hastatus -sum" over and over again, or just "hastatus" which provides information in real time, but is slightly more confusing to read than the output of "hastatus -sum." If you get an error about the "lock" file, just touch (create it yourself) the file you named as your lockfile in the new main.cf. You can touch it even if it exists, just to be sure, without causing any issues:

host # touch /opt/VRTSvcs/lockhost # hastart

g. Now that we've established the VCS is running fine on our main node, we can start up the other node(s). "hastatus -sum" and "hastatus" will show them (at an early point) in a REMOTE_BUILD state. This indicates that they're updating their main.cf's to match the one that we just changed.

Voila! All set. Once you're happy, be sure to set the NFS resource to "critical," if you want that. It's as easy as typing these commands on the command line (again, on your main node, if possible):

A special note for Solaris 10 users: If you want to use VCS to manage your NFS resources, you'll need to delete certain resources from the SMF framework. The following commands should do the trick for you: