This web page is no longer maintained. Information presented here exists only to avoid breaking historical links.The Project stays maintained, and lives on: see the Linux-HA Reference Documentation.To get rid of this notice, you may want to browse the old wiki instead.

Device Numbering

The device major/minor numbers are embedded in the NFS filehandle, making the filehandle go stale if the major/minor numbers change as the failover happens. If that is the case, you can either change your configuration to make it magically the same if you are using DRBD , or if you have shared disks you can use either EVMS or LVM to create a shared volume which will have the same major/minor numbers. You can download EVMS in http://evms.sourceforge.net/ and LVM in http://www.sistina.com/products_lvm.htm

Another alternative to deal with the NFS device numbering problem is afforded by later versions of NFS (commonly included with 2.6 Linux kernels). In these versions, you can specify an integer to be used in place of the major/minor of the mount device through the fsid parameter. For more details see exports(5).

Quick Setup

For those who want a quick setup, here are the steps to prepare manual NFS failover. Once this is tested and working, you should consult Heartbeat for setting up automatic failover.

NFS v4: Keeping rpc_pipefs local

For NFS v4, short of taking down the rpc_pipefs file system mount and all resources that use it, you might keep the rpc_pipefs directory local to each server to get started. Do this before setting up the sharing of /var/lib/nfs described in the next section.

Note: This is a shortcut, which may not work in complex environments using all the services mentioned above.

The following procedure was tested on Redhat Enterprise Linux 5.2. The servers were not NFS clients, and did not use GSS. You may need to adjust details for your own distribution. The changes are to be made on both nodes.

Change all occurrences of /var/lib/nfs/rpc_pipefs to /var/lib/rpc_pipefs:

This is of course not recommended. Rather, scan the top of the file for any configuration files that are source'd and edit those instead.

Start services, on the primary node only:

service nfs start
service nfslock start

For heartbeat integration, make sure that both of these services are removed from the native init.d startup sequence and instead are placed under heartbeat control, i.e., in the haresources file for heartbeat v.1 or the CIB (v.2). The order of the services is not entirely clear. Most examples have nfslock subordinate to nfs, being started after nfs, and stopped before nfs, though some init.d configurations may have the order reversed (notably RHEL5.x).

Export the user directory. Add the following line to /etc/exports:

/hafs/data *(rw,sync)

and run

exportfs -va

Either sync /etc/exports to the other node, or symlink it on both nodes into the shared disk, perhaps into /var/lib/nfs:

Initiate heartbeat migration for all resources in the previous section, e.g., for a heartbeat v2 configuration:

crm_resource -r drbdgroup -M
crm_mon; sleep 5; crm_mon

On the secondary server: bring up NFS services:

service nfslock start
service nfs start

On the client, watch the above loop.

To migrate back, perform the analogous:

On the secondary server: take down NFS services:

service nfslock stop
service nfs stop

On the primary server::

crm_resource -r drbdgroup -U
crm_mon; sleep 5; crm_mon

... and bring up NFS services:

service nfslock start
service nfs start

Hints

NFS-mounting any filesystem on your NFS servers is highly discouraged. DaveDykstra wanted both servers to NFS-mount the replicated filesystem from the active server, and through a lot of trouble mostly got it working but still saw scenarios where "NFS server not responding" could interfere with heartbeat failovers and he finally gave up on it. The biggest problem was with the fuser command hanging. For more details see the archives of a mailing list discussion from 2005 and another from 2006.

If your kernel defaults to using TCP for NFS (as is the case in 2.6 kernels), switch to UDP instead by using the 'udp' mount option. If you don't do this, you won't be able to quickly switch from server "A" to "B" and back to "A" because "A" will hold the TCP connection in TIME_WAIT state for 15-20 minutes and refuse to reconnect.

For failover between the NFS server nodes to succeed, the shared nfs directory must be properly and unconditionally handed over.

On Redhat Enterprise Linux 5.x, nfsd occasionally refuses to die upon service nfs stop. To remedy the situation, apply the following patch:

This patch is for /etc/init.d/nfs from nfs-utils-1.0.9-33.el5. Adapt as necessary for your distribution.

Locking

NFS locking is a cooperative enterprise. Lock migration is coordinated by rpc.statd(8). Locks are stored in /var/lib/nfs/statd/sm and /var/lib/nfs/statd/sm.bak. Having /var/lib/nfs on shared storage as outlined above will enable cooperative lock migration upon failover, as initiated by rpc.statd on the new server.

For debugging, rpc.statd(8) allows signal-initiated recovery:

SIGUSR1 causes rpc.statd to re-read the notify list from disk and send notifications to clients. This can be used in High Availability NFS (HA-NFS) environments to notify clients to reacquire file locks upon takeover of an NFS export from another server.

On some kernels or in some situations, locks may not survive HA failovers without the steps above. As a workaround for those situations, it is recommended that you mount NFS filesystems with the "nolock" option. For more details see a mailing list post from 2005 and a confirmation from 2007 using a more recent kernel.