In any event, after extensively field-testing the methods espoused in the earlier posts referenced above, a deployment of clustered servers to an offsite location ended up having an issue that we weren't able to anticipate (or cause to occur in previous cookie-cutter-similar deployments). For some reason, when we rolled this cluster out, NFS just refused to work in a failover capacity. Actually, it only failed, specifically, to allow the main node to mount on the failover node. This problem seems pedestrian (even still ;) - the only odd thing was that it had never happened before under equal circumstances.

Here's what we figured out along the way (and how to fix it, too ;) For our purposes today (and the way it was then) the NFS cluster component works fine on node-b, but node-a can't mount the NFS resource when it fails over to node-b.

1. The first thing most people do in any investigation is to see if the basic stuff is all up and running. We don't like to be different, so we duly checked that all of the required VCS resources were up and online. They were; which explained the puzzling ONLINE state ;)

2. We then proceeded to ensure that, in fact, node-b was sharing out the NFS resource. Commands like showmount indicated that it, indeed, was. A little research into the subject showed that the issue we ended up having can indicate an RPC failure at this point, as well, but it's best to try step 3, too, just to be sure the problem isn't confined to a single server (although the fix for it is the same no matter which way your story goes ;)

3. Then we finally struck gold, and got an actual error, when we tried to hit the mount from node-a:

4. Unspecified errors are the best kind of errors you can get since there are a much wider variety of possible solutions you can come up with... Or, maybe I have that backwards... There's really not much more to step 4. This step is a practice in surrealism ;)

5. It turns out that the answer lay in setting rpcbind properties (away from the defaults on both servers). The answer to the problem (or the fix, if you will) actually makes more sense than the way things "usually" work. The first thing we did was to set rpcbind to "global" on both nodes. By default, it was set to "local_only." We double confirmed that this is still the case on other cluster setups we have running, in which everything is hunky-dory. You also need to do these steps on both nodes (or all nodes) in your cluster, while, here, we're only showing what we typed on the active NFS resource-sharing node:

node-b # svcprop network/rpc/bind:default | grep local_only <-- See if the local_only property is setconfig/local_only boolean true <-- and there it is!

then move on to fixing the problem (again on both nodes) by setting the rpcbind configuration value to global (which, in the instance of rpcbind, actually means setting the local_only attribute to "false"):

7. Finally, just make sure you can mount your NFS resource from whichever node isn't currently hosting the NFS resource. You don't necessarily have to test it on both nodes, once you've fixed this issue on both, but why risk the near-future embarrassment?

And that's that. You should be good to go :) Since all's well that ends well, we'll try not to leave you with any clichés in our farewell. Parting is, after all, such sweet sorrow. At least until tomorrow :)

Cheers,

, Mike

Discover the Free Ebook that shows you how to make 100% commissions on ClickBank!