What version of the code is this? A fix I added on 11/12 appears
to be missing.
Here's what should happen:
- 5 nodes in cluster (A,B,X,Y,Z) and running fenced,dlm,gfs
- kill 3 (X,Y,Z) leaving 2 (A,B)
- cluster loses quorum meaning all services are suspended on A+B
(this includes, fencing, dlm and gfs services; no fencing should
occur, no dlm recovery should occur and no gfs journal replay
should occur, until...)
- X is brought back into the cluster
- Y+Z are left inactive
- 3 of 5 nodes (A+B+X) now satisfies quorum
- on A+B, the services are now unsuspended and they do recovery:
* first, fence domain recovery: any node that was in the cluster
but is not any longer is fenced. This means that Y+Z are fenced.
X, which was also killed but has also just joined the cluster,
should /not/ be fenced. Here's where the bug is: the fencing
daemon was incorrectly fencing node X in addition to correctly
fencing Y+Z. I fixed this on 11/12/04 in response to a report
by Patrick on cluster-list.
[Note: I should add an additional delay before fencing Y+Z in this
situation in the hope that they'll have enough time to rejoin
the cluster and avoid being fenced like X.]
* second, dlm recovery occurs
* third, gfs recovery occurs; A and B are responsible for recovering
the journals used by X, Y and Z. X will not be allowed to mount
until these recoveries are complete.
I don't have an explanation for the gfs assertion. There was another
obscure bug I fixed on 11/15/04 where a machine would be allowed to
mount gfs while it was /joining/ the fencing domain, but was not yet a
full domain member. This could result in a machine having gfs
mounted, being killed but not getting fenced. It's a possible
explanation for this although I'm not sure of the details.

Dave,
You mentioned a possible additional delay in the comment above so that
nodes Y and Z can avoid being fenced like X. Did that ever go in? I
still see cases where I have 6 nodes, 3 get shot and brought back
within a few seconds of each other and one or two of them end up fenced.