Shoal's GroupManagementService
(GMS) provides a client API that allows client components within a
process to receive callbacks when group events occur. One such group
event is the Automatic Selection of a live member for Initiating
Recovery operations on a failed member's resources when the failure
is confirmed.

This selection event is only
notified to the registered component on the selected member, while a
record of this selection is shared with all members through an entry
in the GMS implementation of Distributed State Cache. In order
to have such a selection performed and notification delivered, one or
more components in each member should register with the Group
Management Service. For this, the client component has to
implement two interfaces, namely, FailureRecoveryActionFactory and
FailureRecoveryAction. FailureRecoveryActionFactory produces a
FailureRecoveryAction which consumes a FailureRecoverySignal, which
is the selection event object handed by GMS on the member that is
selected.

When
Failure OccursGMS's
failure handing core would determine if the recovery member selection
algorithm needs to be run and if there are recipients for such a
resultant notification. If so, the algorithm is run and in the member
that was selected, the client component's
FailureRecoveryActionFactory's
produceAction() method is invoked, followed by invoking the
produced FailureRecoveryAction's
consumeSignal() method. GMS passes
the FailureRecoverySignal
implementation packaging the failed member's identity token,
timestamps, etc.

The recovery selection algorithm
currently depends on an identically ordered list of members on all
surviving members. The list prior to failure is consulted, and the
member that was next in the list to the failed member, is selected
for recovery operations. This makes it a more deterministic, simple
bit scalable solution for recovery operations.

Note that the implementation of
FailureRecoveryAction's
consumeSignal() method has to follow
the precedence norms set out in the Shoal
Design Document. In other words, the implementation in
consumeSignal() method should first
acquire the Signal by invoking FailureRecoverySignal.acquire().

Note also that the acquire()
call here has special meaning in that Shoal's protective Failure
Fencing kicks in when this call is made, i.e. internally, the
Distributed State Cache implementation is called and an entry is
placed in it recording the recovery member identity, the failed
member identity, the recovery state (appointed or in-progress) and a
timestamp of the entry. Once recovery is completed, the
consumeSignal() implementation should
call Signal.release() which results
in the fence being lowered . i.e. the entry in the Distributed State
Cache is removed.

Failure Fencing
Explained

When
a remote/delegate member is performing recovery operations on a
failed member's resources, this operation has to be protected from
contention or race conditions. The contention or race condition could
occur when the failed member restarts before the recovery operations
by the delegate member is completed and attempts to regain control of
its resources. In order to enforce such a protection, GMS provides a
protocol that allows processes to raise a protective fence until
recovery operations are complete, at which point, the fence is
lowered by the recovery member. The lowering of the fence is
automatically done when consumeSignal()implementation
calls FailureRecoverySignal.release().

When
a failed member restarts, the protocol requires that the member check
with the group as to whether any other member is performing recovery
operations on its resources. This portion of the protocol is termed
Netiquette. This is provided through GroupHandle.isFenced()
API.

The
client in the restarting failed member, may have predetermined
policies for handling such situations. One policy example could be to
continue its startup without reclaiming control over the resources,
and reclaim ownership when notified about completion of the recovery.
GMS currently does not provide notifications when the fence is raised
or lowered. Typically, clients can call the GroupHandle.isFenced()
API to make such a determination. Another policy example would be to
block start up until the recovery operations are completed and
continue startup after regaining control of resources. This is left
to the client's implementation.

Clients Performing
Recovery on Startup

There
are situations where certain clients perform recovery operations on
their resources during startup with the implicit assumption that the
startup was preceded by a failure. In such cases, and particularly
so when delegated recovery notification is registered for, such
clients during startup must :

first
check if a fence has been raised on their member identity (through
GroupHandle.isFenced())

if
not fenced, and if performing recovery, raise a fence (through
GroupHandle.raiseFence()). This
results in an entry in the Distributed State Cache recording the
recovery member identity, the failed member identity, the recovery
state (appointed or in-progress) and a timestamp of the entry

when
recovery operations are done, lower the fence (through
GroupHandle.lowerFence()). This
results in the entry in the
Distributed State Cache being removed