Hi guys,
I found a corner case where calling fence_tools -w leave will/might hang.
in my setup where i have 2 nodes cluster:
- both nodes are up
- poweroff the first one -> OK
- reboot the second one -> OK
- the second node comes up again:
cman_tools services will show:
fence 0 default 00040001 JOIN_START_WAIT
since the first node is "dead" there is never a complete switch to state = none.
if you call fence_tools -w leave it will hang there forever.
in my init scripts I just changed the fence_stop() to use the usual wait 10
seconds or die kind of loop:
fence_tool -w leave &
for sec in $(seq 1 10); do
if pidof fence_tool &> /dev/null; then
if [ "$sec" = 10 ]; then
kill $(pidof fence_tool) > /dev/null 2>&1
else
sleep 1
fi
fi
done
Regards
Fabio
PS I spotted this problem when updating the Ubuntu init scripts, but the code
used in upstream init script seems to suffer the exact same problem. You also
want to note that i am not checking for fenced to exit, but for the tools to return.
--
I'm going to make him an offer he can't refuse.