Hi Artem -
It looks like you selected wrong (or not exactly right) product for
you project. Heartbeat (Linux-HA) cluster was designed to provide High
Availability features for applications, not for distributed
computations. In HA cluster in most cases a node that loses connection
to the cluster hast to be killed (see STONITH feature) to prevent
data corruption on shared devices in a split-brain situation.
Take a look at this product: http://oscar.openclustergroup.org may be
it fits better for your needs.
On 10/26/07, Artem Pervin <ArtemPervin at botik.ru> wrote:
> Hello, all,
>> I'm new to the heartbeat project and I'm experiencing some problems
> in setting things up. I will be very grateful for any help.
>> I want to use the heartbeat to detect a disconnection of a node in a
> computational
> cluster. As well, I need to detect when the node is back on the
> network. On both events a custom script should be started either on the
> head node
> or both on the head node and on the computational node.
>> I've managed to setup the heartbeat on two nodes (A and B) and I can
> observe the nodes status with the help of
> crm_mon. When I simulate the lost of connectivity on the node B (using
> ifdown) after some time I can see the
> changing in the status of this node: on the node A
> crm_mon reports that B is "OFFLINE" and A - "online", and on the node B
> crm_mon reports that the A is "OFFLINE" and
> B - "online". In this case I simply cannot determine from crm_mon data
> (or cib.xml) which node is actually down and
> where should I shutdown my processes.
>> Moreover when I turn on the network interface on the node B back the
> status of the node doesn't change at all. Only
> when I restart the heartbeat service either on the node B or on the node
> A, the status of the node B returns to
> "online". That's pretty odd behavior, I think.
>> My questions are the following:
> 1) Is the heartbeat an appropriate solution for my task or should I use
> something else?
> 2) If the heartbeat is fine, then what am I doing wrong? Why the status
> of the node B doesn't return to "online"
> state when turn on the network interface?
> 3) Is there any API to check the status of the node? Parsing the cib.xml
> is not very convenient.
>> Here's my configuration:
> CentOS, kernel 2.6.18, x86
> heartbeat version is 2.0.1 installed as rpm package
>> ha.cf:
> ----------------------------------------------------------------------------------
> use_logd yes
> bcast eth0
> node A B
> crm on
> auto_failback on
> ---------------------------------------------------------------------------------
>> authkeys
> ---------------------------------------------------------------------------------
> auth 1
> 1 sha1 helloworld
> ---------------------------------------------------------------------------------
>> logd.cf
> ---------------------------------------------------------------------------------
> debugfile /var/log/ha-debug
> logfile /var/log/ha-log
> logfacility daemon
> ---------------------------------------------------------------------------------
>> Thank you for your help.
>>> --
> Artem Pervin
> _______________________________________________
> Linux-HA mailing list
>Linux-HA at lists.linux-ha.org>http://lists.linux-ha.org/mailman/listinfo/linux-ha> See also: http://linux-ha.org/ReportingProblems>
--
Serge Dubrouski.