Description

A zookeeper server under minimum load, with a number of clients watching exactly one node will fail to maintain the connection when the machine is subjected to moderate IO load.

In a specific test example we had three zookeeper servers running on dedicated machines with 45 clients connected, watching exactly one node. The clients would disconnect after moderate load was added to each of the zookeeper servers with the command:

dd if=/dev/urandom of=/dev/mapper/nimbula-test

The dd command transferred data at a rate of about 4Mb/s.

The same thing happens with

dd if=/dev/zero of=/dev/mapper/nimbula-test

It seems strange that such a moderate load should cause instability in the connection.

Very few other processes were running, the machines were setup to test the connection instability we have experienced. Clients performed no other read or mutation operations.

Although the documents state that minimal competing IO load should present on the zookeeper server, it seems reasonable that moderate IO should not cause problems in this case.