acceptedEpoch not handling zxid rollover in lower 32bits

Details

Workaround: there is a simple workaround for this issue. Force a leader re-election before the lower 32bits reach 0xffffffff

Most users won't even see this given the number of writes on a typical installation - say you are doing 500 writes/second, you'd see this after ~3 months if the quorum is stable, any changes (such as upgrading the server software) would cause the xid to be reset, thereby staving off this issue for another period.

Workaround: there is a simple workaround for this issue. Force a leader re-election before the lower 32bits reach 0xffffffff
Most users won't even see this given the number of writes on a typical installation - say you are doing 500 writes/second, you'd see this after ~3 months if the quorum is stable, any changes (such as upgrading the server software) would cause the xid to be reset, thereby staving off this issue for another period.

Description

When the lower 32bits of a zxid "roll over" (zxid is a 64 bit number, however the upper 32 are considered the epoch number) the epoch number (upper 32 bits) are incremented and the lower 32 start at 0 again.

This should work fine, however, afaict, in the current 3.4/3.5 the acceptedEpoch/currentEpoch files are not being updated for this case.

Activity

I just tested this with my test from ZOOKEEPER-1277 and it fails with out the hzxid change in ZooKeeperServer. However even with that patch it still fails, I'm assuming because the acceptedEpoch, etc... files are not being updated properly.

Patrick Hunt
added a comment - 12/Nov/11 01:27 I just tested this with my test from ZOOKEEPER-1277 and it fails with out the hzxid change in ZooKeeperServer. However even with that patch it still fails, I'm assuming because the acceptedEpoch, etc... files are not being updated properly.
Camille can you take a look?

Patrick Hunt
added a comment - 23/Mar/12 00:36 This turns out to be a duplicate of ZOOKEEPER-1277 - that patch causes the leader to be re-elected just prior to rollover. 1277 was applied to 3.3/3.4/3.5(trunk)