Consistency in zookeeper

Consistency in zookeeper

Hello everyone,

From the zookeeper website I understand that zookeeper does not provide strict consistency in every instance in time. (http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#ch_zkGuarantees)
Have ever anyone considered to make zookeeper strictly consistent at anytime. What I mean is that any time a value is updated in zookeeper, any client that retrieves the value from any follower should get consistent result. Is it feasible to improve the zookeeper core so that zookeeper delivers strict consistency not the eventual consistency?

Re: Consistency in zookeeper

Hi Yasin,

I assume you mean "linearizability" by "strict consistency".

ZooKeeper provides "sequential consistency". This is weaker than
linearizability but is still very strong, much stronger than "eventual
consistency".
In addition, all update operations are linearizable as they are
sequenced by the leader. With sequential consistency, a reader never
"goes back in time"
even if you read from a different follower every time, you'll never
see version 3 of the data after seeing version 4.

ZooKeeper also provides a sync command. If you invoke a sync command
and then a read, the read is guaranteed to see at least the last write
that
completed before the sync started. So if you always do "sync + read"
instead of just "read", you get linearizability. But you pay in
performance since
these reads will no longer be executed locally on the follower to
which you're connected - they sync is sent to the leader. That's why
ZooKeeper gives
you the option of doing a fast read that is consistent but may
retrieve a slightly old version, or a sync+read that is more
consistent but slower.

RE: Consistency in zookeeper

Hi Yasin,

Adding one more point,

ZooKeeper provides different ways of achieving data sync. Like Alex & Vladimir explained, sync() api is one way and it has the overhead of performance.

Another approach is to define Watchers. This also will be helpful to keep in sync the data between the clients. Its internally using the asynchronous way of notifying different events. Also, its very light-weight and here user/client should define specific watchers to achieve the synchronized view of data.

ZK supports various events like NodeDataChanged, NodeChildrenChanged. Since it is asynchronous, there will be slight latency in recieving the events.

ZooKeeper provides "sequential consistency". This is weaker than
linearizability but is still very strong, much stronger than "eventual
consistency".
In addition, all update operations are linearizable as they are
sequenced by the leader. With sequential consistency, a reader never
"goes back in time"
even if you read from a different follower every time, you'll never
see version 3 of the data after seeing version 4.

ZooKeeper also provides a sync command. If you invoke a sync command
and then a read, the read is guaranteed to see at least the last write
that
completed before the sync started. So if you always do "sync + read"
instead of just "read", you get linearizability. But you pay in
performance since
these reads will no longer be executed locally on the follower to
which you're connected - they sync is sent to the leader. That's why
ZooKeeper gives
you the option of doing a fast read that is consistent but may
retrieve a slightly old version, or a sync+read that is more
consistent but slower.

> Hi Yasin,
>
> Adding one more point,
>
> ZooKeeper provides different ways of achieving data sync. Like Alex &
> Vladimir explained, sync() api is one way and it has the overhead of
> performance.
>
> Another approach is to define Watchers. This also will be helpful to keep
> in sync the data between the clients. Its internally using the asynchronous
> way of notifying different events. Also, its very light-weight and here
> user/client should define specific watchers to achieve the synchronized
> view of data.
>
> ZK supports various events like NodeDataChanged, NodeChildrenChanged.
> Since it is asynchronous, there will be slight latency in recieving the
> events.
>
> Reference:
>
> http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#ch_zkWatches> Section: •The data for which the watch was set
>
>
> http://zookeeper.apache.org/doc/r3.2.2/zookeeperTutorial.html#sc_producerConsumerQueues>
> -Rakesh
> ________________________________________
> From: Alexander Shraer [[hidden email]]
> Sent: Friday, March 01, 2013 5:19 AM
> To: [hidden email]> Cc: [hidden email]> Subject: Re: Consistency in zookeeper
>
> Hi Yasin,
>
> I assume you mean "linearizability" by "strict consistency".
>
> ZooKeeper provides "sequential consistency". This is weaker than
> linearizability but is still very strong, much stronger than "eventual
> consistency".
> In addition, all update operations are linearizable as they are
> sequenced by the leader. With sequential consistency, a reader never
> "goes back in time"
> even if you read from a different follower every time, you'll never
> see version 3 of the data after seeing version 4.
>
> ZooKeeper also provides a sync command. If you invoke a sync command
> and then a read, the read is guaranteed to see at least the last write
> that
> completed before the sync started. So if you always do "sync + read"
> instead of just "read", you get linearizability. But you pay in
> performance since
> these reads will no longer be executed locally on the follower to
> which you're connected - they sync is sent to the leader. That's why
> ZooKeeper gives
> you the option of doing a fast read that is consistent but may
> retrieve a slightly old version, or a sync+read that is more
> consistent but slower.
>
> Alex
>
> On Thu, Feb 28, 2013 at 3:35 PM, Yasin <[hidden email]> wrote:
> > Hello everyone,
> >
> > From the zookeeper website I understand that zookeeper does not provide
> > strict consistency in every instance in time.
> > (
> http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#ch_zkGuarantees> )
> > Have ever anyone considered to make zookeeper strictly consistent at
> > anytime. What I mean is that any time a value is updated in zookeeper,
> any
> > client that retrieves the value from any follower should get consistent
> > result. Is it feasible to improve the zookeeper core so that zookeeper
> > delivers strict consistency not the eventual consistency?
> >
> > Best
> >
> > Yasin
> >
> >
> >
> > --
> > View this message in context:
> http://zookeeper-user.578899.n2.nabble.com/Consistency-in-zookeeper-tp7578531.html> > Sent from the zookeeper-user mailing list archive at Nabble.com.
>

Re: Consistency in zookeeper

its possible, but what it gets you is that the read will see at least
the writes that completed before the sync started.
possibly later writes too. Actually, this is true only with some
timing assumption. As was previously discussed on the
list, in order to really guarantee this property even with leader
failures, the leader would have to broadcast sync commands just like
updates,
which it currently doesn't do for some reason.

> Will sync and read really help to achieve what Yasin wants ? is it not
> possible for value to change between sync and read?
>
> Thanks
> Kishore G
>
>
> On Thu, Feb 28, 2013 at 9:32 PM, Rakesh R <[hidden email]> wrote:
>
>> Hi Yasin,
>>
>> Adding one more point,
>>
>> ZooKeeper provides different ways of achieving data sync. Like Alex &
>> Vladimir explained, sync() api is one way and it has the overhead of
>> performance.
>>
>> Another approach is to define Watchers. This also will be helpful to keep
>> in sync the data between the clients. Its internally using the asynchronous
>> way of notifying different events. Also, its very light-weight and here
>> user/client should define specific watchers to achieve the synchronized
>> view of data.
>>
>> ZK supports various events like NodeDataChanged, NodeChildrenChanged.
>> Since it is asynchronous, there will be slight latency in recieving the
>> events.
>>
>> Reference:
>>
>> http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#ch_zkWatches>> Section: •The data for which the watch was set
>>
>>
>> http://zookeeper.apache.org/doc/r3.2.2/zookeeperTutorial.html#sc_producerConsumerQueues>>
>> -Rakesh
>> ________________________________________
>> From: Alexander Shraer [[hidden email]]
>> Sent: Friday, March 01, 2013 5:19 AM
>> To: [hidden email]>> Cc: [hidden email]>> Subject: Re: Consistency in zookeeper
>>
>> Hi Yasin,
>>
>> I assume you mean "linearizability" by "strict consistency".
>>
>> ZooKeeper provides "sequential consistency". This is weaker than
>> linearizability but is still very strong, much stronger than "eventual
>> consistency".
>> In addition, all update operations are linearizable as they are
>> sequenced by the leader. With sequential consistency, a reader never
>> "goes back in time"
>> even if you read from a different follower every time, you'll never
>> see version 3 of the data after seeing version 4.
>>
>> ZooKeeper also provides a sync command. If you invoke a sync command
>> and then a read, the read is guaranteed to see at least the last write
>> that
>> completed before the sync started. So if you always do "sync + read"
>> instead of just "read", you get linearizability. But you pay in
>> performance since
>> these reads will no longer be executed locally on the follower to
>> which you're connected - they sync is sent to the leader. That's why
>> ZooKeeper gives
>> you the option of doing a fast read that is consistent but may
>> retrieve a slightly old version, or a sync+read that is more
>> consistent but slower.
>>
>> Alex
>>
>> On Thu, Feb 28, 2013 at 3:35 PM, Yasin <[hidden email]> wrote:
>> > Hello everyone,
>> >
>> > From the zookeeper website I understand that zookeeper does not provide
>> > strict consistency in every instance in time.
>> > (
>> http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#ch_zkGuarantees>> )
>> > Have ever anyone considered to make zookeeper strictly consistent at
>> > anytime. What I mean is that any time a value is updated in zookeeper,
>> any
>> > client that retrieves the value from any follower should get consistent
>> > result. Is it feasible to improve the zookeeper core so that zookeeper
>> > delivers strict consistency not the eventual consistency?
>> >
>> > Best
>> >
>> > Yasin
>> >
>> >
>> >
>> > --
>> > View this message in context:
>> http://zookeeper-user.578899.n2.nabble.com/Consistency-in-zookeeper-tp7578531.html>> > Sent from the zookeeper-user mailing list archive at Nabble.com.
>>

Re: Consistency in zookeeper

Let me add a couple points to this thread. Yasin didn't ask about a concrete use case, it sounds more like an exploration question rather than a question about how to solve a particular problem. If there is a use case behind the question, it would be great to hear about it.

One reason we had to serve read requests locally comes from the assumption that zookeeper traffic is dominated by reads. By processing read requests locally, we can increase throughput capacity by adding more servers.

The consistency guarantee that zookeeper provides is not eventual in the sense I'm used to: replicas can diverge but they eventually converge. ZK replica servers don't diverge but they can be arbitrarily behind on the application of updates that have been decided upon. We can control to some extent how far behind a follower can be by changing syncLimit.

> its possible, but what it gets you is that the read will see at least
> the writes that completed before the sync started.
> possibly later writes too. Actually, this is true only with some
> timing assumption. As was previously discussed on the
> list, in order to really guarantee this property even with leader
> failures, the leader would have to broadcast sync commands just like
> updates,
> which it currently doesn't do for some reason.
>
> Alex
>
> On Fri, Mar 1, 2013 at 9:49 AM, kishore g <[hidden email]> wrote:
>> Will sync and read really help to achieve what Yasin wants ? is it not
>> possible for value to change between sync and read?
>>
>> Thanks
>> Kishore G
>>
>>
>> On Thu, Feb 28, 2013 at 9:32 PM, Rakesh R <[hidden email]> wrote:
>>
>>> Hi Yasin,
>>>
>>> Adding one more point,
>>>
>>> ZooKeeper provides different ways of achieving data sync. Like Alex &
>>> Vladimir explained, sync() api is one way and it has the overhead of
>>> performance.
>>>
>>> Another approach is to define Watchers. This also will be helpful to keep
>>> in sync the data between the clients. Its internally using the asynchronous
>>> way of notifying different events. Also, its very light-weight and here
>>> user/client should define specific watchers to achieve the synchronized
>>> view of data.
>>>
>>> ZK supports various events like NodeDataChanged, NodeChildrenChanged.
>>> Since it is asynchronous, there will be slight latency in recieving the
>>> events.
>>>
>>> Reference:
>>>
>>> http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#ch_zkWatches>>> Section: •The data for which the watch was set
>>>
>>>
>>> http://zookeeper.apache.org/doc/r3.2.2/zookeeperTutorial.html#sc_producerConsumerQueues>>>
>>> -Rakesh
>>> ________________________________________
>>> From: Alexander Shraer [[hidden email]]
>>> Sent: Friday, March 01, 2013 5:19 AM
>>> To: [hidden email]>>> Cc: [hidden email]>>> Subject: Re: Consistency in zookeeper
>>>
>>> Hi Yasin,
>>>
>>> I assume you mean "linearizability" by "strict consistency".
>>>
>>> ZooKeeper provides "sequential consistency". This is weaker than
>>> linearizability but is still very strong, much stronger than "eventual
>>> consistency".
>>> In addition, all update operations are linearizable as they are
>>> sequenced by the leader. With sequential consistency, a reader never
>>> "goes back in time"
>>> even if you read from a different follower every time, you'll never
>>> see version 3 of the data after seeing version 4.
>>>
>>> ZooKeeper also provides a sync command. If you invoke a sync command
>>> and then a read, the read is guaranteed to see at least the last write
>>> that
>>> completed before the sync started. So if you always do "sync + read"
>>> instead of just "read", you get linearizability. But you pay in
>>> performance since
>>> these reads will no longer be executed locally on the follower to
>>> which you're connected - they sync is sent to the leader. That's why
>>> ZooKeeper gives
>>> you the option of doing a fast read that is consistent but may
>>> retrieve a slightly old version, or a sync+read that is more
>>> consistent but slower.
>>>
>>> Alex
>>>
>>> On Thu, Feb 28, 2013 at 3:35 PM, Yasin <[hidden email]> wrote:
>>>> Hello everyone,
>>>>
>>>> From the zookeeper website I understand that zookeeper does not provide
>>>> strict consistency in every instance in time.
>>>> (
>>> http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#ch_zkGuarantees>>> )
>>>> Have ever anyone considered to make zookeeper strictly consistent at
>>>> anytime. What I mean is that any time a value is updated in zookeeper,
>>> any
>>>> client that retrieves the value from any follower should get consistent
>>>> result. Is it feasible to improve the zookeeper core so that zookeeper
>>>> delivers strict consistency not the eventual consistency?
>>>>
>>>> Best
>>>>
>>>> Yasin
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>> http://zookeeper-user.578899.n2.nabble.com/Consistency-in-zookeeper-tp7578531.html>>>> Sent from the zookeeper-user mailing list archive at Nabble.com.
>>>

Re: Consistency in zookeeper

I am trying to build a system that is always consistent to any client. For example a client sends a write request to update x from x=4 to x=5 to zookeeper and zookeeper leader sends this write request to the followers. In the meantime, the same client wants to read x, and it gets the old value (x=4) from some follower which has not updated the x value. I understand client will get x=5 if it sync before read. This is the consistency model that zookeeper provides. In this case the performance will decrease.

Let me add a couple points to this thread. Yasin didn't ask about a concrete use case, it sounds more like an exploration question rather than a question about how to solve a particular problem. If there is a use case behind the question, it would be great to hear about it.

One reason we had to serve read requests locally comes from the assumption that zookeeper traffic is dominated by reads. By processing read requests locally, we can increase throughput capacity by adding more servers.

The consistency guarantee that zookeeper provides is not eventual in the sense I'm used to: replicas can diverge but they eventually converge. ZK replica servers don't diverge but they can be arbitrarily behind on the application of updates that have been decided upon. We can control to some extent how far behind a follower can be by changing syncLimit.

> its possible, but what it gets you is that the read will see at least
> the writes that completed before the sync started.
> possibly later writes too. Actually, this is true only with some
> timing assumption. As was previously discussed on the
> list, in order to really guarantee this property even with leader
> failures, the leader would have to broadcast sync commands just like
> updates,
> which it currently doesn't do for some reason.
>
> Alex
>
> On Fri, Mar 1, 2013 at 9:49 AM, kishore g <[hidden email]> wrote:
>> Will sync and read really help to achieve what Yasin wants ? is it not
>> possible for value to change between sync and read?
>>
>> Thanks
>> Kishore G
>>
>>
>> On Thu, Feb 28, 2013 at 9:32 PM, Rakesh R <[hidden email]> wrote:
>>
>>> Hi Yasin,
>>>
>>> Adding one more point,
>>>
>>> ZooKeeper provides different ways of achieving data sync. Like Alex &
>>> Vladimir explained, sync() api is one way and it has the overhead of
>>> performance.
>>>
>>> Another approach is to define Watchers. This also will be helpful to keep
>>> in sync the data between the clients. Its internally using the asynchronous
>>> way of notifying different events. Also, its very light-weight and here
>>> user/client should define specific watchers to achieve the synchronized
>>> view of data.
>>>
>>> ZK supports various events like NodeDataChanged, NodeChildrenChanged.
>>> Since it is asynchronous, there will be slight latency in recieving the
>>> events.
>>>
>>> Reference:
>>>
>>> http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#ch_zkWatches

>>> -Rakesh
>>> ________________________________________
>>> From: Alexander Shraer [[hidden email]]
>>> Sent: Friday, March 01, 2013 5:19 AM
>>> To: [hidden email]>>> Cc: [hidden email]>>> Subject: Re: Consistency in zookeeper
>>>
>>> Hi Yasin,
>>>
>>> I assume you mean "linearizability" by "strict consistency".
>>>
>>> ZooKeeper provides "sequential consistency". This is weaker than
>>> linearizability but is still very strong, much stronger than "eventual
>>> consistency".
>>> In addition, all update operations are linearizable as they are
>>> sequenced by the leader. With sequential consistency, a reader never
>>> "goes back in time"
>>> even if you read from a different follower every time, you'll never
>>> see version 3 of the data after seeing version 4.
>>>
>>> ZooKeeper also provides a sync command. If you invoke a sync command
>>> and then a read, the read is guaranteed to see at least the last write
>>> that
>>> completed before the sync started. So if you always do "sync + read"
>>> instead of just "read", you get linearizability. But you pay in
>>> performance since
>>> these reads will no longer be executed locally on the follower to
>>> which you're connected - they sync is sent to the leader. That's why
>>> ZooKeeper gives
>>> you the option of doing a fast read that is consistent but may
>>> retrieve a slightly old version, or a sync+read that is more
>>> consistent but slower.
>>>
>>> Alex
>>>
>>> On Thu, Feb 28, 2013 at 3:35 PM, Yasin <[hidden email]> wrote:
>>>> Hello everyone,
>>>>
>>>> From the zookeeper website I understand that zookeeper does not provide
>>>> strict consistency in every instance in time.
>>>> (
>>> http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#ch_zkGuarantees

>>> )

>>>> Have ever anyone considered to make zookeeper strictly consistent at
>>>> anytime. What I mean is that any time a value is updated in zookeeper,
>>> any
>>>> client that retrieves the value from any follower should get consistent
>>>> result. Is it feasible to improve the zookeeper core so that zookeeper
>>>> delivers strict consistency not the eventual consistency?
>>>>
>>>> Best
>>>>
>>>> Yasin
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>> http://zookeeper-user.578899.n2.nabble.com/Consistency-in-zookeeper-tp7578531.html

>>>> Sent from the zookeeper-user mailing list archive at Nabble.com.
>>>

If you reply to this email, your message will be added to the discussion below:

Re: Consistency in zookeeper

I am trying to build a system that is always consistent to any client. For example a client sends a write request to update x from x=4 to x=5 to zookeeper and zookeeper leader sends this write request to the followers. In the meantime, the same client wants to read x, and it gets the old value (x=4) from some follower which has not updated the x value. I understand client will get x=5 if it sync before read. This is the consistency model that zookeeper provides. In this case the performance will decrease.

Re: Consistency in zookeeper

For the same client (same zookeeper connection handle), that is already
guaranteed. The only case read after write is not guaranteed would be that
you get disconnected after writing and then connect to another zookeeper
server for read.

You can probably work around this by doing a sync in the SYNCCONNECTED
event callback.

> I am trying to build a system that is always consistent to any client. For
> example a client sends a write request to update x from x=4 to x=5 to
> zookeeper and zookeeper leader sends this write request to the followers.
> In
> the meantime, the same client wants to read x, and it gets the old value
> (x=4) from some follower which has not updated the x value. I understand
> client will get x=5 if it sync before read. This is the consistency model
> that zookeeper provides. In this case the performance will decrease.
>
>
>
> --
> View this message in context:
> http://zookeeper-user.578899.n2.nabble.com/Consistency-in-zookeeper-tp7578531p7578540.html> Sent from the zookeeper-user mailing list archive at Nabble.com.
>

Re: Consistency in zookeeper

> Yasin,
>
> If the two clients are connected to two different ZooKeeper servers in the
> cluster, then, yes.
>
> Generally, if you're worried that there may be another client working on
> the same key path, then you should sync() before reading.
>
> Best Regards,
> Martin Kou
>
> On Fri, Mar 1, 2013 at 1:38 PM, Yasin <[hidden email]> wrote:
>
>> So, if the read request is made by some other client, it will not get the
>> updated value without sync, right?
>>
>>
>>
>> --
>> View this message in context:
>> http://zookeeper-user.578899.n2.nabble.com/Consistency-in-zookeeper-tp7578531p7578542.html>> Sent from the zookeeper-user mailing list archive at Nabble.com.
>>

Re: Consistency in zookeeper

Yes. Sync doesn't guarantee up to date. It guarantees an ordering. It
guarantees that if event A involves a ZK update and if you can guarantee
that A occurs before sync, then any read on a client C that is done after a
sync on C will see a successor state of A.

> Even if you do the sync, another client can make a change before you do
> the subsequent read.
>
> -JZ
>
> On Mar 1, 2013, at 1:50 PM, Martin Kou <[hidden email]> wrote:
>
> > Yasin,
> >
> > If the two clients are connected to two different ZooKeeper servers in
> the
> > cluster, then, yes.
> >
> > Generally, if you're worried that there may be another client working on
> > the same key path, then you should sync() before reading.
> >
> > Best Regards,
> > Martin Kou
> >
> > On Fri, Mar 1, 2013 at 1:38 PM, Yasin <[hidden email]> wrote:
> >
> >> So, if the read request is made by some other client, it will not get
> the
> >> updated value without sync, right?
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >>
> http://zookeeper-user.578899.n2.nabble.com/Consistency-in-zookeeper-tp7578531p7578542.html> >> Sent from the zookeeper-user mailing list archive at Nabble.com.
> >>
>
>

RE: Consistency in zookeeper

sync() guarantees that it will synchronize the data between the zk servers at 't' th time.

Say we have two clients and both are working on the same key path:

First client C1, is updating the value of x at 't1', 't2' and 't3' as follows.
at t1 time, value of x = 4
at t2 time, update value of x = 5
at t3 time, update value of x = 6

Second client C2, which is doing sync() at 't2' time and invoke a read() req at 't3' time.
(Ignoring the race condition between the updation of C1 and sync of C2, here assume C1 update has happened first). Now C2 will see value of x=5 from any of the ZK servers(Leader/Followers), but C2 is not guranteed to see value of x=6, as updation happened after sync() api call.

Yes. Sync doesn't guarantee up to date. It guarantees an ordering. It
guarantees that if event A involves a ZK update and if you can guarantee
that A occurs before sync, then any read on a client C that is done after a
sync on C will see a successor state of A.

> Even if you do the sync, another client can make a change before you do
> the subsequent read.
>
> -JZ
>
> On Mar 1, 2013, at 1:50 PM, Martin Kou <[hidden email]> wrote:
>
> > Yasin,
> >
> > If the two clients are connected to two different ZooKeeper servers in
> the
> > cluster, then, yes.
> >
> > Generally, if you're worried that there may be another client working on
> > the same key path, then you should sync() before reading.
> >
> > Best Regards,
> > Martin Kou
> >
> > On Fri, Mar 1, 2013 at 1:38 PM, Yasin <[hidden email]> wrote:
> >
> >> So, if the read request is made by some other client, it will not get
> the
> >> updated value without sync, right?
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >>
> http://zookeeper-user.578899.n2.nabble.com/Consistency-in-zookeeper-tp7578531p7578542.html> >> Sent from the zookeeper-user mailing list archive at Nabble.com.
> >>
>
>