>From the zookeeper website I understand that zookeeper does not providestrict consistency in every instance in time.(http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#ch_zkGuarantees)Have ever anyone considered to make zookeeper strictly consistent atanytime. What I mean is that any time a value is updated in zookeeper, anyclient that retrieves the value from any follower should get consistentresult. Is it feasible to improve the zookeeper core so that zookeeperdelivers strict consistency not the eventual consistency?

ZooKeeper provides "sequential consistency". This is weaker thanlinearizability but is still very strong, much stronger than "eventualconsistency".In addition, all update operations are linearizable as they aresequenced by the leader. With sequential consistency, a reader never"goes back in time"even if you read from a different follower every time, you'll neversee version 3 of the data after seeing version 4.

ZooKeeper also provides a sync command. If you invoke a sync commandand then a read, the read is guaranteed to see at least the last writethatcompleted before the sync started. So if you always do "sync + read"instead of just "read", you get linearizability. But you pay inperformance sincethese reads will no longer be executed locally on the follower towhich you're connected - they sync is sent to the leader. That's whyZooKeeper givesyou the option of doing a fast read that is consistent but mayretrieve a slightly old version, or a sync+read that is moreconsistent but slower.

Alex

On Thu, Feb 28, 2013 at 3:35 PM, Yasin <[EMAIL PROTECTED]> wrote:> Hello everyone,>> From the zookeeper website I understand that zookeeper does not provide> strict consistency in every instance in time.> (http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#ch_zkGuarantees)> Have ever anyone considered to make zookeeper strictly consistent at> anytime. What I mean is that any time a value is updated in zookeeper, any> client that retrieves the value from any follower should get consistent> result. Is it feasible to improve the zookeeper core so that zookeeper> delivers strict consistency not the eventual consistency?>> Best>> Yasin>>>> --> View this message in context: http://zookeeper-user.578899.n2.nabble.com/Consistency-in-zookeeper-tp7578531.html> Sent from the zookeeper-user mailing list archive at Nabble.com.

ZooKeeper provides different ways of achieving data sync. Like Alex & Vladimir explained, sync() api is one way and it has the overhead of performance.

Another approach is to define Watchers. This also will be helpful to keep in sync the data between the clients. Its internally using the asynchronous way of notifying different events. Also, its very light-weight and here user/client should define specific watchers to achieve the synchronized view of data.

ZK supports various events like NodeDataChanged, NodeChildrenChanged. Since it is asynchronous, there will be slight latency in recieving the events.

ZooKeeper provides "sequential consistency". This is weaker thanlinearizability but is still very strong, much stronger than "eventualconsistency".In addition, all update operations are linearizable as they aresequenced by the leader. With sequential consistency, a reader never"goes back in time"even if you read from a different follower every time, you'll neversee version 3 of the data after seeing version 4.

ZooKeeper also provides a sync command. If you invoke a sync commandand then a read, the read is guaranteed to see at least the last writethatcompleted before the sync started. So if you always do "sync + read"instead of just "read", you get linearizability. But you pay inperformance sincethese reads will no longer be executed locally on the follower towhich you're connected - they sync is sent to the leader. That's whyZooKeeper givesyou the option of doing a fast read that is consistent but mayretrieve a slightly old version, or a sync+read that is moreconsistent but slower.

Alex

On Thu, Feb 28, 2013 at 3:35 PM, Yasin <[EMAIL PROTECTED]> wrote:> Hello everyone,>> From the zookeeper website I understand that zookeeper does not provide> strict consistency in every instance in time.> (http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#ch_zkGuarantees)> Have ever anyone considered to make zookeeper strictly consistent at> anytime. What I mean is that any time a value is updated in zookeeper, any> client that retrieves the value from any follower should get consistent> result. Is it feasible to improve the zookeeper core so that zookeeper> delivers strict consistency not the eventual consistency?>> Best>> Yasin>>>> --> View this message in context: http://zookeeper-user.578899.n2.nabble.com/Consistency-in-zookeeper-tp7578531.html> Sent from the zookeeper-user mailing list archive at Nabble.com.

> Hi Yasin,>> Adding one more point,>> ZooKeeper provides different ways of achieving data sync. Like Alex &> Vladimir explained, sync() api is one way and it has the overhead of> performance.>> Another approach is to define Watchers. This also will be helpful to keep> in sync the data between the clients. Its internally using the asynchronous> way of notifying different events. Also, its very light-weight and here> user/client should define specific watchers to achieve the synchronized> view of data.>> ZK supports various events like NodeDataChanged, NodeChildrenChanged.> Since it is asynchronous, there will be slight latency in recieving the> events.>> Reference:>> http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#ch_zkWatches> Section: •The data for which the watch was set>>> http://zookeeper.apache.org/doc/r3.2.2/zookeeperTutorial.html#sc_producerConsumerQueues>> -Rakesh> ________________________________________> From: Alexander Shraer [[EMAIL PROTECTED]]> Sent: Friday, March 01, 2013 5:19 AM> To: [EMAIL PROTECTED]> Cc: [EMAIL PROTECTED]> Subject: Re: Consistency in zookeeper>> Hi Yasin,>> I assume you mean "linearizability" by "strict consistency".>> ZooKeeper provides "sequential consistency". This is weaker than> linearizability but is still very strong, much stronger than "eventual> consistency".> In addition, all update operations are linearizable as they are> sequenced by the leader. With sequential consistency, a reader never> "goes back in time"> even if you read from a different follower every time, you'll never> see version 3 of the data after seeing version 4.>> ZooKeeper also provides a sync command. If you invoke a sync command> and then a read, the read is guaranteed to see at least the last write> that> completed before the sync started. So if you always do "sync + read"> instead of just "read", you get linearizability. But you pay in> performance since> these reads will no longer be executed locally on the follower to> which you're connected - they sync is sent to the leader. That's why> ZooKeeper gives> you the option of doing a fast read that is consistent but may> retrieve a slightly old version, or a sync+read that is more> consistent but slower.>> Alex>> On Thu, Feb 28, 2013 at 3:35 PM, Yasin <[EMAIL PROTECTED]> wrote:> > Hello everyone,> >> > From the zookeeper website I understand that zookeeper does not provide> > strict consistency in every instance in time.> > (> http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#ch_zkGuarantees> )> > Have ever anyone considered to make zookeeper strictly consistent at> > anytime. What I mean is that any time a value is updated in zookeeper,> any> > client that retrieves the value from any follower should get consistent> > result. Is it feasible to improve the zookeeper core so that zookeeper> > delivers strict consistency not the eventual consistency?> >> > Best> >> > Yasin> >> >> >> > --> > View this message in context:> http://zookeeper-user.578899.n2.nabble.com/Consistency-in-zookeeper-tp7578531.html> > Sent from the zookeeper-user mailing list archive at Nabble.com.>

its possible, but what it gets you is that the read will see at leastthe writes that completed before the sync started.possibly later writes too. Actually, this is true only with sometiming assumption. As was previously discussed on thelist, in order to really guarantee this property even with leaderfailures, the leader would have to broadcast sync commands just likeupdates,which it currently doesn't do for some reason.

Alex

On Fri, Mar 1, 2013 at 9:49 AM, kishore g <[EMAIL PROTECTED]> wrote:> Will sync and read really help to achieve what Yasin wants ? is it not> possible for value to change between sync and read?>> Thanks> Kishore G>>> On Thu, Feb 28, 2013 at 9:32 PM, Rakesh R <[EMAIL PROTECTED]> wrote:>>> Hi Yasin,>>>> Adding one more point,>>>> ZooKeeper provides different ways of achieving data sync. Like Alex &>> Vladimir explained, sync() api is one way and it has the overhead of>> performance.>>>> Another approach is to define Watchers. This also will be helpful to keep>> in sync the data between the clients. Its internally using the asynchronous>> way of notifying different events. Also, its very light-weight and here>> user/client should define specific watchers to achieve the synchronized>> view of data.>>>> ZK supports various events like NodeDataChanged, NodeChildrenChanged.>> Since it is asynchronous, there will be slight latency in recieving the>> events.>>>> Reference:>>>> http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#ch_zkWatches>> Section: •The data for which the watch was set>>>>>> http://zookeeper.apache.org/doc/r3.2.2/zookeeperTutorial.html#sc_producerConsumerQueues>>>> -Rakesh>> ________________________________________>> From: Alexander Shraer [[EMAIL PROTECTED]]>> Sent: Friday, March 01, 2013 5:19 AM>> To: [EMAIL PROTECTED]>> Cc: [EMAIL PROTECTED]>> Subject: Re: Consistency in zookeeper>>>> Hi Yasin,>>>> I assume you mean "linearizability" by "strict consistency".>>>> ZooKeeper provides "sequential consistency". This is weaker than>> linearizability but is still very strong, much stronger than "eventual>> consistency".>> In addition, all update operations are linearizable as they are>> sequenced by the leader. With sequential consistency, a reader never>> "goes back in time">> even if you read from a different follower every time, you'll never>> see version 3 of the data after seeing version 4.>>>> ZooKeeper also provides a sync command. If you invoke a sync command>> and then a read, the read is guaranteed to see at least the last write>> that>> completed before the sync started. So if you always do "sync + read">> instead of just "read", you get linearizability. But you pay in>> performance since>> these reads will no longer be executed locally on the follower to>> which you're connected - they sync is sent to the leader. That's why>> ZooKeeper gives>> you the option of doing a fast read that is consistent but may>> retrieve a slightly old version, or a sync+read that is more>> consistent but slower.>>>> Alex>>>> On Thu, Feb 28, 2013 at 3:35 PM, Yasin <[EMAIL PROTECTED]> wrote:>> > Hello everyone,>> >>> > From the zookeeper website I understand that zookeeper does not provide>> > strict consistency in every instance in time.>> > (>> http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#ch_zkGuarantees>> )>> > Have ever anyone considered to make zookeeper strictly consistent at>> > anytime. What I mean is that any time a value is updated in zookeeper,>> any>> > client that retrieves the value from any follower should get consistent>> > result. Is it feasible to improve the zookeeper core so that zookeeper>> > delivers strict consistency not the eventual consistency?>> >>> > Best>> >>> > Yasin>> >>> >>> >>> > -->> > View this message in context:>> http://zookeeper-user.578899.n2.nabble.com/Consistency-in-zookeeper-tp7578531.html>> > Sent from the zookeeper-user mailing list archive at Nabble.com.

Let me add a couple points to this thread. Yasin didn't ask about a concrete use case, it sounds more like an exploration question rather than a question about how to solve a particular problem. If there is a use case behind the question, it would be great to hear about it.

One reason we had to serve read requests locally comes from the assumption that zookeeper traffic is dominated by reads. By processing read requests locally, we can increase throughput capacity by adding more servers.

The consistency guarantee that zookeeper provides is not eventual in the sense I'm used to: replicas can diverge but they eventually converge. ZK replica servers don't diverge but they can be arbitrarily behind on the application of updates that have been decided upon. We can control to some extent how far behind a follower can be by changing syncLimit.

> its possible, but what it gets you is that the read will see at least> the writes that completed before the sync started.> possibly later writes too. Actually, this is true only with some> timing assumption. As was previously discussed on the> list, in order to really guarantee this property even with leader> failures, the leader would have to broadcast sync commands just like> updates,> which it currently doesn't do for some reason.> > Alex> > On Fri, Mar 1, 2013 at 9:49 AM, kishore g <[EMAIL PROTECTED]> wrote:>> Will sync and read really help to achieve what Yasin wants ? is it not>> possible for value to change between sync and read?>> >> Thanks>> Kishore G>> >> >> On Thu, Feb 28, 2013 at 9:32 PM, Rakesh R <[EMAIL PROTECTED]> wrote:>> >>> Hi Yasin,>>> >>> Adding one more point,>>> >>> ZooKeeper provides different ways of achieving data sync. Like Alex &>>> Vladimir explained, sync() api is one way and it has the overhead of>>> performance.>>> >>> Another approach is to define Watchers. This also will be helpful to keep>>> in sync the data between the clients. Its internally using the asynchronous>>> way of notifying different events. Also, its very light-weight and here>>> user/client should define specific watchers to achieve the synchronized>>> view of data.>>> >>> ZK supports various events like NodeDataChanged, NodeChildrenChanged.>>> Since it is asynchronous, there will be slight latency in recieving the>>> events.>>> >>> Reference:>>> >>> http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#ch_zkWatches>>> Section: •The data for which the watch was set>>> >>> >>> http://zookeeper.apache.org/doc/r3.2.2/zookeeperTutorial.html#sc_producerConsumerQueues>>> >>> -Rakesh>>> ________________________________________>>> From: Alexander Shraer [[EMAIL PROTECTED]]>>> Sent: Friday, March 01, 2013 5:19 AM>>> To: [EMAIL PROTECTED]>>> Cc: [EMAIL PROTECTED]>>> Subject: Re: Consistency in zookeeper>>> >>> Hi Yasin,>>> >>> I assume you mean "linearizability" by "strict consistency".>>> >>> ZooKeeper provides "sequential consistency". This is weaker than>>> linearizability but is still very strong, much stronger than "eventual>>> consistency".>>> In addition, all update operations are linearizable as they are>>> sequenced by the leader. With sequential consistency, a reader never>>> "goes back in time">>> even if you read from a different follower every time, you'll never>>> see version 3 of the data after seeing version 4.>>> >>> ZooKeeper also provides a sync command. If you invoke a sync command>>> and then a read, the read is guaranteed to see at least the last write>>> that>>> completed before the sync started. So if you always do "sync + read">>> instead of just "read", you get linearizability. But you pay in>>> performance since>>> these reads will no longer be executed locally on the follower to>>> which you're connected - they sync is sent to the leader. That's why>>> ZooKeeper gives>>> you the option of doing a fast read that is consistent but may

I am trying to build a system that is always consistent to any client. Forexample a client sends a write request to update x from x=4 to x=5 tozookeeper and zookeeper leader sends this write request to the followers.In the meantime, the same client wants to read x, and it gets the old value(x=4) from some follower which has not updated the x value. I understandclient will get x=5 if it sync before read. This is the consistency modelthat zookeeper provides. In this case the performance will decrease.

> Let me add a couple points to this thread. Yasin didn't ask about a> concrete use case, it sounds more like an exploration question rather than> a question about how to solve a particular problem. If there is a use case> behind the question, it would be great to hear about it.>> One reason we had to serve read requests locally comes from the assumption> that zookeeper traffic is dominated by reads. By processing read requests> locally, we can increase throughput capacity by adding more servers.>> The consistency guarantee that zookeeper provides is not eventual in the> sense I'm used to: replicas can diverge but they eventually converge. ZK> replica servers don't diverge but they can be arbitrarily behind on the> application of updates that have been decided upon. We can control to some> extent how far behind a follower can be by changing syncLimit.>> -Flavio>>> On Mar 1, 2013, at 7:19 PM, Alexander Shraer <[hidden email]<http://user/SendEmail.jtp?type=node&node=7578538&i=0>>> wrote:>> > its possible, but what it gets you is that the read will see at least> > the writes that completed before the sync started.> > possibly later writes too. Actually, this is true only with some> > timing assumption. As was previously discussed on the> > list, in order to really guarantee this property even with leader> > failures, the leader would have to broadcast sync commands just like> > updates,> > which it currently doesn't do for some reason.> >> > Alex> >> > On Fri, Mar 1, 2013 at 9:49 AM, kishore g <[hidden email]<http://user/SendEmail.jtp?type=node&node=7578538&i=1>>> wrote:> >> Will sync and read really help to achieve what Yasin wants ? is it not> >> possible for value to change between sync and read?> >>> >> Thanks> >> Kishore G> >>> >>> >> On Thu, Feb 28, 2013 at 9:32 PM, Rakesh R <[hidden email]<http://user/SendEmail.jtp?type=node&node=7578538&i=2>>> wrote:> >>> >>> Hi Yasin,> >>>> >>> Adding one more point,> >>>> >>> ZooKeeper provides different ways of achieving data sync. Like Alex &> >>> Vladimir explained, sync() api is one way and it has the overhead of> >>> performance.> >>>> >>> Another approach is to define Watchers. This also will be helpful to> keep> >>> in sync the data between the clients. Its internally using the> asynchronous> >>> way of notifying different events. Also, its very light-weight and> here> >>> user/client should define specific watchers to achieve the> synchronized> >>> view of data.> >>>> >>> ZK supports various events like NodeDataChanged, NodeChildrenChanged.> >>> Since it is asynchronous, there will be slight latency in recieving> the> >>> events.> >>>> >>> Reference:> >>>> >>>> http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#ch_zkWatches> >>> Section: •The data for which the watch was set> >>>> >>>> >>>> http://zookeeper.apache.org/doc/r3.2.2/zookeeperTutorial.html#sc_producerConsumerQueues> >>>> >>> -Rakesh> >>> ________________________________________> >>> From: Alexander Shraer [[hidden email]<http://user/SendEmail.jtp?type=node&node=7578538&i=3>]>> >>> Sent: Friday, March 01, 2013 5:19 AM> >>> To: [hidden email]<http://user/SendEmail.jtp?type=node&node=7578538&i=4>> >>> Cc: [hidden email]<http://user/SendEmail.jtp?type=node&node=7578538&i=5>> >>> Subject: Re: Consistency in zookeeper

I am trying to build a system that is always consistent to any client. Forexample a client sends a write request to update x from x=4 to x=5 tozookeeper and zookeeper leader sends this write request to the followers. Inthe meantime, the same client wants to read x, and it gets the old value(x=4) from some follower which has not updated the x value. I understandclient will get x=5 if it sync before read. This is the consistency modelthat zookeeper provides. In this case the performance will decrease.

For the same client (same zookeeper connection handle), that is alreadyguaranteed. The only case read after write is not guaranteed would be thatyou get disconnected after writing and then connect to another zookeeperserver for read.

You can probably work around this by doing a sync in the SYNCCONNECTEDevent callback.

> I am trying to build a system that is always consistent to any client. For> example a client sends a write request to update x from x=4 to x=5 to> zookeeper and zookeeper leader sends this write request to the followers.> In> the meantime, the same client wants to read x, and it gets the old value> (x=4) from some follower which has not updated the x value. I understand> client will get x=5 if it sync before read. This is the consistency model> that zookeeper provides. In this case the performance will decrease.>>>> --> View this message in context:> http://zookeeper-user.578899.n2.nabble.com/Consistency-in-zookeeper-tp7578531p7578540.html> Sent from the zookeeper-user mailing list archive at Nabble.com.>

Even if you do the sync, another client can make a change before you do the subsequent read.

-JZ

On Mar 1, 2013, at 1:50 PM, Martin Kou <[EMAIL PROTECTED]> wrote:

> Yasin,> > If the two clients are connected to two different ZooKeeper servers in the> cluster, then, yes.> > Generally, if you're worried that there may be another client working on> the same key path, then you should sync() before reading.> > Best Regards,> Martin Kou> > On Fri, Mar 1, 2013 at 1:38 PM, Yasin <[EMAIL PROTECTED]> wrote:> >> So, if the read request is made by some other client, it will not get the>> updated value without sync, right?>> >> >> >> -->> View this message in context:>> http://zookeeper-user.578899.n2.nabble.com/Consistency-in-zookeeper-tp7578531p7578542.html>> Sent from the zookeeper-user mailing list archive at Nabble.com.>>

Yes. Sync doesn't guarantee up to date. It guarantees an ordering. Itguarantees that if event A involves a ZK update and if you can guaranteethat A occurs before sync, then any read on a client C that is done after async on C will see a successor state of A.

> Even if you do the sync, another client can make a change before you do> the subsequent read.>> -JZ>> On Mar 1, 2013, at 1:50 PM, Martin Kou <[EMAIL PROTECTED]> wrote:>> > Yasin,> >> > If the two clients are connected to two different ZooKeeper servers in> the> > cluster, then, yes.> >> > Generally, if you're worried that there may be another client working on> > the same key path, then you should sync() before reading.> >> > Best Regards,> > Martin Kou> >> > On Fri, Mar 1, 2013 at 1:38 PM, Yasin <[EMAIL PROTECTED]> wrote:> >> >> So, if the read request is made by some other client, it will not get> the> >> updated value without sync, right?> >>> >>> >>> >> --> >> View this message in context:> >>> http://zookeeper-user.578899.n2.nabble.com/Consistency-in-zookeeper-tp7578531p7578542.html> >> Sent from the zookeeper-user mailing list archive at Nabble.com.> >>>>

sync() guarantees that it will synchronize the data between the zk servers at 't' th time.

Say we have two clients and both are working on the same key path:

First client C1, is updating the value of x at 't1', 't2' and 't3' as follows.at t1 time, value of x = 4at t2 time, update value of x = 5at t3 time, update value of x = 6

Second client C2, which is doing sync() at 't2' time and invoke a read() req at 't3' time. (Ignoring the race condition between the updation of C1 and sync of C2, here assume C1 update has happened first). Now C2 will see value of x=5 from any of the ZK servers(Leader/Followers), but C2 is not guranteed to see value of x=6, as updation happened after sync() api call.

Yes. Sync doesn't guarantee up to date. It guarantees an ordering. Itguarantees that if event A involves a ZK update and if you can guaranteethat A occurs before sync, then any read on a client C that is done after async on C will see a successor state of A.

> Even if you do the sync, another client can make a change before you do> the subsequent read.>> -JZ>> On Mar 1, 2013, at 1:50 PM, Martin Kou <[EMAIL PROTECTED]> wrote:>> > Yasin,> >> > If the two clients are connected to two different ZooKeeper servers in> the> > cluster, then, yes.> >> > Generally, if you're worried that there may be another client working on> > the same key path, then you should sync() before reading.> >> > Best Regards,> > Martin Kou> >> > On Fri, Mar 1, 2013 at 1:38 PM, Yasin <[EMAIL PROTECTED]> wrote:> >> >> So, if the read request is made by some other client, it will not get> the> >> updated value without sync, right?> >>> >>> >>> >> --> >> View this message in context:> >>> http://zookeeper-user.578899.n2.nabble.com/Consistency-in-zookeeper-tp7578531p7578542.html> >> Sent from the zookeeper-user mailing list archive at Nabble.com.> >>>>