I'm hosting an intern this summer. One project I've been thinkingabout is to decouple zab from zookeeper. There are many use caseswhere you need a quorum based replication, but the hierarchical datamodel doesn't work well. A smallish (~1GB?) replicated key-value storewith millions of entires is one such example. The goal of the projectis to decouple the consensus algorithm (zab) from the data model(zookeeper) more cleanly so that the users can define their own datamodels and use zab to replicate the data.

I have 2 questions:

1. Are there any caveats that I should be aware of? For example,transactions need to be idempotent to allow fuzzy snapshotting.2. Is this useful? Personally I've seen many use cases where thiswould be very useful, but I'd like to hear what you guys think.

1- You'd like to be able to plug in new algorithms or at least make a clear separation of the replication protocol and the logic of the service. 2- You'd like to have an implementation of Zab that you could use for other things, like a kv store.

I think you're focusing more on 2. You can definitely use Zab for other things, and I'm all for it. It would probably be better to just implement the protocol from scratch rather than extract it from ZooKeeper. In fact, it might be worth having a look at ZK-30 (old one, huh?).

In the case of reimplementing it, it might be worth doing it outside ZooKeeper, as a separate project. It could be an incubated project.

On 31 May 2014 14:29, Michi Mutsuzaki <[EMAIL PROTECTED]> wrote:I think this is super useful. As Flavio said, I think there are twoapproaches: having ZAB as a library first orcarving out the ZAB bits and having a generic interface to plug in otherprotocols.

From the ZooKeeper's project PoV, I think that the latter would be awesome,because we can cleanup a lot of code as it happens.

From an intern project's PoV, it sounds like working on an independent ZABimplementation (libzab?) from scratchis easier to target (and will have no impedance, getting huge changesmerged into ZooKeeper takes times...).-rgs

Thank you Flavio and Raul.Thank you for pointing me to ZOOKEEPER-30. Yes, I was focused more on2, but it's definitely a good idea to have a generic interface foratomic broadcast so that you can plug in different algorithms. Itseems like the project can be broken into 3 pieces:

1. Define an interface for atomic broadcast. I'm not sure how thingslike session tracker and dynamic reconfig fits into this.2. Add a ZAB implementation of the interface.3. Create a simple reference implementation of a service (maybe asimple key-value store or a benchmark tool).

I agree with both of you that it's better to do this as a separateproject. Also, It might be better to do this as an incubator projectfrom the beginning. I think it makes it easier for people fromdifferent organizations to collaborate. I'm willing to champion theproject.

The use case this project is going after is to durably replicatein-memory state. I think this project can differentiate itself fromBookKeeper.

1. BookKeeper is pretty heavyweight, as you need to deploy ZooKeeperand bookies. I think there are use cases where you don't need thehorizontal scalability BookKeeper provides, and you prefer to have alight-weight library for replicating state. ZooKeeper is one suchexample :)2. Please correct me if I'm wrong, but BookKeeper is not designed formaintaining multiple in-memory replicas. A ledger can't be opened forreading if it's already open for writing, and you need to recover byrestoring from a snapshot and replaying log entries if the writer goesdown.3. ZOOKEEPER-30, which I wasn't initially aware of, is anothermotivation. I think there is a value in having a common interface forconsensus algorithms so that services can plug in differentimplementations. This makes it easier to benchmark and testcorrectness of various implementations.On Sun, Jun 1, 2014 at 3:05 AM, Ivan Kelly <[EMAIL PROTECTED]> wrote:

I'm not sure it is worth transforming this discussion into a bk vs. zk/zab. I think the space they target is different, although they both deal with replication. It does sound worth having a separate zab implementation, but it isn't clear that it is worth separating zab in the zookeeper code base.

There seem to be some misconceptions here, so here are some clarifications:

- Zab itself doesn't deal with snapshots, it essentially replicates a log. The use of snapshots is an optimization to speed up recovery, and sure, it fits well into the framework of the protocol.- BookKeeper indeed relies on zk because it requires a component for configuration and metadata of ledgers. By relying on a separate configuration component, the pool of bookies can grow and shrink arbitrarily, and such changes do not affect write performance like with zk. The configuration component, however, needs the properties of a protocol like zab, so we still need something like zab.- Calling BK heavyweight is a bit of a stretch. Bookies + zk makes only two components! These are not production numbers, but I don't see a deployment with fewer than 10 machines (5 for ZK + 5 bookies) being very interesting. If that's a significant fraction of your overall server footprint, then sure, it is heavy for you.

Thank you for the clarifications Flavio. I guess 'heavyweight' is arelative term. A typical use cases I deal with is to replicate smallamount of data (<1GB) among 3 ~ 5 servers, and having access to zabwould be very useful.

I didn't mean to suggest to separate zab in the zookeeper code base. Ireferred to ZOOKEEPER-30 to highlight the usefulness of having acommon interface for replication protocol.

I think that reconfig should be the responsibility of the atomic broadcast/ replicated log implementation (if supported by the specificimplementation). Client management and sessions seem like applicationdependent.

I agree that the reconfiguration is a responsibility of the atomicbroadcast. I feel that session management might need to rely on theatomic broadcast exposing additional primitives. For example, rightnow ZooKeeper forwards session information to the leader bypiggybacking it in the quorum ping packets.

Let me know if you know good open source libraries for references. Sofar I've looked at ZooKeeper and goraft.

I was thinking from the point of view that if you want to provide ZABas a library, then the library will have to provide an RPC mechanismfor talking to other members of the quorum, and a means to persistupdates to disk before responding, and _then_ provide a ZABimplementation somewhere in between. This doesn't seem much lighterthan BK.

I think it's a worthwhile thing to pursue, but I disagree that aseparate project is a better way to doing it. If this is an internproject, expecting them to reimplement ZAB might be a bit of a largeask (depending on the internship length and the internthemselves). An investigation into splitting the user interface layerof zookeeper and ZAB seems itself to be a nice chunk to work on, andit has the advantage that even if the changes don't get merged intotrunk, there will be a clearer picture as to why they can't besplit.

You can read from a ledger while it is being written to, but right nowit's polling. Twitter are working on some changes to make it morenotification like to reduce latency between the primary writing andthe secondary reading.

- I don't see a reason for tying the releases of an independentimplementation of Zab to ZooKeeper- The set of developers (and committers) interested in an independentimplementation of Zab might be different compared to ZooKeeper; it couldreally be a separate community- It really feels like parallel efforts along the lines of Curator andBookKeeper, so I see it following similar steps

Regarding the effort of an intern, I guess it depends how far you want theinitial stretch to go. An initial implementation to contribute to Apachefollowed by community activity might get it going.

I agree with Flavio about keeping this a separate project. Having saidthat, at the point I'm not 100% sure whether the intern will implementZAB completely from scratch, or start from a fork of the ZooKeepercode base. At this point I'm somewhat leaning towards using theZooKeeper code base as a starting point. As Ivan pointed out, it'spretty ambitious to implement ZAB correctly in a short amount of time,and it would be good to have something demonstrable at the end of theinternship.On Mon, Jun 2, 2014 at 9:19 AM, FPJ <[EMAIL PROTECTED]> wrote:

It would be great to do a clean implementation of Zab. We have added a lot crap for backward compatibility, and the reconfig stuff, although a great feature properly implemented, didn't improve the state of the code. Also, an implementation of the Zab protocol perhaps putting snapshots aside for v0.1, shouldn't take more than just a few weeks.

On 3 June 2014 12:44, Flavio Junqueira <[EMAIL PROTECTED]lid>wrote:A clean-room implementation of ZAB could indeed be awesome for multiplepurposes. Reasoning around the current implementation is some timeschallenging for us missing the historical context.

Yisheng has been working on this project for about 5 weeks for his12-week internship. Here is the current status:

- First of all, let me thank Flavio and Hongchao for their help. Idon't think the project would be where it is right now without theirsupport.- We have more or less functional implementation of zab in java. Youcan checkout the code here: https://github.com/ZK-1931/javazab- There is a simple reference server. It's an http based key-valuestore that uses javazab for replicating state:https://github.com/ZK-1931/zabkv- The implementation is missing 2 major features, dynamicreconfiguration and snapshotting. Yisheng is about to start working ondynamic reconfiguration.

It's fairly easy to run the reference server. It would be great if youcan play around with it and give us feedback.