If we assume that the server is not up, this will mean that the subsequent
select() call would return 0, since the fd is not ready, and future calls to
zookeeper_interest will always return 0 and not the expected ZCONNECTIONLOSS.
Thus an upstream client will never be aware that the connection is lost.

I don't think this is the expected behavior. I have temporarily patched the zk
C client such that zookeeper_interest will return ZCONNECTIONLOSS if it's still
unable to connect after session_timeout has been exceeded.

I missed initially that the bug reports the use of the single-thread C client. I have checked the multi-thread one and the transition of states CONNECTED->CONNECTING->CONNECTED in my client code as I stop and restart the server works fine. I'll try to reproduce the problem with the single-thread client.

Flavio Junqueira
added a comment - 29/Apr/14 09:11 I missed initially that the bug reports the use of the single-thread C client. I have checked the multi-thread one and the transition of states CONNECTED->CONNECTING->CONNECTED in my client code as I stop and restart the server works fine. I'll try to reproduce the problem with the single-thread client.

I'm not convinced that this is an issue. I have tested the single-threaded client and I get the state changes correctly via the default watcher. That's the right way to observe the state changes rather than relying on the return value of zookeeper_interest. A call to zookeeper_process will make sure to deliver the events. See this for an example:

Flavio Junqueira
added a comment - 20/May/16 21:38 I'm not convinced that this is an issue. I have tested the single-threaded client and I get the state changes correctly via the default watcher. That's the right way to observe the state changes rather than relying on the return value of zookeeper_interest . A call to zookeeper_process will make sure to deliver the events. See this for an example:
https://github.com/apache/zookeeper/blob/trunk/src/c/src/cli.c
I could use a second pair of eyes to confirm this, perhaps Yunong Xiao if still around.

Flavio Junqueira I can confirm that using watcher can get expected connection events. Here is my test case using single thread ZK library https://goo.gl/hql4B1. I've tried to start / stop server before / after my tests run and the state changes from watcher are expected. The return code of zookeeper_interest though is not reliable (for example, zookeeper_interest will return OK when the server is dead). Maybe we should add some documentations on the zookeeper_interest to indicate the return code of it should NOT be used for deciding connection status between client and server.

Michael Han
added a comment - 22/May/16 00:21 Flavio Junqueira I can confirm that using watcher can get expected connection events. Here is my test case using single thread ZK library https://goo.gl/hql4B1 . I've tried to start / stop server before / after my tests run and the state changes from watcher are expected. The return code of zookeeper_interest though is not reliable (for example, zookeeper_interest will return OK when the server is dead). Maybe we should add some documentations on the zookeeper_interest to indicate the return code of it should NOT be used for deciding connection status between client and server.

Michael Han You're right, unfortunately we don't have good online documentation for the C client, and I'm totally for documenting it. I'm thinking that perhaps we should have a documentation jira for the C client and have one of the tasks be to document the use of the single-thread C client. I'd say that we resolve this issue so that we don't have it as a blocker anymore for 3.5.2. We can alternatively keep it open, link it to the documentation jira, but drop the priority. What do you think?

Btw, I used this code to test the behavior of the single-thread C client:

Flavio Junqueira
added a comment - 22/May/16 13:55 Michael Han You're right, unfortunately we don't have good online documentation for the C client, and I'm totally for documenting it. I'm thinking that perhaps we should have a documentation jira for the C client and have one of the tasks be to document the use of the single-thread C client. I'd say that we resolve this issue so that we don't have it as a blocker anymore for 3.5.2. We can alternatively keep it open, link it to the documentation jira, but drop the priority. What do you think?
Btw, I used this code to test the behavior of the single-thread C client:
https://github.com/fpj/zookeeper-book-example/blob/master/src/main/c/master.c