we can't get stats for new node when rebalance falls
if not self.swap_orchestrator:
ClusterOperationHelper._wait_warmup_completed(self, [master], bucket, wait_time=600)
i = 0
#we expect that rebalance will be failed
while rest._rebalance_progress_status() == "running" and i < 60:
self.log.info("rebalance progress:
{0}".format(rest._rebalance_progress()))
time.sleep(1)
i += 1
self.log.info("rebalance progress status:{0}

".format(rest._rebalance_progress_status()))
if rest._rebalance_progress_status() == "running":
self.log.info("rebalance is still running even after restarting memcached")
continue
knownNodes = rest.node_statuses();
self.log.info("nodes are still in cluster:

Chiyoung Seo
added a comment - 10/Jan/13 4:08 PM - edited Andrei,
Aliaskey A, Farshid, and I discussed the test case that you ran, and identified some stuffs that we can improve in the test case.
Basically, we should wait enough until the first rebalance fails instead of waiting for 60 sec in three iterations.
Therefore, the suggestion is
1) Remove the for loop in the code
2) Wait for one hour as the timeout for the first rebalance failure
3) If the rebalance status is not changed even after an hour, print out the error message and exit the test case.
Can you rerun the test case after adapting the test code?
Thanks,

So we got to rebalance fell in ~5 minutes after we killed memcached(13:47:53-13:52:36), and all this time the progress rebalance changed(~40->50%)

I would not want to change the logic of this test, so it works for a long time. In addition, I have a question about your recommendations:
1) why I need to remove loop?
2) why do we need to wait an hour for the first rebalance failure? In our case it turns out that we were waiting for the fall of about 5 minutes after the killing memcached
3)'If the rebalance status is not changed even after an hour' - do you mean it stuck?

>>I would not want to change the logic of this test, so it works for a long time. In addition, I have a question about your recommendations:
if the test logic in incorrect and have been passing we still need to fix the test logic.

>>2) why do we need to wait an hour for the first rebalance failure? In our case it turns out that we were waiting for the fall of about 5 minutes after the killing memcached

so currently we only wait up to 60 seconds for rebalance to fail or complete. that is not enough. rebalance could continue even though memcached which was shutdown on one of the nodes. consider a use case where rebalance has already rebalanced out node X from 6 node cluster and now removing node Y. if test shuts down memcached X it would not impact rebalance operation and it will succeed.

so the test should wait for rebalance to finish and ignore whether rebalance after memcached shutdown has succeeded or failed.

>>1) Remove the for loop in the code
I actually dont know know why there is a for loop there. can you take a look and see why we needed the for looop there ?

Farshid Ghods (Inactive)
added a comment - 11/Jan/13 12:16 PM >>I would not want to change the logic of this test, so it works for a long time. In addition, I have a question about your recommendations:
if the test logic in incorrect and have been passing we still need to fix the test logic.
>>2) why do we need to wait an hour for the first rebalance failure? In our case it turns out that we were waiting for the fall of about 5 minutes after the killing memcached
so currently we only wait up to 60 seconds for rebalance to fail or complete. that is not enough. rebalance could continue even though memcached which was shutdown on one of the nodes. consider a use case where rebalance has already rebalanced out node X from 6 node cluster and now removing node Y. if test shuts down memcached X it would not impact rebalance operation and it will succeed.
so the test should wait for rebalance to finish and ignore whether rebalance after memcached shutdown has succeeded or failed.
>>1) Remove the for loop in the code
I actually dont know know why there is a for loop there. can you take a look and see why we needed the for looop there ?

Farshid Ghods (Inactive)
added a comment - 11/Jan/13 12:17 PM >>>3)'If the rebalance status is not changed even after an hour' - do you mean it stuck?
yes. I think we already have a code in monitorRebalance that detects if rebalanceProgress is not changing and declares that it is stuck and fails the rebalance.

Farshid Ghods (Inactive)
added a comment - 11/Jan/13 12:17 PM >>>3)'If the rebalance status is not changed even after an hour' - do you mean it stuck?
yes. I think we already have a code in monitorRebalance that detects if rebalanceProgress is not changing and declares that it is stuck and fails the rebalance.