We looked at this pretty closely today. The issue here is that the client as designed relies on the get() from the caller to trigger the timeout. An operation will, somewhat correctly, never transition to isDone() or isCancelled() unless someone cares to use it.

The scenario that was likely in play over the WAN here is that the request was in flight to the server while the config was in flight down to the client. It arrives at the server, but is never responded to. Since the get() is never called, it'll never time out and transition to the canceled state.

We recommend you change the test code to use the queue more like a queue and just get() each one. Iterating through the queue is a bit funny in the first place, but if using the get() on the Future objects, you'll still have asynchronous behavior and much of the time the get() will be returning since the data is already there.

Matt Ingenthron
added a comment - 24/Oct/12 8:44 AM We looked at this pretty closely today. The issue here is that the client as designed relies on the get() from the caller to trigger the timeout. An operation will, somewhat correctly, never transition to isDone() or isCancelled() unless someone cares to use it.
The scenario that was likely in play over the WAN here is that the request was in flight to the server while the config was in flight down to the client. It arrives at the server, but is never responded to. Since the get() is never called, it'll never time out and transition to the canceled state.
We recommend you change the test code to use the queue more like a queue and just get() each one. Iterating through the queue is a bit funny in the first place, but if using the get() on the Future objects, you'll still have asynchronous behavior and much of the time the get() will be returning since the data is already there.

I would not try this test manually.. the use case in more detail is as follows:

- Single CouchbaseClient object
- 20 user threads. 10 setting and 10 getting the same sorts of kv
- Operations are done asynchronously. They are submitted into a queue which is then checked periodically for isDone/isCancelled.
- 4 node cluster. Nodes are removed, connections are broken

The issue is those polling methods never returning true, unless they are retrieved synchronously (i.e. ft.get()).. which is actually an accidental detail

Mark Nunberg
added a comment - 05/Oct/12 9:56 AM Michael,
I would not try this test manually.. the use case in more detail is as follows:
- Single CouchbaseClient object
- 20 user threads. 10 setting and 10 getting the same sorts of kv
- Operations are done asynchronously. They are submitted into a queue which is then checked periodically for isDone/isCancelled.
- 4 node cluster. Nodes are removed, connections are broken
The issue is those polling methods never returning true, unless they are retrieved synchronously (i.e. ft.get()).. which is actually an accidental detail

This issue is a blocker for executing more integration tests on java sdk. are there workarounds to avoid this use case or a fix on the way ?
Please assign this back to Mark if more information or logs needed for this issue

Farshid Ghods (Inactive)
added a comment - 03/Oct/12 2:21 PM Matt/Rags,
This issue is a blocker for executing more integration tests on java sdk. are there workarounds to avoid this use case or a fix on the way ?
Please assign this back to Mark if more information or logs needed for this issue