Activity

This doesn't appear to be strictly related to Java7. I was able to get similar code to work on Java7. I even simulated loading a Play2 environment with other JARs that could potentially cause conflicts. It's possible that my simulation of the Play2 environment was insufficient and that Play2 does other stuff...

Michael Nitschinger
added a comment - 27/Nov/12 9:27 AM - edited Are you running on jdk7 on mac? Also, when you use play 2.1 it should work. Play2-related issues more were because of netty version incompatibilities.

I tried using a different netty, but no luck. What worked for me was moving calls to "new CouchbaseClient" out of static "object" initializers, since they appeared to be getting called in the Netty I/O thread (I got this exception in new CouchbaseClient: java.lang.IllegalStateException: await*() in I/O thread causes a dead lock or sudden performance drop. Use addListener() instead or call await*() from a different thread).

Quinn Slack
added a comment - 27/Nov/12 2:24 PM This is jdk7 on Arch Linux and Play 2.1-RC1.
I tried using a different netty, but no luck. What worked for me was moving calls to "new CouchbaseClient" out of static "object" initializers, since they appeared to be getting called in the Netty I/O thread (I got this exception in new CouchbaseClient: java.lang.IllegalStateException: await*() in I/O thread causes a dead lock or sudden performance drop. Use addListener() instead or call await*() from a different thread).

Quinn Slack
added a comment - 28/Nov/12 6:30 PM They may be related--or may just have similar symptoms. I was not getting timeout exceptions, but I was seeing view requests hang indefinitely until I made the fix described above.

Michael Nitschinger
added a comment - 13/Mar/13 7:46 AM Hehe no worries.
Can you do me a favor and remove the factory for a second and just use CouchbaseClient() without anything else? Let me know if the problem still persists or not! Thanks

Hi Michael,
First i want to thank you for the quick feed-back. Your owesome !!!

Before using the factory i used the CouchbaseClient(uris, bucketName, "") which is also timing out. I added the factory to be able increase the timeout times for the connection but still the same problem.

dragos
added a comment - 13/Mar/13 7:53 AM Hi Michael,
First i want to thank you for the quick feed-back. Your owesome !!!
Before using the factory i used the CouchbaseClient(uris, bucketName, "") which is also timing out. I added the factory to be able increase the timeout times for the connection but still the same problem.

Michael Nitschinger
added a comment - 13/Mar/13 8:01 AM Hmm okay, so its not the factory (we had some issues with default values in the past, thats why I'm asking).
Can you pin me down your operating system and exact java version you're using?Thanks for your help!

Also, since you're so responsive (I hope we can pin this down finally!).. Can you run the same code with DEBUG log enabled (see https://code.google.com/p/spymemcached/wiki/Logging).. just use Logger.getLogger("com.couchbase.client").setLevel(Level.FINEST); instead of Logger.getLogger("net.spy.memcached").setLevel(Level.FINEST);

Michael Nitschinger
added a comment - 13/Mar/13 8:06 AM Also, since you're so responsive (I hope we can pin this down finally!).. Can you run the same code with DEBUG log enabled (see https://code.google.com/p/spymemcached/wiki/Logging ).. just use Logger.getLogger("com.couchbase.client").setLevel(Level.FINEST); instead of Logger.getLogger("net.spy.memcached").setLevel(Level.FINEST);
Thanks!

My network infrastructure is like this:
In an AWS virtual private center (VPC) i have a subnet and in this subnet there are 2 machines. The client machine and the Couchbase server. The machines have identical OS and java version as mentioned above. I have open the ports and also checked the network ACL. The traffic is free to go,no restriction. If i am using a rest query with curl for the view i receive the result, the problem is only in the java SDK.
If needed more information please ask. I will gladly offer it.

dragos
added a comment - 13/Mar/13 8:11 AM This is my test environment for the client which will be similar for production.
OS Ubuntu Server 11.10 x64
java -version result:
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
My network infrastructure is like this:
In an AWS virtual private center (VPC) i have a subnet and in this subnet there are 2 machines. The client machine and the Couchbase server. The machines have identical OS and java version as mentioned above. I have open the ports and also checked the network ACL. The traffic is free to go,no restriction. If i am using a rest query with curl for the view i receive the result, the problem is only in the java SDK.
If needed more information please ask. I will gladly offer it.

Michael Nitschinger
added a comment - 13/Mar/13 9:05 AM Hi, thanks for the log..
interestingly, there is nothing unusual in this.. I'll investigate further and come back to you if I need more info..
thanks so far!

The time out is displayed after 5 seconds. As you suggested i removed the factory and the detailed logged above i obtained with using CouchbaseClient(uris, mappedName, ""). Do you have some way of testing the httpCore-nio ? Should i try to look for latest version of httpCore-nio.. ?

dragos
added a comment - 13/Mar/13 9:17 AM The time out is displayed after 5 seconds. As you suggested i removed the factory and the detailed logged above i obtained with using CouchbaseClient(uris, mappedName, ""). Do you have some way of testing the httpCore-nio ? Should i try to look for latest version of httpCore-nio.. ?

yes you could try that.. I ran the test suite with 4.2.3 which is the latest one and there were no errors, so our code should work with it. You need to exchange the httpcore and httpcore-nio with the latest 4.2.3 jars

I don't think this is 100% the issue, but it helps us get rid of one more factor!

Michael Nitschinger
added a comment - 13/Mar/13 9:58 AM Hi,
yes you could try that.. I ran the test suite with 4.2.3 which is the latest one and there were no errors, so our code should work with it. You need to exchange the httpcore and httpcore-nio with the latest 4.2.3 jars
I don't think this is 100% the issue, but it helps us get rid of one more factor!
Thanks!
Michael

SO_TIMEOUT is: Defines the socket timeout (SO_TIMEOUT) in milliseconds, which is the timeout for waiting for data or, put differently, a maximum period inactivity between two consecutive data packets).

and CONNECTION_TIMEOUT is: Determines the timeout in milliseconds until a connection is established.

Dragos, can it be the case that it takes more than 5 seconds to return a result when using curl or so? That would explain the "fail fast" timeout here.. not saying that these times are okay this way, just to find the root cause..

Michael Nitschinger
added a comment - 13/Mar/13 10:21 AM Okay, the 5 second period can come from this:
HttpParams params = new SyncBasicHttpParams();
params.setIntParameter(CoreConnectionPNames.SO_TIMEOUT, 5000)
.setIntParameter(CoreConnectionPNames.CONNECTION_TIMEOUT, 5000)
....
SO_TIMEOUT is: Defines the socket timeout (SO_TIMEOUT) in milliseconds, which is the timeout for waiting for data or, put differently, a maximum period inactivity between two consecutive data packets).
and CONNECTION_TIMEOUT is: Determines the timeout in milliseconds until a connection is established.
Dragos, can it be the case that it takes more than 5 seconds to return a result when using curl or so? That would explain the "fail fast" timeout here.. not saying that these times are okay this way, just to find the root cause..
Thanks!

Hi Michael,
I've made a test with httpcore-nio-4.2.jar and with httpcore-4.2.1.jar and your intuition is good. The httpcore is not the problem because the timeout is still there. With curl the results is back fast, in less then half a second. So the problem must be somewhere in the java SDK in the way the view is retrieved.

dragos
added a comment - 13/Mar/13 11:01 AM Hi Michael,
I've made a test with httpcore-nio-4.2.jar and with httpcore-4.2.1.jar and your intuition is good. The httpcore is not the problem because the timeout is still there. With curl the results is back fast, in less then half a second. So the problem must be somewhere in the java SDK in the way the view is retrieved.

I've moved the Couchbase server to the same machine as the client to have all in same environment for testing. Now all is working fine but it is not possible for our product to stay on the same machine as Couchbase server for obvious reasons. In production we will have more then one app machine which we will try to access the Couchbase cluster.

dragos
added a comment - 13/Mar/13 11:23 AM I've moved the Couchbase server to the same machine as the client to have all in same environment for testing. Now all is working fine but it is not possible for our product to stay on the same machine as Couchbase server for obvious reasons. In production we will have more then one app machine which we will try to access the Couchbase cluster.

okay so its definitely something with those timeouts.. The thing is we can increase them for sure, but the question is if it would work for your application given the long network latency - and if your application stays performant that way.

Michael Nitschinger
added a comment - 13/Mar/13 11:56 AM Hi Dragos,
okay so its definitely something with those timeouts.. The thing is we can increase them for sure, but the question is if it would work for your application given the long network latency - and if your application stays performant that way.
When you're running curl on a request, how long does it take?

The curl request is very fast (a few milliseconds, and less then 0.5 seconds i am sure) and the network latency it is not a problem. Some how the couchbase client is taking to much time connecting. When it need to connect to a network machine in the same lan it takes more then 10 seconds. I experimented this in EC2 environment and on my local machine with a VM. If couchbase is local with the test app, the couchbase client is connecting fast.
Do you have any points on how a connection should be setup for couchbase client ? I am trying to use the factory builder and the connection factory from java sdk but still with no success.

dragos
added a comment - 14/Mar/13 4:05 AM The curl request is very fast (a few milliseconds, and less then 0.5 seconds i am sure) and the network latency it is not a problem. Some how the couchbase client is taking to much time connecting. When it need to connect to a network machine in the same lan it takes more then 10 seconds. I experimented this in EC2 environment and on my local machine with a VM. If couchbase is local with the test app, the couchbase client is connecting fast.
Do you have any points on how a connection should be setup for couchbase client ? I am trying to use the factory builder and the connection factory from java sdk but still with no success.

I can think of two separate things to try. One is to increase the SO_TIMEOUT to 8000, CONNECTION_TIMEOUT to 12000. Identify what's being hit.

Probably more effective would be to use Wireshark to capture all traffic on the client machine, and see what is really happening. Start the capture (with tshark command-line tool, for example), then run the test client, let it time out, then run the curl command, let it succeed, then stop the capture. Gzip the capture data and attach it to this issue.

Tim Smith (Inactive)
added a comment - 14/Mar/13 11:00 AM Dragos,
I can think of two separate things to try. One is to increase the SO_TIMEOUT to 8000, CONNECTION_TIMEOUT to 12000. Identify what's being hit.
Probably more effective would be to use Wireshark to capture all traffic on the client machine, and see what is really happening. Start the capture (with tshark command-line tool, for example), then run the test client, let it time out, then run the curl command, let it succeed, then stop the capture. Gzip the capture data and attach it to this issue.
Tim

Hi Tim,
I've started some load tests to be able to evaluate Couchbase, so this means i moved Couchbase server on the same machine as the load test client. I tried to use a pool of couchbase clients always connected to the server, and after a while the test failed because couldn't create more connections. From my initial research i seen that couchbase client is managing his own connection can you tell me how this is done or what are the policies of shutting down connections ? Can you point me to some best practices for couchbase client connectivity ?

dragos
added a comment - 14/Mar/13 11:52 AM Hi Tim,
I've started some load tests to be able to evaluate Couchbase, so this means i moved Couchbase server on the same machine as the load test client. I tried to use a pool of couchbase clients always connected to the server, and after a while the test failed because couldn't create more connections. From my initial research i seen that couchbase client is managing his own connection can you tell me how this is done or what are the policies of shutting down connections ? Can you point me to some best practices for couchbase client connectivity ?

I will handle that question about connection pooling, etc., separately, since it is unrelated to this bug about timeouts.

On this specific bug report, the current state as I understand it is that: curl returns very quickly with the correct results. The Java client times out after 5 seconds without returning the results.

My suggestion is to set up a very simple Java client that performs a single view query, and to get a packet capture of both curl and the Java client, to identify what the two are doing differently and why the curl succeeds but the Java client fails.

Tim Smith (Inactive)
added a comment - 14/Mar/13 1:19 PM I will handle that question about connection pooling, etc., separately, since it is unrelated to this bug about timeouts.
On this specific bug report, the current state as I understand it is that: curl returns very quickly with the correct results. The Java client times out after 5 seconds without returning the results.
My suggestion is to set up a very simple Java client that performs a single view query, and to get a packet capture of both curl and the Java client, to identify what the two are doing differently and why the curl succeeds but the Java client fails.

Any update on this issue?
I faced the same issue and here is my environment:
JRuby 1.7.3
Oracle JDK 1.7(with 1.6 the same thing happened)
Couchbase 2.0.1
Java SDK 1.1.3(with 1.1.4 the same thing happened)
And latest netty and http-core and other jar files.

Hi All,
Finally after a long time i found the workaround needed for the java client to not timeout. As has been suggested by the Couchbase team to use a host name for the Couchabase server i applied the same fix but to the client machine. Meaning on the client machine were the java Couchbase client was timing out ( in Amazon VPC ) we added a hostname for the Couchbase server. Our test URI from http://10.0.7.164:8091/pools has become http://couchbase:8091/pools. This workaround is also fixing a 10 seconds connectivity for the client.

On Ubuntu the hostname file is at: /etc/hosts
On Windows 7 the hostname file is at: C:\Windows\system32\drivers\etc\hosts

dragos
added a comment - 01/Apr/13 1:04 AM Hi All,
Finally after a long time i found the workaround needed for the java client to not timeout. As has been suggested by the Couchbase team to use a host name for the Couchabase server i applied the same fix but to the client machine. Meaning on the client machine were the java Couchbase client was timing out ( in Amazon VPC ) we added a hostname for the Couchbase server. Our test URI from http://10.0.7.164:8091/pools has become http://couchbase:8091/pools . This workaround is also fixing a 10 seconds connectivity for the client.
On Ubuntu the hostname file is at: /etc/hosts
On Windows 7 the hostname file is at: C:\Windows\system32\drivers\etc\hosts
Example of host name entry in client machine:
10.0.7.146 couchbase

Michael Nitschinger
added a comment - 02/Apr/13 12:38 AM Thanks very much dragos for tracking this down!
Does this mean that in your environment after the change, you're not seeing any performance impact anymore? Everything works as expected and performant?
Thanks!

Hi Michael,
In this moment the client does not time out and the time taken by java couchbase client to connect takes around 1.5 seconds from 10 seconds which we can consider a normal behavior. As for performance with this workaround we can continue the POC and evaluate the performance. As a first impression i can say that we see a good speed but i will keep my reservation until final POC is tested and we can achieve a full load test.

What i want to emphasize that this is a welcomed workaround but it is not a solution on a longer term for our cloud production site. In a scenario where we need to create new machines and maybe new couchbase clusters it involves manual or specific automatic changes to the hosts file and of course extra managing of this. Bottom line is that we are waiting the fix on the couchbase java sdk 1.1.5 which will make this workaround obsolete.

Also i feel the need to mention that this workaround was found in collaboration with the Couchbase team which provided the basic information for the workaround to be found, so thank you guys.

dragos
added a comment - 02/Apr/13 12:59 AM Hi Michael,
In this moment the client does not time out and the time taken by java couchbase client to connect takes around 1.5 seconds from 10 seconds which we can consider a normal behavior. As for performance with this workaround we can continue the POC and evaluate the performance. As a first impression i can say that we see a good speed but i will keep my reservation until final POC is tested and we can achieve a full load test.
What i want to emphasize that this is a welcomed workaround but it is not a solution on a longer term for our cloud production site. In a scenario where we need to create new machines and maybe new couchbase clusters it involves manual or specific automatic changes to the hosts file and of course extra managing of this. Bottom line is that we are waiting the fix on the couchbase java sdk 1.1.5 which will make this workaround obsolete.
Also i feel the need to mention that this workaround was found in collaboration with the Couchbase team which provided the basic information for the workaround to be found, so thank you guys.

I'm not sure if there is a thing that we can fix inside the client when this is a DNS/hostname resolution issue with the OS and amazon?

We would definitely need to investigate further, but all that the client does is try to use the hostname/IP provided during bootstrap. If nothing comes back or the host can't be found, we can't do much about it. Anyway, I think we need to do some more investigation here to really see whats the problem.

Michael Nitschinger
added a comment - 02/Apr/13 1:21 AM Hi Dragos,
I'm not sure if there is a thing that we can fix inside the client when this is a DNS/hostname resolution issue with the OS and amazon?
We would definitely need to investigate further, but all that the client does is try to use the hostname/IP provided during bootstrap. If nothing comes back or the host can't be found, we can't do much about it. Anyway, I think we need to do some more investigation here to really see whats the problem.

Java 1.6.0_27 (and also tried 1.7)[ERROR] getView | Exception while doing getView: Timed out waiting for operation[ERROR] Exception while doing getView: I/O reactor has been shut down
It surely has something to do with the client, because when I launch second client it works properly for a while.
Also, it seems to happen under high loads more frequently

aboyev
added a comment - 09/Sep/13 4:06 AM Hello, Michael!
I have the same problem. Here is my setup:
Couchbase 2.1.1
Java SDK 1.1.9
Java 1.6.0_27 (and also tried 1.7)
[ERROR] getView | Exception while doing getView: Timed out waiting for operation
[ERROR] Exception while doing getView: I/O reactor has been shut down
It surely has something to do with the client, because when I launch second client it works properly for a while.
Also, it seems to happen under high loads more frequently

How did u change the "Meaning on the client machine were the java Couchbase client was timing out ( in Amazon VPC ) we added a hostname for the Couchbase server. Our test URI from http://10.0.7.164:8091/pools has become http://couchbase:8091/pools. This workaround is also fixing a 10 seconds connectivity for the client." in AWS .. I have EC2 instance for which i want to set the hostname and connect through it like you did... Can you tell me the exact steps for that

gaurang jhawar
added a comment - 24/Oct/13 11:31 PM How did u change the "Meaning on the client machine were the java Couchbase client was timing out ( in Amazon VPC ) we added a hostname for the Couchbase server. Our test URI from http://10.0.7.164:8091/pools has become http://couchbase:8091/pools . This workaround is also fixing a 10 seconds connectivity for the client." in AWS .. I have EC2 instance for which i want to set the hostname and connect through it like you did... Can you tell me the exact steps for that

are you also exhibiting a delay on connect? Or during operations? I think these two may be related but should be looked at separately and defined. What environment, client, workload and so on are you running?

Michael Nitschinger
added a comment - 25/Oct/13 12:24 AM Hi Gaurang,
are you also exhibiting a delay on connect? Or during operations? I think these two may be related but should be looked at separately and defined. What environment, client, workload and so on are you running?

java.net.ConnectException: Connection timed out: no further information
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:692)
at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:423)
at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:261)
at com.couchbase.client.CouchbaseConnection.run(CouchbaseConnection.java:288)
2013-10-24 22:14:01.893 WARN com.couchbase.client.CouchbaseConnection: Closing, and reopening

One thing to note here is that this is not a good idea. You have "the internet" between your computer (client) and the server which adds latency. Can you try running the application also in an AWS instance and see if the error goes away?

Michael Nitschinger
added a comment - 25/Oct/13 12:43 AM One thing to note here is that this is not a good idea. You have "the internet" between your computer (client) and the server which adds latency. Can you try running the application also in an AWS instance and see if the error goes away?

I gave the elastic amazon IP in the url and also tried what dragos tried but that also didn`t work for me .. I made ALL TCP ports on firewall ON ... And I am not sure why it reconnects to the IP or fails to connect and load data in the bucket.

But its the same case as dragos ... for production that`s not viable .. since not everything can be on AWS.

Also I can connect to the same instance via telnet and http(browser) .. I don`t know why I am facing this issue with Java 1.7.

gaurang jhawar
added a comment - 25/Oct/13 12:44 AM - edited I gave the elastic amazon IP in the url and also tried what dragos tried but that also didn`t work for me .. I made ALL TCP ports on firewall ON ... And I am not sure why it reconnects to the IP or fails to connect and load data in the bucket.
But its the same case as dragos ... for production that`s not viable .. since not everything can be on AWS.
Also I can connect to the same instance via telnet and http(browser) .. I don`t know why I am facing this issue with Java 1.7.
"2013-10-24 22:14:01.893 WARN com.couchbase.client.CouchbaseConnection: Closing, and reopening
{QA sa=172.31.9.124/172.31.9.124:11210, #Rops=0, #Wops=0, #iq=0, topRop=null, topWop=null, toWrite=0, interested=0}
, attempt 1.
2013-10-24 22:14:05.903 INFO com.couchbase.client.CouchbaseConnection: Reconnecting
{QA sa=172.31.9.124/172.31.9.124:11210, #Rops=0, #Wops=0, #iq=0, topRop=null, topWop=null, toWrite=0, interested=0}
2013-10-24 22:14:09.632 INFO com.couchbase.client.ViewConnection: Added 172.31.9.124 to connect queue
2013-10-24 22:14:09.632 INFO com.couchbase.client.CouchbaseClient: viewmode property isn't defined. Setting viewmode to production mode
in INIT **************.
Here is the entity set : users
2013-10-24 22:14:09.868 WARN com.couchbase.client.CouchbaseConnection: Node expected to receive data is inactive. This could be due to a failure within the cluster. Will check for updated configuration. Key without a configured node is: 0.
30 Seconds: Load is in progress
2013-10-24 22:14:09.993 INFO com.couchbase.client.CouchbaseConnection: Added
{QA sa=/172.31.9.124:11210, #Rops=0, #Wops=0, #iq=0, topRop=null, topWop=null, toWrite=0, interested=0}
to connect queue
2013-10-24 22:14:12.037 INFO com.couchbase.client.CouchbaseConnection: Connection state changed for sun.nio.ch.SelectionKeyImpl@6dae04e2
2013-10-24 22:14:12.037 INFO com.couchbase.client.CouchbaseConnection: Reconnecting due to failure to connect to
{QA sa=172.31.9.124/172.31.9.124:11210, #Rops=0, #Wops=0, #iq=0, topRop=null, topWop=null, toWrite=0, interested=0}
java.net.ConnectException: Connection timed out: no further information"
NOTE here it tried to reconnect to the client and says its inactive and I am not sure if this is a firewall issue or something else.

Michael Nitschinger
added a comment - 25/Oct/13 12:51 AM Can you provide debug log information for the connect process? Maybe this helps pinning down where the issue is. (you can read here how to do it if you don't know already) : http://nitschinger.at/Logging-with-the-Couchbase-Java-Client
thanks

Michael Nitschinger
added a comment - 25/Oct/13 1:08 AM - edited Thanks, can you also share the code you use?
In the code it looks like you are creating the CouchbaseClient object twice.
Please note that the bootstrapping succeeds it is different from the ticket reported here.
Are you sure port 11210 is open from client to server? It looks like something is blocking it, http seems to work fine.

gaurang jhawar
added a comment - 25/Oct/13 1:14 AM http://s000.tinyupload.com/index.php?file_id=34200463437173886967
ya the port is opened.
http://s000.tinyupload.com/index.php?file_id=06508245784961590671
Both inbound / outbound on aws as well as my local machine firewall ports are open

couchbase-user
added a comment - 28/Oct/13 4:07 AM hey, actually I'm running them all on a local small network.
I am using ubuntu 12.4 on all machines and running java application on headnode to get view data and store them locally.

Michael Nitschinger
added a comment - 28/Oct/13 4:19 AM Hm, I wonder why you are hitting this, I'm not sure its related.. do you get the same exception as the top of the ticket here? Can you provide debug logs so we can see at which point it is happening?
Maybe we need to tweak socket timeouts.

In your logs I can see that 192.168.0.101 is changed to Cloud-assistant.local. Could it be that this is a DNS issue in your environment?
Also please make sure that ports 8091, 8092 and 11210 are reachable on each node.

Both the socket and connection timeout for views are set to 5 second which is already "quite high" if you want to run a real workload over it.

Michael Nitschinger
added a comment - 28/Oct/13 4:42 AM - edited In your logs I can see that 192.168.0.101 is changed to Cloud-assistant.local. Could it be that this is a DNS issue in your environment?
Also please make sure that ports 8091, 8092 and 11210 are reachable on each node.
Both the socket and connection timeout for views are set to 5 second which is already "quite high" if you want to run a real workload over it.

I'm not sure what the DNS has to do with it or how to check it, something to note here is that sometimes it works fine (very rare).
I have also just tried to change the switch but still the same problem.
I'm working on investigating the DNS and firewall issue. thanks

couchbase-user
added a comment - 28/Oct/13 4:53 AM - edited I'm not sure what the DNS has to do with it or how to check it, something to note here is that sometimes it works fine (very rare).
I have also just tried to change the switch but still the same problem.
I'm working on investigating the DNS and firewall issue. thanks

Michael Nitschinger
added a comment - 28/Oct/13 5:09 AM Okay, if you go directly to the view url, how long does it take you to? like with curl... so we can see how far you are over the 5 seconds and if you are under it then there's something else going on.

couchbase-user
added a comment - 28/Oct/13 6:46 AM I have added different servers names to hosts files in all servers and disabled the firewall on all of them and now it seems to work fine.
Thanks a lot for your help Michael.

Michael Nitschinger
added a comment - 29/Oct/13 12:47 AM Since we fixed the suspected issue with "couchbase-user" in kind of the same way,
ca you first verify that
All ports are reachable from all the boxes (11210, 8091, 8092)
DNS settings are configured properly, this is especially important in EC2
If you look through the thread here, some of the people reporting this thing fixed once they made sure their environmental settings were okay. Can you double check?

We have been facing the same issue for couples of months - with Couchbase on RHEL, and we have just found out the solution that works for us.

We have 2 Tomcat nodes running; 1 with Java 1.7 and another with 1.6. Couchbase was deployed on another 4 servers in the same network. We simply defined all Couchbase nodes in /etc/hosts file in both Tomcat nodes as you normally would (with x.x.x.x hostname), then it magically works for us - without having to restart Tomcat, nor Couchbase nodes.

CathodTech
added a comment - 13/Dec/13 8:24 AM We have been facing the same issue for couples of months - with Couchbase on RHEL, and we have just found out the solution that works for us.
We have 2 Tomcat nodes running; 1 with Java 1.7 and another with 1.6. Couchbase was deployed on another 4 servers in the same network. We simply defined all Couchbase nodes in /etc/hosts file in both Tomcat nodes as you normally would (with x.x.x.x hostname), then it magically works for us - without having to restart Tomcat, nor Couchbase nodes.

Thanks for pointing this out, I'll see if I can get it into the official docs to aid other people. To me this really looks like java-related and network-related. Interestingly only a few people are hitting it.

Michael Nitschinger
added a comment - 13/Dec/13 8:35 AM Thanks for pointing this out, I'll see if I can get it into the official docs to aid other people. To me this really looks like java-related and network-related. Interestingly only a few people are hitting it.

For the issue I faced, it looks more like Java API related. Perhaps, something to do with DNS (e.g., something like gethostbyname() / gethostbyaddr()), and how the API handles the case when IP has no hostname. Note that we only use IP address to add nodes into Couchbase cluster and Java on other servers also connect to Couchbase with IP address, not hostname.

From the trace, we found that java code has no problem connecting (via connection pool) with Couchbase nodes that has IP/hostname pair in /etc/hosts. But for Couchbase nodes that were not defined in /etc/hosts, it attempted to connect, then the connection dropped, and returned timeout issue.

My team also found that later when he restarted java to re-deploy, Couchbase connection attempts at start-up were much faster (on both Java 1.6 and 1.7).

For this issue, we didn't face it when deploying Java and Couchbase on the same server as, perhaps, it is common to have "127.0.0.1 localhost" defined in /etc/hosts.

We hope that the solution we found would help you and others as well. Not sure if others are facing the same scenario, though.

CathodTech
added a comment - 14/Dec/13 1:00 AM For the issue I faced, it looks more like Java API related. Perhaps, something to do with DNS (e.g., something like gethostbyname() / gethostbyaddr()), and how the API handles the case when IP has no hostname. Note that we only use IP address to add nodes into Couchbase cluster and Java on other servers also connect to Couchbase with IP address, not hostname.
From the trace, we found that java code has no problem connecting (via connection pool) with Couchbase nodes that has IP/hostname pair in /etc/hosts. But for Couchbase nodes that were not defined in /etc/hosts, it attempted to connect, then the connection dropped, and returned timeout issue.
My team also found that later when he restarted java to re-deploy, Couchbase connection attempts at start-up were much faster (on both Java 1.6 and 1.7).
For this issue, we didn't face it when deploying Java and Couchbase on the same server as, perhaps, it is common to have "127.0.0.1 localhost" defined in /etc/hosts.
We hope that the solution we found would help you and others as well. Not sure if others are facing the same scenario, though.

I've ran into a very similar/the same situation with the latest couchbase (2.2.0) server and java client (1.2.3), I thought I share my experiences.

The problem I faced was when trying to call CouchbaseClient.getView(). This apparently goes to one node in the couchbase cluster and queries some view metadata over HTTP (roughly). However I got "ERROR com.couchbase.client.ViewNode$EventLogger: Connection timed out" and that pretty much killed my client there. Seeing the solutions here with the hosts file workaround, I tried that and it indeed worked, but it's not really a suitable solution for us, so I decided to see into this a little bit more. Here is what I found.

(Disclaimer: this worked for me, and although I found other ways around this they may not work for you, and I did these attempting to solve this particular issue and didn't really care about anything else these changes may break, so be careful!)

(0) Baseline, the environment specific thing that actually causes the issue is that call java api call InetAddress.getHostFromNameService(InetAddress addr, boolean check) returns very slowly if there's no hostname to be found for a given IP. I have no idea why is that in my environment, I suppose could be because of a bunch of different reasons. Slow here means that it takes 4-5 secs for me for this call to return with an unknown hostname

This comes into the picture when the couchbase client tries to get a view. It uses Apache Http client to do this, and the following things happen:
(1) When the request for the view gets created, eventually we get to BaseIOReactor.execute which instantiates SessionHandle, and registers the time when this happened (this will cause the timeout)
(2) Somewhere in the request processing, we actually DO hit getHostFromNameService. More specifically this happens because a RequestTargetHost interceptor runs and this injects the "Host" header into the HTTP request.
(3) some time after (2) we hit a timeout check which compares the timestamp recorded in (1) and the current time, obviously resulting in a timeout and the error in the client if (2) takes a long time as indicated

The ideal solution would obviously be to fix whatever is wrong with my DNS configs, but I'm not (yet) sure what's wrong (running Ubuntu 13.10 btw). And also make sure that our live environments won't have the same issue...

One workaround I found that might be generally usable is to override the JVM's DNS resolver, by starting the client with: Dsun.net.spi.nameservice.provider.1=dns,sun This worked for me, but I don't know what side-effects it may have (generally this may not be a great idea). Additionally this resulted in a significant increase of startup speed as the couchbase client seems to do a lot of ip>hostname lookups when it's starting (are these really necessary?)

Also I'd like to ask from the Couchbase folks if it's really necessary to do this reverse DNS lookup for every HTTP request. I cloned your client from git, and removed the RequestTargetHost interceptor from these calls; that also solved the timeout problem, and all seems to work fine. I didn't do exhaustive testing on it, and this, too, can be environment specific - running my couchbase cluster in local VMs at the moment.

David Borsos
added a comment - 18/Dec/13 11:43 AM I've ran into a very similar/the same situation with the latest couchbase (2.2.0) server and java client (1.2.3), I thought I share my experiences.
The problem I faced was when trying to call CouchbaseClient.getView(). This apparently goes to one node in the couchbase cluster and queries some view metadata over HTTP (roughly). However I got "ERROR com.couchbase.client.ViewNode$EventLogger: Connection timed out" and that pretty much killed my client there. Seeing the solutions here with the hosts file workaround, I tried that and it indeed worked, but it's not really a suitable solution for us, so I decided to see into this a little bit more. Here is what I found.
(Disclaimer: this worked for me, and although I found other ways around this they may not work for you, and I did these attempting to solve this particular issue and didn't really care about anything else these changes may break, so be careful!)
(0) Baseline, the environment specific thing that actually causes the issue is that call java api call InetAddress.getHostFromNameService(InetAddress addr, boolean check) returns very slowly if there's no hostname to be found for a given IP. I have no idea why is that in my environment, I suppose could be because of a bunch of different reasons. Slow here means that it takes 4-5 secs for me for this call to return with an unknown hostname
This comes into the picture when the couchbase client tries to get a view. It uses Apache Http client to do this, and the following things happen:
(1) When the request for the view gets created, eventually we get to BaseIOReactor.execute which instantiates SessionHandle, and registers the time when this happened (this will cause the timeout)
(2) Somewhere in the request processing, we actually DO hit getHostFromNameService. More specifically this happens because a RequestTargetHost interceptor runs and this injects the "Host" header into the HTTP request.
(3) some time after (2) we hit a timeout check which compares the timestamp recorded in (1) and the current time, obviously resulting in a timeout and the error in the client if (2) takes a long time as indicated
The ideal solution would obviously be to fix whatever is wrong with my DNS configs, but I'm not (yet) sure what's wrong (running Ubuntu 13.10 btw). And also make sure that our live environments won't have the same issue...
One workaround I found that might be generally usable is to override the JVM's DNS resolver, by starting the client with: Dsun.net.spi.nameservice.provider.1=dns,sun This worked for me, but I don't know what side-effects it may have (generally this may not be a great idea). Additionally this resulted in a significant increase of startup speed as the couchbase client seems to do a lot of ip >hostname lookups when it's starting (are these really necessary?)
Also I'd like to ask from the Couchbase folks if it's really necessary to do this reverse DNS lookup for every HTTP request. I cloned your client from git, and removed the RequestTargetHost interceptor from these calls; that also solved the timeout problem, and all seems to work fine. I didn't do exhaustive testing on it, and this, too, can be environment specific - running my couchbase cluster in local VMs at the moment.
Thanks.

thanks for the heads-up, that gives me a better idea of whats going on. I basically revamped the whole view connection management for the next 1.3 release which hopefully should address this to. Would you want to try it out based on a preliminary build?

The thing is that you can't remove the RequestTargetHost, but it only does the DNS lookup if HTTP.TARGET_HOST is not set, but we can infer that from our configuration. So the new code basically sets it before the request is put in the queue, once we've determined which node to ask, then it should be skipped.

I already have that locally, so if you want to try it out we can quickly see if it fixes the issue.

Michael Nitschinger
added a comment - 19/Dec/13 12:44 AM - edited Hi David,
thanks for the heads-up, that gives me a better idea of whats going on. I basically revamped the whole view connection management for the next 1.3 release which hopefully should address this to. Would you want to try it out based on a preliminary build?
The thing is that you can't remove the RequestTargetHost, but it only does the DNS lookup if HTTP.TARGET_HOST is not set, but we can infer that from our configuration. So the new code basically sets it before the request is put in the queue, once we've determined which node to ask, then it should be skipped.
I already have that locally, so if you want to try it out we can quickly see if it fixes the issue.

That sounds great. I'd be happy to help you verify your new code; send it over or let me know how can I get it, I'll switch to it and see if it solves the problem. If we can get the new version in early January (I saw it's scheduled for the 7th) that's cool too, we won't be doing much during the holidays anyway

David Borsos
added a comment - 19/Dec/13 3:34 AM Michael,
That sounds great. I'd be happy to help you verify your new code; send it over or let me know how can I get it, I'll switch to it and see if it solves the problem. If we can get the new version in early January (I saw it's scheduled for the 7th) that's cool too, we won't be doing much during the holidays anyway
Thanks

Oded Hassidi
added a comment - 21/Jul/14 4:15 AM Was this fixed, since we are having issues with timeot on Java SDK v1.4.x, and we are using JDK 7?
This is not happen all the time, if I run a thread that runs many request it occur every 2-4 min

Thanks for the quick response.
The timeout are not at startup it happens during runtime. every 2-3 minutes I get the followed:

SEVERE: The RuntimeException could not be mapped to a response, re-throwing to the HTTP container
java.lang.RuntimeException: Timed out waiting for operation
at com.couchbase.client.internal.HttpFuture.get(HttpFuture.java:75)
at com.couchbase.client.CouchbaseClient.query(CouchbaseClient.java:778)
Caused by: java.util.concurrent.TimeoutException: Timed out waiting for operation
at com.couchbase.client.internal.HttpFuture.waitForAndCheckOperation(HttpFuture.java:93)
at com.couchbase.client.internal.ViewFuture.get(ViewFuture.java:65)
at com.couchbase.client.internal.ViewFuture.get(ViewFuture.java:49)
at com.couchbase.client.internal.HttpFuture.get(HttpFuture.java:72)
... 45 more

Oded Hassidi
added a comment - 21/Jul/14 4:52 AM Thanks for the quick response.
The timeout are not at startup it happens during runtime. every 2-3 minutes I get the followed:
SEVERE: The RuntimeException could not be mapped to a response, re-throwing to the HTTP container
java.lang.RuntimeException: Timed out waiting for operation
at com.couchbase.client.internal.HttpFuture.get(HttpFuture.java:75)
at com.couchbase.client.CouchbaseClient.query(CouchbaseClient.java:778)
Caused by: java.util.concurrent.TimeoutException: Timed out waiting for operation
at com.couchbase.client.internal.HttpFuture.waitForAndCheckOperation(HttpFuture.java:93)
at com.couchbase.client.internal.ViewFuture.get(ViewFuture.java:65)
at com.couchbase.client.internal.ViewFuture.get(ViewFuture.java:49)
at com.couchbase.client.internal.HttpFuture.get(HttpFuture.java:72)
... 45 more