Details

Description

Just realized that without this HBASE-4805 is broken.
I.e. there's no point keeping a persistent HConnection around if it can be rendered permanently unusable if the ZK connection is lost temporarily.
Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to backport)

Lars Hofhansl
added a comment - 02/Apr/12 21:37 Let me dig into 0.96 after I get this into 0.94... Wanna cut RC1 soon.
From the past comments here I see no objections to posted patch... Will commit soon. Please speak up if you disagree.

In 0.96 this should work, with the restriction that the logic is that you can get a non working connection, that will get fixed when you try to use it. It's a different mechanism than the one for HBaseAdmin, as HBaseAdmin first check the connection. Thz ZK mechanism is more efficient (you save a remote call to check that the connection is really working), but is more complex.

However it seems it does not work at the end:

What I saw in 0.96 is that the client was blocked for a very long time (gave up after a few minutes), even though I had set all timeouts to low values. This is also deadly in an app server setting. Might be a simple fix there, didn't dig deeper.

Nicolas Liochon
added a comment - 02/Apr/12 20:24 In 0.96 this should work, with the restriction that the logic is that you can get a non working connection, that will get fixed when you try to use it. It's a different mechanism than the one for HBaseAdmin, as HBaseAdmin first check the connection. Thz ZK mechanism is more efficient (you save a remote call to check that the connection is really working), but is more complex.
However it seems it does not work at the end:
What I saw in 0.96 is that the client was blocked for a very long time (gave up after a few minutes), even though I had set all timeouts to low values. This is also deadly in an app server setting. Might be a simple fix there, didn't dig deeper.
@lars What did you exactly do? I can do the fix it on 0.96.

Lars Hofhansl
added a comment - 02/Apr/12 20:15 I think this is as good as we can get in 0.94.
Removed the exception handling from ensureZookeeperTrackers none of these methods throw.
added getZookeeperWatcher to two methods that just need a ZKW.
The key is that an HConnection will never be left in a permanently useless state. Can file another jira for better timeouts.

What I do find is if the ZK quorum is down, none of getZookeeperWatcher(), masterAddressTracker.start(), and rootRegionTracker.start() actually fail. They just retry and then happily return, which is as designed, because they are asynchronous.
Would be nice to have a isAlive or waitForConnect method on ZKW that would throw if the connection could not be established.

The attached patch is still a vast improvement, but it could be made better (even with zk timeout set to 100ms and retries to 3, it still take 22s for ensureZookeeperTrackers to finish).

Lars Hofhansl
added a comment - 02/Apr/12 20:09 Yeah, my comment was wrong. It's not generally doing that.
What I do find is if the ZK quorum is down, none of getZookeeperWatcher(), masterAddressTracker.start(), and rootRegionTracker.start() actually fail. They just retry and then happily return, which is as designed, because they are asynchronous.
Would be nice to have a isAlive or waitForConnect method on ZKW that would throw if the connection could not be established.
The attached patch is still a vast improvement, but it could be made better (even with zk timeout set to 100ms and retries to 3, it still take 22s for ensureZookeeperTrackers to finish).

One other strangeness I found is that none of ZKUtil methods actually throw exceptions. They retry (via RecoverableZooKeeper) and then just log a message if there is a failure. This is especially annoying with ZooKeeperWatcher, because there is no way of telling whether the connection succeeded of not from the outside.

Lars Hofhansl
added a comment - 02/Apr/12 19:35 One other strangeness I found is that none of ZKUtil methods actually throw exceptions. They retry (via RecoverableZooKeeper) and then just log a message if there is a failure. This is especially annoying with ZooKeeperWatcher, because there is no way of telling whether the connection succeeded of not from the outside.

Presumably close it not needed since the connection is known to be down in this case. To be save, I'll add that, and make sure it doesn't cause another hang.

I think this is better than HBASE-5153, because it attempts to reconnect when the connection is needed and not when it was lost (in which case it is likely that the next retry will fail as well, leading to long hangs with no change for the caller to notice).

Lars Hofhansl
added a comment - 02/Apr/12 18:30 Presumably close it not needed since the connection is known to be down in this case. To be save, I'll add that, and make sure it doesn't cause another hang.
I think this is better than HBASE-5153 , because it attempts to reconnect when the connection is needed and not when it was lost (in which case it is likely that the next retry will fail as well, leading to long hangs with no change for the caller to notice).

Everything seems good to me. Only a minor doubt, is it necessary to close zooKeeper before set it as null?
If HConnectionImplementation#managed is true, HConnectionImplementation#abort doesn't set closed to true, just calls close method. It makes sense to me. So the retry logic introduced in HBASE-5153 seems redundant.
If one want to manage the connection by himself. If the connection is aborted. We should suggest to recreate the HConnection and HTable, right?

Jieshan Bean
added a comment - 02/Apr/12 10:39 Everything seems good to me. Only a minor doubt, is it necessary to close zooKeeper before set it as null?
If HConnectionImplementation#managed is true, HConnectionImplementation#abort doesn't set closed to true, just calls close method. It makes sense to me . So the retry logic introduced in HBASE-5153 seems redundant.
If one want to manage the connection by himself. If the connection is aborted. We should suggest to recreate the HConnection and HTable, right?

Patch that removes the log statement Stack mentioned (had it in there for earlier debugging, forgot to remove it).

Also adds a simple test with an HConnection that is created before the mini-cluster is started to prove that initialization is indeed lazy.
(can't test with stopping and restarting the minicluster as new random ports are used each time).

Lars Hofhansl
added a comment - 02/Apr/12 07:08 Patch that removes the log statement Stack mentioned (had it in there for earlier debugging, forgot to remove it).
Also adds a simple test with an HConnection that is created before the mini-cluster is started to prove that initialization is indeed lazy.
(can't test with stopping and restarting the minicluster as new random ports are used each time).

On commit, change this '+ LOG.debug("Abort", t);' to include the passed in msg?

Else, +1 on the patch. Let me ask N if he thinks TRUNK can pick up anything from this patch (maybe his keepalive should do this auto-reconnect but maybe it doesn't need it). What were you doing w/ it was taking a long time to recover?

stack
added a comment - 01/Apr/12 22:50 On commit, change this '+ LOG.debug("Abort", t);' to include the passed in msg?
Else, +1 on the patch. Let me ask N if he thinks TRUNK can pick up anything from this patch (maybe his keepalive should do this auto-reconnect but maybe it doesn't need it). What were you doing w/ it was taking a long time to recover?

Probably. I guess most folks have clients that they restart frequently, use thrift, or asynchhbase. But in its current form using the standard HBase client in an app server is very error prone if the HBase/ZK cluster is ever serviced without bringing the app server down in lock step.

Didn't we add a check for if the connection is bad?

Yeah with hbase-5153 but in 0.90 only. At some point we decided the fix there wasn't good and Ram patched it up for 0.90.
This should subsime HBASE-5153. I'm happy to even put this in 0.90, but that's up to Ram.

I'm interested in problems you see in hbase-5153 or issues you have w/ the implementation there that being the 0.96 client.

What I saw in 0.96 is that the client was blocked for a very long time (gave up after a few minutes), even though I had set all timeouts to low values. This is also deadly in an app server setting. Might be a simple fix there, didn't dig deeper.

Lars Hofhansl
added a comment - 01/Apr/12 06:55 You think this should go into 0.92?
Probably. I guess most folks have clients that they restart frequently, use thrift, or asynchhbase. But in its current form using the standard HBase client in an app server is very error prone if the HBase/ZK cluster is ever serviced without bringing the app server down in lock step.
Didn't we add a check for if the connection is bad?
Yeah with hbase-5153 but in 0.90 only. At some point we decided the fix there wasn't good and Ram patched it up for 0.90.
This should subsime HBASE-5153 . I'm happy to even put this in 0.90, but that's up to Ram.
I'm interested in problems you see in hbase-5153 or issues you have w/ the implementation there that being the 0.96 client.
What I saw in 0.96 is that the client was blocked for a very long time (gave up after a few minutes), even though I had set all timeouts to low values. This is also deadly in an app server setting. Might be a simple fix there, didn't dig deeper.

Upped to "critical". Without this the HBase client is pretty much useless in an AppServer setting where client can outlive the HBase cluster and ZK ensemble.
(Testing within the Salesforce AppServer is how I noticed the problem initially.)

Lars Hofhansl
added a comment - 01/Apr/12 06:44 Upped to "critical". Without this the HBase client is pretty much useless in an AppServer setting where client can outlive the HBase cluster and ZK ensemble.
(Testing within the Salesforce AppServer is how I noticed the problem initially.)

Found the problem.
The ClusterId could remain null permanently if HConnection.getZookeeperWatcher() was called. That would initialize HConnectionImplementation.zookeeper, and hence not reset clusterid in ensureZookeeperTrackers.
TestZookeeper.testClientSessionExpired does that.

Also in TestZookeeper.testClientSessionExpired the state might be CONNECTING rather than CONNECTED depending on timing.

Upon inspection I also made clusterId, rootRegionTracker, masterAddressTracker, and zooKeeper volatile, because they can be modified by a different thread, but are not exclusively accessed in a synchronized block (existing problem).

New patch that fixes the problem, passes all tests.

TestZookeeper seems to have good coverage. If I can think of more tests, I'll add them there.

Lars Hofhansl
added a comment - 01/Apr/12 06:42 - edited Found the problem.
The ClusterId could remain null permanently if HConnection.getZookeeperWatcher() was called. That would initialize HConnectionImplementation.zookeeper, and hence not reset clusterid in ensureZookeeperTrackers.
TestZookeeper.testClientSessionExpired does that.
Also in TestZookeeper.testClientSessionExpired the state might be CONNECTING rather than CONNECTED depending on timing.
Upon inspection I also made clusterId, rootRegionTracker, masterAddressTracker, and zooKeeper volatile, because they can be modified by a different thread, but are not exclusively accessed in a synchronized block (existing problem).
New patch that fixes the problem, passes all tests.
TestZookeeper seems to have good coverage. If I can think of more tests, I'll add them there.

Lars Hofhansl
added a comment - 01/Apr/12 04:27 I am not envisioning any API changes, just that the HConnection would no longer be ripped from under any HTables where there is a ZK connection loss.
I ran all tests again, and TestReplication and TestZookeeper have some failures that are related. Looking.

The problem I have seen in 0.94/0.92 without this patch even with managed connections is that after HConnection times out, it is unusable and even getting a new HTable does not fix the problem since behind the scenes the same HConnection is retrieved.

Didn't we add a check for if the connection is bad?

Will think about an automated test. Do you like the version better that always does the recheck (and hence all the conditional for "managed" go away)?

How does this work in trunk? In trunk the work has been done so we don't really keep open a zk session any more. For the sake of making tests run smoother, we'll do keep alive on zk session and hold it open 5 minutes and let it go if unused.

I'm +1 on making our stuff more resilient. Resusing a dud hconnection either because the connection is dead or zk session died is hard to figure.

How will this change a users's perception about how this stuff is used? If your answer is that it helps in the extreme where the connection goes dead, and thats the only change a user percieves, then lets commit. But we should include a test? If you describe one, I can try help write it?

stack
added a comment - 01/Apr/12 01:26 The problem I have seen in 0.94/0.92 without this patch even with managed connections is that after HConnection times out, it is unusable and even getting a new HTable does not fix the problem since behind the scenes the same HConnection is retrieved.
Didn't we add a check for if the connection is bad?
Will think about an automated test. Do you like the version better that always does the recheck (and hence all the conditional for "managed" go away)?
How does this work in trunk? In trunk the work has been done so we don't really keep open a zk session any more. For the sake of making tests run smoother, we'll do keep alive on zk session and hold it open 5 minutes and let it go if unused.
I'm +1 on making our stuff more resilient. Resusing a dud hconnection either because the connection is dead or zk session died is hard to figure.
How will this change a users's perception about how this stuff is used? If your answer is that it helps in the extreme where the connection goes dead, and thats the only change a user percieves, then lets commit. But we should include a test? If you describe one, I can try help write it?
You think this should go into 0.92?

Lars Hofhansl
added a comment - 01/Apr/12 00:34 The more I look at, the more I do like the patch that changes the behavior in all cases.
It's simple and low risk: Just recheck the ZK trackers before they are needed.

"perversion" is hard word. It is just rechecking before each use whether the trackers are still usable. The timeout is handled through the HConnection's abort().

The testing I've done:

ZK down, HBase down, start a client. Then start ZK, then HBase.

ZK up, HBase down, start client. Then start HBase

both ZK and HBase up, start client, kill HBase, restart HBase

both ZK and HBase up, start client, kill ZK and HBase restart

The client just create a new HTable and then tries to get some rows in a loop.
In all cases the client should successfully be able to reconnect when both ZK and HBase are up.

The problem I have seen in 0.94/0.92 without this patch even with managed connections is that after HConnection times out, it is unusable and even getting a new HTable does not fix the problem since behind the scenes the same HConnection is retrieved.

Will think about an automated test. Do you like the version better that always does the recheck (and hence all the conditional for "managed" go away)?

Lars Hofhansl
added a comment - 31/Mar/12 23:46 "perversion" is hard word. It is just rechecking before each use whether the trackers are still usable. The timeout is handled through the HConnection's abort().
The testing I've done:
ZK down, HBase down, start a client. Then start ZK, then HBase.
ZK up, HBase down, start client. Then start HBase
both ZK and HBase up, start client, kill HBase, restart HBase
both ZK and HBase up, start client, kill ZK and HBase restart
The client just create a new HTable and then tries to get some rows in a loop.
In all cases the client should successfully be able to reconnect when both ZK and HBase are up.
The problem I have seen in 0.94/0.92 without this patch even with managed connections is that after HConnection times out, it is unusable and even getting a new HTable does not fix the problem since behind the scenes the same HConnection is retrieved.
Will think about an automated test. Do you like the version better that always does the recheck (and hence all the conditional for "managed" go away)?

If we pass in a connection from outside, down in the guts, do special handling that makes the connection and zookeeper handling do reconnect. Its like we should be passing an Interface made at a higher-level of abstraction and then in the implementation, it did this fixup when connection breaks.

With that out of the way, do whatever you need to make it work. Patch looks fine. How did you test. Would it be hard to make a unit test of it. A unit test would be good codifying this perversion since it will be brittle being not whats expected.

I'm against changing the behavior of the default case in 0.92/0.94. I'm interested in problems you see in hbase-5153 or issues you have w/ the implementation there that being the 0.96 client.

stack
added a comment - 31/Mar/12 23:11 This is a perversion.
If we pass in a connection from outside, down in the guts, do special handling that makes the connection and zookeeper handling do reconnect. Its like we should be passing an Interface made at a higher-level of abstraction and then in the implementation, it did this fixup when connection breaks.
With that out of the way, do whatever you need to make it work. Patch looks fine. How did you test. Would it be hard to make a unit test of it. A unit test would be good codifying this perversion since it will be brittle being not whats expected.
I'm against changing the behavior of the default case in 0.92/0.94. I'm interested in problems you see in hbase-5153 or issues you have w/ the implementation there that being the 0.96 client.

Thanks Ted.
The last question is: Should we do this for all HConnection (not just for unmanaged ones)? It means that HConnection would be able to recover from loss of ZK connection and the abort() method would only clear out the ZK trackers and never close or abort he connection. I'd be in favor of that.

@Ram and @Jieshan: Since would a more robust version of HBASE-5153, could you have a look at this?

Lars Hofhansl
added a comment - 31/Mar/12 17:42 Thanks Ted.
The last question is: Should we do this for all HConnection (not just for unmanaged ones)? It means that HConnection would be able to recover from loss of ZK connection and the abort() method would only clear out the ZK trackers and never close or abort he connection. I'd be in favor of that.
@Ram and @Jieshan: Since would a more robust version of HBASE-5153 , could you have a look at this?

The gist of this change is that (1) the ZK connection is re-checked in all calls where it is needed and re-established if needed and (2) if the connection is down the client can find out quickly (by setting timeouts accordingly) and report via IOException to the calling thread.
This is only done for unmanaged HConnections (those that were created with HConnectionManager.createConnection(...) and are hence not reference counted. Reference counted HConnctions are treated as before.)

This is needed to safely use the HConnection is a multithreaded long-lived AppServer setting.

(In my tests I found that even 0.96 needs some more work here, but that's for a different jira.)

Lars Hofhansl
added a comment - 31/Mar/12 06:19 The gist of this change is that (1) the ZK connection is re-checked in all calls where it is needed and re-established if needed and (2) if the connection is down the client can find out quickly (by setting timeouts accordingly) and report via IOException to the calling thread.
This is only done for unmanaged HConnections (those that were created with HConnectionManager.createConnection(...) and are hence not reference counted. Reference counted HConnctions are treated as before.)
This is needed to safely use the HConnection is a multithreaded long-lived AppServer setting.
(In my tests I found that even 0.96 needs some more work here, but that's for a different jira.)

the condition is different, because that is what it did before. I.e. if the connection is managed the trackers are setup only at construction and during abort in the specific case of SessionExpiredException. If the connection is unmanaged on the other hand the trackers are rechecked before they are needed and hence abort becomes a no-op for any KeeperExcepion. Hence the condition is exactly reversed. This part is the key of the patch actually.

The space wasn't there before. I actually had the space added and then removed it again I'll add it back.

Lars Hofhansl
added a comment - 31/Mar/12 06:06 Thanks Ted.
the condition is different, because that is what it did before. I.e. if the connection is managed the trackers are setup only at construction and during abort in the specific case of SessionExpiredException. If the connection is unmanaged on the other hand the trackers are rechecked before they are needed and hence abort becomes a no-op for any KeeperExcepion. Hence the condition is exactly reversed. This part is the key of the patch actually.
The space wasn't there before. I actually had the space added and then removed it again I'll add it back.

Ted Yu
added a comment - 31/Mar/12 05:31 For abort():
+ if (managed) {
+ // if the connection is managed attempt to reconnect immediately
+ ensureZookeeperTrackers();
the condition for calling ensureZookeeperTrackers() is different from other calls in the patch. Please explain.
+ private synchronized void ensureZookeeperTrackers()
throws ZooKeeperConnectionException{
Please add a space before the right curly brace.

Lars Hofhansl
added a comment - 31/Mar/12 05:01 Slightly better patch.
No need to wait waitForRootRegion in ZK longer than RecoverableZookeeper tries.
With recovery is clean. The client can control via timeouts how soon it would it would a connection problem up to the application layer.
Since this patch is only against 0.94 I'll run tests locally.

A little bit more background:
I put HBASE-4805 in place to be able to use the standard HBase client in long running app server processes. The idea is that one can manage an HConnection and a ThreadPool per app server and then cheaply create HTables when needed.
For that setup it is vital that the HConnection does not become unusable when there are temporary network outages, or that HBase cluster is temporarily taken down.
In that case the clients should timeout quickly to allow the application to react to it, and if the network/cluster has recovered the application should be able to recover.

Lars Hofhansl
added a comment - 31/Mar/12 04:01 A little bit more background:
I put HBASE-4805 in place to be able to use the standard HBase client in long running app server processes. The idea is that one can manage an HConnection and a ThreadPool per app server and then cheaply create HTables when needed.
For that setup it is vital that the HConnection does not become unusable when there are temporary network outages, or that HBase cluster is temporarily taken down.
In that case the clients should timeout quickly to allow the application to react to it, and if the network/cluster has recovered the application should be able to recover.

The idea is that if this is an unmanaged Connection (see HBASE-4805), a ZK connection is re-establish whenever needed (if it was lost before).

This patch is somewhat more complicated than I'd like, because I did not want to change the behavior for managed (default) connections.
If we like I can make this the default behavior... Seems much more robust than the current behavior.

I tested this manually, and the connection (if created with HConnectionManager.createConnection, and hence unmanaged) recovers from loosing both the HBase and ZK connections.

(Interestingly in plain HBase 0.94 the client never recovers from this - even with the default connection behavior.)

Lars Hofhansl
added a comment - 31/Mar/12 01:04 - edited Here's a patch.
Please have a careful look. I can upload to RB too.
The idea is that if this is an unmanaged Connection (see HBASE-4805 ), a ZK connection is re-establish whenever needed (if it was lost before).
This patch is somewhat more complicated than I'd like, because I did not want to change the behavior for managed (default) connections.
If we like I can make this the default behavior... Seems much more robust than the current behavior.
I tested this manually, and the connection (if created with HConnectionManager.createConnection, and hence unmanaged) recovers from loosing both the HBase and ZK connections.
(Interestingly in plain HBase 0.94 the client never recovers from this - even with the default connection behavior.)