HBase client is blocked forever

Details

Description

Since the client had a temporary network failure, After it recovered.
I found my client thread was blocked.
Looks below stack and logs, It said that we use a invalid CatalogTracker in function "tableExists".

In ZooKeeperNodeTracker, We don't throw the KeeperException to high level.
So in CatalogTracker level, We think ZooKeeperNodeTracker start success and
continue to process .

[WriteHbaseThread33]2011-12-16 17:07:33,153[WARN ] | hconnection-0x334129cf6890051-0x334129cf6890051-0x334129cf6890051 Unable to get data of znode /hbase/root-region-server | org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:557)
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/root-region-server
at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:931)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:549)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:73)
at org.apache.hadoop.hbase.catalog.CatalogTracker.start(CatalogTracker.java:136)
at org.apache.hadoop.hbase.client.HBaseAdmin.getCatalogTracker(HBaseAdmin.java:111)
at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:162)
at com.huawei.hdi.hbase.HbaseFileOperate.checkHtableState(Unknown Source)
at com.huawei.hdi.hbase.HbaseReOper.reCreateHtable(Unknown Source)
at com.huawei.hdi.hbase.HbaseFileOperate.writeToHbase(Unknown Source)
at com.huawei.hdi.hbase.WriteHbaseThread.run(Unknown Source)[WriteHbaseThread33]2011-12-16 17:07:33,361[ERROR] | hconnection-0x334129cf6890051-0x334129cf6890051-0x334129cf6890051 Received unexpected KeeperException, re-throwing exception | org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.keeperException(ZooKeeperWatcher.java:385)
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/root-region-server
at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:931)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:549)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:73)
at org.apache.hadoop.hbase.catalog.CatalogTracker.start(CatalogTracker.java:136)
at org.apache.hadoop.hbase.client.HBaseAdmin.getCatalogTracker(HBaseAdmin.java:111)
at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:162)
at com.huawei.hdi.hbase.HbaseFileOperate.checkHtableState(Unknown Source)
at com.huawei.hdi.hbase.HbaseReOper.reCreateHtable(Unknown Source)
at com.huawei.hdi.hbase.HbaseFileOperate.writeToHbase(Unknown Source)
at com.huawei.hdi.hbase.WriteHbaseThread.run(Unknown Source)

[WriteHbaseThread33]2011-12-16 17:07:33,361[FATAL] | Unexpected exception during initialization, aborting | org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.abort(HConnectionManager.java:1351)
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/root-region-server
at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:931)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:549)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:73)
at org.apache.hadoop.hbase.catalog.CatalogTracker.start(CatalogTracker.java:136)
at org.apache.hadoop.hbase.client.HBaseAdmin.getCatalogTracker(HBaseAdmin.java:111)
at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:162)
at com.huawei.hdi.hbase.HbaseFileOperate.checkHtableState(Unknown Source)
at com.huawei.hdi.hbase.HbaseReOper.reCreateHtable(Unknown Source)
at com.huawei.hdi.hbase.HbaseFileOperate.writeToHbase(Unknown Source)
at com.huawei.hdi.hbase.WriteHbaseThread.run(Unknown Source)

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.

Hadoop QA
added a comment - 19/Dec/11 09:51 -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12507874/HBASE-5060_trunk.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.
-1 javadoc. The javadoc tool appears to have generated -152 warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
-1 findbugs. The patch appears to introduce 76 new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.mapred.TestTableMapReduce
org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/540//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/540//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/540//console
This message is automatically generated.