Our hbase-master-server was shutdown with following message.Hbase is runnig in Distributed mode in a single node.I checked that GC completed in a very short time at the time of output theWARN.In addition the other system that is running in the same architecturedoesn't output the following WARN messsage and works well.So I think that this is not due to a long GC pause.

On 02/08/2013 03:55 AM, So Hibino wrote:> Our hbase-master-server was shutdown with following message.> Hbase is runnig in Distributed mode in a single node.Can you share your .conf files?> I checked that GC completed in a very short time at the time of output the> WARN.> In addition the other system that is running in the same architecture> doesn't output the following WARN messsage and works well.> So I think that this is not due to a long GC pause.>> Do you have any idea about the problem?>> 2013-01-30 03:07:48,582 WARN org.apache.hadoop.hbase.util.Sleeper: We slept> 28970ms instead of 1000ms, this is likely due to a long garbage collecting> pause and it's usually bad, see> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpiredDid you check the link?Todd wrote a series of posts in Clouderaï¿½s blog about Java Long GC pauses, HBase and Zookeeper.Itï¿½s a great read:http://www.cloudera.com/blog/2011/02/avoiding-full-gcs-in-hbase-with-memstore-local-allocation-buffers-part-1/http://www.cloudera.com/blog/2011/02/avoiding-full-gcs-in-hbase-with-memstore-local-allocation-buffers-part-2/> 2013-01-30 03:07:48,583 WARN org.apache.hadoop.hbase.util.Sleeper: We slept> 36902ms instead of 10000ms, this is likely due to a long garbage collecting> pause and it's usually bad, see> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired> 2013-01-30 03:07:48,585 INFO org.apache.zookeeper.ClientCnxn: Client session> timed out, have not heard from server in 39989ms for sessionid> 0x13c84cebfce0000, closing socket connection and attempting reconnect> 2013-01-30 03:07:48,586 INFO org.apache.zookeeper.ClientCnxn: Client session> timed out, have not heard from server in 39987ms for sessionid> 0x13c84cebfce0001, closing socket connection and attempting reconnect> 2013-01-30 03:07:52,779 INFO org.apache.zookeeper.ClientCnxn: Opening socket> connection to server VM_11/192.168.152.1:2181> 2013-01-30 03:07:52,789 INFO org.apache.zookeeper.ClientCnxn: Socket> connection established to VM_11/192.168.152.1:2181, initiating session> 2013-01-30 03:07:52,777 INFO org.apache.zookeeper.ClientCnxn: Opening socket> connection to server VM_11/192.168.152.1:2181> 2013-01-30 03:07:52,793 INFO org.apache.zookeeper.ClientCnxn: Socket> connection established to VM_11/192.168.152.1:2181, initiating session> 2013-01-30 03:07:52,794 INFO org.apache.zookeeper.ClientCnxn: Unable to> reconnect to ZooKeeper service, session 0x13c84cebfce0001 has expired,> closing socket connection> 2013-01-30 03:07:52,794 INFO> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:> This client just lost it's session with ZooKeeper, trying to reconnect.> 2013-01-30 03:07:52,794 INFO> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:> Trying to reconnect to zookeeper.> 2013-01-30 03:07:52,795 INFO org.apache.zookeeper.ZooKeeper: Initiating> client connection, connectString=VM_11:2181 sessionTimeout=180000> watcher=hconnection> 2013-01-30 03:07:52,812 INFO org.apache.zookeeper.ClientCnxn: Unable to> reconnect to ZooKeeper service, session 0x13c84cebfce0000 has expired,> closing socket connection> 2013-01-30 03:07:52,813 FATAL org.apache.hadoop.hbase.master.HMaster:> master:60000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000> master:60000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000> received expired from ZooKeeper, aborting> org.apache.zookeeper.KeeperException$SessionExpiredException:> KeeperErrorCode = Session expired> at> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:361)

The master doesn't have memstores so this wouldn't help. In fact it'spretty rare that we see the master with GC issues. I recall seingissues with time travelling (machine clock's too slow and ntpd resetsit) or on EC2 where sometimes you'd see random machine pauses out ofnowhere (although that was a long time ago and haven't used EC2since).

Hi,>The master doesn't have memstores so this wouldn't help. In fact it's >pretty rare that we see the master with GC issues. I recall seing >issues with time travelling (machine clock's too slow and ntpd resets >it) or on EC2 where sometimes you'd see random machine pauses out of >nowhere (although that was a long time ago and haven't used EC2 >since).We doesn't use EC2,but this server works with KVM.

Hi, >The master doesn't have memstores so this wouldn't help. In fact it's >pretty rare that we see the master with GC issues. I recall seing >issues with time travelling (machine clock's too slow and ntpd resets >it) or on EC2 where sometimes you'd see random machine pauses out of >nowhere (although that was a long time ago and haven't used EC2 >since). We doesn't use EC2,but this server works with KVM.

First, In the world of Hadoop, if it ain't broke don't fix it, may not be the best advice.HBase is still evolving at a good pace and you want to be closer to the latest releases.CDH4 is stable so that I would agree that going to CDH4 would be best.

Second.You are running this as a single machine within the VM.

What does the hardware look like?Number of cores, physical or virtual ?How much memory?

What type of disks and where are they?( attached or SAN?)Sent from a remote device. Please excuse any typos...