RegionServer abort failed when AbstractFSWAL.shutdown hang

Details

Type: Bug

Status:Open

Priority: Major

Resolution:
Unresolved

Affects Version/s:
None

Fix Version/s:
None

Component/s:
None

Labels:

None

Environment:

HBase 2.1.2

Hadoop 3.1.x

centos 7.4

Description

We use hbase 2.1.2,when the rs with heavy qps and rs abort with error like "Caused by: org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to get sync result after 300000 ms for txid=36380334, WAL system stuck?"

RegionServer aborted failed when AbstractFSWAL.shutdown hang

jstack info always show the regionserver hang with "AbstractFSWAL.shutdown"

parking to wait for <0x00007f18a49b2bb8> (a java.util.concurrent.locks.ReentrantLock$FairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantLock$FairSync.lock(ReentrantLock.java:224)at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.shutdown(AbstractFSWAL.java:815)
at org.apache.hadoop.hbase.wal.AbstractFSWALProvider.shutdown(AbstractFSWALProvider.java:168)
at org.apache.hadoop.hbase.wal.RegionGroupingProvider.shutdown(RegionGroupingProvider.java:221)
at org.apache.hadoop.hbase.wal.WALFactory.shutdown(WALFactory.java:239)
at org.apache.hadoop.hbase.regionserver.HRegionServer.shutdownWAL(HRegionServer.java:1445)at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1117) at java.lang.Thread.run(Thread.java:745)