hbase-issues mailing list archives

[jira] [Created] (HBASE-14889) Region stuck in transition in OPEN state indefinitely in corner scenario

Date

Thu, 26 Nov 2015 07:32:11 GMT

Abhishek Singh Chouhan created HBASE-14889:
----------------------------------------------
Summary: Region stuck in transition in OPEN state indefinitely in corner scenario
Key: HBASE-14889
URL: https://issues.apache.org/jira/browse/HBASE-14889
Project: HBase
Issue Type: Bug
Affects Versions: 0.98.14
Reporter: Abhishek Singh Chouhan
During a failure scenario when a RS dies and the bulk assigner(BA) is assigning its regions
to others RSs, if another RS dies(on which some regions are being moved) on which region is
in pending open state, we end up in a situation where two bulk assigners try to assign the
same region on the Same RS.
The following happened -
1. While one BA was opening the region the second one sees it in pending open state, retries
and calls unassign(...) thereby sending CLOSE RPC to the RS.
2. The RS meanwhile has already opened the region, hence changing the znode state to RS_ZK_REGION_OPENED
which triggers event on master.
3. On master after the unassign is successful we go on to deleting the znode, change region
state to Pending open and send open RPC to RS.
4. The earlier triggered event now sees the state as Pending open and happily changes it to
OPEN, but is unable to delete the znode which by this time is not in RS_ZK_REGION_OPENED state
but is in M_ZK_REGION_OFFLINE state. Hence the region remains in transition in the OPEN state.
5. RS goes on to changing the znode states and successfully opens the region (changes znode
state to RS_ZK_REGION_OPENED)
6. This again triggers event on master but this time since the state is OPEN the folloing
code path is taken
{noformat}
case RS_ZK_REGION_OPENED:
// Should see OPENED after OPENING but possible after PENDING_OPEN.
if (regionState == null
|| !regionState.isPendingOpenOrOpeningOnServer(sn)) {
LOG.warn("Received OPENED for " + prettyPrintedRegionName
+ " from " + sn + " but the region isn't PENDING_OPEN/OPENING here: "
+ regionStates.getRegionState(encodedName));
if (regionState != null) {
// Close it without updating the internal region states,
// so as not to create double assignments in unlucky scenarios
// mentioned in OpenRegionHandler#process
unassign(regionState.getRegion(), null, -1, null, false, sn);
}
return;
}
{noformat}
We call unassign here with transitionInZK=false and state=null
7. RS closes the region but doesn't update the ZK, also state is not changed in master. Region
remains in transition in OPEN state, when its actually closed. We have to restart the RS post
which it opens correctly on some other RS.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)