Rollback of split could cause closed region to be opened again

Details

Description

If master sending close region to rs and region's split transaction concurrently happen,
it may cause closed region to opened.

See the detailed code in SplitTransaction#createDaughters

List<StoreFile> hstoreFilesToSplit = null;
try{
hstoreFilesToSplit = this.parent.close(false);
if (hstoreFilesToSplit == null) {
// The region was closed by a concurrent thread. We can't continue// with the split, instead we must just abandon the split. If we
// reopen or split this could cause problems because the region has
// probably already been moved to a different server, or is in the
// process of moving to a different server.
thrownew IOException("Failed to close region: already closed by " +
"another thread");
}
} finally {
this.journal.add(JournalEntry.CLOSED_PARENT_REGION);
}

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.

Hadoop QA
added a comment - 28/Dec/11 08:40 -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12508728/hbase-5100.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.
-1 javadoc. The javadoc tool appears to have generated -151 warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
-1 findbugs. The patch appears to introduce 77 new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.TestDrainingServer
org.apache.hadoop.hbase.mapred.TestTableMapReduce
org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/607//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/607//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/607//console
This message is automatically generated.

@Zhihong
If we remove the try/finally construct, when encountering excepiton in this.parent.close(false), the rollback of split would not do this.parent.initialize() because of no JournalEntry.CLOSED_PARENT_REGION.

chunhui shen
added a comment - 29/Dec/11 01:28 @Zhihong
If we remove the try/finally construct, when encountering excepiton in this.parent.close(false), the rollback of split would not do this.parent.initialize() because of no JournalEntry.CLOSED_PARENT_REGION.

@Zhihong
If the initial boolean value is false, when encountering excepiton in this.parent.close(false), it will be false all the same which causes JournalEntry.CLOSED_PARENT_REGION not added in this.journal

chunhui shen
added a comment - 29/Dec/11 01:40 @Zhihong
If the initial boolean value is false, when encountering excepiton in this.parent.close(false), it will be false all the same which causes JournalEntry.CLOSED_PARENT_REGION not added in this.journal

Jieshan Bean
added a comment - 29/Dec/11 05:03 The patch is good. If region has been closed by other thread, just abondon the split.That region should not be online again while rolling back. Meanwhile, we just need to clean the splitDir.

I thought I like Java until I understood what createDaughters() should be doing - with Chunhui's patch, i.e.

The case is that we need to handle two types of exceptions: the one coming out of parent.close(false) call and the IOE thrown from within the try block.

I got some idea when I was driving this morning. 5100.txt is my proposed form where I tried to disambiguate the two types of exceptions by removing finally block.
closedByOtherException is a singleton so the overhead of my proposal vs. introducing a boolean is negligible.

Ted Yu
added a comment - 29/Dec/11 16:56 I thought I like Java until I understood what createDaughters() should be doing - with Chunhui's patch, i.e.
The case is that we need to handle two types of exceptions: the one coming out of parent.close(false) call and the IOE thrown from within the try block.
I got some idea when I was driving this morning. 5100.txt is my proposed form where I tried to disambiguate the two types of exceptions by removing finally block.
closedByOtherException is a singleton so the overhead of my proposal vs. introducing a boolean is negligible.
I ran split-related tests and they passed:
806 mt -Dtest=TestCoprocessorInterface
807 mt -Dtest=TestSplitTransaction
808 mt -Dtest=TestSplitTransactionOnCluster
809 mt -Dtest=TestEndToEndSplitTransaction
810 mt -Dtest=TestHRegion
Please share your comments.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.

Hadoop QA
added a comment - 29/Dec/11 19:15 -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12508859/5100.txt
against trunk revision .
+1 @author. The patch does not contain any @author tags.
-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.
-1 javadoc. The javadoc tool appears to have generated -151 warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
-1 findbugs. The patch appears to introduce 76 new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.coprocessor.TestMasterObserver
org.apache.hadoop.hbase.mapred.TestTableMapReduce
org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/627//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/627//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/627//console
This message is automatically generated.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.

Hadoop QA
added a comment - 29/Dec/11 22:46 -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12508876/5100.txt
against trunk revision .
+1 @author. The patch does not contain any @author tags.
-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.
-1 javadoc. The javadoc tool appears to have generated -151 warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
-1 findbugs. The patch appears to introduce 76 new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.mapred.TestTableMapReduce
org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/631//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/631//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/631//console
This message is automatically generated.

chunhui shen
added a comment - 30/Dec/11 01:51 @Zhihong:
I like the modify of 5100.txt,
However,is it any possible to throw other exception in this.parent.close(false) ,such as NullPointerException? Maybe it is impossible and needn't worry.

Patch looks fine but seems a bit involved. I tried my hand at it and ended up w/ the below which is not much better so +1 on commit:

Index: src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
===================================================================
--- src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java (revision 1221978)
+++ src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java (working copy)
@@ -246,18 +246,24 @@
this.journal.add(JournalEntry.CREATE_SPLIT_DIR);
List<StoreFile> hstoreFilesToSplit = null;
- try{
+ boolean someoneElseClosedRegion = false;
+ try {
hstoreFilesToSplit = this.parent.close(false);
if (hstoreFilesToSplit == null) {
// The region was closed by a concurrent thread. We can't continue- // with the split, instead we must just abandon the split. If we
- // reopen or split this could cause problems because the region has
- // probably already been moved to a different server, or is in the
- // process of moving to a different server.
+ // with the split. Instead we must abandon the split. If we reopen
+ // or split this could cause problems because the region has probably
+ // already been moved to a different server, or is in the process of
+ // moving to a different server. Set boolean. See finally clause
+ // for how its handled.
+ someoneElseClosedRegion = true;
+ }
+ } finally {
+ if (someoneElseClosedRegion) {
+ // If closed by someone else, don't add journal entry.
thrownew IOException("Failed to close region: already closed by " +
- "another thread");
+ another thread");
}
- } finally {
this.journal.add(JournalEntry.CLOSED_PARENT_REGION);
}

stack
added a comment - 30/Dec/11 05:14 This is a good catch Chunhui; nice one.
Patch looks fine but seems a bit involved. I tried my hand at it and ended up w/ the below which is not much better so +1 on commit:
Index: src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
===================================================================
--- src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java (revision 1221978)
+++ src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java (working copy)
@@ -246,18 +246,24 @@
this .journal.add(JournalEntry.CREATE_SPLIT_DIR);
List<StoreFile> hstoreFilesToSplit = null ;
- try {
+ boolean someoneElseClosedRegion = false ;
+ try {
hstoreFilesToSplit = this .parent.close( false );
if (hstoreFilesToSplit == null ) {
// The region was closed by a concurrent thread. We can't continue
- // with the split, instead we must just abandon the split. If we
- // reopen or split this could cause problems because the region has
- // probably already been moved to a different server, or is in the
- // process of moving to a different server.
+ // with the split. Instead we must abandon the split. If we reopen
+ // or split this could cause problems because the region has probably
+ // already been moved to a different server, or is in the process of
+ // moving to a different server. Set boolean . See finally clause
+ // for how its handled.
+ someoneElseClosedRegion = true ;
+ }
+ } finally {
+ if (someoneElseClosedRegion) {
+ // If closed by someone else , don't add journal entry.
throw new IOException( "Failed to close region: already closed by " +
- "another thread" );
+ another thread");
}
- } finally {
this .journal.add(JournalEntry.CLOSED_PARENT_REGION);
}

Reviewing the other, older patch (Ted just wrote me to review it instead), it looks cleaner (hbase-5100.patch). I don't know what closedJE is. I'm +1 on commit but suggest we use a better name – perhaps closedParent – on commit or the above suggested long name (I don't think it too long)

stack
added a comment - 30/Dec/11 05:32 Reviewing the other, older patch (Ted just wrote me to review it instead), it looks cleaner (hbase-5100.patch). I don't know what closedJE is. I'm +1 on commit but suggest we use a better name – perhaps closedParent – on commit or the above suggested long name (I don't think it too long)

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.

Hadoop QA
added a comment - 30/Dec/11 08:07 -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12508916/5100-v2.txt
against trunk revision .
+1 @author. The patch does not contain any @author tags.
-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.
-1 javadoc. The javadoc tool appears to have generated -151 warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
-1 findbugs. The patch appears to introduce 76 new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.mapred.TestTableMapReduce
org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/639//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/639//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/639//console
This message is automatically generated.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.

Hadoop QA
added a comment - 30/Dec/11 12:30 -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12508926/5100-double-exeception.txt
against trunk revision .
+1 @author. The patch does not contain any @author tags.
-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.
-1 javadoc. The javadoc tool appears to have generated -151 warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
-1 findbugs. The patch appears to introduce 76 new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.mapred.TestTableMapReduce
org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/641//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/641//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/641//console
This message is automatically generated.