[availability] Skip recovered.edits files with edits we know older than what region currently has

Details

Description

Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files – tens if not hundreds – to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time.

Thinking some more on this, we don't need to rename recovered.edits files. The files are named for the first sequenceid in the file, so, we could just do file listing and sort the return. Then we'd have range of sequenceids per file. We could then just pass on files with edits that are smaller than regions current seqid.

stack
added a comment - 16/Nov/11 18:41 Thinking some more on this, we don't need to rename recovered.edits files. The files are named for the first sequenceid in the file, so, we could just do file listing and sort the return. Then we'd have range of sequenceids per file. We could then just pass on files with edits that are smaller than regions current seqid.

Todd Lipcon
added a comment - 16/Nov/11 19:00 Plus, looks like we're burning a lot of time with synchronous updates to the region opening "twiddle". Perhaps add a little timestamp in there that we only twiddle it every 5 seconds (or do it async)

Thanks Jimmy for taking this on. Looks like you don't have to rename the files; just sort them and figure which set to apply (and do what Todd suggests rewriting the znode less often – or asynchronously).

stack
added a comment - 21/Nov/11 18:51 Thanks Jimmy for taking this on. Looks like you don't have to rename the files; just sort them and figure which set to apply (and do what Todd suggests rewriting the znode less often – or asynchronously).

Yes, that's what I was thinking. The file name has the start seq id. If
there are multiple files, there should be multiple start seq ids. That
implies the max seq ids in
some of these files, if sorted. I can use these information to filter out
some files safely.

Jimmy Xiang
added a comment - 21/Nov/11 19:16 Yes, that's what I was thinking. The file name has the start seq id. If
there are multiple files, there should be multiple start seq ids. That
implies the max seq ids in
some of these files, if sorted. I can use these information to filter out
some files safely.
On Mon, Nov 21, 2011 at 10:52 AM, stack (Commented) (JIRA)

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:https://reviews.apache.org/r/2906/
-----------------------------------------------------------

Review request for hbase, Todd Lipcon and Michael Stack.

Summary
-------

If there are multiple recovered edits files, I used the file name to find the initial sequence id. After these files are sorted, we can find a file's possible maximum sequence id based on the next file's initial sequence id. If the maximum sequence id is smaller than the current sequence id, the whole recovered edits file is old and ignored.

jiraposter@reviews.apache.org
added a comment - 21/Nov/11 22:39
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2906/
-----------------------------------------------------------
Review request for hbase, Todd Lipcon and Michael Stack.
Summary
-------
If there are multiple recovered edits files, I used the file name to find the initial sequence id. After these files are sorted, we can find a file's possible maximum sequence id based on the next file's initial sequence id. If the maximum sequence id is smaller than the current sequence id, the whole recovered edits file is old and ignored.
This addresses bug HBASE-4797 .
https://issues.apache.org/jira/browse/HBASE-4797
Diffs
src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 8b89661
src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java 5daa02b
Diff: https://reviews.apache.org/r/2906/diff
Testing
-------
Added test case to TestHRegion, and all the tests in this test are passed.
Thanks,
Jimmy

In future, would suggest you confine your change just to what you are adding. The white space cleanup is nice but it distracts from your patch. It also bloats it and makes it look intimidating to review (smile).

If there are multiple recovered edits files, I used the file name to find the initial sequence id. After these files are sorted, we can find a file's possible maximum sequence id based on the next file's initial sequence id. If the maximum sequence id is smaller than the current sequence id, the whole recovered edits file is old and ignored.

jiraposter@reviews.apache.org
added a comment - 21/Nov/11 22:49
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2906/#review3409
-----------------------------------------------------------
Very nice patch.
In future, would suggest you confine your change just to what you are adding. The white space cleanup is nice but it distracts from your patch. It also bloats it and makes it look intimidating to review (smile).
Minor fixups only.
src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
< https://reviews.apache.org/r/2906/#comment7635 >
So, are these already sorted in right order from oldest edit to newest?
src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
< https://reviews.apache.org/r/2906/#comment7636 >
Possilbe should be Possible.
I'd be more assertive in this message. "Maximum possible sequenceid for this log is " + + ", skipping ..
src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
< https://reviews.apache.org/r/2906/#comment7637 >
Good.
src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
< https://reviews.apache.org/r/2906/#comment7638 >
Any more asserts we can do in here? Assert we replayed N of the M files?
Michael
On 2011-11-21 22:38:39, Jimmy Xiang wrote:
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2906/
-----------------------------------------------------------
(Updated 2011-11-21 22:38:39)
Review request for hbase, Todd Lipcon and Michael Stack.
Summary
-------
If there are multiple recovered edits files, I used the file name to find the initial sequence id. After these files are sorted, we can find a file's possible maximum sequence id based on the next file's initial sequence id. If the maximum sequence id is smaller than the current sequence id, the whole recovered edits file is old and ignored.
This addresses bug HBASE-4797 .
https://issues.apache.org/jira/browse/HBASE-4797
Diffs
-----
src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 8b89661
src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java 5daa02b
Diff: https://reviews.apache.org/r/2906/diff
Testing
-------
Added test case to TestHRegion, and all the tests in this test are passed.
Thanks,
Jimmy

The "title" for the bug can be updated given that we are no longer renaming the files in recovered.edits. [That concerned me initially -- but reading through the details, looks like you have come up with a way to avoid new name format. That's always smoother for upgrades and such..]

Kannan Muthukkaruppan
added a comment - 21/Nov/11 22:51 The "title" for the bug can be updated given that we are no longer renaming the files in recovered.edits. [That concerned me initially -- but reading through the details, looks like you have come up with a way to avoid new name format. That's always smoother for upgrades and such..]

stack
added a comment - 21/Nov/11 22:56 The region opening is tried periodically. The waiting interval is about 1/3 of the assignment time out. I think that's fine.
From the log snippet above though Jimmy, it seems like we are updating the znode every second almost. Thats too much?

If there are multiple recovered edits files, I used the file name to find the initial sequence id. After these files are sorted, we can find a file's possible maximum sequence id based on the next file's initial sequence id. If the maximum sequence id is smaller than the current sequence id, the whole recovered edits file is old and ignored.

jiraposter@reviews.apache.org
added a comment - 21/Nov/11 23:23
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2906/#review3413
-----------------------------------------------------------
src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
< https://reviews.apache.org/r/2906/#comment7642 >
maxSedId should be named maxSeqId
Ted
On 2011-11-21 22:38:39, Jimmy Xiang wrote:
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2906/
-----------------------------------------------------------
(Updated 2011-11-21 22:38:39)
Review request for hbase, Todd Lipcon and Michael Stack.
Summary
-------
If there are multiple recovered edits files, I used the file name to find the initial sequence id. After these files are sorted, we can find a file's possible maximum sequence id based on the next file's initial sequence id. If the maximum sequence id is smaller than the current sequence id, the whole recovered edits file is old and ignored.
This addresses bug HBASE-4797 .
https://issues.apache.org/jira/browse/HBASE-4797
Diffs
-----
src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 8b89661
src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java 5daa02b
Diff: https://reviews.apache.org/r/2906/diff
Testing
-------
Added test case to TestHRegion, and all the tests in this test are passed.
Thanks,
Jimmy

If there are multiple recovered edits files, I used the file name to find the initial sequence id. After these files are sorted, we can find a file's possible maximum sequence id based on the next file's initial sequence id. If the maximum sequence id is smaller than the current sequence id, the whole recovered edits file is old and ignored.

jiraposter@reviews.apache.org
added a comment - 22/Nov/11 00:03
On 2011-11-21 23:23:07, Ted Yu wrote:
> src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 2468
> < https://reviews.apache.org/r/2906/diff/2/?file=59652#file59652line2468 >
>
> maxSedId should be named maxSeqId
Good catch.
Jimmy
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2906/#review3413
-----------------------------------------------------------
On 2011-11-21 22:38:39, Jimmy Xiang wrote:
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2906/
-----------------------------------------------------------
(Updated 2011-11-21 22:38:39)
Review request for hbase, Todd Lipcon and Michael Stack.
Summary
-------
If there are multiple recovered edits files, I used the file name to find the initial sequence id. After these files are sorted, we can find a file's possible maximum sequence id based on the next file's initial sequence id. If the maximum sequence id is smaller than the current sequence id, the whole recovered edits file is old and ignored.
This addresses bug HBASE-4797 .
https://issues.apache.org/jira/browse/HBASE-4797
Diffs
-----
src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 8b89661
src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java 5daa02b
Diff: https://reviews.apache.org/r/2906/diff
Testing
-------
Added test case to TestHRegion, and all the tests in this test are passed.
Thanks,
Jimmy

> So, are these already sorted in right order from oldest edit to newest?

All these files are under the same folder, if these files have the same name pattern as defined in HLog: String.format("%019d", seqid);
yes, they are sorted in the right order based on the sequence id number.

If this is not true, then the order to reapply these edits is already wrong.

If there are multiple recovered edits files, I used the file name to find the initial sequence id. After these files are sorted, we can find a file's possible maximum sequence id based on the next file's initial sequence id. If the maximum sequence id is smaller than the current sequence id, the whole recovered edits file is old and ignored.

jiraposter@reviews.apache.org
added a comment - 22/Nov/11 00:32
On 2011-11-21 22:47:55, Michael Stack wrote:
> src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 2456
> < https://reviews.apache.org/r/2906/diff/2/?file=59652#file59652line2456 >
>
> So, are these already sorted in right order from oldest edit to newest?
All these files are under the same folder, if these files have the same name pattern as defined in HLog: String.format("%019d", seqid);
yes, they are sorted in the right order based on the sequence id number.
If this is not true, then the order to reapply these edits is already wrong.
On 2011-11-21 22:47:55, Michael Stack wrote:
> src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 2475
> < https://reviews.apache.org/r/2906/diff/2/?file=59652#file59652line2475 >
>
> Possilbe should be Possible.
>
> I'd be more assertive in this message. "Maximum possible sequenceid for this log is " + + ", skipping ..
Sure, I will fix it.
On 2011-11-21 22:47:55, Michael Stack wrote:
> src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java, line 2855
> < https://reviews.apache.org/r/2906/diff/2/?file=59653#file59653line2855 >
>
> Any more asserts we can do in here? Assert we replayed N of the M files?
Sure, I added more test cases.
Jimmy
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2906/#review3409
-----------------------------------------------------------
On 2011-11-21 22:38:39, Jimmy Xiang wrote:
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2906/
-----------------------------------------------------------
(Updated 2011-11-21 22:38:39)
Review request for hbase, Todd Lipcon and Michael Stack.
Summary
-------
If there are multiple recovered edits files, I used the file name to find the initial sequence id. After these files are sorted, we can find a file's possible maximum sequence id based on the next file's initial sequence id. If the maximum sequence id is smaller than the current sequence id, the whole recovered edits file is old and ignored.
This addresses bug HBASE-4797 .
https://issues.apache.org/jira/browse/HBASE-4797
Diffs
-----
src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 8b89661
src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java 5daa02b
Diff: https://reviews.apache.org/r/2906/diff
Testing
-------
Added test case to TestHRegion, and all the tests in this test are passed.
Thanks,
Jimmy

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:https://reviews.apache.org/r/2906/
-----------------------------------------------------------

(Updated 2011-11-22 00:32:48.813017)

Review request for hbase, Todd Lipcon and Michael Stack.

Changes
-------

Revised patch with changes per review.

Summary
-------

If there are multiple recovered edits files, I used the file name to find the initial sequence id. After these files are sorted, we can find a file's possible maximum sequence id based on the next file's initial sequence id. If the maximum sequence id is smaller than the current sequence id, the whole recovered edits file is old and ignored.

jiraposter@reviews.apache.org
added a comment - 22/Nov/11 00:34
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2906/
-----------------------------------------------------------
(Updated 2011-11-22 00:32:48.813017)
Review request for hbase, Todd Lipcon and Michael Stack.
Changes
-------
Revised patch with changes per review.
Summary
-------
If there are multiple recovered edits files, I used the file name to find the initial sequence id. After these files are sorted, we can find a file's possible maximum sequence id based on the next file's initial sequence id. If the maximum sequence id is smaller than the current sequence id, the whole recovered edits file is old and ignored.
This addresses bug HBASE-4797 .
https://issues.apache.org/jira/browse/HBASE-4797
Diffs (updated)
src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 8b89661
src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java 5daa02b
Diff: https://reviews.apache.org/r/2906/diff
Testing
-------
Added test case to TestHRegion, and all the tests in this test are passed.
Thanks,
Jimmy

If there are multiple recovered edits files, I used the file name to find the initial sequence id. After these files are sorted, we can find a file's possible maximum sequence id based on the next file's initial sequence id. If the maximum sequence id is smaller than the current sequence id, the whole recovered edits file is old and ignored.

jiraposter@reviews.apache.org
added a comment - 22/Nov/11 00:54
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2906/#review3416
-----------------------------------------------------------
Ship it!
Looks good to me.
Thanks for fixing the whitespace too (although it made the patch harder to read).
You also left some whitespace in testSkipRecoveredEditsReplay.
Lars
On 2011-11-22 00:32:48, Jimmy Xiang wrote:
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2906/
-----------------------------------------------------------
(Updated 2011-11-22 00:32:48)
Review request for hbase, Todd Lipcon and Michael Stack.
Summary
-------
If there are multiple recovered edits files, I used the file name to find the initial sequence id. After these files are sorted, we can find a file's possible maximum sequence id based on the next file's initial sequence id. If the maximum sequence id is smaller than the current sequence id, the whole recovered edits file is old and ignored.
This addresses bug HBASE-4797 .
https://issues.apache.org/jira/browse/HBASE-4797
Diffs
-----
src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 8b89661
src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java 5daa02b
Diff: https://reviews.apache.org/r/2906/diff
Testing
-------
Added test case to TestHRegion, and all the tests in this test are passed.
Thanks,
Jimmy

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:https://reviews.apache.org/r/2906/
-----------------------------------------------------------

(Updated 2011-11-22 01:02:17.373022)

Review request for hbase, Todd Lipcon and Michael Stack.

Changes
-------

Removed white spaces in TestHRegion.java

Summary
-------

If there are multiple recovered edits files, I used the file name to find the initial sequence id. After these files are sorted, we can find a file's possible maximum sequence id based on the next file's initial sequence id. If the maximum sequence id is smaller than the current sequence id, the whole recovered edits file is old and ignored.

jiraposter@reviews.apache.org
added a comment - 22/Nov/11 01:02
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2906/
-----------------------------------------------------------
(Updated 2011-11-22 01:02:17.373022)
Review request for hbase, Todd Lipcon and Michael Stack.
Changes
-------
Removed white spaces in TestHRegion.java
Summary
-------
If there are multiple recovered edits files, I used the file name to find the initial sequence id. After these files are sorted, we can find a file's possible maximum sequence id based on the next file's initial sequence id. If the maximum sequence id is smaller than the current sequence id, the whole recovered edits file is old and ignored.
This addresses bug HBASE-4797 .
https://issues.apache.org/jira/browse/HBASE-4797
Diffs (updated)
src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 8b89661
src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java 5daa02b
Diff: https://reviews.apache.org/r/2906/diff
Testing
-------
Added test case to TestHRegion, and all the tests in this test are passed.
Thanks,
Jimmy

Hadoop QA
added a comment - 22/Nov/11 02:09 -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12504687/0001-HBASE-4797-%5Bavailability%5D-skip-older-edits.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 4 new or modified tests.
-1 patch. The patch command could not apply the patch.
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/326//console
This message is automatically generated.

If there are multiple recovered edits files, I used the file name to find the initial sequence id. After these files are sorted, we can find a file's possible maximum sequence id based on the next file's initial sequence id. If the maximum sequence id is smaller than the current sequence id, the whole recovered edits file is old and ignored.

jiraposter@reviews.apache.org
added a comment - 22/Nov/11 04:58
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2906/#review3425
-----------------------------------------------------------
+1 on patch
ramkrishna
On 2011-11-22 01:02:17, Jimmy Xiang wrote:
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2906/
-----------------------------------------------------------
(Updated 2011-11-22 01:02:17)
Review request for hbase, Todd Lipcon and Michael Stack.
Summary
-------
If there are multiple recovered edits files, I used the file name to find the initial sequence id. After these files are sorted, we can find a file's possible maximum sequence id based on the next file's initial sequence id. If the maximum sequence id is smaller than the current sequence id, the whole recovered edits file is old and ignored.
This addresses bug HBASE-4797 .
https://issues.apache.org/jira/browse/HBASE-4797
Diffs
-----
src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 8b89661
src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java 5daa02b
Diff: https://reviews.apache.org/r/2906/diff
Testing
-------
Added test case to TestHRegion, and all the tests in this test are passed.
Thanks,
Jimmy

stack
added a comment - 22/Nov/11 17:43 @Jimmy Just FYI, since you are new, to trigger the build again, you need to re-upload the original patch or a new one (which you did), then (I think) you need to cancel and resubmit the patch.