[ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13198376#comment-13198376
]
jiraposter@reviews.apache.org commented on HBASE-5128:
------------------------------------------------------
bq. On 2012-01-25 18:01:32, Ted Yu wrote:
bq. > We should deprecate clearRegionFromTransition().
done.
bq. On 2012-01-25 18:01:32, Ted Yu wrote:
bq. > src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java, line 202
bq. > <https://reviews.apache.org/r/3435/diff/2/?file=68922#file68922line202>
bq. >
bq. > We should set interrupt flag.
replaced with Thread.getCurrentThread().interrupt();
bq. On 2012-01-25 18:01:32, Ted Yu wrote:
bq. > src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java, line 197
bq. > <https://reviews.apache.org/r/3435/diff/2/?file=68922#file68922line197>
bq. >
bq. > success is local variable.
bq. > Why don't we change return type to boolean and return its value ?
I've cleaned this up to reuse the connection from an HBaseAdmin. v3 already has this update
in some places -- this is one of the places missed.
bq. On 2012-01-25 18:01:32, Ted Yu wrote:
bq. > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1636
bq. > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line1636>
bq. >
bq. > This TODO has been implemented, so we can remove it.
removed
bq. On 2012-01-25 18:01:32, Ted Yu wrote:
bq. > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1131
bq. > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line1131>
bq. >
bq. > How about naming this method hasHdfsOnlyEdits() ?
renamed to containsOnlyHdfsEdits
bq. On 2012-01-25 18:01:32, Ted Yu wrote:
bq. > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1081
bq. > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line1081>
bq. >
bq. > This sentence should be moved before ' from ...'
That code has been refactored in v3 but the message was a bit off. I've updated it.
bq. On 2012-01-25 18:01:32, Ted Yu wrote:
bq. > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1083
bq. > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line1083>
bq. >
bq. > We should handle potential exception from this method.
bq. >
bq. > Maybe we should check the availability of this rpc outside the loop and set
a flag indicating whether Master supports this RPC.
This was something that I noted that I was going to handle in the next rev -- checkout v3,
I think it addresses the concern.
- jmhsieh
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3435/#review4591
-----------------------------------------------------------
On 2012-01-25 17:24:41, jmhsieh wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/3435/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-01-25 17:24:41)
bq.
bq.
bq. Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. I'm posting a preliminary version that I'm currently testing on real clusters. The tests
are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly),
and there are a few more TODO's I want to knock out before this is ready for full review to
be considered for committing. It's got some problems I need some advice figuring out.
bq.
bq. Problem 1:
bq.
bq. In the unit tests, I have a few cases where I fabricate new regions and try to force
the overlapping regions to be closed. For some of these, I cannot delete a table after it
is repaired without causing subsequent tests to fail. I think this is due to a few things:
bq.
bq. 1) The disable table handler uses in-memory assignment manager state while delete uses
in META assignment information.
bq. 2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master
and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments.
If I use the unassign method sends RIT transitions to the master, but which ends up attempting
to assign it again, causing timing/transient states.
bq.
bq. What is a good way to clear the HMaster's assignment manager's assignment data for particular
regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant
to repair).
bq.
bq. Problem 2:
bq.
bq. Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This
means the old and new regions are confiused with each other and basically something is still
happening asynchronously. I think this is the new region is being assigned and is still transitioning.
Sound about right? To make the unit test deterministic, should hbck wait for these to settle
or should just the unit test wait?
bq.
bq.
bq. This addresses bug HBASE-5128.
bq. https://issues.apache.org/jira/browse/HBASE-5128
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6
bq. src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 9520b95
bq. src/main/java/org/apache/hadoop/hbase/master/HMaster.java f7ad064
bq. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d
bq. src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b
bq. src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2
bq. src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION
bq. src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 7138d63
bq. src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57
bq. src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 2c4a79e
bq. src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8
bq. src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151
bq.
bq. Diff: https://reviews.apache.org/r/3435/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. All unit tests pass sometimes. Some fail sometimes (generally the cases that fabricate
new regions).
bq.
bq. Not ready for commit.
bq.
bq.
bq. Thanks,
bq.
bq. jmhsieh
bq.
bq.
> [uber hbck] Enable hbck to automatically repair table integrity problems as well as region
consistency problems while online.
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-5128
> URL: https://issues.apache.org/jira/browse/HBASE-5128
> Project: HBase
> Issue Type: New Feature
> Components: hbck
> Affects Versions: 0.90.5, 0.92.0
> Reporter: Jonathan Hsieh
> Assignee: Jonathan Hsieh
>
> The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and
table integrity invariant violations. However with '-fix' it can only automatically repair
region consistency cases having to do with deployment problems. This updated version should
be able to handle all cases (including a new orphan regiondir case). When complete will likely
deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.
> Here's the approach (from the comment of at the top of the new version of the file).
> {code}
> /**
> * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
> * table integrity.
> *
> * Region consistency checks verify that META, region deployment on
> * region servers and the state of data in HDFS (.regioninfo files) all are in
> * accordance.
> *
> * Table integrity checks verify that that all possible row keys can resolve to
> * exactly one region of a table. This means there are no individual degenerate
> * or backwards regions; no holes between regions; and that there no overlapping
> * regions.
> *
> * The general repair strategy works in these steps.
> * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
> * 2) Repair Region Consistency with META and assignments
> *
> * For table integrity repairs, the tables their region directories are scanned
> * for .regioninfo files. Each table's integrity is then verified. If there
> * are any orphan regions (regions with no .regioninfo files), or holes, new
> * regions are fabricated. Backwards regions are sidelined as well as empty
> * degenerate (endkey==startkey) regions. If there are any overlapping regions,
> * a new region is created and all data is merged into the new region.
> *
> * Table integrity repairs deal solely with HDFS and can be done offline -- the
> * hbase region servers or master do not need to be running. These phase can be
> * use to completely reconstruct the META table in an offline fashion.
> *
> * Region consistency requires three conditions -- 1) valid .regioninfo file
> * present in an hdfs region dir, 2) valid row with .regioninfo data in META,
> * and 3) a region is deployed only at the regionserver that is was assigned to.
> *
> * Region consistency requires hbck to contact the HBase master and region
> * servers, so the connect() must first be called successfully. Much of the
> * region consistency information is transient and less risky to repair.
> */
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira