We would like to give an update on the status of HBASE-10070 work, and openup discussion for how we can do further development.

We seem to be at a point where we have the core functionality of theregion replica, as described in HBASE-10070 working. As pointed outunder the section "Development Phases" in the design doc posted on thejira HBASE-10070, this work was divided into two broad phases. The firstphase introduces region replicas concept, the new consistency model, andcorresponding RPC implementations. All of the issues for Phase 1 can befound under [3]. Phase 2 is still in the works, and contains the proposedchanges listed under [4].

With all the issues committed in HBASE-10070 branch in svn, we think thatthe "phase-1" is complete. The user documentation on HBASE-10513 gives anaccurate picture of what has been done in phase-1 and what the impact ofusing this feature is, APIs etc. We have addeda couple of IT tests as part of this work and we have tested the workwe did in "phase-1" of the project quite extensively in Hortonworks'infrastructure.

In summary, with the code in branch, you can create tables with regionreplicas, do gets / multi gets and scans using TIMELINE consistency withhigh availability. Region replicas periodically scan the files of theprimary and pick up flushed / committed files. The RPC paths / assignment,balancing etc are pretty stable. However some more performance analysis andtuning is needed. More information can be found in [1] and [2].As a reminder, the development has been happening in the branch -hbase-10070 (https://github.com/apache/hbase/tree/hbase-10070). We havebeen pretty diligent about more than one committer's +1 on the branchcommits and for almost all the subtasks in HBASE-10070 there is more thanone +1 except for test fix issues or more trivial issues, where there maybe only one +1. Enis/Nicolas/Sergey/Devaraj/Nick are the main contributorsof code in the phase-1 and many patches have been reviewed already bypeople outsidethis group (mainly Stack, Jimmy)

For Phase 2, we think that we can deliver on the proposed designincrementally over the next couple of months. However, we think that itmight be ok to merge the changes from phase 1 first, then do acommit-as-you-go approach for remaining items. Therefore, we would like topropose to merge the branch to trunk, and continue the work over there.This might also result in more reviews.

Alternatively, we can continue the work in the branch, and do the merge atthe end of Phase 2, but that will make the review process a bit moretricky, which is why we would like the merge sooner.

CONFIDENTIALITY NOTICENOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

I would be in favor of a merge to trunk, but next week some time after allissues slated for the 0.98.3 release have been committed through trunk.Otherwise the rebasing, new review, and probable new test failures wouldmean the RC misses the deadline significantly.On Wednesday, May 21, 2014, Enis Söztutar <[EMAIL PROTECTED]> wrote:

On Thu, May 22, 2014 at 9:21 AM, Jimmy Xiang <[EMAIL PROTECTED]> wrote:Sorry this fell through cracks. The code is rolling restart compatible. Theonly limitation is that you should not createa new table with region replicas >1 until the rolling restart is complete.If so, the region replicas cannot be opened and keepbouncing (region name cannot be parsed), but it still should not causeissues.

Thanks Stack. On the testing, we have been adding unit tests for allpatches. We have also tested the patches quite extensively on real clustersvia IT tests (existing and newly added) and issues discovered have beenfixed. AFAICT, and from the tests run, this feature shouldn't destabilizethe mainline. There are some limitations when region-replica is used -better integration with hbck (HBASE-10674) and support for split/merge (andthese are being worked on as part of phase 2). On May 23, 2014 9:56 PM, "Stack" <[EMAIL PROTECTED]> wrote:CONFIDENTIALITY NOTICENOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

Agreed with Devaraj. I do not think that these changes will destabilize thecode base much. All of the code committed to branch has been unit testedand integration tests, both specific to region replicas and others havebeen running for some time.

For testing the feature itself, HBASE-10572, HBASE-10616, and HBASE-10818adds tests for get/multi-get/scan and bulk load which has been running withCM. HBASE-10817 adds a test for region replicas + replication. HBASE-10791changes PerformanceEvaluation to be able to work with region replicas.Nicolas is also working on further perf testing and improvements.

For 1.0, I think it should be fine to include this as an experimentalfeature. Otherwise, it will be very hard to add this to the 1.x line.

Are there any more concerns? If not, I would like to raise a VOTE soon.

I took the tip of HBASE-10070 for a spin just now. It started up fine overexisting dataset. At first it threw me off because it is missing trunkfeatures but after I got over that, it looked fine. No strange errors inlogs using my config from pre-HBASE-10070 startup. Thats good.

Running a pure randomread test on a small cluster I seem to see a slightslowdown (14.2k/second vs 13.7k/second out of bucketcache). You seen that? Just wondering.