> Waiting on HBASE-9612 jenkins build but starting in making a new RC. It> takes a few hours if all goes well. Please no commits on 0.96 branch till> the all clear is sounded. Thanks.>>All clear, but please only important bug fixes for 0.96 branch; nothingthat might destabilize. If you do commit one, mark it fixed in version0.96.1.Thanks,St.Ack

Thanks Stack for doing this. We had a lot of churn between RC3 and 4 (newmodules etc). Agreed that we should easy on the risky patches even if thisRC fails.

EnisOn Sat, Oct 5, 2013 at 5:00 PM, Stack <[EMAIL PROTECTED]> wrote:

> On Fri, Oct 4, 2013 at 2:20 PM, Stack <[EMAIL PROTECTED]> wrote:>> > Waiting on HBASE-9612 jenkins build but starting in making a new RC. It> > takes a few hours if all goes well. Please no commits on 0.96 branch> till> > the all clear is sounded. Thanks.> >> >> All clear, but please only important bug fixes for 0.96 branch; nothing> that might destabilize. If you do commit one, mark it fixed in version> 0.96.1.> Thanks,> St.Ack>

> Thanks Stack for doing this. We had a lot of churn between RC3 and 4 (new> modules etc). Agreed that we should easy on the risky patches even if this> RC fails.>

Yeah. Sorry about that. We put out a bunch of development releases butdownstreamers only seemed to have started paying attention now we are in RCstate. The addition of the new test module was to make hbase have hadoops'form so downstreamers could depend on our test tools/cluster explicitly andall dependencies would get pulled in (mvn resolve is wonky for the *-testjars).

> On Sat, Oct 5, 2013 at 6:01 PM, Enis Söztutar <[EMAIL PROTECTED]> wrote:>> > Thanks Stack for doing this. We had a lot of churn between RC3 and 4 (new> > modules etc). Agreed that we should easy on the risky patches even if> this> > RC fails.> >>> Yeah. Sorry about that. We put out a bunch of development releases but> downstreamers only seemed to have started paying attention now we are in RC> state.That's because if you have data you care about you don't want to startplaying with it until the project team says "this is what you are about toget".

Non-storage related projects can get faster pickup, but even there it'slate beta before you get the flood of bugreps

> The addition of the new test module was to make hbase have hadoops'> form so downstreamers could depend on our test tools/cluster explicitly and> all dependencies would get pulled in (mvn resolve is wonky for the *-test> jars).>> St.Ack>

-- CONFIDENTIALITY NOTICENOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

just a check to make sure i am pulling down the right version from staging:what is the sha1 of the latest RC?On 6 October 2013 01:00, Stack <[EMAIL PROTECTED]> wrote:

> On Fri, Oct 4, 2013 at 2:20 PM, Stack <[EMAIL PROTECTED]> wrote:>> > Waiting on HBASE-9612 jenkins build but starting in making a new RC. It> > takes a few hours if all goes well. Please no commits on 0.96 branch> till> > the all clear is sounded. Thanks.> >> >> All clear, but please only important bug fixes for 0.96 branch; nothing> that might destabilize. If you do commit one, mark it fixed in version> 0.96.1.> Thanks,> St.Ack>

-- CONFIDENTIALITY NOTICENOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

> just a check to make sure i am pulling down the right version from staging:> what is the sha1 of the latest RC?>>>You saw the *.mds files up herehttp://people.apache.org/~stack/hbase-0.96.0RC4/? SHA1 is in them? (Ijust compared to what I have here down on build box here).

Does that answer your question mighty Steve?

St.Ack

> On 6 October 2013 01:00, Stack <[EMAIL PROTECTED]> wrote:>> > On Fri, Oct 4, 2013 at 2:20 PM, Stack <[EMAIL PROTECTED]> wrote:> >> > > Waiting on HBASE-9612 jenkins build but starting in making a new RC.> It> > > takes a few hours if all goes well. Please no commits on 0.96 branch> > till> > > the all clear is sounded. Thanks.> > >> > >> > All clear, but please only important bug fixes for 0.96 branch; nothing> > that might destabilize. If you do commit one, mark it fixed in version> > 0.96.1.> > Thanks,> > St.Ack> >>> --> CONFIDENTIALITY NOTICE> NOTICE: This message is intended for the use of the individual or entity to> which it is addressed and may contain information that is confidential,> privileged and exempt from disclosure under applicable law. If the reader> of this message is not the intended recipient, you are hereby notified that> any printing, copying, dissemination, distribution, disclosure or> forwarding of this communication is strictly prohibited. If you have> received this communication in error, please contact the sender immediately> and delete it from your system. Thank You.>

-- CONFIDENTIALITY NOTICENOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

You know what they say, sixth time is the charm.EnisOn Tue, Oct 8, 2013 at 2:43 PM, Enis Söztutar <[EMAIL PROTECTED]> wrote:

> HEADS UP:> I think we may have to sink this one. Our tests ITBLL and ITLAV with CM> fails consistently, and we suspect a problem with HBASE-9612 (although not> confirmed yet)>> More details are coming soon after more digging into logs.> Enis>>> On Mon, Oct 7, 2013 at 1:46 PM, Stack <[EMAIL PROTECTED]> wrote:>>> On Mon, Oct 7, 2013 at 10:05 AM, Steve Loughran <[EMAIL PROTECTED]>> >wrote:>>>> > go>> >>> > well, those are the .gz files, not the JARs, I'll have to download and>> > check...>> >>> >>> Give me list of jars you want a sha for and I'll run them for you against>> the build RC (and publish it)>>>>>>>> > BTW, there's a new Hadoop RC out in staging: 2.2.0>> >>> >>> > The RC is available at:>> > http://people.apache.org/~acmurthy/hadoop-2.2.0-rc0>> > The RC tag in svn is here:>> > http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.2.0-rc0>>>>>> Yeah. Hopefully our RC works against it (I didn't try it).>>>> St.Ack>>>>

It would be really nice to avoid committing large changes to 96 henceforthbefore we have the RC5. Otherwise it would never stabilize at this rate.On Wed, Oct 9, 2013 at 10:45 AM, Enis Söztutar <[EMAIL PROTECTED]> wrote:

-- CONFIDENTIALITY NOTICENOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

> On Wed, Oct 9, 2013 at 10:45 AM, Enis Söztutar <[EMAIL PROTECTED]> wrote:>> > I just committed https://issues.apache.org/jira/browse/HBASE-9730 for> > this.> > Time for another RC, what do you think?> >> > You know what they say, sixth time is the charm.>>>> I can cut one no problem. Just say.>> Does your test rig pass? Ours hasn't yet because of HBASE-9563; master is> killed and won't come back though restarted and tests fail.>> Do we want HBASE-9696 in there? It is currently under test.>> And HBASE-9724 Failed region split is not handled correctly by AM?>> But if you fellas need me to put up a new one, just say. Just takes a few> hours.>> St.Ack>

-- CONFIDENTIALITY NOTICENOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

> 9696 looks a little bit scary... did you guys test it on your rig?>>> On Wed, Oct 9, 2013 at 11:54 AM, Stack <[EMAIL PROTECTED]> wrote:>> > On Wed, Oct 9, 2013 at 10:45 AM, Enis Söztutar <[EMAIL PROTECTED]> wrote:> >> > > I just committed https://issues.apache.org/jira/browse/HBASE-9730 for> > > this.> > > Time for another RC, what do you think?> > >> > > You know what they say, sixth time is the charm.> >> >> >> > I can cut one no problem. Just say.> >> > Does your test rig pass? Ours hasn't yet because of HBASE-9563; master> is> > killed and won't come back though restarted and tests fail.> >> > Do we want HBASE-9696 in there? It is currently under test.> >> > And HBASE-9724 Failed region split is not handled correctly by AM?> >> > But if you fellas need me to put up a new one, just say. Just takes a> few> > hours.> >> > St.Ack> >>> --> CONFIDENTIALITY NOTICE> NOTICE: This message is intended for the use of the individual or entity to> which it is addressed and may contain information that is confidential,> privileged and exempt from disclosure under applicable law. If the reader> of this message is not the intended recipient, you are hereby notified that> any printing, copying, dissemination, distribution, disclosure or> forwarding of this communication is strictly prohibited. If you have> received this communication in error, please contact the sender immediately> and delete it from your system. Thank You.>

> It's testing now. :)>>> On Wed, Oct 9, 2013 at 12:42 PM, Sergey Shelukhin <[EMAIL PROTECTED]> >wrote:>> > 9696 looks a little bit scary... did you guys test it on your rig?> >> >> > On Wed, Oct 9, 2013 at 11:54 AM, Stack <[EMAIL PROTECTED]> wrote:> >> > > On Wed, Oct 9, 2013 at 10:45 AM, Enis Söztutar <[EMAIL PROTECTED]>> wrote:> > >> > > > I just committed https://issues.apache.org/jira/browse/HBASE-9730for> > > > this.> > > > Time for another RC, what do you think?> > > >> > > > You know what they say, sixth time is the charm.> > >> > >> > >> > > I can cut one no problem. Just say.> > >> > > Does your test rig pass? Ours hasn't yet because of HBASE-9563; master> > is> > > killed and won't come back though restarted and tests fail.> > >> > > Do we want HBASE-9696 in there? It is currently under test.> > >> > > And HBASE-9724 Failed region split is not handled correctly by AM?> > >> > > But if you fellas need me to put up a new one, just say. Just takes a> > few> > > hours.> > >> > > St.Ack> > >> >> > --> > CONFIDENTIALITY NOTICE> > NOTICE: This message is intended for the use of the individual or entity> to> > which it is addressed and may contain information that is confidential,> > privileged and exempt from disclosure under applicable law. If the reader> > of this message is not the intended recipient, you are hereby notified> that> > any printing, copying, dissemination, distribution, disclosure or> > forwarding of this communication is strictly prohibited. If you have> > received this communication in error, please contact the sender> immediately> > and delete it from your system. Thank You.> >>

-- CONFIDENTIALITY NOTICENOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

At this point I think that we should have real clean IT test runs beforecutting another release. And we can't really get that until the masteralways comes back up (The issue stack was working on yesterday) and untilmerging is stable. I would like to see those two things fixed before 0.96On Wed, Oct 9, 2013 at 1:38 PM, Devaraj Das <[EMAIL PROTECTED]> wrote:

I am not sure I agree with this though. The reason being - HBASE-9696 wasraised on Saturday and we have cut an RC after that. So why not another onenow? For the 0.96.0 version, can we not say that "merge" should be usedwith caution. Also, it is not guaranteed that we will not face any new ITissues after 9696 goes in, right?Let's cut 0.96.0 now and fix remaining issues in 0.96.1. Thoughts?On Wed, Oct 9, 2013 at 2:17 PM, Elliott Clark <[EMAIL PROTECTED]> wrote:

-- CONFIDENTIALITY NOTICENOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

I prefer to have 9696 in. It's not just about merging. I am also trying tomake sure splitting is good. Currently, if a region is splitting, the twodaughters are wrote to meta at first. CM could move them around beforemaster knows about these two new regions. So they could be double-assignedfor a short while. It could be a cause why ITBLL still shows data losssomewhere.

I think we should make sure ITBLL runs well with no data loss before werelease 0.96.0. Data loss is a big concern to me.On Wed, Oct 9, 2013 at 2:33 PM, Devaraj Das <[EMAIL PROTECTED]> wrote:

On Wed, Oct 9, 2013 at 2:33 PM, Devaraj Das <[EMAIL PROTECTED]> wrote:>> For the 0.96.0 version, can we not say that "merge" should be used> with caution.I would feel very uncomfortable with that. Telling people to justhope that the servers don't crash while a merge is going on seems likean unwise strategy. Crashing or power failures are completely beyondusers control. Since we have a proposed fix it seems better to me,that we hold off on this. Get the tests done. Then get the patch in,and start another round of testing.

Also the master not coming back up, while not a known data loss issuelike 9696 is very concerning. We should get to the bottom of this.It's making TestMTTR fail, along with others sporadically.

We've taken > 1.5 years on this release and we're on the home stretch. We should make sure this is a really stable and quality release andnot try and rush it. Right now we're failing IT tests left and right. We can't even pass an ingest test that lasts 4 hours. That'ssomething I can't see myself recommending to anyone in it's currentstate. So that seems to me something that we shouldn't release. Andif we put up an RC now then we just know that it's going to fail ITtests and so will probably be a failed RC.

I want this release out as badly as anyone else but I'd rather we havesomething that people can really and truly trust and not justsomething we have rushed.

> I prefer to have 9696 in. It's not just about merging. I am also trying> to make sure splitting is good. Currently, if a region is splitting, the> two daughters are wrote to meta at first. CM could move them around before> master knows about these two new regions. So they could be double-assigned> for a short while. It could be a cause why ITBLL still shows data loss> somewhere.>

Is this really the case? If client learns the daughter regions from meta,before master learns about the split, even if they callHMaster.moveRegion(), they would get UnknownRegionException, no?

HBASE-9563 is already committed to 0.96. That leaves only HBASE-9696 andHBASE-9724 under discussion. I am holding on committing 9724 for the timebeing. Are there any more issues that might be a blocker against thisrelease?

After 1.5 years without a major release, and the RC process nearing 40days, I think we should only accept absolute blockers at this point. As faras I am concerned, neither 9724 nor 9696 is a blocker against 0.96. Mergeis a new feature, and nothing critical depends on it. We can release sayingthat merge is experimental (which was how it originally introduced, AFAIK)and disable merge in CM for now if it makes tests flaky. We did notidentify a root cause that would point to 9696 although we are runningtests with CM for some time. We can still fix the merge and do a quick0.96.1, in the release train model that proved to be so successful for0.94. We do not have to delay 0.96 another month just because to fix acorner case for a new feature.

As per our testing, we have been testing the 95 and 96 branches for acouple of months. We still see some sporadic failures for CM tests, but noblockers at this point. Most of the issues have been fixed so far. Ournightlies run ITTBLL, ITLAV, both with and without CM running for ~3 hours,ITMTTR, and many other IT's. My manual runs for longer intervals alsosucceeds for now. Remember that none of these IT's would run even once forearlier versions of 0.94 or before.

Ellliot, what are the root causes for the failures you are seeing? Thereare no blockers raised as far as I can see. Let's decide on HBASE-9696whether it is a blocker, and do the new candidate based on that unlessthere are more blockers.

> On Wed, Oct 9, 2013 at 2:33 PM, Devaraj Das <[EMAIL PROTECTED]> wrote:> >> > For the 0.96.0 version, can we not say that "merge" should be used> > with caution.>>> I would feel very uncomfortable with that. Telling people to just> hope that the servers don't crash while a merge is going on seems like> an unwise strategy. Crashing or power failures are completely beyond> users control. Since we have a proposed fix it seems better to me,> that we hold off on this. Get the tests done. Then get the patch in,> and start another round of testing.>> Also the master not coming back up, while not a known data loss issue> like 9696 is very concerning. We should get to the bottom of this.> It's making TestMTTR fail, along with others sporadically.>> We've taken > 1.5 years on this release and we're on the home stretch.> We should make sure this is a really stable and quality release and> not try and rush it. Right now we're failing IT tests left and right.> We can't even pass an ingest test that lasts 4 hours. That's> something I can't see myself recommending to anyone in it's current> state. So that seems to me something that we shouldn't release. And> if we put up an RC now then we just know that it's going to fail IT> tests and so will probably be a failed RC.>> I want this release out as badly as anyone else but I'd rather we have> something that people can really and truly trust and not just> something we have rushed.>

> HBASE-9563 is already committed to 0.96. That leaves only HBASE-9696 and> HBASE-9724 under discussion. I am holding on committing 9724 for the time> being. Are there any more issues that might be a blocker against this> release?>>As mentioned above, HBASE-9563 makes it so our hbase-it suite does notcomplete. We've not had a successful run with weeks on our end. Thisissue is our current stumbling block. Let me designate it a blocker whilewe are digging and discussing. You fellas are not running into this?

HBASE-9563 is trivial enough and it is already in 0.96. We may have runthat into some point, but not lately. Do you see your tests succeeding withHBASE-9563 and HBASE-9696?On Wed, Oct 9, 2013 at 5:13 PM, Stack <[EMAIL PROTECTED]> wrote:

> On Wed, Oct 9, 2013 at 4:51 PM, Enis Söztutar <[EMAIL PROTECTED]> wrote:>> > HBASE-9563 is already committed to 0.96. That leaves only HBASE-9696 and> > HBASE-9724 under discussion. I am holding on committing 9724 for the time> > being. Are there any more issues that might be a blocker against this> > release?> >> >> As mentioned above, HBASE-9563 makes it so our hbase-it suite does not> complete. We've not had a successful run with weeks on our end. This> issue is our current stumbling block. Let me designate it a blocker while> we are digging and discussing. You fellas are not running into this?>> Thanks,> St.Ack>

> HBASE-9563 is trivial enough and it is already in 0.96. We may have run> that into some point, but not lately. Do you see your tests succeeding with> HBASE-9563 and HBASE-9696?>>Both are under test in independent rigs. For HBASE-9563, we are trying torepro the clash of the masters to see if the patch helped. We've alsoinstrumented the rig so we can get more data when we hit the hang again.

Anyways, if you fellas can't wait anymore, just say and we'll figure outsomething.

> Anyways, if you fellas can't wait anymore, just say and we'll figure outsomething.As I see it, HBASE-9563 is committed, and HBASE-9696 is not a blockeragainst 0.96. But if you argue that 9696 is indeed a blocker, let's raiseit as such. There is no point in creating an RC, an immediately sinking itif we cannot verify the RC for a +1. We don't run into data loss issuesanymore which is why I still think we can release 0.96 even without 9696and 9724. Nothing is preventing us to release 0.96.1, with this and morefixes in let's say a couple of weeks or months.

I guess let's wait for tomorrow to see whether there is any progress on9563 and 9696.

EnisOn Wed, Oct 9, 2013 at 5:55 PM, Stack <[EMAIL PROTECTED]> wrote:

> On Wed, Oct 9, 2013 at 5:30 PM, Enis Söztutar <[EMAIL PROTECTED]> wrote:>>> HBASE-9563 is trivial enough and it is already in 0.96. We may have run>> that into some point, but not lately. Do you see your tests succeeding>> with>> HBASE-9563 and HBASE-9696?>>>>> Both are under test in independent rigs. For HBASE-9563, we are trying to> repro the clash of the masters to see if the patch helped. We've also> instrumented the rig so we can get more data when we hit the hang again.>> Anyways, if you fellas can't wait anymore, just say and we'll figure out> something.>

> > Anyways, if you fellas can't wait anymore, just say and we'll figure out> something.> As I see it, HBASE-9563 is committed,It is still open and committed with qualification "Stack:...Was going totry this first but likely needs more..." and "Elliott: +1 I think it's animprovement even if it doesn't 100% fix the master issue."

> and HBASE-9696 is not a blocker> against 0.96. But if you argue that 9696 is indeed a blocker, let's raise> it as such.

Agree.

> There is no point in creating an RC, an immediately sinking it> if we cannot verify the RC for a +1. We don't run into data loss issues> anymore which is why I still think we can release 0.96 even without 9696> and 9724. Nothing is preventing us to release 0.96.1, with this and more> fixes in let's say a couple of weeks or months.>> I guess let's wait for tomorrow to see whether there is any progress on> 9563 and 9696.>

Yes. Lets take this up tomorrow. Elliott and I are on the master issue,HBASE-9563, this evening.

To me it feels like HBASE-9724 should go into 0.96. We're not releasing anew rc tonight, it seems really weird to hold up a bug fix to try and hitsome unknown, and un-agreed upon, deadline.On Wed, Oct 9, 2013 at 8:08 PM, Stack <[EMAIL PROTECTED]> wrote:

> On Wed, Oct 9, 2013 at 7:14 PM, Enis Söztutar <[EMAIL PROTECTED]> wrote:>>> > Anyways, if you fellas can't wait anymore, just say and we'll figure out>> something.>> As I see it, HBASE-9563 is committed,>>> It is still open and committed with qualification "Stack:...Was going to> try this first but likely needs more..." and "Elliott: +1 I think it's> an improvement even if it doesn't 100% fix the master issue.">>>>> and HBASE-9696 is not a blocker>> against 0.96. But if you argue that 9696 is indeed a blocker, let's raise>> it as such.>>>> Agree.>>>>> There is no point in creating an RC, an immediately sinking it>> if we cannot verify the RC for a +1. We don't run into data loss issues>> anymore which is why I still think we can release 0.96 even without 9696>> and 9724. Nothing is preventing us to release 0.96.1, with this and more>> fixes in let's say a couple of weeks or months.>>>> I guess let's wait for tomorrow to see whether there is any progress on>> 9563 and 9696.>>>> Yes. Lets take this up tomorrow. Elliott and I are on the master issue,> HBASE-9563, this evening.>> Thanks Enis,> St.Ack>

It looks like HBASE-9724 got committed. Was it the final patch for it? It'sa small and hopefully safe.

If patch is large and risky, and the feature it fixes is semi-experimental,like HBASE-9696, IMHO it should not be blocker for the release.The concern is that we keep making large changes to AM that fix some bugsbut may introduce more bugs (like it happened with the last one), so it'shard to tell when it will stabilize at all.

> To me it feels like HBASE-9724 should go into 0.96. We're not releasing a> new rc tonight, it seems really weird to hold up a bug fix to try and hit> some unknown, and un-agreed upon, deadline.>>> On Wed, Oct 9, 2013 at 8:08 PM, Stack <[EMAIL PROTECTED]> wrote:>> > On Wed, Oct 9, 2013 at 7:14 PM, Enis Söztutar <[EMAIL PROTECTED]> wrote:> >> >> > Anyways, if you fellas can't wait anymore, just say and we'll figure> out> >> something.> >> As I see it, HBASE-9563 is committed,> >> >> > It is still open and committed with qualification "Stack:...Was going to> > try this first but likely needs more..." and "Elliott: +1 I think it's> > an improvement even if it doesn't 100% fix the master issue."> >> >> >> >> and HBASE-9696 is not a blocker> >> against 0.96. But if you argue that 9696 is indeed a blocker, let's> raise> >> it as such.> >> >> >> > Agree.> >> >> >> >> There is no point in creating an RC, an immediately sinking it> >> if we cannot verify the RC for a +1. We don't run into data loss issues> >> anymore which is why I still think we can release 0.96 even without 9696> >> and 9724. Nothing is preventing us to release 0.96.1, with this and more> >> fixes in let's say a couple of weeks or months.> >>> >> I guess let's wait for tomorrow to see whether there is any progress on> >> 9563 and 9696.> >>> >> > Yes. Lets take this up tomorrow. Elliott and I are on the master issue,> > HBASE-9563, this evening.> >> > Thanks Enis,> > St.Ack> >>

-- CONFIDENTIALITY NOTICENOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

Can we agree if the IT tests are green for a certain number of runs in arow, then it's stable?On Thu, Oct 10, 2013 at 10:08 AM, Sergey Shelukhin<[EMAIL PROTECTED]>wrote:

> It looks like HBASE-9724 got committed. Was it the final patch for it? It's> a small and hopefully safe.>> If patch is large and risky, and the feature it fixes is semi-experimental,> like HBASE-9696, IMHO it should not be blocker for the release.> The concern is that we keep making large changes to AM that fix some bugs> but may introduce more bugs (like it happened with the last one), so it's> hard to tell when it will stabilize at all.>>>> On Wed, Oct 9, 2013 at 9:22 PM, Elliott Clark <[EMAIL PROTECTED]> wrote:>> > To me it feels like HBASE-9724 should go into 0.96. We're not releasing> a> > new rc tonight, it seems really weird to hold up a bug fix to try and hit> > some unknown, and un-agreed upon, deadline.> >> >> > On Wed, Oct 9, 2013 at 8:08 PM, Stack <[EMAIL PROTECTED]> wrote:> >> > > On Wed, Oct 9, 2013 at 7:14 PM, Enis Söztutar <[EMAIL PROTECTED]> wrote:> > >> > >> > Anyways, if you fellas can't wait anymore, just say and we'll figure> > out> > >> something.> > >> As I see it, HBASE-9563 is committed,> > >> > >> > > It is still open and committed with qualification "Stack:...Was going> to> > > try this first but likely needs more..." and "Elliott: +1 I think it's> > > an improvement even if it doesn't 100% fix the master issue."> > >> > >> > >> > >> and HBASE-9696 is not a blocker> > >> against 0.96. But if you argue that 9696 is indeed a blocker, let's> > raise> > >> it as such.> > >> > >> > >> > > Agree.> > >> > >> > >> > >> There is no point in creating an RC, an immediately sinking it> > >> if we cannot verify the RC for a +1. We don't run into data loss> issues> > >> anymore which is why I still think we can release 0.96 even without> 9696> > >> and 9724. Nothing is preventing us to release 0.96.1, with this and> more> > >> fixes in let's say a couple of weeks or months.> > >>> > >> I guess let's wait for tomorrow to see whether there is any progress> on> > >> 9563 and 9696.> > >>> > >> > > Yes. Lets take this up tomorrow. Elliott and I are on the master> issue,> > > HBASE-9563, this evening.> > >> > > Thanks Enis,> > > St.Ack> > >> >>> --> CONFIDENTIALITY NOTICE> NOTICE: This message is intended for the use of the individual or entity to> which it is addressed and may contain information that is confidential,> privileged and exempt from disclosure under applicable law. If the reader> of this message is not the intended recipient, you are hereby notified that> any printing, copying, dissemination, distribution, disclosure or> forwarding of this communication is strictly prohibited. If you have> received this communication in error, please contact the sender immediately> and delete it from your system. Thank You.>

> We're not releasing a new rc tonight, it seems really weird to hold up abug fix to try and hit some unknown, and un-agreed upon, deadline.There is no deadline, but we cannot hold the RC for non-blocker patchesespecially once the release candidate process has begun. We are not goingto solve every bug in existence (look at HBASE-9721 for example) in HBase.In the usual case for a release, once an RC is cut, you do not want todestabilize by adding risky patches and continue on this kind cat and mousegame. Note that 9696 did not hold up the cut for the previous RC, whyshould it hold it now?

> To me it feels like HBASE-9724 should go into 0.96I am fine with committing HBASE-9724. I did not commit that yesterday sinceI though we should be doing an RC and I did not want last minute fixes toAM.

> To me it feels like HBASE-9724 should go into 0.96. We're not releasing a> new rc tonight, it seems really weird to hold up a bug fix to try and hit> some unknown, and un-agreed upon, deadline.>>> On Wed, Oct 9, 2013 at 8:08 PM, Stack <[EMAIL PROTECTED]> wrote:>>> On Wed, Oct 9, 2013 at 7:14 PM, Enis Söztutar <[EMAIL PROTECTED]> wrote:>>>>> > Anyways, if you fellas can't wait anymore, just say and we'll figure>>> out>>> something.>>> As I see it, HBASE-9563 is committed,>>>>>> It is still open and committed with qualification "Stack:...Was going to>> try this first but likely needs more..." and "Elliott: +1 I think it's>> an improvement even if it doesn't 100% fix the master issue.">>>>>>>>> and HBASE-9696 is not a blocker>>> against 0.96. But if you argue that 9696 is indeed a blocker, let's raise>>> it as such.>>>>>>>> Agree.>>>>>>>>> There is no point in creating an RC, an immediately sinking it>>> if we cannot verify the RC for a +1. We don't run into data loss issues>>> anymore which is why I still think we can release 0.96 even without 9696>>> and 9724. Nothing is preventing us to release 0.96.1, with this and more>>> fixes in let's say a couple of weeks or months.>>>>>> I guess let's wait for tomorrow to see whether there is any progress on>>> 9563 and 9696.>>>>>>> Yes. Lets take this up tomorrow. Elliott and I are on the master issue,>> HBASE-9563, this evening.>>>> Thanks Enis,>> St.Ack>>>>

>Can we agree if the IT tests are green for a certain number of runs in arow, then it's stable?

What do you mean by IT tests are green? Ours are mostly green lately(except for recently fixed bugs).Can you please share some investigation details? Maybe file bugs withdescription of symptoms, like logs and stuff; are you sure you are hitting9696 in particular?9696 is a very big patch too, it can introduce more bugs and will requiremore fixing.We do need to have some deadline where large/risky changes cannot go imho.

> Can we agree if the IT tests are green for a certain number of runs in a> row, then it's stable?>>> On Thu, Oct 10, 2013 at 10:08 AM, Sergey Shelukhin> <[EMAIL PROTECTED]>wrote:>> > It looks like HBASE-9724 got committed. Was it the final patch for it?> It's> > a small and hopefully safe.> >> > If patch is large and risky, and the feature it fixes is> semi-experimental,> > like HBASE-9696, IMHO it should not be blocker for the release.> > The concern is that we keep making large changes to AM that fix some bugs> > but may introduce more bugs (like it happened with the last one), so it's> > hard to tell when it will stabilize at all.> >> >> >> > On Wed, Oct 9, 2013 at 9:22 PM, Elliott Clark <[EMAIL PROTECTED]> wrote:> >> > > To me it feels like HBASE-9724 should go into 0.96. We're not> releasing> > a> > > new rc tonight, it seems really weird to hold up a bug fix to try and> hit> > > some unknown, and un-agreed upon, deadline.> > >> > >> > > On Wed, Oct 9, 2013 at 8:08 PM, Stack <[EMAIL PROTECTED]> wrote:> > >> > > > On Wed, Oct 9, 2013 at 7:14 PM, Enis Söztutar <[EMAIL PROTECTED]>> wrote:> > > >> > > >> > Anyways, if you fellas can't wait anymore, just say and we'll> figure> > > out> > > >> something.> > > >> As I see it, HBASE-9563 is committed,> > > >> > > >> > > > It is still open and committed with qualification "Stack:...Was going> > to> > > > try this first but likely needs more..." and "Elliott: +1 I think> it's> > > > an improvement even if it doesn't 100% fix the master issue."> > > >> > > >> > > >> > > >> and HBASE-9696 is not a blocker> > > >> against 0.96. But if you argue that 9696 is indeed a blocker, let's> > > raise> > > >> it as such.> > > >> > > >> > > >> > > > Agree.> > > >> > > >> > > >> > > >> There is no point in creating an RC, an immediately sinking it> > > >> if we cannot verify the RC for a +1. We don't run into data loss> > issues> > > >> anymore which is why I still think we can release 0.96 even without> > 9696> > > >> and 9724. Nothing is preventing us to release 0.96.1, with this and> > more> > > >> fixes in let's say a couple of weeks or months.> > > >>> > > >> I guess let's wait for tomorrow to see whether there is any progress> > on> > > >> 9563 and 9696.> > > >>> > > >> > > > Yes. Lets take this up tomorrow. Elliott and I are on the master> > issue,> > > > HBASE-9563, this evening.> > > >> > > > Thanks Enis,> > > > St.Ack> > > >> > >> >> > --> > CONFIDENTIALITY NOTICE> > NOTICE: This message is intended for the use of the individual or entity> to> > which it is addressed and may contain information that is confidential,> > privileged and exempt from disclosure under applicable law. If the reader> > of this message is not the intended recipient, you are hereby notified> that> > any printing, copying, dissemination, distribution, disclosure or> > forwarding of this communication is strictly prohibited. If you have> > received this communication in error, please contact the sender> immediately> > and delete it from your system. Thank You.> >>

-- CONFIDENTIALITY NOTICENOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

> >Can we agree if the IT tests are green for a certain number of runs in a> row, then it's stable?>> What do you mean by IT tests are green? Ours are mostly green lately> (except for recently fixed bugs).> Can you please share some investigation details? Maybe file bugs with> description of symptoms, like logs and stuff; are you sure you are hitting> 9696 in particular?>

We've been trying to keep up HBASE-9696 w/ ongoing notes. We should dobetter for sure but big picture is that we have evidence that what is inHBASE-9696 is an improvement over what we have now having had two sustainedruns w/o data loss. The fix is needed so we can do long-running hbase-itsuites; w/o it we were just crash-landing a few hours in.> 9696 is a very big patch too, it can introduce more bugs and will require> more fixing.> We do need to have some deadline where large/risky changes cannot go imho.>>>Agree but after reviews, I do not know how to avoid it (see 9696 and its RB)

I suggest we commit hbase-9696 as is since it an incompatible change withits introduction of two new states, states that we do not seem to be ableto do without. Then I cut an RC. If further issue in 9696, we can finetune/bug-fix post release.

On another note, a rig run that has been going for almost 24 hours has gonefurther than any run of the last few weeks. That is good.

> On Thu, Oct 10, 2013 at 6:39 PM, Sergey Shelukhin <[EMAIL PROTECTED]>wrote:>>> >Can we agree if the IT tests are green for a certain number of runs in a>> row, then it's stable?>>>> What do you mean by IT tests are green? Ours are mostly green lately>> (except for recently fixed bugs).>> Can you please share some investigation details? Maybe file bugs with>> description of symptoms, like logs and stuff; are you sure you are hitting>> 9696 in particular?>>>> We've been trying to keep up HBASE-9696 w/ ongoing notes. We should do> better for sure but big picture is that we have evidence that what is in

CONFIDENTIALITY NOTICENOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.