On May 2, 2011, at 10:58 AM, Doug Cutting wrote:> The patch selection process for this branch did not appear to be a> community process. A massive patch set was committed en-masse with no> public discussion before or after about its specific composition.

Lets review:

# You proposed to release off the Yahoo security patchset first in April, 2010: http://s.apache.org/5Gv# I started this discussion again in Jan, 2011: http://s.apache.org/uf# We went through several iterations: - I first committed a jumbo patch upon which some reservations were expressed. - Owen went ahead and broke them up to commit individual patches to incorporate the provided feedback.# Roy clearly clarified the way forward: http://s.apache.org/tD4(which Owen has since incorporatedk by breaking into individual patches).

Your current stance given the history, is surprising, to say the least... we have already discussed this. It is clear that the community (including downstream Apache projects like Pig, Hive and HCatalog) will substantially benefit from an Apache release of this improved codebase.

Roy Fielding wrote:> a) break the changes down into a sequence of patches, create jira> issues for each one (or append to the existing issue), and then> provide the group with a list of the issue links so that people> can quickly +1 each one. When it seems worthwhile to you, create> a branch off of some prior Apache release point in svn and commit> each patch to it until the branch is identical to (or, in your own> opinion, better than) the source code that you have tested locally.> Then RM a tarball and start a release vote. Since all of this is> being done in jira and svn, others can help you do all but the> first part (breaking down the big patch).> > or> > b) create a branch off of some prior Apache release point in svn> and replay the internal Y! commits on that branch until the branch> source code is identical to what you have tested locally. Then> RM a tarball based on that branch and start a release vote.> Since the history is now in svn, others could do the RM bit if> you don't have time.> > or> > c) create a branch off of some prior Apache release point in svn> and apply one big ugly patch to it. Then RM a tarball based> on that branch and ask for a release vote.> > You will note that none of the above requires a discussion on this> list prior to the release vote, though (a) would likely result in> more +1s than (b), and (b) would likely receive more +1s than (c).> Regardless, the release vote is a lazy majority decision.>> [ ... ]>> When the release vote happens, encourage folks to test and +1> the release. If it passes, woohoo! If not, then listen to the> reasons given by the other PMC members and see if you can make> enough changes to the release to get those extra +1s.

I believe that Owen chose (b). We're now at the release vote and I am aPMC member giving reasons for my vote.

Also note that, on the common-dev thread, Eli & Tom have both noted anumber of inconsistencies between this set of patches and trunk, 0.22and even prior 0.20 branches and releases. In addition to the lack ofcommunity involvement in patch selection, these issues concern me.

I cannot in good conscience vote for this release as a community product.

On May 2, 2011, at 1:40 PM, Doug Cutting wrote:>> Also note that, on the common-dev thread, Eli & Tom have both noted a> number of inconsistencies between this set of patches and trunk, 0.22> and even prior 0.20 branches and releases. In addition to the lack of> community involvement in patch selection, these issues concern me.>> I cannot in good conscience vote for this release as a community > product.

As I noted before you were the first one to propose this release off Yahoo security patch-set in April, 2010:http://s.apache.org/5Gv

What has changed since? Clearly, the same situation exists today.

Also, please note that of the ~450 commits in the branch, only 30 odd jiras are yet to be committed to trunk:http://s.apache.org/7Pe. So it's incorrect to state 'lack of community involvement'.

Assuming the technical inconsistencies are sorted out, are you willing to withdraw you objection?

On 05/02/2011 02:05 PM, Arun C Murthy wrote:> As I noted before you were the first one to propose this release off> Yahoo security patch-set in April, 2010:> http://s.apache.org/5Gv> > What has changed since? Clearly, the same situation exists today.

I have absolutely no objection in principle to an Apache 0.20 releaseincluding security. I object to the fact that this patchset startedfrom an arbitrary point and unilaterally applied a large set of patchesthat are not well correlated with Jira, trunk or other 0.20 branches.

> Also, please note that of the ~450 commits in the branch, only 30 odd> jiras are yet to be committed to trunk:> http://s.apache.org/7Pe. So it's incorrect to state 'lack of community> involvement'.

This should be easily discoverable from Jira: issues should use the"fix-for" field to indicate which branches they've been merged to. Thisstandard practice has not been observed for over 400 patches included inthis release candidate.

> Assuming the technical inconsistencies are sorted out, are you willing> to withdraw you objection?

These are not just technical concerns. How I vote on any future releasecandidate will in part depend on how the community is involved in itsproduction.

> On 05/02/2011 02:05 PM, Arun C Murthy wrote:>> As I noted before you were the first one to propose this release off>> Yahoo security patch-set in April, 2010:>> http://s.apache.org/5Gv>>>> What has changed since? Clearly, the same situation exists today.>> I have absolutely no objection in principle to an Apache 0.20 release> including security. I object to the fact that this patchset started> from an arbitrary point and unilaterally applied a large set of > patches> that are not well correlated with Jira, trunk or other 0.20 branches.

Completely untrue.

This patchset started from 0.20.1 has is complete superset of 0.20.1.

We will work towards ensuring it is a complete superset of the last stable release: 0.20.2.

>>> Also, please note that of the ~450 commits in the branch, only 30 odd>> jiras are yet to be committed to trunk:>> http://s.apache.org/7Pe. So it's incorrect to state 'lack of >> community>> involvement'.>> This should be easily discoverable from Jira: issues should use the> "fix-for" field to indicate which branches they've been merged to. > This> standard practice has not been observed for over 400 patches > included in> this release candidate.>

This seems like parliamentary stalling procedures... sure they don't have 'fix-for' fields but they've been verified to be true from external committers:

Are you simply asking for someone to go through the 450 odd jiras and set 'fix-for' fields?

>> Assuming the technical inconsistencies are sorted out, are you >> willing>> to withdraw you objection?>> These are not just technical concerns. How I vote on any future > release> candidate will in part depend on how the community is involved in its> production.>

I understand they aren't technical concerns.

I asked if you were willing to withdraw your objection if the technical concerns are satisfied. I think you answered my question - you will not withdraw your objection even if it's a technical issue.

On 05/02/2011 02:33 PM, Arun C Murthy wrote:> On May 2, 2011, at 2:21 PM, Doug Cutting wrote:>> I have absolutely no objection in principle to an Apache 0.20 release>> including security. I object to the fact that this patchset started>> from an arbitrary point and unilaterally applied a large set of patches>> that are not well correlated with Jira, trunk or other 0.20 branches.> > Completely untrue.

'Completely'? Really? Not a true bit in there? Wow!

> This patchset started from 0.20.1 has is complete superset of 0.20.1.

0.20.1 isn't a branch, it's a tag. The 0.20 branch includes manypost-0.20.1 patches that are not in this candidate. Releases in aseries normally share a branch.

> I asked if you were willing to withdraw your objection if the technical> concerns are satisfied. I think you answered my question - you will not> withdraw your objection even if it's a technical issue.

That is not what I said. If this release does not get enough votes thenperhaps another 0.20.203 release candidate will be proposed. Itsprocess and contents will be different and I will judge it on the basisof those when I vote.

>> On May 3, 2011, at 7:33 AM, Arun C Murthy wrote:>>>>> This patchset started from 0.20.1 has is complete superset of 0.20.1.>>>> We will work towards ensuring it is a complete superset of the last >> stable release: 0.20.2.>> so are you intending to make it a superset for 203? or for a future > release?

On 05/02/2011 02:33 PM, Arun C Murthy wrote:> Are you simply asking for someone to go through the 450 odd jiras and> set 'fix-for' fields?

Every other release we've made is well-correlated with Jira. It shouldnot be difficult to achieve that for this one. We could write a scriptto take all 450 bug IDs from the change log and use Jira's command-linetool to set the "fix-for" to be this 0.20+security release. Would youlike help with that?

> On 05/02/2011 02:33 PM, Arun C Murthy wrote:>> Are you simply asking for someone to go through the 450 odd jiras and>> set 'fix-for' fields?> > Every other release we've made is well-correlated with Jira. It should> not be difficult to achieve that for this one. We could write a script> to take all 450 bug IDs from the change log and use Jira's command-line> tool to set the "fix-for" to be this 0.20+security release. Would you> like help with that?>

On 05/03/2011 06:01 PM, Arun C Murthy wrote:> On May 3, 2011, at 5:17 PM, "Doug Cutting" <[EMAIL PROTECTED]> wrote:> >> On 05/02/2011 02:33 PM, Arun C Murthy wrote:>>> Are you simply asking for someone to go through the 450 odd jiras and>>> set 'fix-for' fields?>>>> Every other release we've made is well-correlated with Jira. It should>> not be difficult to achieve that for this one. We could write a script>> to take all 450 bug IDs from the change log and use Jira's command-line>> tool to set the "fix-for" to be this 0.20+security release. Would you>> like help with that?>>> > Yes please, that would be great. Thanks!

Please find below a script that will add a fix-version to issues.

Doug

#!/bin/bash

# reads bug ids from standard input# and adds the fixVersion named on command line

Most points in this thread are valid, having to do with the process of how the contribution was assembled; and specific technical aspects of it, e.g. JIRAs missing from branch 0.20.203 relative to branch 0.20. However,

> From: Doug Cutting <[EMAIL PROTECTED]>> > Assuming the technical inconsistencies are sorted out,> > are you willing to withdraw you objection?> > These are not just technical concerns. How I vote on any future> release candidate will in part depend on how the community is> involved in its production.

What strikes me, as an observer to this discussion, is that here "community" does not seem equated with Yahoo by implication. Perhaps I misread. Nevertheless, Yahoo retains a good percentage of active Core developers with standing as both committers and high scale users, and these people produced the contribution that is branch 0.20.203, and therefore by definition "the community" was entirely involved in its production.

Yahoo should be commended for advancing the state of branch 0.20 with an obvious commitment to donating the results to Apache. As a community we are lucky to have a strong contributor. Their security enhancements allow us and many others the option of strong authentication and user isolation for multitenant deployments.

A commercial vendor's product already incorporates Yahoo's donated security enhancements. It would be regrettable if nontechnical factors ultimately prevents Apache from incorporating the value of these contributions into an official release.

Some technical concerns seem reasonable. Regarding that:

> From: Stack <[EMAIL PROTECTED]>> How hard would it be to get the patches Tom lists below into> branch-0.20-security-203? I'd think it'd be an easier> sell if it were a superset of all in 0.20, especially since it> bears its name.

This suggestion makes a lot of sense. In addition, filing JIRAs for and posting the diffs of the remaining differences could help the process as well, and would be good faith actions of an active contributor.

> It would be regrettable if> nontechnical factors ultimately prevents Apache from> incorporating the value of these contributions into an> official release.

To:

It would be regrettable if nontechnical factors ultimately prevents Apache from incorporating the value of these contributions into an official release OF 0.20. There are some not yet ready to take the leap to 0.22; who do not consider it proven.

So in this regard I do not wish to minimize concerns about distracting from the success of 0.22 or later releases.

> Most points in this thread are valid,> having to do with the process of how the contribution was> assembled; and specific technical aspects of it, e.g. JIRAs> missing from branch 0.20.203 relative to branch 0.20.> However,> > > > From: Doug Cutting <[EMAIL PROTECTED]>> > > Assuming the technical inconsistencies are sorted> > > out, are you willing to withdraw you objection?> > > > These are not just technical concerns. How I vote on> > any future release candidate will in part depend on how> > the community is involved in its production.> > What strikes me, as an observer to this discussion, is that> here "community" does not seem equated with Yahoo by> implication. Perhaps I misread. Nevertheless, Yahoo retains> a good percentage of active Core developers with standing as> both committers and high scale users, and these people> produced the contribution that is branch 0.20.203, and> therefore by definition "the community" was entirely> involved in its production.> > Yahoo should be commended for advancing the state of branch> 0.20 with an obvious commitment to donating the results to> Apache. As a community we are lucky to have a strong> contributor. Their security enhancements allow us and many> others the option of strong authentication and user> isolation for multitenant deployments. > > A commercial vendor's product already incorporates Yahoo's> donated security enhancements. It would be regrettable if> nontechnical factors ultimately prevents Apache from> incorporating the value of these contributions into an> official release.> > Some technical concerns seem reasonable. Regarding that:> > > From: Stack <[EMAIL PROTECTED]>> > How hard would it be to get the patches Tom lists> below into> > branch-0.20-security-203? I'd think it'd be an> easier> > sell if it were a superset of all in 0.20, especially> since it> > bears its name.> > This suggestion makes a lot of sense. In addition, filing> JIRAs for and posting the diffs of the remaining differences> could help the process as well, and would be good faith> actions of an active contributor.> > Best regards,> > - Andy> > Problems worthy of attack prove their worth by hitting> back. - Piet Hein (via Tom White)> >

On May 2, 2011, at 3:05 PM, Andrew Purtell wrote:> Some technical concerns seem reasonable. Regarding that:>>> From: Stack <[EMAIL PROTECTED]>>> How hard would it be to get the patches Tom lists below into>> branch-0.20-security-203? I'd think it'd be an easier>> sell if it were a superset of all in 0.20, especially since it>> bears its name.>> This suggestion makes a lot of sense. In addition, filing JIRAs for > and posting the diffs of the remaining differences could help the > process as well, and would be good faith actions of an active > contributor.>

Agreed, I'm starting the effort to ensure the differences from 0.20.2 are resolved.

On Mon, May 2, 2011 at 3:15 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote:>> On May 2, 2011, at 3:05 PM, Andrew Purtell wrote:>>>> Some technical concerns seem reasonable. Regarding that:>>>>> From: Stack <[EMAIL PROTECTED]>>>> How hard would it be to get the patches Tom lists below into>>> branch-0.20-security-203? I'd think it'd be an easier>>> sell if it were a superset of all in 0.20, especially since it>>> bears its name.>>>> This suggestion makes a lot of sense. In addition, filing JIRAs for and>> posting the diffs of the remaining differences could help the process as>> well, and would be good faith actions of an active contributor.>>>> Agreed, I'm starting the effort to ensure the differences from 0.20.2 are> resolved.>> From my msg on this thread to common-dev@:>>> # Remaining for 0.20.203>> * HADOOP-5611>> * HADOOP-5612>> * HADOOP-5623>> * HDFS-596>> * HDFS-723>> * HDFS-732>> * HDFS-579>> * MAPREDUCE-1070>> * HADOOP-6315>> * MAPREDUCE-1163>> Suresh has kindly agreed to help me, appreciate help from others -> particularly on the 0.20.3 changes.>

On 05/02/2011 03:05 PM, Andrew Purtell wrote:> What strikes me, as an observer to this discussion, is that here> "community" does not seem equated with Yahoo by implication. Perhaps> I misread. Nevertheless, Yahoo retains a good percentage of active> Core developers with standing as both committers and high scale> users, and these people produced the contribution that is branch> 0.20.203, and therefore by definition "the community" was entirely> involved in its production.

Whether or not a subset of contributors acts as the community depends onwhether others outside that subset have a reasonable opportunity tobecome involved. Until this release vote was called it wasn't entirelyclear to all what was happening with these branches. Wider communityinvolvement is now starting as folks work to rationalize this 450-issuepatch with respect to past and future releases, Jira, etc.

> Yahoo should be commended for advancing the state of branch 0.20 with> an obvious commitment to donating the results to Apache. As a> community we are lucky to have a strong contributor. Their security> enhancements allow us and many others the option of strong> authentication and user isolation for multitenant deployments.

+1

> A commercial vendor's product already incorporates Yahoo's donated> security enhancements. It would be regrettable if nontechnical> factors ultimately prevents Apache from incorporating the value of> these contributions into an official release.

Doug Cutting <[EMAIL PROTECTED]> wrote:On 05/02/2011 03:05 PM, Andrew Purtell wrote:> What strikes me, as an observer to this discussion, is that here> "community" does not seem equated with Yahoo by implication. Perhaps> I misread. Nevertheless, Yahoo retains a good percentage of active> Core developers with standing as both committers and high scale> users, and these people produced the contribution that is branch> 0.20.203, and therefore by definition "the community" was entirely> involved in its production.

Whether or not a subset of contributors acts as the community depends onwhether others outside that subset have a reasonable opportunity tobecome involved. Until this release vote was called it wasn't entirelyclear to all what was happening with these branches. Wider communityinvolvement is now starting as folks work to rationalize this 450-issuepatch with respect to past and future releases, Jira, etc.

> Yahoo should be commended for advancing the state of branch 0.20 with> an obvious commitment to donating the results to Apache. As a> community we are lucky to have a strong contributor. Their security> enhancements allow us and many others the option of strong> authentication and user isolation for multitenant deployments.

+1

> A commercial vendor's product already incorporates Yahoo's donated> security enhancements. It would be regrettable if nontechnical> factors ultimately prevents Apache from incorporating the value of> these contributions into an official release.

I think its a good idea to release hadoop-0.20.203. It moves Apache Hadoop astep forward.

Looks like the technical difficulties are resolved now with latest Arun'scommits.Being a superset of hadoop-0.20.2 it can be considered based on one of theofficial Apache releases.I don't think there was a lack of discussions on the lists about the issuesincluded in the release candidate. Todd did a thorough review of the entiresecurity branch. Many developers participated in discussions.Agreeing with Stack I wish HBase was considered a primary target for Hadoopsupport. But it is not realistic to have it in hadoop-0.20.203.I have some experience running a version of this release candidate on alarge cluster. It works. I would add a couple of patches, which make it runon Windows for me like HADOOP-7110, HADOOP-7126. But those are not blockers.

>> On May 3, 2011, at 9:58 AM, Arun C Murthy wrote:>> >>> >> Owen, Suresh and I have committed everything on this list except> >> HADOOP-6386 and HADOOP-6428. Not sure which of the two are relevant/> >> necessary, I'll check with Cos. Other than that hadoop-0.20.203 now a> >> superset of hadoop-0.20.2.> >>> >> > Missed adding HADOOP-5759 to that list, I'll check with Amareshwari> before committing.> >> > Arun>> Thanks for doing this so fast Arun.>>

I think we still need to incorporate the patches currently checkedinto branch 0.20. For example, Owen identified a major bug(BooleanWritable's comparator is broken) and filed a jira(HADOOP-6928) to put it in branch-0.20, where I reviewed it andchecked it in, so this bug would be fixed in the next stable release.However this change is not in branch-0.20-security-203. Unless we putthe delta from branch-0.20 into this release, it is missing importantbug fixes that will cause it to regress against 20.3 (if it ever isreleased).

I am also nervous about changes like the one identified byHADOOP-7255. It looks like this change caused a significant regressionin TestDFSIO throughput. It changes the core Task class, the commitlog is a single line, and as far as I can tell it was not discussed orreviewed by anyone in the community. Don't changes like this at leastdeserve a jira before we release them?

Thanks,Eli

On Tue, May 3, 2011 at 1:39 AM, Konstantin Shvachko<[EMAIL PROTECTED]> wrote:> I think its a good idea to release hadoop-0.20.203. It moves Apache Hadoop a> step forward.>> Looks like the technical difficulties are resolved now with latest Arun's> commits.> Being a superset of hadoop-0.20.2 it can be considered based on one of the> official Apache releases.> I don't think there was a lack of discussions on the lists about the issues> included in the release candidate. Todd did a thorough review of the entire> security branch. Many developers participated in discussions.> Agreeing with Stack I wish HBase was considered a primary target for Hadoop> support. But it is not realistic to have it in hadoop-0.20.203.> I have some experience running a version of this release candidate on a> large cluster. It works. I would add a couple of patches, which make it run> on Windows for me like HADOOP-7110, HADOOP-7126. But those are not blockers.>> Thanks,> --Konstantin>>> On Mon, May 2, 2011 at 5:12 PM, Ian Holsman <[EMAIL PROTECTED]> wrote:>>>>> On May 3, 2011, at 9:58 AM, Arun C Murthy wrote:>>>> >>>> >> Owen, Suresh and I have committed everything on this list except>> >> HADOOP-6386 and HADOOP-6428. Not sure which of the two are relevant/>> >> necessary, I'll check with Cos. Other than that hadoop-0.20.203 now a>> >> superset of hadoop-0.20.2.>> >>>> >>> > Missed adding HADOOP-5759 to that list, I'll check with Amareshwari>> before committing.>> >>> > Arun>>>> Thanks for doing this so fast Arun.>>>>>

Just to gauge what amount of stuff is in branch-0.20-security-203 I wrote aquick script which does a comparison based on JIRAs mention in the commitlog. It output the following list of JIRAs that are in the branch but notcommitted to trunk. I've marked many as N/A meaning that they don't apply totrunk:

Certainly some of these are new test cases, benchmark improvements, orsystem tests. But many others are large new features (e.g new metricsframework, separate JobHistory service). Others also introduce newconfigurations (eg new JT based limits). In the list above there are 58 thatseem to be applicable, probably at least half of which are non-test code.

This list above doesn't include 192 patches that were committed to thebranch without reference to any JIRA in the commit message:todd@todd-w510:~/git/hadoop-common$ for x in $(git rev-listorigin/branch-0.20..origin/branch-0.20-security-203 -- src) ; do git log -n1$x | egrep -q '(MAPREDUCE|HDFS|HADOOP)[-:][0-9]+' || echo $x ; done | wc -l192Browsing through these, many have already been forward ported, or at leasthad corresponding JIRAs opened. But it's very difficult to match them up andevaluate which ones have been committed. Eli pointed out one earlier thisweek that was done by a non-committer with no public review that introducedan apparent performance regression; it's difficult to know whether theremight be others as well.

Rather than being a "maintenance release" (as is usually expected whenincrementing the third component of a version string) this is essentially aseparate trunk off of 0.20. I agree that the advancements in this branch aremany, and are a great set of contributions for the community. User limitsand security are two such that have been cited in this thread; unfortunatelythe new improvements in limits haven't been committed to trunk, and thesecurity in trunk has a known root exploit. Do users really want to see usputting these things in 20 without making sure they'll also show up infuture releases?

Looking at recent history of 204 it seems some more patches have gone inthere before going into trunk as well - for example MR-2429. Arun and Sidare working on forward-porting it, and it's obviously not due to any kind ofbad intent that it was missed, but it underscores the dangers of havingessentially two trunks in ASF. I completely agree that there should belong-term maintenance branches at the ASF, but we need to establish a clearprocess to make sure that "maintenance" doesn't diverge into something else.

Here are two requests that others have made but I haven't seen an answer toyet:

- Document the criteria by which developers can judge whether an improvementshould be included in branch-0.20-security. The inclusion criteria for thebranch as it stands is not clear -- given this branch's lineage, it clearlyused to be "things important for the Yahoo clusters", but that doesn't seemlike a reasonable community criterion. Up until now in Hadoop's history, thecriteria has always been "compatible bug fixes only", which doesn't describethis branch either.

- Clearly establish the process that all patches must either be committed totrunk first (and then backported), or include a comment on the JIRAexplaining why this is not necessary. Additionally we should decide whetherpatches must be backported "through" 0.22 or if they may skip back fromtrunk to 20-security. (I'm assuming 21 is dead here)

Perhaps this could go on a wiki page (or web site page) regarding thecurrently active branches?

I am in favor of releasing hadoop-0.20.203.And we run a version of this release on a large cluster at eBay. I know itworks.I understand the controversy behind it. I regret it hasn't been developed ina true community way.I think it nevertheless adds value to Apache Hadoop.Lets just make sure it passes the tests.

> I think its a good idea to release hadoop-0.20.203. It moves Apache Hadoop> a step forward.>> Looks like the technical difficulties are resolved now with latest Arun's> commits.> Being a superset of hadoop-0.20.2 it can be considered based on one of the> official Apache releases.> I don't think there was a lack of discussions on the lists about the issues> included in the release candidate. Todd did a thorough review of the entire> security branch. Many developers participated in discussions.> Agreeing with Stack I wish HBase was considered a primary target for Hadoop> support. But it is not realistic to have it in hadoop-0.20.203.> I have some experience running a version of this release candidate on a> large cluster. It works. I would add a couple of patches, which make it run> on Windows for me like HADOOP-7110, HADOOP-7126. But those are not blockers.>> Thanks,> --Konstantin>>> On Mon, May 2, 2011 at 5:12 PM, Ian Holsman <[EMAIL PROTECTED]> wrote:>>>>> On May 3, 2011, at 9:58 AM, Arun C Murthy wrote:>>>> >>>> >> Owen, Suresh and I have committed everything on this list except>> >> HADOOP-6386 and HADOOP-6428. Not sure which of the two are relevant/>> >> necessary, I'll check with Cos. Other than that hadoop-0.20.203 now a>> >> superset of hadoop-0.20.2.>> >>>> >>> > Missed adding HADOOP-5759 to that list, I'll check with Amareshwari>> before committing.>> >>> > Arun>>>> Thanks for doing this so fast Arun.>>>>>

Congrats anad thanks to all the developers for such passion and hard workthey they put into a very important project.

+1 for getting new 0.20 release out with security. It is great news forusers for that new apache release is finally coming out. I hope most of thetechnical as well as non-technical issues will resolved this or nextrelease.

There are many important issues raised in this thread and these are crucialfor future of Apache Hadoop. wanted to echo one of them in particular :community is the most important aspect of the project and triumphs over therest.I am sure the process would be much smoother going forward as we have morefrequent releases. This is probably the first real test for thedevelop-a-large-feature-on-a-branch-and-merge process. Discussion here wouldcertainly lead to important improvements to the process.

Looking at from a different angle, Hadoop has a very enviable problem :there is so much development it is very hard to co-ordinate and scale. Ithas already scalled up a few times before, and with the leaders it has, itis doing it again.

Here's an updated release candidate for 0.20.203.0. I've incorporated the feedback and included all of the patches from 0.20.2, which is the last stable release. I also fixed the eclipse-plugin problem.

On Wed, May 4, 2011 at 10:31 AM, Owen O'Malley <[EMAIL PROTECTED]> wrote:> Here's an updated release candidate for 0.20.203.0. I've incorporated the feedback and included all of the patches from 0.20.2, which is the last stable release. I also fixed the eclipse-plugin problem.>> The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/>> Please download it, inspect it, compile it, and test it. Clearly, I'm +1.>> -- Owen

Hey Owen,

Thanks for incorporating all the feedback and additional changes. It'sgreat that this release won't be a regression against our previousstable release.

I would like to call out that we are not just voting to adopt aparticular release, we are starting a new version scheme for theproject, doing new feature development on maintenance release branches(before trunk), and we're saying it's OK to release software thathasn't been reviewed by the community.

I'd like to hear from our development community not just that we wantto do a release from this branch but that we want to adopt these otherchanges as well. Here's a summary of the major *remaining* issues anda recommendation on how to proceed:

1. There are about ~50 changes that have jiras that are committed tothe branch that are not yet in trunk. The next release (0.22) will bea regression against this release, with respect to these particularchanges. Recomendation: we should get these changes in trunk beforereleasing so that new features do not show up in maintenace branchesfirst.

2. There are 192 patches that were committed to the branch withoutreference to any Jira in the commit message. Some of these may havealready been forward ported, but it is very difficult to match them upand evaluate which ones have been committed. Some are troublesome,when spot checking the commits I found some that have been done bynon-committers with no public review that introduced an apparentperformance regressions (eg see HADOOP-7255). Recommendation: weshould update the commit log to make sure there is a jira for eachissue, and all changes have been reviewed/committed. This is the waywe've always done releases.

3. The new versioning scheme major.minor.point.X the new "X" componentallows for new feature development on point releases. Recomendation:we should discuss in a separate thread whether we want to do newfeature development on maintenance branches and if so to adopt thisnew version scheme.

This is a release vote, let's stay focused. On this thread I think appropriate responses are either

+1 and some short commentary (assuming you've tried it and it works)

or

-1 and some short commentary. It would also be cool if you noted if you've tried it.

----

In the spirit of my feedback, I'll respond to this under another subject.

Thanks,

E14

On May 4, 2011, at 12:17 PM, Eli Collins wrote:

> On Wed, May 4, 2011 at 10:31 AM, Owen O'Malley <[EMAIL PROTECTED]> wrote:>> Here's an updated release candidate for 0.20.203.0. I've incorporated the feedback and included all of the patches from 0.20.2, which is the last stable release. I also fixed the eclipse-plugin problem.>> >> The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/>> >> Please download it, inspect it, compile it, and test it. Clearly, I'm +1.>> >> -- Owen> > Hey Owen,> > Thanks for incorporating all the feedback and additional changes. It's> great that this release won't be a regression against our previous> stable release.> > I would like to call out that we are not just voting to adopt a> particular release, we are starting a new version scheme for the> project, doing new feature development on maintenance release branches> (before trunk), and we're saying it's OK to release software that> hasn't been reviewed by the community.> > I'd like to hear from our development community not just that we want> to do a release from this branch but that we want to adopt these other> changes as well. Here's a summary of the major *remaining* issues and> a recommendation on how to proceed:> > 1. There are about ~50 changes that have jiras that are committed to> the branch that are not yet in trunk. The next release (0.22) will be> a regression against this release, with respect to these particular> changes. Recomendation: we should get these changes in trunk before> releasing so that new features do not show up in maintenace branches> first.> > 2. There are 192 patches that were committed to the branch without> reference to any Jira in the commit message. Some of these may have> already been forward ported, but it is very difficult to match them up> and evaluate which ones have been committed. Some are troublesome,> when spot checking the commits I found some that have been done by> non-committers with no public review that introduced an apparent> performance regressions (eg see HADOOP-7255). Recommendation: we> should update the commit log to make sure there is a jira for each> issue, and all changes have been reviewed/committed. This is the way> we've always done releases.> > 3. The new versioning scheme major.minor.point.X the new "X" component> allows for new feature development on point releases. Recomendation:> we should discuss in a separate thread whether we want to do new> feature development on maintenance branches and if so to adopt this> new version scheme.> > Thanks,> Eli

> Here's an updated release candidate for 0.20.203.0. I've incorporated the feedback and included all of the patches from 0.20.2, which is the last stable release. I also fixed the eclipse-plugin problem. > > The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/> > Please download it, inspect it, compile it, and test it. Clearly, I'm +1. Am I misreading this, or are the MR protocols out of sync between 0.20.203 and 0.21? It would also appear that this is marked stable in 0.21. What is the user impact?

> Am I misreading this, or are the MR protocols out of sync between 0.20.203 and 0.21? It would also appear that this is marked stable in 0.21. What is the user impact?

The names of the protocols were changed, but the names of the protocols aren't user-facing. The protocols themselves also changed, as with all Hadoop major versions. (We need to switch to protobuf or something for RPC to provide wire compatibility.)

On Wed, May 4, 2011 at 10:31 AM, Owen O'Malley <[EMAIL PROTECTED]> wrote:> Here's an updated release candidate for 0.20.203.0. I've incorporated the feedback and included all of the patches from 0.20.2, which is the last stable release. I also fixed the eclipse-plugin problem.>> The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/>> Please download it, inspect it, compile it, and test it. Clearly, I'm +1.>> -- Owen

While rc2 is an improvement on rc1, I am -1 on this particular rc. Rationale:

This rc contains many patches not yet committed to trunk. This wouldcause the next major release (0.22) to be a feature regression againstour latest stable release (203), were 0.22 released soon.

This rc contains many patches not yet reviewed by the community viathe normal process (jira, patch against trunk, merge to a releasebranch). I think we should respect the existing community process thathas been used for all previous releases.

This rc introduces a new development and braching model (new featuredevelopment outside trunk) and Hadoop versioning scheme withoutsufficient discussion or proposal of these changes with the community.

We should establish new process before the release, a release is notthe appropriate mechanism for changing our review and developmentprocess or versioning .

I do support a release from branch-0.20-security that follows theexisting, established community process.

> On Wed, May 4, 2011 at 10:31 AM, Owen O'Malley <[EMAIL PROTECTED]> wrote:>> Here's an updated release candidate for 0.20.203.0. I've incorporated the>> feedback and included all of the patches from 0.20.2, which is the last>> stable release. I also fixed the eclipse-plugin problem.>> >> The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/>> >> Please download it, inspect it, compile it, and test it. Clearly, I'm +1.>> >> -- Owen> > While rc2 is an improvement on rc1, I am -1 on this particular rc. Rationale:> > This rc contains many patches not yet committed to trunk. This would> cause the next major release (0.22) to be a feature regression against> our latest stable release (203), were 0.22 released soon.> > This rc contains many patches not yet reviewed by the community via> the normal process (jira, patch against trunk, merge to a release> branch). I think we should respect the existing community process that> has been used for all previous releases.> > This rc introduces a new development and braching model (new feature> development outside trunk) and Hadoop versioning scheme without> sufficient discussion or proposal of these changes with the community.> > We should establish new process before the release, a release is not> the appropriate mechanism for changing our review and development> process or versioning .> > I do support a release from branch-0.20-security that follows the> existing, established community process.> > Thanks,> Eli

When we went through the 10x and 20x patches we only pulled a subsetof them, primarily for security and the general improvements that wethought were good. We found both incompatible changes and somesketchy changes that we did not pull in from a quality perspective.There is a big difference between a patch set that's acceptable forYahoo!'s user base and one that's a more general artifact.

When we evaluated the YDH patch sets we were using that frame of mind. I'm now looking it in terms of an Apache release. And the place toreview changes for an Apache release is on jira.

CDH3 is based on the latest stable Apache release (20.2) so it doesn'tregress against it. I'm nervous about rebasing future releases on 203because of the compatibility and quality implications.

Thanks,EliOn Wed, May 4, 2011 at 3:06 PM, Suresh Srinivas <[EMAIL PROTECTED]> wrote:> Eli,>> How many of these patches that you find troublesome are in CDH already?>> Regards,> Suresh>>> On 5/4/11 3:03 PM, "Eli Collins" <[EMAIL PROTECTED]> wrote:>>> On Wed, May 4, 2011 at 10:31 AM, Owen O'Malley <[EMAIL PROTECTED]> wrote:>>> Here's an updated release candidate for 0.20.203.0. I've incorporated the>>> feedback and included all of the patches from 0.20.2, which is the last>>> stable release. I also fixed the eclipse-plugin problem.>>>>>> The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/>>>>>> Please download it, inspect it, compile it, and test it. Clearly, I'm +1.>>>>>> -- Owen>>>> While rc2 is an improvement on rc1, I am -1 on this particular rc. Rationale:>>>> This rc contains many patches not yet committed to trunk. This would>> cause the next major release (0.22) to be a feature regression against>> our latest stable release (203), were 0.22 released soon.>>>> This rc contains many patches not yet reviewed by the community via>> the normal process (jira, patch against trunk, merge to a release>> branch). I think we should respect the existing community process that>> has been used for all previous releases.>>>> This rc introduces a new development and braching model (new feature>> development outside trunk) and Hadoop versioning scheme without>> sufficient discussion or proposal of these changes with the community.>>>> We should establish new process before the release, a release is not>> the appropriate mechanism for changing our review and development>> process or versioning .>>>> I do support a release from branch-0.20-security that follows the>> existing, established community process.>>>> Thanks,>> Eli>>

@Eli >> This rc contains many patches not yet committed to trunk.If you've compiled this list, can you post it?

On Wed, May 4, 2011 at 3:24 PM, Eli Collins <[EMAIL PROTECTED]> wrote:> With my Cloudera hat on..>> When we went through the 10x and 20x patches we only pulled a subset> of them, primarily for security and the general improvements that we> thought were good. We found both incompatible changes and some> sketchy changes that we did not pull in from a quality perspective.> There is a big difference between a patch set that's acceptable for> Yahoo!'s user base and one that's a more general artifact.>> When we evaluated the YDH patch sets we were using that frame of mind.> I'm now looking it in terms of an Apache release. And the place to> review changes for an Apache release is on jira.>> CDH3 is based on the latest stable Apache release (20.2) so it doesn't> regress against it. I'm nervous about rebasing future releases on 203> because of the compatibility and quality implications.>> Thanks,> Eli>>> On Wed, May 4, 2011 at 3:06 PM, Suresh Srinivas <[EMAIL PROTECTED]> wrote:>> Eli,>>>> How many of these patches that you find troublesome are in CDH already?>>>> Regards,>> Suresh>>>>>> On 5/4/11 3:03 PM, "Eli Collins" <[EMAIL PROTECTED]> wrote:>>>>> On Wed, May 4, 2011 at 10:31 AM, Owen O'Malley <[EMAIL PROTECTED]> wrote:>>>> Here's an updated release candidate for 0.20.203.0. I've incorporated the>>>> feedback and included all of the patches from 0.20.2, which is the last>>>> stable release. I also fixed the eclipse-plugin problem.>>>>>>>> The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/>>>>>>>> Please download it, inspect it, compile it, and test it. Clearly, I'm +1.>>>>>>>> -- Owen>>>>>> While rc2 is an improvement on rc1, I am -1 on this particular rc. Rationale:>>>>>> This rc contains many patches not yet committed to trunk. This would>>> cause the next major release (0.22) to be a feature regression against>>> our latest stable release (203), were 0.22 released soon.>>>>>> This rc contains many patches not yet reviewed by the community via>>> the normal process (jira, patch against trunk, merge to a release>>> branch). I think we should respect the existing community process that>>> has been used for all previous releases.>>>>>> This rc introduces a new development and braching model (new feature>>> development outside trunk) and Hadoop versioning scheme without>>> sufficient discussion or proposal of these changes with the community.>>>>>> We should establish new process before the release, a release is not>>> the appropriate mechanism for changing our review and development>>> process or versioning .>>>>>> I do support a release from branch-0.20-security that follows the>>> existing, established community process.>>>>>> Thanks,>>> Eli>>>>>

The list seems highly inaccurate. Checked the first few N/A items. All are false positives.

< HADOOP-6304 N/A -- fixed in trunk via HADOOP-7110 (Todd, it was fixed by you. Forgot?)< HADOOP-6598 N/A -- moved to HADOOP-6763 and committed to trunk< HADOOP-6653 N/A -- not applicable in trunk< HADOOP-6716 N/A -- as part of HADOOP-6815 which was committed to trunk< HADOOP-6718 N/A -- Incorporated in HADOOP-6706 for 0.22.< HADOOP-6776 N/A -- Tom White said "This is fixed in trunk, so can be closed."

> On May 4, 2011, at 4:09 PM, Tsz Wo (Nicholas), Sze wrote:>> The list seems highly inaccurate. Checked the first few N/A items. All>> are>> false positives.>>>>> Also, can you please provide a list on features which are not related to> gridmix benchmarks or herriot tests?>

Here are a few I quickly pulled up:MAPREDUCE-2316 (docs for improved capacity scheduler)MAPREDUCE-2355 (adds new config for heartbeat dampening in MR)

" BZ-4182948. Add statistics logging to Fred for better visibility intostartup time costs. (Matt Foley)"- I believe I saw a note from Matt on the JIRA yesterday about this feature,where he decided that the version done in 203 wasn't a good approach, andit's done differently in trunk (not sure if done yet).

MAPREDUCE-2364 (important bug fix for localization)- in fact most of localization is different in this branch compared to trunkdue to inclusion of MAPREDUCE-2378, the trunk version of which is still onthe "yahoo-merge" branch,.

"New cunters for FileInput/OutputFormat. New Counter MAP_OUTPUT_MATERIALZIED_BYTES. Related bugs: 4241034, 3418543,4217546"- not sure which JIRA this is, I think I've seen a JIRA for trunk, but notcommitted.

+ BZ4101537 . When a queue is built without any access rights we explainthe+ problem. (dking, rvw ramach) [attachment of 2010-11-24]seems to be on trunk as MR-2411, but not committed, best I can tell, despitethe JIRA there being resolved (based on looking at QueueManager in trunk)

Major new feature: MAPREDUCE-323 - very large rework of how job historyfiles are managedMajor change: MAPREDUCE-1100/MAPREDUCE-1176: unresolved on trunk, thoughprobably will be attacked by different JIRAsMajor new ops-visible feature: "metrics2" systemMajor new ops-visible feature: MAPREDUCE-291 job history can be viewed froma separate serverMajor new set of user-visible configurations: MAPREDUCE-1943 and friendswhich implement new limits in MapReduce (eg MAPREDUCE-1872 as well)

I have code to work on, so I won't keep going, but this is from looking atthe last couple months of 203.

Let's stay focused. Let's take the other threads onto other threads. This is a vote.

To the extent naming is a problem, let's take that to a thread and find an acceptable proposal.

To the extent folks want to collaborate on certifying the release for total lack of regression or collaborate on the cleanest possible merge, I think all interested parties should take these topics to another thread and divide up the work.

If you've voted, you don't need to comment further on this thread, no matter what company you work for!

Thanks,

---E14 - typing on glass

On May 4, 2011, at 4:46 PM, "Todd Lipcon" <[EMAIL PROTECTED]> wrote:

> On Wed, May 4, 2011 at 4:11 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote:> >> On May 4, 2011, at 4:09 PM, Tsz Wo (Nicholas), Sze wrote:>> >> The list seems highly inaccurate. Checked the first few N/A items. All>>> are>>> false positives.>>> >>> >> Also, can you please provide a list on features which are not related to>> gridmix benchmarks or herriot tests?>> > > Here are a few I quickly pulled up:> MAPREDUCE-2316 (docs for improved capacity scheduler)> MAPREDUCE-2355 (adds new config for heartbeat dampening in MR)> > " BZ-4182948. Add statistics logging to Fred for better visibility into> startup time costs. (Matt Foley)"> - I believe I saw a note from Matt on the JIRA yesterday about this feature,> where he decided that the version done in 203 wasn't a good approach, and> it's done differently in trunk (not sure if done yet).> > MAPREDUCE-2364 (important bug fix for localization)> - in fact most of localization is different in this branch compared to trunk> due to inclusion of MAPREDUCE-2378, the trunk version of which is still on> the "yahoo-merge" branch,.> > "New cunters for FileInput/OutputFormat. New Counter> MAP_OUTPUT_MATERIALZIED_BYTES. Related bugs: 4241034, 3418543,> 4217546"> - not sure which JIRA this is, I think I've seen a JIRA for trunk, but not> committed.> > - MAPREDUCE-1904, committed without JIRA as:> " . Reducing new Path(), RawFileStatus() creation overhead in> LocalDirAllocator"> not in trunk> > + BZ4101537 . When a queue is built without any access rights we explain> the> + problem. (dking, rvw ramach) [attachment of 2010-11-24]> seems to be on trunk as MR-2411, but not committed, best I can tell, despite> the JIRA there being resolved (based on looking at QueueManager in trunk)> > " . Remove unnecessary reference to user configuration from> TaskDistributedCacheManager causing memory leaks"> Not in trunk, not sure which JIRA it might be.. probably part of 2178.> > Major new feature: MAPREDUCE-323 - very large rework of how job history> files are managed> Major change: MAPREDUCE-1100/MAPREDUCE-1176: unresolved on trunk, though> probably will be attacked by different JIRAs> Major new ops-visible feature: "metrics2" system> Major new ops-visible feature: MAPREDUCE-291 job history can be viewed from> a separate server> Major new set of user-visible configurations: MAPREDUCE-1943 and friends> which implement new limits in MapReduce (eg MAPREDUCE-1872 as well)> > I have code to work on, so I won't keep going, but this is from looking at> the last couple months of 203.> > -Todd> -- > Todd Lipcon> Software Engineer, Cloudera

Good suggestion, it would be helpful to hash out the issues aroundcompatibility, feature branches, version numbers, how to contribute atApache before putting up new votes that would be helpful, ie the votewould go much smoother if all the issues with the previous vote wereaddressed before starting a new one.

Thanks,Eli

On Wed, May 4, 2011 at 5:05 PM, Eric Baldeschwieler<[EMAIL PROTECTED]> wrote:> Hi folks,>> Let's stay focused. Let's take the other threads onto other threads. This is a vote.>> To the extent naming is a problem, let's take that to a thread and find an acceptable proposal.>> To the extent folks want to collaborate on certifying the release for total lack of regression or collaborate on the cleanest possible merge, I think all interested parties should take these topics to another thread and divide up the work.>> If you've voted, you don't need to comment further on this thread, no matter what company you work for!>> Thanks,>> ---> E14 - typing on glass>> On May 4, 2011, at 4:46 PM, "Todd Lipcon" <[EMAIL PROTECTED]> wrote:>>> On Wed, May 4, 2011 at 4:11 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote:>>>>> On May 4, 2011, at 4:09 PM, Tsz Wo (Nicholas), Sze wrote:>>>>>> The list seems highly inaccurate. Checked the first few N/A items. All>>>> are>>>> false positives.>>>>>>>>>>> Also, can you please provide a list on features which are not related to>>> gridmix benchmarks or herriot tests?>>>>>>> Here are a few I quickly pulled up:>> MAPREDUCE-2316 (docs for improved capacity scheduler)>> MAPREDUCE-2355 (adds new config for heartbeat dampening in MR)>>>> " BZ-4182948. Add statistics logging to Fred for better visibility into>> startup time costs. (Matt Foley)">> - I believe I saw a note from Matt on the JIRA yesterday about this feature,>> where he decided that the version done in 203 wasn't a good approach, and>> it's done differently in trunk (not sure if done yet).>>>> MAPREDUCE-2364 (important bug fix for localization)>> - in fact most of localization is different in this branch compared to trunk>> due to inclusion of MAPREDUCE-2378, the trunk version of which is still on>> the "yahoo-merge" branch,.>>>> "New cunters for FileInput/OutputFormat. New Counter>> MAP_OUTPUT_MATERIALZIED_BYTES. Related bugs: 4241034, 3418543,>> 4217546">> - not sure which JIRA this is, I think I've seen a JIRA for trunk, but not>> committed.>>>> - MAPREDUCE-1904, committed without JIRA as:>> " . Reducing new Path(), RawFileStatus() creation overhead in>> LocalDirAllocator">> not in trunk>>>> + BZ4101537 . When a queue is built without any access rights we explain>> the>> + problem. (dking, rvw ramach) [attachment of 2010-11-24]>> seems to be on trunk as MR-2411, but not committed, best I can tell, despite>> the JIRA there being resolved (based on looking at QueueManager in trunk)>>>> " . Remove unnecessary reference to user configuration from>> TaskDistributedCacheManager causing memory leaks">> Not in trunk, not sure which JIRA it might be.. probably part of 2178.>>>> Major new feature: MAPREDUCE-323 - very large rework of how job history>> files are managed>> Major change: MAPREDUCE-1100/MAPREDUCE-1176: unresolved on trunk, though>> probably will be attacked by different JIRAs>> Major new ops-visible feature: "metrics2" system>> Major new ops-visible feature: MAPREDUCE-291 job history can be viewed from>> a separate server>> Major new set of user-visible configurations: MAPREDUCE-1943 and friends>> which implement new limits in MapReduce (eg MAPREDUCE-1872 as well)>>>> I have code to work on, so I won't keep going, but this is from looking at>> the last couple months of 203.>>>> -Todd>> -->> Todd Lipcon>> Software Engineer, Cloudera>

Eli, I think the intent from the email was to just vote on this thread,which I agree with. Discussions should be done in a separate threads. Hopefully we canall stick to just voting!

thanksmahadev

On Wed, May 4, 2011 at 5:22 PM, Eli Collins <[EMAIL PROTECTED]> wrote:> Good suggestion, it would be helpful to hash out the issues around> compatibility, feature branches, version numbers, how to contribute at> Apache before putting up new votes that would be helpful, ie the vote> would go much smoother if all the issues with the previous vote were> addressed before starting a new one.>> Thanks,> Eli>> On Wed, May 4, 2011 at 5:05 PM, Eric Baldeschwieler> <[EMAIL PROTECTED]> wrote:>> Hi folks,>>>> Let's stay focused. Let's take the other threads onto other threads. This is a vote.>>>> To the extent naming is a problem, let's take that to a thread and find an acceptable proposal.>>>> To the extent folks want to collaborate on certifying the release for total lack of regression or collaborate on the cleanest possible merge, I think all interested parties should take these topics to another thread and divide up the work.>>>> If you've voted, you don't need to comment further on this thread, no matter what company you work for!>>>> Thanks,>>>> --->> E14 - typing on glass>>>> On May 4, 2011, at 4:46 PM, "Todd Lipcon" <[EMAIL PROTECTED]> wrote:>>>>> On Wed, May 4, 2011 at 4:11 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote:>>>>>>> On May 4, 2011, at 4:09 PM, Tsz Wo (Nicholas), Sze wrote:>>>>>>>> The list seems highly inaccurate. Checked the first few N/A items. All>>>>> are>>>>> false positives.>>>>>>>>>>>>>> Also, can you please provide a list on features which are not related to>>>> gridmix benchmarks or herriot tests?>>>>>>>>>> Here are a few I quickly pulled up:>>> MAPREDUCE-2316 (docs for improved capacity scheduler)>>> MAPREDUCE-2355 (adds new config for heartbeat dampening in MR)>>>>>> " BZ-4182948. Add statistics logging to Fred for better visibility into>>> startup time costs. (Matt Foley)">>> - I believe I saw a note from Matt on the JIRA yesterday about this feature,>>> where he decided that the version done in 203 wasn't a good approach, and>>> it's done differently in trunk (not sure if done yet).>>>>>> MAPREDUCE-2364 (important bug fix for localization)>>> - in fact most of localization is different in this branch compared to trunk>>> due to inclusion of MAPREDUCE-2378, the trunk version of which is still on>>> the "yahoo-merge" branch,.>>>>>> "New cunters for FileInput/OutputFormat. New Counter>>> MAP_OUTPUT_MATERIALZIED_BYTES. Related bugs: 4241034, 3418543,>>> 4217546">>> - not sure which JIRA this is, I think I've seen a JIRA for trunk, but not>>> committed.>>>>>> - MAPREDUCE-1904, committed without JIRA as:>>> " . Reducing new Path(), RawFileStatus() creation overhead in>>> LocalDirAllocator">>> not in trunk>>>>>> + BZ4101537 . When a queue is built without any access rights we explain>>> the>>> + problem. (dking, rvw ramach) [attachment of 2010-11-24]>>> seems to be on trunk as MR-2411, but not committed, best I can tell, despite>>> the JIRA there being resolved (based on looking at QueueManager in trunk)>>>>>> " . Remove unnecessary reference to user configuration from>>> TaskDistributedCacheManager causing memory leaks">>> Not in trunk, not sure which JIRA it might be.. probably part of 2178.>>>>>> Major new feature: MAPREDUCE-323 - very large rework of how job history>>> files are managed>>> Major change: MAPREDUCE-1100/MAPREDUCE-1176: unresolved on trunk, though>>> probably will be attacked by different JIRAs>>> Major new ops-visible feature: "metrics2" system>>> Major new ops-visible feature: MAPREDUCE-291 job history can be viewed from>>> a separate server>>> Major new set of user-visible configurations: MAPREDUCE-1943 and friends>>> which implement new limits in MapReduce (eg MAPREDUCE-1872 as well)

The point is that these discussion should be sorted out, ie you don'tchange your development and release model on a release VOTE thread,you change it on a DISCUSSION thread.

Ie before we release this we should understand what that means. Whatis being proposed is not just another release from branch-0.20 orbranch-0.22.

Thanks,Eli

On Wed, May 4, 2011 at 5:30 PM, Mahadev Konar <[EMAIL PROTECTED]> wrote:> Eli,> I think the intent from the email was to just vote on this thread,> which I agree with.> Discussions should be done in a separate threads. Hopefully we can> all stick to just voting!>> thanks> mahadev>> On Wed, May 4, 2011 at 5:22 PM, Eli Collins <[EMAIL PROTECTED]> wrote:>> Good suggestion, it would be helpful to hash out the issues around>> compatibility, feature branches, version numbers, how to contribute at>> Apache before putting up new votes that would be helpful, ie the vote>> would go much smoother if all the issues with the previous vote were>> addressed before starting a new one.>>>> Thanks,>> Eli>>>> On Wed, May 4, 2011 at 5:05 PM, Eric Baldeschwieler>> <[EMAIL PROTECTED]> wrote:>>> Hi folks,>>>>>> Let's stay focused. Let's take the other threads onto other threads. This is a vote.>>>>>> To the extent naming is a problem, let's take that to a thread and find an acceptable proposal.>>>>>> To the extent folks want to collaborate on certifying the release for total lack of regression or collaborate on the cleanest possible merge, I think all interested parties should take these topics to another thread and divide up the work.>>>>>> If you've voted, you don't need to comment further on this thread, no matter what company you work for!>>>>>> Thanks,>>>>>> --->>> E14 - typing on glass>>>>>> On May 4, 2011, at 4:46 PM, "Todd Lipcon" <[EMAIL PROTECTED]> wrote:>>>>>>> On Wed, May 4, 2011 at 4:11 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote:>>>>>>>>> On May 4, 2011, at 4:09 PM, Tsz Wo (Nicholas), Sze wrote:>>>>>>>>>> The list seems highly inaccurate. Checked the first few N/A items. All>>>>>> are>>>>>> false positives.>>>>>>>>>>>>>>>>> Also, can you please provide a list on features which are not related to>>>>> gridmix benchmarks or herriot tests?>>>>>>>>>>>>> Here are a few I quickly pulled up:>>>> MAPREDUCE-2316 (docs for improved capacity scheduler)>>>> MAPREDUCE-2355 (adds new config for heartbeat dampening in MR)>>>>>>>> " BZ-4182948. Add statistics logging to Fred for better visibility into>>>> startup time costs. (Matt Foley)">>>> - I believe I saw a note from Matt on the JIRA yesterday about this feature,>>>> where he decided that the version done in 203 wasn't a good approach, and>>>> it's done differently in trunk (not sure if done yet).>>>>>>>> MAPREDUCE-2364 (important bug fix for localization)>>>> - in fact most of localization is different in this branch compared to trunk>>>> due to inclusion of MAPREDUCE-2378, the trunk version of which is still on>>>> the "yahoo-merge" branch,.>>>>>>>> "New cunters for FileInput/OutputFormat. New Counter>>>> MAP_OUTPUT_MATERIALZIED_BYTES. Related bugs: 4241034, 3418543,>>>> 4217546">>>> - not sure which JIRA this is, I think I've seen a JIRA for trunk, but not>>>> committed.>>>>>>>> - MAPREDUCE-1904, committed without JIRA as:>>>> " . Reducing new Path(), RawFileStatus() creation overhead in>>>> LocalDirAllocator">>>> not in trunk>>>>>>>> + BZ4101537 . When a queue is built without any access rights we explain>>>> the>>>> + problem. (dking, rvw ramach) [attachment of 2010-11-24]>>>> seems to be on trunk as MR-2411, but not committed, best I can tell, despite>>>> the JIRA there being resolved (based on looking at QueueManager in trunk)>>>>>>>> " . Remove unnecessary reference to user configuration from>>>> TaskDistributedCacheManager causing memory leaks">>>> Not in trunk, not sure which JIRA it might be.. probably part of 2178.>>>>>>>> Major new feature: MAPREDUCE-323 - very large rework of how job history

I tend to agree. Changing release model of Apache Hadoop train isn'tsomething that should be done in a hassle or as a part of releasevoting.

If these questions aren't addressed - let's postpone the vote anddiscuss all the complications or implications until they sorted out orthe consensus/compromise is reached.

Cos

On Wed, May 4, 2011 at 17:39, Eli Collins <[EMAIL PROTECTED]> wrote:> The point is that these discussion should be sorted out, ie you don't> change your development and release model on a release VOTE thread,> you change it on a DISCUSSION thread.>> Ie before we release this we should understand what that means. What> is being proposed is not just another release from branch-0.20 or> branch-0.22.>> Thanks,> Eli>> On Wed, May 4, 2011 at 5:30 PM, Mahadev Konar <[EMAIL PROTECTED]> wrote:>> Eli,>> I think the intent from the email was to just vote on this thread,>> which I agree with.>> Discussions should be done in a separate threads. Hopefully we can>> all stick to just voting!>>>> thanks>> mahadev>>>> On Wed, May 4, 2011 at 5:22 PM, Eli Collins <[EMAIL PROTECTED]> wrote:>>> Good suggestion, it would be helpful to hash out the issues around>>> compatibility, feature branches, version numbers, how to contribute at>>> Apache before putting up new votes that would be helpful, ie the vote>>> would go much smoother if all the issues with the previous vote were>>> addressed before starting a new one.>>>>>> Thanks,>>> Eli>>>>>> On Wed, May 4, 2011 at 5:05 PM, Eric Baldeschwieler>>> <[EMAIL PROTECTED]> wrote:>>>> Hi folks,>>>>>>>> Let's stay focused. Let's take the other threads onto other threads. This is a vote.>>>>>>>> To the extent naming is a problem, let's take that to a thread and find an acceptable proposal.>>>>>>>> To the extent folks want to collaborate on certifying the release for total lack of regression or collaborate on the cleanest possible merge, I think all interested parties should take these topics to another thread and divide up the work.>>>>>>>> If you've voted, you don't need to comment further on this thread, no matter what company you work for!>>>>>>>> Thanks,>>>>>>>> --->>>> E14 - typing on glass>>>>>>>> On May 4, 2011, at 4:46 PM, "Todd Lipcon" <[EMAIL PROTECTED]> wrote:>>>>>>>>> On Wed, May 4, 2011 at 4:11 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote:>>>>>>>>>>> On May 4, 2011, at 4:09 PM, Tsz Wo (Nicholas), Sze wrote:>>>>>>>>>>>> The list seems highly inaccurate. Checked the first few N/A items. All>>>>>>> are>>>>>>> false positives.>>>>>>>>>>>>>>>>>>>> Also, can you please provide a list on features which are not related to>>>>>> gridmix benchmarks or herriot tests?>>>>>>>>>>>>>>>> Here are a few I quickly pulled up:>>>>> MAPREDUCE-2316 (docs for improved capacity scheduler)>>>>> MAPREDUCE-2355 (adds new config for heartbeat dampening in MR)>>>>>>>>>> " BZ-4182948. Add statistics logging to Fred for better visibility into>>>>> startup time costs. (Matt Foley)">>>>> - I believe I saw a note from Matt on the JIRA yesterday about this feature,>>>>> where he decided that the version done in 203 wasn't a good approach, and>>>>> it's done differently in trunk (not sure if done yet).>>>>>>>>>> MAPREDUCE-2364 (important bug fix for localization)>>>>> - in fact most of localization is different in this branch compared to trunk>>>>> due to inclusion of MAPREDUCE-2378, the trunk version of which is still on>>>>> the "yahoo-merge" branch,.>>>>>>>>>> "New cunters for FileInput/OutputFormat. New Counter>>>>> MAP_OUTPUT_MATERIALZIED_BYTES. Related bugs: 4241034, 3418543,>>>>> 4217546">>>>> - not sure which JIRA this is, I think I've seen a JIRA for trunk, but not>>>>> committed.>>>>>>>>>> - MAPREDUCE-1904, committed without JIRA as:>>>>> " . Reducing new Path(), RawFileStatus() creation overhead in>>>>> LocalDirAllocator">>>>> not in trunk>>>>>>>>>> + BZ4101537 . When a queue is built without any access rights we explain

> I tend to agree. Changing release model of Apache Hadoop train isn't> something that should be done in a hassle or as a part of release> voting.> > If these questions aren't addressed - let's postpone the vote and> discuss all the complications or implications until they sorted out or> the consensus/compromise is reached.> > Cos> > On Wed, May 4, 2011 at 17:39, Eli Collins <[EMAIL PROTECTED]> wrote:>> The point is that these discussion should be sorted out, ie you don't>> change your development and release model on a release VOTE thread,>> you change it on a DISCUSSION thread.>> >> Ie before we release this we should understand what that means. What>> is being proposed is not just another release from branch-0.20 or>> branch-0.22.>> >> Thanks,>> Eli>> >> On Wed, May 4, 2011 at 5:30 PM, Mahadev Konar <[EMAIL PROTECTED]> wrote:>>> Eli,>>> I think the intent from the email was to just vote on this thread,>>> which I agree with.>>> Discussions should be done in a separate threads. Hopefully we can>>> all stick to just voting!>>> >>> thanks>>> mahadev>>> >>> On Wed, May 4, 2011 at 5:22 PM, Eli Collins <[EMAIL PROTECTED]> wrote:>>>> Good suggestion, it would be helpful to hash out the issues around>>>> compatibility, feature branches, version numbers, how to contribute at>>>> Apache before putting up new votes that would be helpful, ie the vote>>>> would go much smoother if all the issues with the previous vote were>>>> addressed before starting a new one.>>>> >>>> Thanks,>>>> Eli>>>> >>>> On Wed, May 4, 2011 at 5:05 PM, Eric Baldeschwieler>>>> <[EMAIL PROTECTED]> wrote:>>>>> Hi folks,>>>>> >>>>> Let's stay focused. Let's take the other threads onto other threads. This is a vote.>>>>> >>>>> To the extent naming is a problem, let's take that to a thread and find an acceptable proposal.>>>>> >>>>> To the extent folks want to collaborate on certifying the release for total lack of regression or collaborate on the cleanest possible merge, I think all interested parties should take these topics to another thread and divide up the work.>>>>> >>>>> If you've voted, you don't need to comment further on this thread, no matter what company you work for!>>>>> >>>>> Thanks,>>>>> >>>>> --->>>>> E14 - typing on glass>>>>> >>>>> On May 4, 2011, at 4:46 PM, "Todd Lipcon" <[EMAIL PROTECTED]> wrote:>>>>> >>>>>> On Wed, May 4, 2011 at 4:11 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote:>>>>>> >>>>>>> On May 4, 2011, at 4:09 PM, Tsz Wo (Nicholas), Sze wrote:>>>>>>> >>>>>>> The list seems highly inaccurate. Checked the first few N/A items. All>>>>>>>> are>>>>>>>> false positives.>>>>>>>> >>>>>>>> >>>>>>> Also, can you please provide a list on features which are not related to>>>>>>> gridmix benchmarks or herriot tests?>>>>>>> >>>>>> >>>>>> Here are a few I quickly pulled up:>>>>>> MAPREDUCE-2316 (docs for improved capacity scheduler)>>>>>> MAPREDUCE-2355 (adds new config for heartbeat dampening in MR)>>>>>> >>>>>> " BZ-4182948. Add statistics logging to Fred for better visibility into>>>>>> startup time costs. (Matt Foley)">>>>>> - I believe I saw a note from Matt on the JIRA yesterday about this feature,

On Wed, May 4, 2011 at 6:18 PM, Eric Baldeschwieler<[EMAIL PROTECTED]> wrote:> Ok. I'll bite.>> The point of a vote is to learn what everyone thinks. So far we have learned:>> 1 - the team that is trying to contribute code and do a release thinks it is ready.>> 2 - Cloudera does not think the release is a good idea.>

I don't think that's true. There's a difference between notsupporting a given rc and not supporting a release from this branch ingeneral.

With both of my hats on, I want code to be reviewed before beingrelease, I want releases to not regress against previous releases, Idon't want the next major release to regress against a stable release,I want the community to discuss new version schemes and developmentmodels vs adopting them by accident just because we voted on aparticular release.

> The point is that these discussion should be sorted out, ie you don't> change your development and release model on a release VOTE thread,> you change it on a DISCUSSION thread.

That is no different than saying you have a right to veto arelease until the issue is addressed, which you don't have.

A release vote is a majority decision. If the majoritydecides to release, then whatever gets released will definethe new norm by which policies are assumed. If not released,then I suggest collaborating more on the policies beforetrying to vote again.

Either way, we don't hold up a vote for the sake of apolicy discussion because voting is a more efficientmeans of discovering if the policy really matters.

> On May 4, 2011, at 5:39 PM, Eli Collins wrote:>> > The point is that these discussion should be sorted out, ie you don't> > change your development and release model on a release VOTE thread,> > you change it on a DISCUSSION thread.>> That is no different than saying you have a right to veto a> release until the issue is addressed, which you don't have.>> A release vote is a majority decision. If the majority> decides to release, then whatever gets released will define> the new norm by which policies are assumed. If not released,> then I suggest collaborating more on the policies before> trying to vote again.>> Either way, we don't hold up a vote for the sake of a> policy discussion because voting is a more efficient> means of discovering if the policy really matters.>> ....Roy>>-- Connect to me at http://www.facebook.com/dhruba

I'm really not sure yet how to vote here. I was going to vote +1 for what I was told by a number of Yahoo! committers would be a one time release as Yahoo! "comes back to Apache" after a hiatus last fall/winter and ended their own distribution. Clearly this code was not all developed as a community process, but I was going to support a one time release of what they had developed in exclusion.

Then I read Roy's email, which confused me. We would he or I or anyone else support this release setting precedent or policy since it would walk all over our bylaws, community process, and the consensus nature of our foundation? This release vote is a lazy majority of the PMC, but other decisions rolled up in this are supposed to be lazy majority of active committers or, in the case of code changes, a lazy consensus. Setting policy by this release means any sufficiently large group of committers could go off and develop on their own and then commit it to a branch and call a release.

Furthermore, it now sounds like this is possibly the first in a line of feature releases off this branch. Bug fixes releases, sure. But feature releases? What's wrong with trunk?

Nige

On May 4, 2011, at 6:56 PM, Roy T. Fielding wrote:

> On May 4, 2011, at 5:39 PM, Eli Collins wrote:> >> The point is that these discussion should be sorted out, ie you don't>> change your development and release model on a release VOTE thread,>> you change it on a DISCUSSION thread.> > That is no different than saying you have a right to veto a> release until the issue is addressed, which you don't have.> > A release vote is a majority decision. If the majority> decides to release, then whatever gets released will define> the new norm by which policies are assumed. If not released,> then I suggest collaborating more on the policies before> trying to vote again.> > Either way, we don't hold up a vote for the sake of a> policy discussion because voting is a more efficient> means of discovering if the policy really matters.> > ....Roy>

As Roy says, "whatever gets released will define the new norm by whichpolicies are assumed", and I certainly don't want this project to change itsnorms to accommodate bad practices. In particular, Eli presented three veryreasonable technical objections to this release. To summarize:

1) Let's get the JIRAs that are going into this release into trunk first.2) Let's create a JIRA for each issue in the release.3) Let's stick to the release numbering conventions established for thisproject.

I know the folks at Yahoo! are all professional engineers and donetremendous work to help get the project to this point. There's no doubt inmy mind they understand the validity of the above three technicalobjections. In fact, many of them helped author our "How to Contribute"page, which established these conventions:wiki.apache.org/hadoop/HowToContribute. We develop new features againsttrunk, we create JIRAs for each issue, we review code before it goes intotrunk, and we only update old releases with bug fixes.

I couldn't be more excited to have Yahoo! once again doing development inApache, and I hope that we can work together to get the work that you'vedone in this branch into one of our upcoming feature releases.

I hope those who voted +1 before Roy clarified what a release vote will meanfor future project norms will reconsider their votes.

While there may be many competing agendas in this community, we all wish tosee Apache Hadoop releases of the highest quality. Changing our norms toallow huge, unreviewed patch sets introducing new features into a pastrelease is a step in the wrong direction.

With a little bit of elbow grease, we can get the work done in this branchinto trunk, get 0.22 out the door, and be ready for a great 0.23 release.

> I'm really not sure yet how to vote here. I was going to vote +1 for what> I was told by a number of Yahoo! committers would be a one time release as> Yahoo! "comes back to Apache" after a hiatus last fall/winter and ended> their own distribution. Clearly this code was not all developed as a> community process, but I was going to support a one time release of what> they had developed in exclusion.>> Then I read Roy's email, which confused me. We would he or I or anyone> else support this release setting precedent or policy since it would walk> all over our bylaws, community process, and the consensus nature of our> foundation? This release vote is a lazy majority of the PMC, but other> decisions rolled up in this are supposed to be lazy majority of active> committers or, in the case of code changes, a lazy consensus. Setting> policy by this release means any sufficiently large group of committers> could go off and develop on their own and then commit it to a branch and> call a release.>> Furthermore, it now sounds like this is possibly the first in a line of> feature releases off this branch. Bug fixes releases, sure. But feature> releases? What's wrong with trunk?>> Nige>> On May 4, 2011, at 6:56 PM, Roy T. Fielding wrote:>> > On May 4, 2011, at 5:39 PM, Eli Collins wrote:> >> >> The point is that these discussion should be sorted out, ie you don't> >> change your development and release model on a release VOTE thread,> >> you change it on a DISCUSSION thread.> >> > That is no different than saying you have a right to veto a> > release until the issue is addressed, which you don't have.> >> > A release vote is a majority decision. If the majority> > decides to release, then whatever gets released will define> > the new norm by which policies are assumed. If not released,> > then I suggest collaborating more on the policies before> > trying to vote again.> >> > Either way, we don't hold up a vote for the sake of a> > policy discussion because voting is a more efficient> > means of discovering if the policy really matters.> >> > ....Roy> >>

> -1.>> As Roy says, "whatever gets released will define the new norm by which> policies are assumed", and I certainly don't want this project to change> its> norms to accommodate bad practices. In particular, Eli presented three very> reasonable technical objections to this release. To summarize:>> 1) Let's get the JIRAs that are going into this release into trunk first.> 2) Let's create a JIRA for each issue in the release.> 3) Let's stick to the release numbering conventions established for this> project.>> I know the folks at Yahoo! are all professional engineers and done> tremendous work to help get the project to this point. There's no doubt in> my mind they understand the validity of the above three technical> objections. In fact, many of them helped author our "How to Contribute"> page, which established these conventions:> wiki.apache.org/hadoop/HowToContribute. We develop new features against> trunk, we create JIRAs for each issue, we review code before it goes into> trunk, and we only update old releases with bug fixes.>> I couldn't be more excited to have Yahoo! once again doing development in> Apache, and I hope that we can work together to get the work that you've> done in this branch into one of our upcoming feature releases.>> I hope those who voted +1 before Roy clarified what a release vote will> mean> for future project norms will reconsider their votes.>> While there may be many competing agendas in this community, we all wish to> see Apache Hadoop releases of the highest quality. Changing our norms to> allow huge, unreviewed patch sets introducing new features into a past> release is a step in the wrong direction.>> With a little bit of elbow grease, we can get the work done in this branch> into trunk, get 0.22 out the door, and be ready for a great 0.23 release.>> Later,> Jeff>> On Wed, May 4, 2011 at 9:17 PM, Nigel Daley <[EMAIL PROTECTED]> wrote:>> > I'm really not sure yet how to vote here. I was going to vote +1 for> what> > I was told by a number of Yahoo! committers would be a one time release> as> > Yahoo! "comes back to Apache" after a hiatus last fall/winter and ended> > their own distribution. Clearly this code was not all developed as a> > community process, but I was going to support a one time release of what> > they had developed in exclusion.> >> > Then I read Roy's email, which confused me. We would he or I or anyone> > else support this release setting precedent or policy since it would walk> > all over our bylaws, community process, and the consensus nature of our> > foundation? This release vote is a lazy majority of the PMC, but other> > decisions rolled up in this are supposed to be lazy majority of active> > committers or, in the case of code changes, a lazy consensus. Setting> > policy by this release means any sufficiently large group of committers> > could go off and develop on their own and then commit it to a branch and> > call a release.> >> > Furthermore, it now sounds like this is possibly the first in a line of> > feature releases off this branch. Bug fixes releases, sure. But feature> > releases? What's wrong with trunk?> >> > Nige> >> > On May 4, 2011, at 6:56 PM, Roy T. Fielding wrote:> >> > > On May 4, 2011, at 5:39 PM, Eli Collins wrote:> > >> > >> The point is that these discussion should be sorted out, ie you don't> > >> change your development and release model on a release VOTE thread,> > >> you change it on a DISCUSSION thread.> > >> > > That is no different than saying you have a right to veto a> > > release until the issue is addressed, which you don't have.> > >> > > A release vote is a majority decision. If the majority> > > decides to release, then whatever gets released will define

I'm not going to cast a vote, but I'm concerned about this for the same reasons Eli brought up -- in particular, compatibility with 0.22. I'm an author of several patches that have gone into 0.21 and trunk, only to stay on hiatus for 2 years because the project hasn't made a stable release since 0.20. (Today, many of these patches are being used through CDH, which is great, but it would be nice to see them in an Apache release too.) This push of features into 0.20.203 makes a widely used 0.22 seem even more distant. Can we at least get a confirmation that these changes will be included in 0.22, as well as a timeline?

To support a vibrant developer community, Apache Hadoop should not just be a mechanism for Yahoo and Cloudera to publish patches. It should include a well-defined process for smaller third-party contributors to push changes that will make it into a stable release within a reasonable time horizon. The lack of such a process has been a major cause for the slowdown in the project in my perspective.

Matei

On May 4, 2011, at 10:47 PM, Eric Sammer wrote:

> (non-binding) -1 for similar reasons to what Jeff and others have laid out,> and certainly if we're going to change the development process as a side> effect of a release vote.> > On Wed, May 4, 2011 at 9:54 PM, Jeff Hammerbacher <[EMAIL PROTECTED]>wrote:> >> -1.>> >> As Roy says, "whatever gets released will define the new norm by which>> policies are assumed", and I certainly don't want this project to change>> its>> norms to accommodate bad practices. In particular, Eli presented three very>> reasonable technical objections to this release. To summarize:>> >> 1) Let's get the JIRAs that are going into this release into trunk first.>> 2) Let's create a JIRA for each issue in the release.>> 3) Let's stick to the release numbering conventions established for this>> project.>> >> I know the folks at Yahoo! are all professional engineers and done>> tremendous work to help get the project to this point. There's no doubt in>> my mind they understand the validity of the above three technical>> objections. In fact, many of them helped author our "How to Contribute">> page, which established these conventions:>> wiki.apache.org/hadoop/HowToContribute. We develop new features against>> trunk, we create JIRAs for each issue, we review code before it goes into>> trunk, and we only update old releases with bug fixes.>> >> I couldn't be more excited to have Yahoo! once again doing development in>> Apache, and I hope that we can work together to get the work that you've>> done in this branch into one of our upcoming feature releases.>> >> I hope those who voted +1 before Roy clarified what a release vote will>> mean>> for future project norms will reconsider their votes.>> >> While there may be many competing agendas in this community, we all wish to>> see Apache Hadoop releases of the highest quality. Changing our norms to>> allow huge, unreviewed patch sets introducing new features into a past>> release is a step in the wrong direction.>> >> With a little bit of elbow grease, we can get the work done in this branch>> into trunk, get 0.22 out the door, and be ready for a great 0.23 release.>> >> Later,>> Jeff>> >> On Wed, May 4, 2011 at 9:17 PM, Nigel Daley <[EMAIL PROTECTED]> wrote:>> >>> I'm really not sure yet how to vote here. I was going to vote +1 for>> what>>> I was told by a number of Yahoo! committers would be a one time release>> as>>> Yahoo! "comes back to Apache" after a hiatus last fall/winter and ended>>> their own distribution. Clearly this code was not all developed as a>>> community process, but I was going to support a one time release of what>>> they had developed in exclusion.>>> >>> Then I read Roy's email, which confused me. We would he or I or anyone>>> else support this release setting precedent or policy since it would walk>>> all over our bylaws, community process, and the consensus nature of our

I'm not going to cast a vote, but I'm concerned about this for the same reasons Eli brought up -- in particular, compatibility with 0.22. I'm an author of several patches that have gone into 0.21 and trunk, only to stay on hiatus for 2 years because the project hasn't made a stable release since 0.20. (Today, many of these patches are being used through CDH, which is great, but it would be nice to see them in an Apache release too.) This push of features into 0.20.203 makes a widely used 0.22 seem even more distant. Can we at least get a confirmation that these changes will be included in 0.22, as well as a timeline?

To support a vibrant developer community, Apache Hadoop should not just be a mechanism for Yahoo and Cloudera to publish patches. It should include a well-defined process for smaller third-party contributors to push changes that will make it into a stable release within a reasonable time horizon. The lack of such a process has been a major cause for the slowdown in the project in my perspective.

Matei

On May 4, 2011, at 10:47 PM, Eric Sammer wrote:

> (non-binding) -1 for similar reasons to what Jeff and others have laid out,> and certainly if we're going to change the development process as a side> effect of a release vote.> > On Wed, May 4, 2011 at 9:54 PM, Jeff Hammerbacher <[EMAIL PROTECTED]>wrote:> >> -1.>> >> As Roy says, "whatever gets released will define the new norm by which>> policies are assumed", and I certainly don't want this project to change>> its>> norms to accommodate bad practices. In particular, Eli presented three very>> reasonable technical objections to this release. To summarize:>> >> 1) Let's get the JIRAs that are going into this release into trunk first.>> 2) Let's create a JIRA for each issue in the release.>> 3) Let's stick to the release numbering conventions established for this>> project.>> >> I know the folks at Yahoo! are all professional engineers and done>> tremendous work to help get the project to this point. There's no doubt in>> my mind they understand the validity of the above three technical>> objections. In fact, many of them helped author our "How to Contribute">> page, which established these conventions:>> wiki.apache.org/hadoop/HowToContribute. We develop new features against>> trunk, we create JIRAs for each issue, we review code before it goes into>> trunk, and we only update old releases with bug fixes.>> >> I couldn't be more excited to have Yahoo! once again doing development in>> Apache, and I hope that we can work together to get the work that you've>> done in this branch into one of our upcoming feature releases.>> >> I hope those who voted +1 before Roy clarified what a release vote will>> mean>> for future project norms will reconsider their votes.>> >> While there may be many competing agendas in this community, we all wish to>> see Apache Hadoop releases of the highest quality. Changing our norms to>> allow huge, unreviewed patch sets introducing new features into a past>> release is a step in the wrong direction.>> >> With a little bit of elbow grease, we can get the work done in this branch>> into trunk, get 0.22 out the door, and be ready for a great 0.23 release.>> >> Later,>> Jeff>> >> On Wed, May 4, 2011 at 9:17 PM, Nigel Daley <[EMAIL PROTECTED]> wrote:>> >>> I'm really not sure yet how to vote here. I was going to vote +1 for>> what>>> I was told by a number of Yahoo! committers would be a one time release>> as>>> Yahoo! "comes back to Apache" after a hiatus last fall/winter and ended>>> their own distribution. Clearly this code was not all developed as a

> " BZ-4182948. Add statistics logging to Fred for better visibility into> startup time costs. (Matt Foley)"> - I believe I saw a note from Matt on the JIRA yesterday about this feature,> where he decided that the version done in 203 wasn't a good approach, and> it's done differently in trunk (not sure if done yet).

Could anyone elaborate on what this "Fred" is that has been coming upon these threads a few times now?

And is there something like a RELEASE NOTES draft that I could lookover? I try to follow these mailing lists as best as I can but I havelost track of all the branches and features being worked on and Ican't imagine I'm the only one.

It would be nice to get an overview of what this release is all aboutwhere work is being done etc.

With Apache hat on, I don't see how this is at all relevant to the task athand. I would make the same arguments against taking CDH3 and releasing itas an ASF artifact -- we'd also have a certain amount of work to do to makesure that all of the patches are in trunk, first. Additionally, I'd want tooutline what the inclusion criteria would be for that branch.

> With my Cloudera hat on..>> When we went through the 10x and 20x patches we only pulled a subset> of them, primarily for security and the general improvements that we> thought were good. We found both incompatible changes and some> sketchy changes that we did not pull in from a quality perspective.> There is a big difference between a patch set that's acceptable for> Yahoo!'s user base and one that's a more general artifact.>> When we evaluated the YDH patch sets we were using that frame of mind.> I'm now looking it in terms of an Apache release. And the place to> review changes for an Apache release is on jira.>> CDH3 is based on the latest stable Apache release (20.2) so it doesn't> regress against it. I'm nervous about rebasing future releases on 203> because of the compatibility and quality implications.>> Thanks,> Eli>>> On Wed, May 4, 2011 at 3:06 PM, Suresh Srinivas <[EMAIL PROTECTED]>> wrote:> > Eli,> >> > How many of these patches that you find troublesome are in CDH already?> >> > Regards,> > Suresh> >> >> > On 5/4/11 3:03 PM, "Eli Collins" <[EMAIL PROTECTED]> wrote:> >> >> On Wed, May 4, 2011 at 10:31 AM, Owen O'Malley <[EMAIL PROTECTED]>> wrote:> >>> Here's an updated release candidate for 0.20.203.0. I've incorporated> the> >>> feedback and included all of the patches from 0.20.2, which is the last> >>> stable release. I also fixed the eclipse-plugin problem.> >>>> >>> The candidate is at:> http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/> >>>> >>> Please download it, inspect it, compile it, and test it. Clearly, I'm> +1.> >>>> >>> -- Owen> >>> >> While rc2 is an improvement on rc1, I am -1 on this particular rc.> Rationale:> >>> >> This rc contains many patches not yet committed to trunk. This would> >> cause the next major release (0.22) to be a feature regression against> >> our latest stable release (203), were 0.22 released soon.> >>> >> This rc contains many patches not yet reviewed by the community via> >> the normal process (jira, patch against trunk, merge to a release> >> branch). I think we should respect the existing community process that> >> has been used for all previous releases.> >>> >> This rc introduces a new development and braching model (new feature> >> development outside trunk) and Hadoop versioning scheme without> >> sufficient discussion or proposal of these changes with the community.> >>> >> We should establish new process before the release, a release is not> >> the appropriate mechanism for changing our review and development> >> process or versioning .> >>> >> I do support a release from branch-0.20-security that follows the> >> existing, established community process.> >>> >> Thanks,> >> Eli> >> >>

Security EnhancementsAs one of the primary contributors and largest production users of Hadoop,Yahoo! publishes the source tree for the version of Hadoop that they run ontheir production clusters. We are pleased to announce that we have mergedYahoo¹s source tree into CDH3b3. This merge brings many improvementsdeveloped at Yahoo! into CDH, including improvements for MapReducescalability on 1000+-node clusters and several new tools for benchmarkingand testing Hadoop.------

It would be great, if you can list how many of 192 changes were reviewed andbecame part of CDH.

Your -1 vote essentially blocks the changes that are already available inCDH to be available from Apache open source!On 5/4/11 3:30 PM, "Todd Lipcon" <[EMAIL PROTECTED]> wrote:

> With Cloudera hat on, I agree with Eli's assessment.> > With Apache hat on, I don't see how this is at all relevant to the task at> hand. I would make the same arguments against taking CDH3 and releasing it> as an ASF artifact -- we'd also have a certain amount of work to do to make> sure that all of the patches are in trunk, first. Additionally, I'd want to> outline what the inclusion criteria would be for that branch.> > -Todd> > On Wed, May 4, 2011 at 3:24 PM, Eli Collins <[EMAIL PROTECTED]> wrote:> >> With my Cloudera hat on..>> >> When we went through the 10x and 20x patches we only pulled a subset>> of them, primarily for security and the general improvements that we>> thought were good. We found both incompatible changes and some>> sketchy changes that we did not pull in from a quality perspective.>> There is a big difference between a patch set that's acceptable for>> Yahoo!'s user base and one that's a more general artifact.>> >> When we evaluated the YDH patch sets we were using that frame of mind.>> I'm now looking it in terms of an Apache release. And the place to>> review changes for an Apache release is on jira.>> >> CDH3 is based on the latest stable Apache release (20.2) so it doesn't>> regress against it. I'm nervous about rebasing future releases on 203>> because of the compatibility and quality implications.>> >> Thanks,>> Eli>> >> >> On Wed, May 4, 2011 at 3:06 PM, Suresh Srinivas <[EMAIL PROTECTED]>>> wrote:>>> Eli,>>> >>> How many of these patches that you find troublesome are in CDH already?>>> >>> Regards,>>> Suresh>>> >>> >>> On 5/4/11 3:03 PM, "Eli Collins" <[EMAIL PROTECTED]> wrote:>>> >>>> On Wed, May 4, 2011 at 10:31 AM, Owen O'Malley <[EMAIL PROTECTED]>>> wrote:>>>>> Here's an updated release candidate for 0.20.203.0. I've incorporated>> the>>>>> feedback and included all of the patches from 0.20.2, which is the last>>>>> stable release. I also fixed the eclipse-plugin problem.>>>>> >>>>> The candidate is at:>> http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/>>>>> >>>>> Please download it, inspect it, compile it, and test it. Clearly, I'm>> +1.>>>>> >>>>> -- Owen>>>> >>>> While rc2 is an improvement on rc1, I am -1 on this particular rc.>> Rationale:>>>> >>>> This rc contains many patches not yet committed to trunk. This would>>>> cause the next major release (0.22) to be a feature regression against>>>> our latest stable release (203), were 0.22 released soon.>>>> >>>> This rc contains many patches not yet reviewed by the community via>>>> the normal process (jira, patch against trunk, merge to a release>>>> branch). I think we should respect the existing community process that>>>> has been used for all previous releases.>>>> >>>> This rc introduces a new development and braching model (new feature>>>> development outside trunk) and Hadoop versioning scheme without>>>> sufficient discussion or proposal of these changes with the community.>>>> >>>> We should establish new process before the release, a release is not>>>> the appropriate mechanism for changing our review and development

> Your -1 vote essentially blocks the changes that are already available in> CDH to be available from Apache open source!

As Eric mentioned, this thread is about an Apache release, not CDH.

My -1 vote does not block these changes from being released viaApache. You can not veto a release. Releases are lazy majority, therelease is only blocked if there are more -1 votes than +1 votes.

If these changes are contributed on jira, discussed and reviewed, andcommitted to trunk I'm happy to support the release. There's a bigdifference between asking that a release respect the Apache communityprocess and blocking it. If you want to get the release out how aboutcontributing the work via the normal means so the community can reviewit like we review all other code changes.

Downloaded, verified, tested on single node cluster to mysatisfaction. We've also brought this release up on a sizable clusterand checked its basic sanity.

Regardless of the difficult path we've had over the past year, this isa good chunk of code to get out to the community. I'd much ratherexplain a convoluted numbering system or what is or isn't in thisrelease than continue to apologize for having no release at all.

> +1> > Downloaded, verified, tested on single node cluster to my> satisfaction. We've also brought this release up on a sizable cluster> and checked its basic sanity.

All of you people doing single node tests are missing stuff. For example, the regression in how the secondary namenode addr stuff works vs. 0.20.

By far, the biggest problem we've found is that the capacity scheduler documentation doesn't actually match what the code does. I have a hunch that the unit tests were written/change to match the outcome, rather than test what is supposed to happen. For us, this breakage makes it unusable out of the box and we'll likely either go back to our (relatively stable) backport of 0.21's cap sched, try to fix the 0.20.203 code, or maybe even switch to a completely different scheduler.

Can you provide some more details into what issues you are seeing with thecapacity scheduler? Is it just the docs don't match the code, or are youseeing real issues with job scheduling?

Thanks

ToddP

On 5/6/11 5:49 PM, "Allen Wittenauer" <[EMAIL PROTECTED]> wrote:

>>On May 5, 2011, at 1:56 PM, Jakob Homan wrote:>>> +1>> >> Downloaded, verified, tested on single node cluster to my>> satisfaction. We've also brought this release up on a sizable cluster>> and checked its basic sanity.>> All of you people doing single node tests are missing stuff. For>example, the regression in how the secondary namenode addr stuff works>vs. 0.20. >> By far, the biggest problem we've found is that the capacity scheduler>documentation doesn't actually match what the code does. I have a hunch>that the unit tests were written/change to match the outcome, rather than>test what is supposed to happen. For us, this breakage makes it unusable>out of the box and we'll likely either go back to our (relatively stable)>backport of 0.21's cap sched, try to fix the 0.20.203 code, or maybe even>switch to a completely different scheduler.

> Allen,> > Can you provide some more details into what issues you are seeing with the> capacity scheduler? Is it just the docs don't match the code, or are you> seeing real issues with job scheduling?

Jobs are definitely not getting the maximum number of task slots they should be getting. I'm suspecting a bug with how max-limit of -1 queues are handled. I'll actually be in the office next week to try and see if I can figure out where things are going haywire.

[I filed a bug on this a few weeks ago before I left for vacation. It was basically ignored.]

Allen, there are per job limits, and per user limits in this branch. (So,max capacity of -1 is for the queue, but within the queue, the per userlimits come into picture.) If I remember right, the defaults were based ona certain assumption of how many users would be on a queue simultaneously.Of course this would need to be set in the site-specific config.

>>On May 6, 2011, at 6:43 PM, Todd Papaioannou wrote:>>> Allen,>> >> Can you provide some more details into what issues you are seeing with>>the>> capacity scheduler? Is it just the docs don't match the code, or are you>> seeing real issues with job scheduling?>> Jobs are definitely not getting the maximum number of task slots they>should be getting. I'm suspecting a bug with how max-limit of -1 queues>are handled. I'll actually be in the office next week to try and see if>I can figure out where things are going haywire.>> [I filed a bug on this a few weeks ago before I left for vacation. It>was basically ignored.]

> Allen, there are per job limits, and per user limits in this branch. (So,> max capacity of -1 is for the queue, but within the queue, the per user> limits come into picture.) If I remember right, the defaults were based on> a certain assumption of how many users would be on a queue simultaneously.> Of course this would need to be set in the site-specific config.

Yes, I'm aware of the changes. What I'm basically saying is that even with those new limits taken into consideration, the math doesn't seem to hold up.

On Wed, May 4, 2011 at 15:06, Suresh Srinivas <[EMAIL PROTECTED]> wrote:> Eli,>> How many of these patches that you find troublesome are in CDH already?

How is that relevant to the release vote and discrepancies listed inEli's email?

> Regards,> Suresh>>> On 5/4/11 3:03 PM, "Eli Collins" <[EMAIL PROTECTED]> wrote:>>> On Wed, May 4, 2011 at 10:31 AM, Owen O'Malley <[EMAIL PROTECTED]> wrote:>>> Here's an updated release candidate for 0.20.203.0. I've incorporated the>>> feedback and included all of the patches from 0.20.2, which is the last>>> stable release. I also fixed the eclipse-plugin problem.>>>>>> The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/>>>>>> Please download it, inspect it, compile it, and test it. Clearly, I'm +1.>>>>>> -- Owen>>>> While rc2 is an improvement on rc1, I am -1 on this particular rc. Rationale:>>>> This rc contains many patches not yet committed to trunk. This would>> cause the next major release (0.22) to be a feature regression against>> our latest stable release (203), were 0.22 released soon.>>>> This rc contains many patches not yet reviewed by the community via>> the normal process (jira, patch against trunk, merge to a release>> branch). I think we should respect the existing community process that>> has been used for all previous releases.>>>> This rc introduces a new development and braching model (new feature>> development outside trunk) and Hadoop versioning scheme without>> sufficient discussion or proposal of these changes with the community.>>>> We should establish new process before the release, a release is not>> the appropriate mechanism for changing our review and development>> process or versioning .>>>> I do support a release from branch-0.20-security that follows the>> existing, established community process.>>>> Thanks,>> Eli>>

> Here's an updated release candidate for 0.20.203.0. I've > incorporated the feedback and included all of the patches from > 0.20.2, which is the last stable release. I also fixed the eclipse- > plugin problem.>> The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/>> Please download it, inspect it, compile it, and test it. Clearly, > I'm +1.>

This candidate has lots of patches that are not in trunk, potentiallyadding regressions to 0.22 and 0.23. This should be addressed before werelease from 0.20-security. We should also not move to four-componentversion numbering. A release from the 0.20-security branch shouldperhaps be called 0.20.100.

Doug

On 05/04/2011 10:31 AM, Owen O'Malley wrote:> Here's an updated release candidate for 0.20.203.0. I've incorporated the feedback and included all of the patches from 0.20.2, which is the last stable release. I also fixed the eclipse-plugin problem. > > The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/> > Please download it, inspect it, compile it, and test it. Clearly, I'm +1.> > -- Owen

-1 for the same reasons I outlined in my email yesterday. This is not acommunity artifact following the community's processes, and thus should notbe an official release until those issues are addressed.

> -1>> This candidate has lots of patches that are not in trunk, potentially> adding regressions to 0.22 and 0.23. This should be addressed before we> release from 0.20-security. We should also not move to four-component> version numbering. A release from the 0.20-security branch should> perhaps be called 0.20.100.>> Doug>> On 05/04/2011 10:31 AM, Owen O'Malley wrote:> > Here's an updated release candidate for 0.20.203.0. I've incorporated the> feedback and included all of the patches from 0.20.2, which is the last> stable release. I also fixed the eclipse-plugin problem.> >> > The candidate is at:> http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/> >> > Please download it, inspect it, compile it, and test it. Clearly, I'm +1.> >> > -- Owen>

+1 based on some single node tests I did (with security ON).On 5/4/11 10:31 AM, "Owen O'Malley" <[EMAIL PROTECTED]> wrote:

Here's an updated release candidate for 0.20.203.0. I've incorporated the feedback and included all of the patches from 0.20.2, which is the last stable release. I also fixed the eclipse-plugin problem.

>+1 based on some single node tests I did (with security ON).>>>On 5/4/11 10:31 AM, "Owen O'Malley" <[EMAIL PROTECTED]> wrote:>>Here's an updated release candidate for 0.20.203.0. I've incorporated the>feedback and included all of the patches from 0.20.2, which is the last>stable release. I also fixed the eclipse-plugin problem.>>The candidate is at:>http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/>>Please download it, inspect it, compile it, and test it. Clearly, I'm +1.>>-- Owen>

I'm +1 on releasing rc1. The signature and hashes match on theartifact, ran some of the more aggressive MR tests. Reviewed changesfrom rc0.

It looks like we need a FAQ for this release, if only to prevent thesame questions from being asked and answered across different threadsand lists. Reservations, regressions, and pending work can also bedocumented there.

Right now, Apache Hadoop releases are not recommended by itscommunity. Instead, not only our end users, but other Apache projectsrun Cloudera's distribution. From all those wearing their Apache hat,I would like to see more effort directed toward a release that we canrecommend soon and less time spent compiling tasks to delay it.

Releasing this will complicate the documented process. However, thatprocess *has not produced a usable release* for the last two out ofsix years. This is failure. Entertaining concerns like a one-to-onecorrespondence between commits and JIRA issues is bizarre in thiscontext. Let's find a way to make progress instead of tossingpharisaic accusations of illegitimacy. -C

On Wed, May 4, 2011 at 10:31 AM, Owen O'Malley <[EMAIL PROTECTED]> wrote:> Here's an updated release candidate for 0.20.203.0. I've incorporated the feedback and included all of the patches from 0.20.2, which is the last stable release. I also fixed the eclipse-plugin problem.>> The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/>> Please download it, inspect it, compile it, and test it. Clearly, I'm +1.>> -- Owen

I did download it and checked it out, but when I look at thedocumentation I see it says "Hadoop 0.20 documentation" in the tab ontop. From what I can tell this isn't the branch 0.20 so I think it'san error and from a user point of view this looks more like somethingI would call 0.22 (although yes I understand this is 0.20 +security+whatever).

Why would a single company push so hard to go against the "normal"release process just for "the benefit of putting our work in the handsof all hadoop users" is beyond me. It's not like people were beggingon the mailing lists to be able to get their hands on such a releaseto the point where an emergency point release including tons of newfeatures is needed.

So to me the more logical reason would be monetary gains, that I wouldunderstand better from a for-profit company. But then why go throughthe hurdles of having such an ASF release when Y! isn't even sellinganything remotely related to Hadoop services? And why now?

But then there's this spinoff thing and it suddenly makes a lot more sense.

E14 said earlier that "That is how apache works."

I would say yes, maybe this is how it works, but I'm not sure I wantto see it working like _that_. The ASF shouldn't be the vehicle for asingle (future) company's wishes.

J-D

On Wed, May 4, 2011 at 10:31 AM, Owen O'Malley <[EMAIL PROTECTED]> wrote:> Here's an updated release candidate for 0.20.203.0. I've incorporated the feedback and included all of the patches from 0.20.2, which is the last stable release. I also fixed the eclipse-plugin problem.>> The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/>> Please download it, inspect it, compile it, and test it. Clearly, I'm +1.>> -- Owen

> Non-biding -1.> > I did download it and checked it out, but when I look at the> documentation I see it says "Hadoop 0.20 documentation" in the tab on> top. From what I can tell this isn't the branch 0.20 so I think it's> an error and from a user point of view this looks more like something> I would call 0.22 (although yes I understand this is 0.20 +security> +whatever).> > Why would a single company push so hard to go against the "normal"> release process just for "the benefit of putting our work in the hands> of all hadoop users" is beyond me. It's not like people were begging> on the mailing lists to be able to get their hands on such a release> to the point where an emergency point release including tons of new> features is needed.> > So to me the more logical reason would be monetary gains, that I would> understand better from a for-profit company. But then why go through> the hurdles of having such an ASF release when Y! isn't even selling> anything remotely related to Hadoop services? And why now?> > But then there's this spinoff thing and it suddenly makes a lot more sense.> > E14 said earlier that "That is how apache works."> > I would say yes, maybe this is how it works, but I'm not sure I want> to see it working like _that_. The ASF shouldn't be the vehicle for a> single (future) company's wishes.

The ASF is a vehicle for whomever wishes to collaborate on agiven project. Collaboration means helping do the work. Thosewho do the work may do so for whatever reasons that they thinkare good, whether it is because they feel like being charitabletoday, they get paid a salary and the big boss said "work onthis part", or because they just have an itch worth scratching.

Apache does not care why people choose to collaborate orhow they choose to apply their own intellectual efforts. Wewelcome all forms of contribution under the terms of our license.

What we do require is a certain amount of civility regardingour voting procedures and an emphasis on individual responsibilityfor your votes. Anyone caught *voting* a particular way justbecause the boss says so will be dealt with severely. Votesare how we do quality control and make decisions, and no othercompany can be allowed to make decisions for our non-profit.

just as a Tallywe have6+1's (andy.. is yours binding?? if so 7)and 3 -1's.

so according to the votes so far we are releasing.. but according to our bylaws.. we need to wait 7 days for everyone to chime in.

--IOn May 5, 2011, at 12:22 PM, Roy T. Fielding wrote:

> On May 4, 2011, at 6:24 PM, Jean-Daniel Cryans wrote:> >> Non-biding -1.>> >> I did download it and checked it out, but when I look at the>> documentation I see it says "Hadoop 0.20 documentation" in the tab on>> top. From what I can tell this isn't the branch 0.20 so I think it's>> an error and from a user point of view this looks more like something>> I would call 0.22 (although yes I understand this is 0.20 +security>> +whatever).>> >> Why would a single company push so hard to go against the "normal">> release process just for "the benefit of putting our work in the hands>> of all hadoop users" is beyond me. It's not like people were begging>> on the mailing lists to be able to get their hands on such a release>> to the point where an emergency point release including tons of new>> features is needed.>> >> So to me the more logical reason would be monetary gains, that I would>> understand better from a for-profit company. But then why go through>> the hurdles of having such an ASF release when Y! isn't even selling>> anything remotely related to Hadoop services? And why now?>> >> But then there's this spinoff thing and it suddenly makes a lot more sense.>> >> E14 said earlier that "That is how apache works.">> >> I would say yes, maybe this is how it works, but I'm not sure I want>> to see it working like _that_. The ASF shouldn't be the vehicle for a>> single (future) company's wishes.> > The ASF is a vehicle for whomever wishes to collaborate on a> given project. Collaboration means helping do the work. Those> who do the work may do so for whatever reasons that they think> are good, whether it is because they feel like being charitable> today, they get paid a salary and the big boss said "work on> this part", or because they just have an itch worth scratching.> > Apache does not care why people choose to collaborate or> how they choose to apply their own intellectual efforts. We> welcome all forms of contribution under the terms of our license.> > What we do require is a certain amount of civility regarding> our voting procedures and an emphasis on individual responsibility> for your votes. Anyone caught *voting* a particular way just> because the boss says so will be dealt with severely. Votes> are how we do quality control and make decisions, and no other> company can be allowed to make decisions for our non-profit.> > ....Roy

--- On Wed, 5/4/11, Ian Holsman <[EMAIL PROTECTED]> wrote:> just as a Tally we have> 6+1's (andy.. is yours binding?? if so 7)> and 3 -1's.> > so according to the votes so far we are releasing.. but> according to our bylaws.. we need to wait 7 days for> everyone to chime in.> > --I

I'm a committer not PMC, so it's non binding.Why:-looks and works OK on my desktop-we've been using Y! releases, and this brings their branch back into the apache fold, meaning we can say "the official Apache release of Apache Hadoop is something you can use in production".-I'm confident that the Y! team have tested this well.

I understand Eli's concerns that putting stuff in there that hasn't gone into trunk yet is danger. However, as the team makes no guarantees of 100% compatibility between releases, I don't think it's critical. It's just something that needs to be addressed -which can be done after this release has shipped.-Steve

and they are happy to exploit any perception of division, lack of forward progress, and such. Perhaps this is additional, and I pray sufficient, motivation to cease the proxy battles, bury the hatchet, etc.

- Andy--- On Fri, 5/6/11, Steve Loughran <[EMAIL PROTECTED]> wrote:

> From: Steve Loughran <[EMAIL PROTECTED]>> Subject: Re: [VOTE] Release candidate 0.20.203.0-rc1> To: [EMAIL PROTECTED]> Date: Friday, May 6, 2011, 4:52 AM> > Vote: +1> > I'm a committer not PMC, so it's non binding.> > > Why:> -looks and works OK on my desktop> -we've been using Y! releases, and this brings their branch> back into the apache fold, meaning we can say "the official> Apache release of Apache Hadoop is something you can use in> production".> -I'm confident that the Y! team have tested this well.> > I understand Eli's concerns that putting stuff in there> that hasn't gone into trunk yet is danger. However, as the> team makes no guarantees of 100% compatibility between> releases, I don't think it's critical. It's just something> that needs to be addressed -which can be done after this> release has shipped.> > > -Steve> >

On Wed, May 4, 2011 at 7:22 PM, Roy T. Fielding <[EMAIL PROTECTED]> wrote:> The ASF is a vehicle for whomever wishes to collaborate on a> given project. Collaboration means helping do the work. Those> who do the work may do so for whatever reasons that they think> are good, whether it is because they feel like being charitable> today, they get paid a salary and the big boss said "work on> this part", or because they just have an itch worth scratching.>> Apache does not care why people choose to collaborate or> how they choose to apply their own intellectual efforts. We> welcome all forms of contribution under the terms of our license.

I don't think I was arguing against the contribution of the code inthat branch, it's very welcome, but I'm questioning (and rantingabout) the motivation for releasing a version that even just by nameis a weird hulla-hoop around the usual development practices thatHadoop has had in the past (not that it's set in stone).

So I wanted to contribute my negative non-binding vote to highlightthat this release is probably very confusing for the general user.This is 0.20, but it's not. Also it has more numbers, and it starts at203. Why doing this at all instead of just moving on with 0.22? Or is0.22 bound to be like 0.21? It almost begs the question if this shouldbe called 0.22.0 then.

>> What we do require is a certain amount of civility regarding> our voting procedures and an emphasis on individual responsibility> for your votes. Anyone caught *voting* a particular way just> because the boss says so will be dealt with severely. Votes> are how we do quality control and make decisions, and no other> company can be allowed to make decisions for our non-profit.

Yeah I don't think that's a problem here, everyone seem to have theirvery own strong opinions.

> Roy,> > On Wed, May 4, 2011 at 7:22 PM, Roy T. Fielding <[EMAIL PROTECTED]> wrote:>> The ASF is a vehicle for whomever wishes to collaborate on a>> given project. Collaboration means helping do the work. Those>> who do the work may do so for whatever reasons that they think>> are good, whether it is because they feel like being charitable>> today, they get paid a salary and the big boss said "work on>> this part", or because they just have an itch worth scratching.>> >> Apache does not care why people choose to collaborate or>> how they choose to apply their own intellectual efforts. We>> welcome all forms of contribution under the terms of our license.> > I don't think I was arguing against the contribution of the code in> that branch, it's very welcome, but I'm questioning (and ranting> about) the motivation for releasing a version that even just by name> is a weird hulla-hoop around the usual development practices that> Hadoop has had in the past (not that it's set in stone).

Yes, and I said that kind of questioning is not appropriate.You are not responsible for other peoples' motivation.

> So I wanted to contribute my negative non-binding vote to highlight> that this release is probably very confusing for the general user.> This is 0.20, but it's not. Also it has more numbers, and it starts at> 203. Why doing this at all instead of just moving on with 0.22? Or is> 0.22 bound to be like 0.21? It almost begs the question if this should> be called 0.22.0 then.

Yes, I already made that same point. You don't need to talk aboutmotivation in order to do so.

If I had a vote, I would have voted -1 just because version numbersdo matter to users, the three number form is well-known, and mintingnew versions is far cheaper than adding extra numbers. I'd have cutthe release candidate as 0.30. However, I did not do that work.The person who did the work chose 0.20.203.0. Anyone who doesn'tlike that should vote accordingly and, preferably, make yourcommunication about such things more open in the future sothat you don't waste others' time on extra builds. And if themajority thinks releasing these bits are more important than myconcerns, then I have to accept that as the will of the project.

Please note also that policies are not technical discussions.Likewise, version numbers are not technical. If those weretechnical changes then anyone on the PMC could veto them,which would effectively mean anyone could veto a releaseand the project would quickly devolve into tyranny by minority.

Likewise, just because I said that a successful release definesits own set of precedents (and therein policy), that doesn'tmean the project can't vote on a new policy the next day ormake another release that sets it again moving forward.Progress is in the doing.

> Here's an updated release candidate for 0.20.203.0. I've > incorporated the feedback and included all of the patches from > 0.20.2, which is the last stable release. I also fixed the eclipse- > plugin problem.>> The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/>> Please download it, inspect it, compile it, and test it. Clearly, > I'm +1.>> -- Owen

But I can't vote for a 0.20.clusterbomb release that railroads overprecedent compounding further the existing confusion that alreadyexists around the state of Hadoop.

Thanks,St.AckOn Wed, May 4, 2011 at 10:31 AM, Owen O'Malley <[EMAIL PROTECTED]> wrote:> Here's an updated release candidate for 0.20.203.0. I've incorporated the feedback and included all of the patches from 0.20.2, which is the last stable release. I also fixed the eclipse-plugin problem.>> The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/>> Please download it, inspect it, compile it, and test it. Clearly, I'm +1.>> -- Owen

Speculation either on the motives of those objecting to a release or of those making contributions or proposing a release does not advance progress. The accusations and counter-accusations seen on this thread are regrettable and I feel less and less confident in the future of Apache Hadoop as time goes on. As a strong believer in and advocate of open source as an answer to technical and architectural challenges, I am pained to see the members of what should be a vibrant community litigating in an ultimately self-defeating way. If only this energy put into argument could be channeled into code or patches...

In open source, if opinions were code we would rule the world.

So what of this candidate?

Artifact looks good, DFS tests are good, MR tests are good. Looked over some of the documentation and found no errors. To my knowledge this is now a superset of branch-0.20, addressing the reasonably determined deficit of rc0.

There seems no reason other issues cannot be addressed subsequently.

There has not been a release of Apache Hadoop 0.20 since at least Feb 6 2010 yet since this time important security enhancements have been contributed, but in the form of an Apache product these are only available as patches on a non-release branch. Forward progress of the Apache product seems more important than achieving the perfect release in all eyes.

For example, append features remain on a non-release branch. I would really have liked to see the append changes included in this candidate, but this is not grounds for objection merely regret, and I hope this can be covered by a subsequent release, perhaps soon.

After security and append features are in 0.20, in my personal humble opinion the 0.20 release in total is sufficient and all attention should be paid to the next release (0.22 or whatever), except for critical bug fixes.