>> Should we release>> http://people.apache.org/~omalley/hadoop-0.20.203.0-rc0/?> > The patch selection process for this branch did not appear to be a> community process. A massive patch set was committed en-masse with no> public discussion before or after about its specific composition.

guys...1. do we agree this is an issue2. if it is, how we do get the communication & discussion on list?

what do people think are the major issues that are stopping people talking about stuff on list are?

> As previously discussed, all parties are welcome to champion> altenative releases from Apache if they want to invest in making> Apache Hadoop better.

I do not believe that different organizations should release their ownversions of Hadoop posing as Apache releases. If folks wish to releasetheir own versions, then they should call them something else andrelease them themselves. The Apache Hadoop project should createreleases collaboratively, through an open process. The standard meansis to start a branch from trunk or a prior release and propose patchesto that branch, one-by-one. This candidate diverged sufficiently fromthis pattern that, for me, it doesn't qualify as a community release.

I guess I am concerned as a user of hadoop that the only way to get an “endorsed” up-to-date version of hadoop one has to abandon the community and “trust” a commercial release with its special sauce.

I am just hoping that the community can put together a nice stable up-to-date patched version. That’d be nice. It probably won’t change my commercial deploy, but it would give me something to compare with :)

Just my $0.02 (CND)CheersJames.

On 2011-05-02, at 2:51 PM, Doug Cutting wrote:

> On 05/02/2011 01:05 PM, Eric Baldeschwieler wrote:>> As previously discussed, all parties are welcome to champion>> altenative releases from Apache if they want to invest in making>> Apache Hadoop better.> > I do not believe that different organizations should release their own> versions of Hadoop posing as Apache releases. If folks wish to release> their own versions, then they should call them something else and> release them themselves. The Apache Hadoop project should create> releases collaboratively, through an open process. The standard means> is to start a branch from trunk or a prior release and propose patches> to that branch, one-by-one. This candidate diverged sufficiently from> this pattern that, for me, it doesn't qualify as a community release.> > Cheers,> > Doug

> moving this thread to general@> > On May 3, 2011, at 3:58 AM, Doug Cutting wrote:> >>> Should we release>>> http://people.apache.org/~omalley/hadoop-0.20.203.0-rc0/?>> >> The patch selection process for this branch did not appear to be a>> community process. A massive patch set was committed en-masse with no>> public discussion before or after about its specific composition.> > guys...> 1. do we agree this is an issue

Of course it is an issue. Anyone can make it an issue -- noagreement is necessary.

> 2. if it is, how we do get the communication & discussion on list?

By communicating and discussing on list. Like, for example,by proposing a release vote and people objecting to it, followedby a polite collaboration on ways to reduce objections if thatis needed to get a release out the door.

> what do people think are the major issues that are stopping people talking about stuff on list are?

The fact that people can vote on individual issues via jira,which means that there is effectively no discussion of theproduct as a whole on list. I am constantly amazed at howquiet it is in this project, at least until I remember thatmost of the work is done exclusively via jira, unlike any ofmy other followed projects that use jira. I'd suggest thatthe right place to hold any discussion is on the dev list,but I am not on that list because it receives way too manyautomated notifications. Maybe it would help discussion ondev if notices were sent elsewhere and only discussions wereheld on dev.

By all means, produce a tarball and let the entire PMC voteon it as the next release. My personal preference is to notallow anything that deviates from the major.minor.patch releasenumbering that most software projects follow, but I don't havea vote here.

It is perfectly reasonable for Doug (or anyone else) to voteon a release based on a lack of version history, adequatedescription of the sweet meats, or anything else that othersmight consider non-technical. This is a release vote!It does not require consensus. It requires minimal review(usually meaning three +1s) and a majority opinion of thoseon the PMC who choose to review the proposed release and vote.

>It is perfectly reasonable for Doug (or anyone else) to vote>on a release based on a lack of version history, adequate>description of the sweet meats, or anything else that others>might consider non-technical. This is a release vote!>It does not require consensus. It requires minimal review>(usually meaning three +1s) and a majority opinion of those>on the PMC who choose to review the proposed release and vote.

Roy,

Thanks for reminding everyone that a release does not require consensus.

Regarding this release, I think anyone who runs a multi-tenant Hadoopcluster will appreciate the user-limits feature that goes a long way toensure that an errant job does not take the entire cluster down. Youroperations and support people will thank you for deploying this release.

Recently I was discussing with operations folks at a company that operatesa Hadoop cluster based on a commercial distribution of Hadoop, and theywere excited to hear that they will have a way of making sure that theircluster will not be taken down by an errant user/job, because that's onebig fear that keeps them awake.

FWIW, I am +1 for this release.

Arun, can you include a document that gives more details about what thelimits are, and how to modify your jobs to stay below these limits (I knowit is a cut-paste for you :-)?

> quiet it is in this project, at least until I remember that> most of the work is done exclusively via jira, unlike any of> my other followed projects that use jira. I'd suggest that> the right place to hold any discussion is on the dev list,> but I am not on that list because it receives way too many> automated notifications. Maybe it would help discussion on> dev if notices were sent elsewhere and only discussions were> held on dev.

(pause: bisecting their list shows that in 1.mar.06 they forked JIRA to a separate list to hide the details of ongoing work)

In some ways it's a means of dealing with a large and fast moving codebase: you subscribe to the issues that matter to you, all the discussions on a specific feature are archived, etc.

However, it has some flaws -discouragement of community, you become a group of people working on JIRA issues, rather than on a large integrated project -with work spread across common, hdfs and mapreduce JIRAs and mailing lists, it's hard to keep all the things in your head -it is pretty much a full time job to do so. And I don't know about the others, but I don't have the time. -we need a way of gently moving people from those who use hadoop to those who develop it. To me, every end user is a warm engineering resource we just need to point at a problem that they care about. The scale of the project, its complexity, JIRA change rate and testing difficulties are all barriers to entry -you end up needing a team of people * someone to track all the issues and keep the design in their head * 1+ person to test * 1+ person to codeI don't know about others, but I can't do this on my own.

The attempt to split up into HDFS+MAPREDUCE was one tactic to deal with this, but it hasn't worked, we just have more mailing lists to track (or in my case, fall behind on).

votewise:

-I'm favour of shipping an apache release of 20.x that has the patches that Y! and others have added to deal with scale and availability -and which has been tested by them. This will provide an apache release for people to use in production systems -because the official apache releases have lagged the CDH and Y! releases.

-I'd like to see all the changes integrated into trunk too, as it doesn't make sense for a patch in this branch not to be in trunk.

Voting +1 for a release means that you have downloaded thesource code package, verified its signatures, compiled iton your platform of choice, and checked to your satisfactionthat it matches the source code we have in subversion and thatis is better (in your opinion) than the last Apache releaseof the same name.

The ASF relies on that minimum amount of peer review to makesure that we don't release trojan horses, license violations,or other things that might get us sued as a foundation or asindividuals. If you don't have time to do it yourself, thenvote +0 (with happy feelings) and hope that there are at leastthree members of the PMC who do have that time.

DO NOT +1 a release just because it seems like progress.Progress is in the doing, not the talking.

....Roy

NEW: Monitor These Apps!

Apache Lucene, Apache Solr and all other Apache Software Foundation project and their respective logos are trademarks of the Apache Software Foundation.
Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S. and in other countries. This site and Sematext Group is in no way affiliated with Elasticsearch BV.
Service operated by Sematext