...and just pay attention to the Hadoop project over the last 3-4 years. It's operatingas a single project, that's masking separate communities that themselves are reallyseparate ASF projects.

At the ASF, this has been a problem area called "umbrella" projects and over the years, all I've seen from them is wasted bandwidth, artificial barriers and the inventions of new ways to perform process mongering and to reduce the fun in developing softwareat this fantastic foundation.

I've talked about umbrella projects enough. We've diverted conversation enough.Enough people have tried to act like there is some technical mumbo jumbo that ispreventing the eventual act of higher power that I myself hope comes should thesediscussions prove unfruitful through normal means.

Over the course of this discussion I've become convinced it is time to split up Hadoop. Pig, Hive, Zookeeper, HBase and other Hadoop graduates all seem to have been plagued by fewer meta-discussions and bi-law fights., etc since they graduated from Hadoop. Board members have been advising us to do this for years. With 1.0 stable and 2.0 on the way, now seems like a good time to do it.

With mavenization done and the advent of BigTop and multiple 3rd party hadoop distro packagers, there is little doubt that people concerned about consuming the work of the distinct projects will be able to get them to work together.

On Aug 28, 2012, at 7:33 PM, Mattmann, Chris A (388J) wrote:

> [decided to minimize traffic and to simply put this in one thread]> > Hi Guys,> > See the recent discussion on these threads:> > YARN as its own Hadoop "sub project": http://s.apache.org/WW1> Maintain a single committer list for the Hadoop project: http://s.apache.org/Owx> > ...and just pay attention to the Hadoop project over the last 3-4 years. It's operating> as a single project, that's masking separate communities that themselves are really> separate ASF projects. > > At the ASF, this has been a problem area called "umbrella" projects and over the years, > all I've seen from them is wasted bandwidth, artificial barriers and the inventions of > new ways to perform process mongering and to reduce the fun in developing software> at this fantastic foundation.> > I've talked about umbrella projects enough. We've diverted conversation enough.> Enough people have tried to act like there is some technical mumbo jumbo that is> preventing the eventual act of higher power that I myself hope comes should these> discussions prove unfruitful through normal means. > > *these. are. separate. projects.*> *there.are.not.blocker.issues.from.spinning.out.these.projects.as.their.own.communities*> > In this email: http://s.apache.org/rSm> > And in the 2 subsequent follow ons in that thread, I've outlined a process that I'll copy> through below for splitting these projects into their own TLPs:> > -----snip> Process: > > 0. [DISCUSS] thread for <TLP name> in which you talk about #1 and #2 below, potentially draft resolution too.> > 1. Decide on an initial set of *PMC* members. I urge each new TLP to adopt PMC==C. See reasons I've> already discussed.> > 2. Decide on a chair. Try not to VOTE for this explicitly, see if can be discussed and consensus > can be reached (just a thought experiment). VOTE if necessary.> > 3. [VOTE] thread for <TLP name>> > 4. Create Project:> a. paste resolution from #0 to board@ or;> b. go to general@incubator and start new Incubator project.> > 5. infrastructure set up.> MLs moving; new UNIX groups; website setup; > SVN setup like this:> > svn copy -m "MR TLP." https://svn.apache.org/repos/asf/hadoop/https://svn.apache.org/repos/asf/<insert cool MR name>; or > svn copy -m "YARN TLP." https://svn.apache.org/repos/asf/hadoop/https://svn.apache.org/repos/asf/<insert cool YARN name>; or> svn copy -m "HDFS TLP." https://svn.apache.org/repos/asf/hadoop/https://svn.apache.org/repos/asf/<insert cool HDFS name>> > After all 3 have been created run:> > svn remove -m "Remove Hadoop umbrella TLP. Split into separate projects." https://svn.apache.org/repos/asf/hadoop> > 6. (TLPs if 4a; Incubator podling if 4b;) proceed, collaborate, operate as distinct communities, and try to solve the code duplication/dependency> issues from there.> > 7. If 4b; then graduate as TLP from Incubator.> > -----snip> > So that's my proposal. > > Thanks guys.> > Cheers,> Chris> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++> Chris Mattmann, Ph.D.> Senior Computer Scientist> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA> Office: 171-266B, Mailstop: 171-246> Email: [EMAIL PROTECTED]> WWW: http://sunset.usc.edu/~mattmann/> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

IMO a pre-requisite to this is to figure out how we'll handle the following:

* Where does common stuff lives?* What are the public interfaces of each project (towards the other projects)?* How do we do development/releases? In tandem? Separate? How thiswill work in practice, currently we are constantly tweaking thingsinter-projects, sometimes in the same JIRAs, sometimes in follow upJIRAs.

Thoughts?

Thxs.

On Tue, Aug 28, 2012 at 7:33 PM, Mattmann, Chris A (388J)<[EMAIL PROTECTED]> wrote:> [decided to minimize traffic and to simply put this in one thread]>> Hi Guys,>> See the recent discussion on these threads:>> YARN as its own Hadoop "sub project": http://s.apache.org/WW1> Maintain a single committer list for the Hadoop project: http://s.apache.org/Owx>> ...and just pay attention to the Hadoop project over the last 3-4 years. It's operating> as a single project, that's masking separate communities that themselves are really> separate ASF projects.>> At the ASF, this has been a problem area called "umbrella" projects and over the years,> all I've seen from them is wasted bandwidth, artificial barriers and the inventions of> new ways to perform process mongering and to reduce the fun in developing software> at this fantastic foundation.>> I've talked about umbrella projects enough. We've diverted conversation enough.> Enough people have tried to act like there is some technical mumbo jumbo that is> preventing the eventual act of higher power that I myself hope comes should these> discussions prove unfruitful through normal means.>> *these. are. separate. projects.*> *there.are.not.blocker.issues.from.spinning.out.these.projects.as.their.own.communities*>> In this email: http://s.apache.org/rSm>> And in the 2 subsequent follow ons in that thread, I've outlined a process that I'll copy> through below for splitting these projects into their own TLPs:>> -----snip> Process:>> 0. [DISCUSS] thread for <TLP name> in which you talk about #1 and #2 below, potentially draft resolution too.>> 1. Decide on an initial set of *PMC* members. I urge each new TLP to adopt PMC==C. See reasons I've> already discussed.>> 2. Decide on a chair. Try not to VOTE for this explicitly, see if can be discussed and consensus> can be reached (just a thought experiment). VOTE if necessary.>> 3. [VOTE] thread for <TLP name>>> 4. Create Project:> a. paste resolution from #0 to board@ or;> b. go to general@incubator and start new Incubator project.>> 5. infrastructure set up.> MLs moving; new UNIX groups; website setup;> SVN setup like this:>> svn copy -m "MR TLP." https://svn.apache.org/repos/asf/hadoop/https://svn.apache.org/repos/asf/<insert cool MR name>; or> svn copy -m "YARN TLP." https://svn.apache.org/repos/asf/hadoop/https://svn.apache.org/repos/asf/<insert cool YARN name>; or> svn copy -m "HDFS TLP." https://svn.apache.org/repos/asf/hadoop/https://svn.apache.org/repos/asf/<insert cool HDFS name>>> After all 3 have been created run:>> svn remove -m "Remove Hadoop umbrella TLP. Split into separate projects." https://svn.apache.org/repos/asf/hadoop>> 6. (TLPs if 4a; Incubator podling if 4b;) proceed, collaborate, operate as distinct communities, and try to solve the code duplication/dependency> issues from there.>> 7. If 4b; then graduate as TLP from Incubator.>> -----snip>> So that's my proposal.>> Thanks guys.>> Cheers,> Chris>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++> Chris Mattmann, Ph.D.> Senior Computer Scientist> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA> Office: 171-266B, Mailstop: 171-246> Email: [EMAIL PROTECTED]> WWW: http://sunset.usc.edu/~mattmann/> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++> Adjunct Assistant Professor, Computer Science Department> University of Southern California, Los Angeles, CA 90089 USA

Alejandro

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

> > IMO a pre-requisite to this is to figure out how we'll handle the following:>

To be honest, I don't think any of the below are prereqs. They are technicalissues that can be dealt with post facto of just SVN copy'ing hadoop as it stands today per my SVN commands into each of the new TLPs and thenusing that as a starting point for doing the below, as part of the natural evolutionof the project code.

That being said, if I had to guess what the TLPs would do to address the belowonce they are created:

> * Where does common stuff lives?

This usually happens over time and depending on how often things release, and other things cited else-threads, and else-discussions over the past yearsin Hadoop. You guys clearly have a good handle on things like this.

I would just encourage the subsequent TLPs to not worry about doing everythingperfectly and to realize that if you start out with the same code base, you can selectivelyand then iteratively just make things more clean, refactored, and the answer to questionslike this will happen naturally during that evolution.

> * What are the public interfaces of each project (towards the other projects)?

This is something that each distinct community can answer once they are bootstrappedas TLPs. You can decide what portion of the code is really under charter and then workas a community to figure this out. Sorry I can't be more specific than that.

> * How do we do development/releases? In tandem? Separate?

In tandem across communities never really works. Releases should occur separately, percommunity and TLP, on their own schedule. Code that depends on other projects eitherhas to wait for those communities/TLPs/projects to fix things, or add new features, or whatever, or insulate, and keep the fixes locally in your project's SVN until those fixescan be pushed upstream, and included in the other communities releases, etc.

Ask yourself this. If you guys have a dependency on e.g., Tomcat, and there is some criticalbug or new feature you want in Tomcat, how would you deal with that? I would posit the sameway that you could deal with this situation. Keep the fix to Tomcat locally in your project; work to get that fix upstream and included in some subsequent Tomcat release, etc.

> How this> will work in practice, currently we are constantly tweaking things> inter-projects, sometimes in the same JIRAs, sometimes in follow up> JIRAs.

Technically you are doing that that, but community wise, it's not working out, and hasn'treally been working for years. I've been around Hadoop since its inception (I was a Nutchcommitter before Hadoop existed), and though it's been hugely successful, and really awesome and super great (congrats, everyone, BTW!), the community issues have alwayscropped up b/c it's one big huge umbrella project and that doesn't work at Apache.

I personally am for splitting up the projects. I think there is a lot ofpotential that each of the projects could have on their own, and I expectto see them evolve in new and interesting ways when the projects are nottied directly together.

But, in order to get there we need to address the issues that made thefirst split attempt fail. First off we need look at all API calls thatMR, YARN, or HDFS do into common that are not @Stable, and either promotethem to @Stable or remove the need for those calls. Second while we aredoing that we need to look at the visibility of those APIs. How many APIsreally need to be @LimitedPrivate or should they be @Public? How many ofthe APIs have no designation at all? Third get truly serious aboutmaintaining binary compatibility on @Stable APIs. Fourth we need to startsplitting the projects up, starting with common. I think it would be coolto call it liBig, but I digress. Once common has been split out and is onits own for a few releases, we start splitting out HDFS, YARN, andMapReduce. For each of those we need to do a similar audit between theprojects and fix the interdependencies between them. This is mostlydependencies between YARN and MR.

As part of this we also need to have a clear set of rules about what ittakes to become a committer or PMC member for the new projects when theysplit off. I am fine with all committers become PMC members, but if wemerge the lists now and simply say all pervious committers becomecommitters on the new TLPs there will be a lot of committers/PMC membersthat have no real desire to be on those projects. I would propose that wemerge the committer lists, but all committers on the current projectreceive an invitation to become a committer on the new projects. ATMconvinced me that committers know their boundaries and will self censor.I believe that many committers will decline to become committers on thenew projects either because it is out of their area of experteese orbecause they are not involved with Hadoop any more, and will ignore theinvitation.

I fear that just voting and doing an svn copy -m will result in the samething that happened last time. Someone will want to make a large change.This will require making a change to something in common, but because itcannot easily be done in a backwards compatible way, or it will take threesteps to complete the change instead of one we will get frustrated. Ifthis happens enough we will really get frustrated and try to merge theprojects back together again. This is because the projects are tootightly coupled together right now to really have them stand on their own. Just look at all of the security and token work that has been donerecently. They have touched every single project and it has been a bit ofa nightmare. It would be even worse if the projects were completely splitapart.

I also want us to think about the timing of this. Do we really want to dothis before 2.0 is GA? Doing this properly is probably going to be aseveral month effort for one or two people, and a concerted effort byeveryone not to break things while they work. If we have to rearchitectsomething so that the APIs can be marked stable it may be a lot longerthen that. Is it worth pushing the GA of 2.0 off by an entire quarter?For me I would say yes, but I know others have different opinions, anddifferent schedules.

@Chris,

I can see your desire to do the split now, and then deal with the falloutas we adapt to the changes. I think that would work assuming that we allare completely committed to making the changes necessary. But because weare having this discussion at all seems to indicate that we are not allcompletely committed to this, and I also feel that dealing with thefallout is going to take a lot longer if we don't try to address some ofthe problems up front. Putting on my Yahoo! Hat, I want to avoid as manyproblems and delays as I can, because my customers want a stable releaseof Hadoop the features that are in 2.0. The longer it is delayed thelonger we stay on branch-0.23. A one quarter delay because of this I amsure I can swing, more then that and I will start to get more pressure topull in new features which will probably mean that we then have to forkwhich is something that I really do not want to do.

So I am +1 on merging the committer list, and +1 splitting the projects.I would encourage us to at least do some planning and legwork up frontbefore splitting. I am even +1 for setting a deadline on which date svn-m will happen wether we are ready or not.On 8/28/12 10:50 PM, "Alejandro Abdelnur" <[EMAIL PROTECTED]> wrote:

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

> > IMO a pre-requisite to this is to figure out how we'll handle the following:> Good points - I'd recommend we keep Common and HDFS in the same project. Yes, MR/YARN will need some changes in Common occasionally, but core pieces like RPC have been maintained by HDFS folks over time anyway e.g. move to ProtoBufs were led by Sanjay, Suresh, Todd, Jitendra et al.

We can move SequenceFile into MR if necessary and keep same package names for compatibility.

We should, of course, stop tweaking things in different projects in the same jira - we've been reasonably good at not doing that.

Thoughts?

Arun

> * Where does common stuff lives?> * What are the public interfaces of each project (towards the other projects)?> * How do we do development/releases? In tandem? Separate? How this> will work in practice, currently we are constantly tweaking things> inter-projects, sometimes in the same JIRAs, sometimes in follow up> JIRAs.> > Thoughts?> > Thxs.> > On Tue, Aug 28, 2012 at 7:33 PM, Mattmann, Chris A (388J)> <[EMAIL PROTECTED]> wrote:>> [decided to minimize traffic and to simply put this in one thread]>> >> Hi Guys,>> >> See the recent discussion on these threads:>> >> YARN as its own Hadoop "sub project": http://s.apache.org/WW1>> Maintain a single committer list for the Hadoop project: http://s.apache.org/Owx>> >> ...and just pay attention to the Hadoop project over the last 3-4 years. It's operating>> as a single project, that's masking separate communities that themselves are really>> separate ASF projects.>> >> At the ASF, this has been a problem area called "umbrella" projects and over the years,>> all I've seen from them is wasted bandwidth, artificial barriers and the inventions of>> new ways to perform process mongering and to reduce the fun in developing software>> at this fantastic foundation.>> >> I've talked about umbrella projects enough. We've diverted conversation enough.>> Enough people have tried to act like there is some technical mumbo jumbo that is>> preventing the eventual act of higher power that I myself hope comes should these>> discussions prove unfruitful through normal means.>> >> *these. are. separate. projects.*>> *there.are.not.blocker.issues.from.spinning.out.these.projects.as.their.own.communities*>> >> In this email: http://s.apache.org/rSm>> >> And in the 2 subsequent follow ons in that thread, I've outlined a process that I'll copy>> through below for splitting these projects into their own TLPs:>> >> -----snip>> Process:>> >> 0. [DISCUSS] thread for <TLP name> in which you talk about #1 and #2 below, potentially draft resolution too.>> >> 1. Decide on an initial set of *PMC* members. I urge each new TLP to adopt PMC==C. See reasons I've>> already discussed.>> >> 2. Decide on a chair. Try not to VOTE for this explicitly, see if can be discussed and consensus>> can be reached (just a thought experiment). VOTE if necessary.>> >> 3. [VOTE] thread for <TLP name>>> >> 4. Create Project:>> a. paste resolution from #0 to board@ or;>> b. go to general@incubator and start new Incubator project.>> >> 5. infrastructure set up.>> MLs moving; new UNIX groups; website setup;>> SVN setup like this:>> >> svn copy -m "MR TLP." https://svn.apache.org/repos/asf/hadoop/https://svn.apache.org/repos/asf/<insert cool MR name>; or>> svn copy -m "YARN TLP." https://svn.apache.org/repos/asf/hadoop/https://svn.apache.org/repos/asf/<insert cool YARN name>; or>> svn copy -m "HDFS TLP." https://svn.apache.org/repos/asf/hadoop/https://svn.apache.org/repos/asf/<insert cool HDFS name>>> >> After all 3 have been created run:>> >> svn remove -m "Remove Hadoop umbrella TLP. Split into separate projects." https://svn.apache.org/repos/asf/hadoop>> >> 6. (TLPs if 4a; Incubator podling if 4b;) proceed, collaborate, operate as distinct communities, and try to solve the code duplication/dependency

I am +1 for splitting up the projects. This is the step in the rightdirection. There will be challenges along the way. I am confident we cansolve them.

Robert and Alejandro have brought up good questions. Here are my thoughts:- For first one or two releases all the projects can coordinate and do thereleases together. This should help simplify the immediate work needed.This should also help in us meeting the release timelines that we areworking towards. As the split makes progress, this cross projectcoordination will no longer be necessary. I volunteer to RM these releasesand do the needed co-ordination from HDFS.- As regards to APIs, currently we have LimitedPrivate APIs for relatedprojects. This has been used by HBase as well. We need to think about atimeline by when we can mark these APIs stable. They should remainLimitedPrivate. Any rare changes to APIs requires only co-ordination amongthe projects and no user applications (which we have not control over) isaffected.- I agree with Arun that the common can move with HDFS.

>> On Aug 28, 2012, at 8:50 PM, Alejandro Abdelnur wrote:>> > Chris, thanks for initiating the discussion.>> Likewise, thanks Chris!>> >> > IMO a pre-requisite to this is to figure out how we'll handle the> following:> >>>> Good points - I'd recommend we keep Common and HDFS in the same project.> Yes, MR/YARN will need some changes in Common occasionally, but core pieces> like RPC have been maintained by HDFS folks over time anyway e.g. move to> ProtoBufs were led by Sanjay, Suresh, Todd, Jitendra et al.>> We can move SequenceFile into MR if necessary and keep same package names> for compatibility.>> We should, of course, stop tweaking things in different projects in the> same jira - we've been reasonably good at not doing that.>> Thoughts?>> Arun>> > * Where does common stuff lives?> > * What are the public interfaces of each project (towards the other> projects)?> > * How do we do development/releases? In tandem? Separate? How this> > will work in practice, currently we are constantly tweaking things> > inter-projects, sometimes in the same JIRAs, sometimes in follow up> > JIRAs.> >> > Thoughts?> >> > Thxs.> >> > On Tue, Aug 28, 2012 at 7:33 PM, Mattmann, Chris A (388J)> > <[EMAIL PROTECTED]> wrote:> >> [decided to minimize traffic and to simply put this in one thread]> >>> >> Hi Guys,> >>> >> See the recent discussion on these threads:> >>> >> YARN as its own Hadoop "sub project": http://s.apache.org/WW1> >> Maintain a single committer list for the Hadoop project:> http://s.apache.org/Owx> >>> >> ...and just pay attention to the Hadoop project over the last 3-4> years. It's operating> >> as a single project, that's masking separate communities that> themselves are really> >> separate ASF projects.> >>> >> At the ASF, this has been a problem area called "umbrella" projects and> over the years,> >> all I've seen from them is wasted bandwidth, artificial barriers and> the inventions of> >> new ways to perform process mongering and to reduce the fun in> developing software> >> at this fantastic foundation.> >>> >> I've talked about umbrella projects enough. We've diverted conversation> enough.> >> Enough people have tried to act like there is some technical mumbo> jumbo that is> >> preventing the eventual act of higher power that I myself hope comes> should these> >> discussions prove unfruitful through normal means.> >>> >> *these. are. separate. projects.*> >>> *there.are.not.blocker.issues.from.spinning.out.these.projects.as.their.own.communities*> >>> >> In this email: http://s.apache.org/rSm> >>> >> And in the 2 subsequent follow ons in that thread, I've outlined a> process that I'll copy> >> through below for splitting these projects into their own TLPs:> >>> >> -----snip> >> Process:> >>> >> 0. [DISCUSS] thread for <TLP name> in which you talk about #1 and #2http://hortonworks.com/download/

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

> I am +1 for splitting up the projects. This is the step in the right> direction. There will be challenges along the way. I am confident we can> solve them.> > Robert and Alejandro have brought up good questions. Here are my thoughts:> - For first one or two releases all the projects can coordinate and do the> releases together. This should help simplify the immediate work needed.> This should also help in us meeting the release timelines that we are> working towards. As the split makes progress, this cross project> coordination will no longer be necessary. I volunteer to RM these releases> and do the needed co-ordination from HDFS.+1 seems like a reasonable first step. Thanks for volunteering Suresh.

> - As regards to APIs, currently we have LimitedPrivate APIs for related> projects. This has been used by HBase as well. We need to think about a> timeline by when we can mark these APIs stable. They should remain> LimitedPrivate. Any rare changes to APIs requires only co-ordination among> the projects and no user applications (which we have not control over) is> affected.

Agreed.

Arun

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

On Wed, Aug 29, 2012 at 10:02 AM, Suresh Srinivas<[EMAIL PROTECTED]> wrote:> - As regards to APIs, currently we have LimitedPrivate APIs for related> projects. This has been used by HBase as well. We need to think about a> timeline by when we can mark these APIs stable. They should remain> LimitedPrivate. Any rare changes to APIs requires only co-ordination among> the projects and no user applications (which we have not control over) is> affected.> - I agree with Arun that the common can move with HDFS.

So, this would mean that a bunch of common functionality needed byother TPLs (YARN, MR, HBASE) which is not required by HDFS will end upin HDFS. I'm not necessary against that but it should be wellunderstood/expected/accepted by HDFS TPL, right?

Thx

Alejandro

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

> I personally am for splitting up the projects. I think there is a lot of> potential that each of the projects could have on their own, and I expect> to see them evolve in new and interesting ways when the projects are not> tied directly together.> > But, in order to get there we need to address the issues that made the> first split attempt fail. [..snip..]

Sorry I snipped the above but mainly I just don't buy the argument thatthere is a bunch of technical things that *block* splitting the projects.

Today, right now, I could propose a new Incubator project, and call itBoooDoopADoop. I could add 5-7 (or 4) people that I think I would workwell with. I could invite others to join in the Incubator as part of theinitial PPMC list and committer list. We could write in our proposal that the existing Hadoop community is technically amazing, but over timehas been mired by a bunch of community issues and we'd like to takeour crack at the source code in a brand new Apache project calledBooDoopADoop.

Then for the code portion of the Incubator proposal, I could say, I willsvn copy all of Hadoop into BooDoopADoop and then start from there.

So, given that I could do that (as could others), I would also have to readily be prepared for the community bad-will and general ASF bad-will that may cause. It may not cause ASF bad-will, b/c in generalthe foundation doesn't care about competing projects or technologies.It does care about splintering communities and the like though. Moreover,beyond the Foundation concerns, I would also have to concern myselfwith pissing you guys off, and all the downstream organizations and companies and individuals that are part of the Hadoop ecosystem that may be pissed off about the way we injected code into BooDoopADoop. But again, nothing stopping me from doing that.

I'd like to point out in the above scenario, I don't have to worry aboutreleasing schedules, and this, or that, and the other. Or APIs, or whatever.I have BooDoopADoop, and so does the new community around it in theIncubator, and we simply "go". Then, if others upstream, or downstreamfind BooDoopADoop useful, they take it, and then incorporate it into their project. Perhaps Hadoop HDFS finds our improvements to BooDoopADoopand its distributed file system better and perhaps we did some Maven magicand made our jar file better or more attractive to use and it saved Hadoop HDFScoding, and time and whatever. So Hadoop HDFS integrates it.

See how this could work?

So, take me out of BooDoopADooop and replace that with the HadoopPMC, and the specific subsets of you guys that are actually really distinctPMC members of distinct communities living within the Hadoop ecosystem.Sure you want to technically work together on releases, and APIs, and whatever,but those are, *inter-community* issues, more so than *intra-community* acrossthe Foundation. Sure, it's good to try and coordinate, b/c you guys all have $dayjobs,and the software you build at those $dayjobs is contributed upstream into the ASF, and then others depend on it (and then others downstream of the ASF and even downstream of your companies, depend on it, and so on and so forth). However,as far as the foundation is concerned, communities, and projects (1:1 ideally) coordinate releases on an inter-community-level, not intra-*. the intra-* is usuallyjust icing and way more difficult.> > As part of this we also need to have a clear set of rules about what it> takes to become a committer or PMC member for the new projects when they> split off. I am fine with all committers become PMC members,

+1 me too, and your suggestion below about "if we merge..." is one optionto doing so. But there could be others and discussing them and putting them up on a list is probably a good idea.

I would honestly suggest someone(s) taking a stab at the lists of the newPMC members for the new TLPs and then putting something out there, and then -'ing people or adding them, as needed.

And yes, I fully agree, that the PMC lists should not simply be the full Hadoop PMC per new TLP -- then we've just replicated the inherentproblem 3x over instead of 1x over :)

However, I don't know the ins and outs enough of who those lists should be for HDFS, MR and YARN. I bet you guys do though, so someone, step upand throw something out there for others to shoot down....errr I mean improve! :)

[...snip...]

See my BooDoopADoop. I don't think that someone in new TLP X wantingto make a change in their copy of common will matter to TLP Y. It shouldn't.It *can*, over time, if there is coordination between X and Y, but it doesn'thave to. Get what I mean?

This is *not* a technical issue :) This is a community issue. It's independentof the technical issues. This is about how to fix the community issues.

But yes, if you guys want to release some upcoming version first or whateverfine, and dandy if the community agrees, but it shouldn't be a gate to fixingcommunity issues.

This happens in the Incubator all the time. The big question with a projectreleasing and then having a graduation VOTE near that release (before orafter) -- do we wait to graduate? I'm always a fan of just moving forward ongraduation b/c it's independent of the technical stuff.Dealing with Hadoop technical problems is probably not my forte anymore (if it ever was : ) ). I'm here as a Foundation member trying to helpwith the community problems.

In the end, forking is what you guys should do :) You should just do itat Apache. "Fork" the current Hadoop uber project into the actual communitiesthat actually exist. You can fork directly out as TLPs, or incubate the forks. But doing it here would be great :)Thanks for your thoughts Bobby. Hope that explains where I am comingfrom.

>>> > - I agree with Arun that the common can move with HDFS.>> So, this would mean that a bunch of common functionality needed by> other TPLs (YARN, MR, HBASE) which is not required by HDFS will end up> in HDFS. I'm not necessary against that but it should be well> understood/expected/accepted by HDFS TPL, right?>

RPC is the main common functionality (not used by HBase). Others are someutilities related to native i/o, Configuration and other helper utils.Other than RPC projects we can move utils specific to a project into thatproject. In some cases if there is code duplication, that is fine. We canmake a call on those on case by case basis.

> +1> > Over the course of this discussion I've become convinced it is time to split up Hadoop. Pig, Hive, Zookeeper, HBase and other Hadoop graduates all seem to have been plagued by fewer meta-discussions and bi-law fights., etc since they graduated from Hadoop. Board members have been advising us to do this for years. With 1.0 stable and 2.0 on the way, now seems like a good time to do it.> > With mavenization done and the advent of BigTop and multiple 3rd party hadoop distro packagers, there is little doubt that people concerned about consuming the work of the distinct projects will be able to get them to work together.> > > > On Aug 28, 2012, at 7:33 PM, Mattmann, Chris A (388J) wrote:> >> [decided to minimize traffic and to simply put this in one thread]>> >> Hi Guys,>> >> See the recent discussion on these threads:>> >> YARN as its own Hadoop "sub project": http://s.apache.org/WW1>> Maintain a single committer list for the Hadoop project: http://s.apache.org/Owx>> >> ...and just pay attention to the Hadoop project over the last 3-4 years. It's operating>> as a single project, that's masking separate communities that themselves are really>> separate ASF projects. >> >> At the ASF, this has been a problem area called "umbrella" projects and over the years, >> all I've seen from them is wasted bandwidth, artificial barriers and the inventions of >> new ways to perform process mongering and to reduce the fun in developing software>> at this fantastic foundation.>> >> I've talked about umbrella projects enough. We've diverted conversation enough.>> Enough people have tried to act like there is some technical mumbo jumbo that is>> preventing the eventual act of higher power that I myself hope comes should these>> discussions prove unfruitful through normal means. >> >> *these. are. separate. projects.*>> *there.are.not.blocker.issues.from.spinning.out.these.projects.as.their.own.communities*>> >> In this email: http://s.apache.org/rSm>> >> And in the 2 subsequent follow ons in that thread, I've outlined a process that I'll copy>> through below for splitting these projects into their own TLPs:>> >> -----snip>> Process: >> >> 0. [DISCUSS] thread for <TLP name> in which you talk about #1 and #2 below, potentially draft resolution too.>> >> 1. Decide on an initial set of *PMC* members. I urge each new TLP to adopt PMC==C. See reasons I've>> already discussed.>> >> 2. Decide on a chair. Try not to VOTE for this explicitly, see if can be discussed and consensus >> can be reached (just a thought experiment). VOTE if necessary.>> >> 3. [VOTE] thread for <TLP name>>> >> 4. Create Project:>> a. paste resolution from #0 to board@ or;>> b. go to general@incubator and start new Incubator project.>> >> 5. infrastructure set up.>> MLs moving; new UNIX groups; website setup; >> SVN setup like this:>> >> svn copy -m "MR TLP." https://svn.apache.org/repos/asf/hadoop/https://svn.apache.org/repos/asf/<insert cool MR name>; or >> svn copy -m "YARN TLP." https://svn.apache.org/repos/asf/hadoop/https://svn.apache.org/repos/asf/<insert cool YARN name>; or>> svn copy -m "HDFS TLP." https://svn.apache.org/repos/asf/hadoop/https://svn.apache.org/repos/asf/<insert cool HDFS name>>> >> After all 3 have been created run:>> >> svn remove -m "Remove Hadoop umbrella TLP. Split into separate projects." https://svn.apache.org/repos/asf/hadoop>> >> 6. (TLPs if 4a; Incubator podling if 4b;) proceed, collaborate, operate as distinct communities, and try to solve the code duplication/dependency>> issues from there.>> >> 7. If 4b; then graduate as TLP from Incubator.>> >> -----snip>> >> So that's my proposal. >> >> Thanks guys.>> >> Cheers,>> Chris>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>> Chris Mattmann, Ph.D.>> Senior Computer Scientist>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

On Wed, Aug 29, 2012 at 5:31 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote:>> On Aug 28, 2012, at 8:50 PM, Alejandro Abdelnur wrote:>>> Chris, thanks for initiating the discussion.>> Likewise, thanks Chris!>>>>> IMO a pre-requisite to this is to figure out how we'll handle the following:>>>>> Good points - I'd recommend we keep Common and HDFS in the same project.

That seems reasonable. The alternative would be to have a Common TLP,which we shouldn't necessarily dismiss, since more important than thesize of the codebase is that there's a community to support thecodebase, as there certainly is here. Having said that, a Common TLPlacks a clear 'mission' since it doesn't offer any standaloneservices. Also, it may diminish in utility over time if pieces aremoved into HDFS, MapReduce and YARN.

> Yes, MR/YARN will need some changes in Common occasionally, but core pieces like RPC have been maintained by HDFS folks over time anyway e.g. move to ProtoBufs were led by Sanjay, Suresh, Todd, Jitendra et al.

Does the work to use versioned protocol buffers for RPC mean thatdifferent releases of HDFS and MapReduce can work together yet? Ifnot, this is something we should be working towards (although thatshouldn't block a move to TLPs).

>> We can move SequenceFile into MR if necessary and keep same package names for compatibility.

There are also Hadoop tools like distcp, Hadoop archives, Streaming,etc, which should go with MapReduce.

Cheers,Tom

>> We should, of course, stop tweaking things in different projects in the same jira - we've been reasonably good at not doing that.>> Thoughts?>> Arun>>> * Where does common stuff lives?>> * What are the public interfaces of each project (towards the other projects)?>> * How do we do development/releases? In tandem? Separate? How this>> will work in practice, currently we are constantly tweaking things>> inter-projects, sometimes in the same JIRAs, sometimes in follow up>> JIRAs.>>>> Thoughts?>>>> Thxs.>>>> On Tue, Aug 28, 2012 at 7:33 PM, Mattmann, Chris A (388J)>> <[EMAIL PROTECTED]> wrote:>>> [decided to minimize traffic and to simply put this in one thread]>>>>>> Hi Guys,>>>>>> See the recent discussion on these threads:>>>>>> YARN as its own Hadoop "sub project": http://s.apache.org/WW1>>> Maintain a single committer list for the Hadoop project: http://s.apache.org/Owx>>>>>> ...and just pay attention to the Hadoop project over the last 3-4 years. It's operating>>> as a single project, that's masking separate communities that themselves are really>>> separate ASF projects.>>>>>> At the ASF, this has been a problem area called "umbrella" projects and over the years,>>> all I've seen from them is wasted bandwidth, artificial barriers and the inventions of>>> new ways to perform process mongering and to reduce the fun in developing software>>> at this fantastic foundation.>>>>>> I've talked about umbrella projects enough. We've diverted conversation enough.>>> Enough people have tried to act like there is some technical mumbo jumbo that is>>> preventing the eventual act of higher power that I myself hope comes should these>>> discussions prove unfruitful through normal means.>>>>>> *these. are. separate. projects.*>>> *there.are.not.blocker.issues.from.spinning.out.these.projects.as.their.own.communities*>>>>>> In this email: http://s.apache.org/rSm>>>>>> And in the 2 subsequent follow ons in that thread, I've outlined a process that I'll copy>>> through below for splitting these projects into their own TLPs:>>>>>> -----snip>>> Process:>>>>>> 0. [DISCUSS] thread for <TLP name> in which you talk about #1 and #2 below, potentially draft resolution too.>>>>>> 1. Decide on an initial set of *PMC* members. I urge each new TLP to adopt PMC==C. See reasons I've>>> already discussed.>>>>>> 2. Decide on a chair. Try not to VOTE for this explicitly, see if can be discussed and consensus>>> can be reached (just a thought experiment). VOTE if necessary.

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

> There are also Hadoop tools like distcp, Hadoop archives, Streaming,> etc, which should go with MapReduce.

Good point. I agree.

> The alternative would be to have a Common TLP,> which we shouldn't necessarily dismiss, since more important than the> size of the codebase is that there's a community to support the> codebase, as there certainly is here. I guess the question is who would want to be on that project? I don't think the current bundle of stuff in common would form a good kernel for a community. A lack of a coherent community for common has always been a problem with the project split IMO. I could see folks deciding that they were going to build a community around a really good RPC stack, or some other chunk of common, but frankly I think it it premature to do that. Proposals welcome of course, but I think the HDFS folks will want a copy of the RPC stuff in their project and most of the rest of the stuff in common is too small to merit a project and is more easily handled via duplication and then sorting it out / dead code elimination.

On Aug 29, 2012, at 10:30 AM, Tom White wrote:

> On Wed, Aug 29, 2012 at 5:31 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote:>> >> On Aug 28, 2012, at 8:50 PM, Alejandro Abdelnur wrote:>> >>> Chris, thanks for initiating the discussion.>> >> Likewise, thanks Chris!>> >>> >>> IMO a pre-requisite to this is to figure out how we'll handle the following:>>> >> >> >> Good points - I'd recommend we keep Common and HDFS in the same project.> > That seems reasonable. The alternative would be to have a Common TLP,> which we shouldn't necessarily dismiss, since more important than the> size of the codebase is that there's a community to support the> codebase, as there certainly is here. Having said that, a Common TLP> lacks a clear 'mission' since it doesn't offer any standalone> services. Also, it may diminish in utility over time if pieces are> moved into HDFS, MapReduce and YARN.> >> Yes, MR/YARN will need some changes in Common occasionally, but core pieces like RPC have been maintained by HDFS folks over time anyway e.g. move to ProtoBufs were led by Sanjay, Suresh, Todd, Jitendra et al.> > Does the work to use versioned protocol buffers for RPC mean that> different releases of HDFS and MapReduce can work together yet? If> not, this is something we should be working towards (although that> shouldn't block a move to TLPs).> >> >> We can move SequenceFile into MR if necessary and keep same package names for compatibility.> > There are also Hadoop tools like distcp, Hadoop archives, Streaming,> etc, which should go with MapReduce.> > Cheers,> Tom> >> >> We should, of course, stop tweaking things in different projects in the same jira - we've been reasonably good at not doing that.>> >> Thoughts?>> >> Arun>> >>> * Where does common stuff lives?>>> * What are the public interfaces of each project (towards the other projects)?>>> * How do we do development/releases? In tandem? Separate? How this>>> will work in practice, currently we are constantly tweaking things>>> inter-projects, sometimes in the same JIRAs, sometimes in follow up>>> JIRAs.>>> >>> Thoughts?>>> >>> Thxs.>>> >>> On Tue, Aug 28, 2012 at 7:33 PM, Mattmann, Chris A (388J)>>> <[EMAIL PROTECTED]> wrote:>>>> [decided to minimize traffic and to simply put this in one thread]>>>> >>>> Hi Guys,>>>> >>>> See the recent discussion on these threads:>>>> >>>> YARN as its own Hadoop "sub project": http://s.apache.org/WW1>>>> Maintain a single committer list for the Hadoop project: http://s.apache.org/Owx>>>> >>>> ...and just pay attention to the Hadoop project over the last 3-4 years. It's operating>>>> as a single project, that's masking separate communities that themselves are really>>>> separate ASF projects.>>>> >>>> At the ASF, this has been a problem area called "umbrella" projects and over the years,>>>> all I've seen from them is wasted bandwidth, artificial barriers and the inventions of

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

On Aug 29, 2012, at 10:04 AM, Arun C Murthy wrote:> On Aug 29, 2012, at 10:02 AM, Suresh Srinivas wrote:>> >> Robert and Alejandro have brought up good questions. Here are my thoughts:>> - For first one or two releases all the projects can coordinate and do the>> releases together. This should help simplify the immediate work needed.>> This should also help in us meeting the release timelines that we are>> working towards. As the split makes progress, this cross project>> coordination will no longer be necessary. I volunteer to RM these releases>> and do the needed co-ordination from HDFS.> > > +1 seems like a reasonable first step. Thanks for volunteering Suresh.

Also, I'd say we make at least 3-4 alpha/beta releases in this shape.

I volunteer to RM for MR/YARN releases and work with Suresh.

Arun

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

Hi Chris and all, Thanks for initiating the discussion. Can I say something in a prospective of contributor but not a committer or PMC member? First, I have a feeling that current hadoop project process is good for contributors to deliver a bug fix but not so easy to deliver a big feature. I have great experience in bug fixing work that can get quickly response from committers and checked in. However, I feel a little frustrated in delivering a feature (~5K LOC, very important for hadoop running well on virtualization infrastructure) across common, hdfs, map reduce and yarn. Firstly, you have to figure out different committers you should turn for help on each component, then convince them your ideas and work with them in reviewing and committing the code. Each committers should understand the completed story and learn the code pending on review as well as that already checked in. If some committers are super busy, then the feature looks like pending forever. Thus, due to my current experience, I may have to say this process is not so friendly to contributors who come from different organizations with different backgrounds but have the same wish to contribute more to Apache hadoop. Based on this, for spinning out hadoop sub-project to TLPs, I would glad to see we will have concisely committer list for each projects then committers can be more focus (more bandwidth may be?) and contributors can know who they should turn to get quick response and help there. On the other hand, I would concern it may take more complexity to dependencies for features that across sub-project today as you should figure out branches for each TLP but it is hard to estimate when code can come alive in each branch of TLP (may take the similar complexity to committers as well). I don't have many good suggestions but would be glad to see the process can be more smoothly for contributor's work no matter what decision we are making today. Just 2 cents.

...and just pay attention to the Hadoop project over the last 3-4 years. It's operatingas a single project, that's masking separate communities that themselves are reallyseparate ASF projects.

At the ASF, this has been a problem area called "umbrella" projects and over the years, all I've seen from them is wasted bandwidth, artificial barriers and the inventions of new ways to perform process mongering and to reduce the fun in developing softwareat this fantastic foundation.

I've talked about umbrella projects enough. We've diverted conversation enough.Enough people have tried to act like there is some technical mumbo jumbo that ispreventing the eventual act of higher power that I myself hope comes should thesediscussions prove unfruitful through normal means.

Another way around is to produce more than one common's artifacts thatwill provide some logic split for the downstream projects like MR, and so on.

Cos

On Wed, Aug 29, 2012 at 10:26AM, Suresh Srinivas wrote:> > > - I agree with Arun that the common can move with HDFS.> >> > So, this would mean that a bunch of common functionality needed by> > other TPLs (YARN, MR, HBASE) which is not required by HDFS will end up> > in HDFS. I'm not necessary against that but it should be well> > understood/expected/accepted by HDFS TPL, right?> >> > RPC is the main common functionality (not used by HBase). Others are some> utilities related to native i/o, Configuration and other helper utils.> Other than RPC projects we can move utils specific to a project into that> project. In some cases if there is code duplication, that is fine. We can> make a call on those on case by case basis.> > -- > http://hortonworks.com/download/

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

I think it makes sense to have Common live in HDFS at least for now,since it's at the bottom of the stack / dependency chain and it's codeis the most intertwined with common, and, per Arun, we tend to work oncommon stuff more than MR people. The HDFS project is really a lotmore than HDFS, eg has all the hadoop commands, non-HDFS file systemsource, etc but that seems like an OK starting point. We need tofigure out the committers and PMC though since the goal is to justhave the HDFS community (vs the current Hadoop people) but the projectwill contain non-HDFS stuff. I'd like to hear from the current Hadoopcommitters and PMC members that mostly work on MR and YARN - are youguys OK losing your current privileges on the HDFS repo? Otherwise wehaven't made much progress (ie HDFS still has multiple communities).

We also need to address the areas where it's not so cut and dry, egwhere there is a single Hadoop project:- The Hadoop trademark, assume this lives in the HDFS project if Common does?- The user community, eg the users lists that we *just* merged, shallwe still keep one list?- We should move the global stuff like "how to get started" docs toBigtop, which can point to individual projects resources- Hadoop 1.x is is maintenance mode, though it still actively getspatches so we need to consider it. The surgery necessary to split v1Hadoop is probably not suitable for a sustaining release and not worthit at this point in the lifetime of this branch. I assume the HDFSproject will then host the Hadoop 1.x branches? This implies onlymembers of the HDFS project can commit and release.

Thanks,Eli

On Tue, Aug 28, 2012 at 7:33 PM, Mattmann, Chris A (388J)<[EMAIL PROTECTED]> wrote:> [decided to minimize traffic and to simply put this in one thread]>> Hi Guys,>> See the recent discussion on these threads:>> YARN as its own Hadoop "sub project": http://s.apache.org/WW1> Maintain a single committer list for the Hadoop project: http://s.apache.org/Owx>> ...and just pay attention to the Hadoop project over the last 3-4 years. It's operating> as a single project, that's masking separate communities that themselves are really> separate ASF projects.>> At the ASF, this has been a problem area called "umbrella" projects and over the years,> all I've seen from them is wasted bandwidth, artificial barriers and the inventions of> new ways to perform process mongering and to reduce the fun in developing software> at this fantastic foundation.>> I've talked about umbrella projects enough. We've diverted conversation enough.> Enough people have tried to act like there is some technical mumbo jumbo that is> preventing the eventual act of higher power that I myself hope comes should these> discussions prove unfruitful through normal means.>> *these. are. separate. projects.*> *there.are.not.blocker.issues.from.spinning.out.these.projects.as.their.own.communities*>> In this email: http://s.apache.org/rSm>> And in the 2 subsequent follow ons in that thread, I've outlined a process that I'll copy> through below for splitting these projects into their own TLPs:>> -----snip> Process:>> 0. [DISCUSS] thread for <TLP name> in which you talk about #1 and #2 below, potentially draft resolution too.>> 1. Decide on an initial set of *PMC* members. I urge each new TLP to adopt PMC==C. See reasons I've> already discussed.>> 2. Decide on a chair. Try not to VOTE for this explicitly, see if can be discussed and consensus> can be reached (just a thought experiment). VOTE if necessary.>> 3. [VOTE] thread for <TLP name>>> 4. Create Project:> a. paste resolution from #0 to board@ or;> b. go to general@incubator and start new Incubator project.>> 5. infrastructure set up.> MLs moving; new UNIX groups; website setup;> SVN setup like this:>> svn copy -m "MR TLP." https://svn.apache.org/repos/asf/hadoop/https://svn.apache.org/repos/asf/<insert cool MR name>; or

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

On Wed, Aug 29, 2012 at 11:22 AM, Arun C Murthy <[EMAIL PROTECTED]> wrote:> On Aug 29, 2012, at 10:04 AM, Arun C Murthy wrote:>> On Aug 29, 2012, at 10:02 AM, Suresh Srinivas wrote:>>>>>> Robert and Alejandro have brought up good questions. Here are my thoughts:>>> - For first one or two releases all the projects can coordinate and do the>>> releases together. This should help simplify the immediate work needed.>>> This should also help in us meeting the release timelines that we are>>> working towards. As the split makes progress, this cross project>>> coordination will no longer be necessary. I volunteer to RM these releases>>> and do the needed co-ordination from HDFS.>>>>>> +1 seems like a reasonable first step. Thanks for volunteering Suresh.>> Also, I'd say we make at least 3-4 alpha/beta releases in this shape.>> I volunteer to RM for MR/YARN releases and work with Suresh.>

I volunteer to RM HDFS releases as well. I think we should coordinatereleases, but I don't think we should gate HDFS releases on MR andYARN releases, that will be one of the benefits of becoming a TLP.Unlike parts of MR and YARN, HDFS wasn't completely re-written and soshould be release on it's own cycle, eg I think we'll be able torelease a non-alpha / beta 2.0 much sooner than MR or YARN.

Thanks,Eli

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

Eric - I agree with Common being included in HDFS. That's what I meantby Common not having a clear enough mission to be a TLP by itself.

Arun - I'm happy to RM some of the upcoming MR releases too. Also tohelp out with the work on audience annotations and compatibility.

Cheers,Tom

On Wed, Aug 29, 2012 at 7:22 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote:> On Aug 29, 2012, at 10:04 AM, Arun C Murthy wrote:>> On Aug 29, 2012, at 10:02 AM, Suresh Srinivas wrote:>>>>>> Robert and Alejandro have brought up good questions. Here are my thoughts:>>> - For first one or two releases all the projects can coordinate and do the>>> releases together. This should help simplify the immediate work needed.>>> This should also help in us meeting the release timelines that we are>>> working towards. As the split makes progress, this cross project>>> coordination will no longer be necessary. I volunteer to RM these releases>>> and do the needed co-ordination from HDFS.>>>>>> +1 seems like a reasonable first step. Thanks for volunteering Suresh.>> Also, I'd say we make at least 3-4 alpha/beta releases in this shape.>> I volunteer to RM for MR/YARN releases and work with Suresh.>> Arun>

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

On Wed, Aug 29, 2012 at 1:34 PM, Tom White <[EMAIL PROTECTED]> wrote:> Eric - I agree with Common being included in HDFS. That's what I meant> by Common not having a clear enough mission to be a TLP by itself.>> Arun - I'm happy to RM some of the upcoming MR releases too. Also to> help out with the work on audience annotations and compatibility.>> Cheers,> Tom>> On Wed, Aug 29, 2012 at 7:22 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote:>> On Aug 29, 2012, at 10:04 AM, Arun C Murthy wrote:>>> On Aug 29, 2012, at 10:02 AM, Suresh Srinivas wrote:>>>>>>>> Robert and Alejandro have brought up good questions. Here are my thoughts:>>>> - For first one or two releases all the projects can coordinate and do the>>>> releases together. This should help simplify the immediate work needed.>>>> This should also help in us meeting the release timelines that we are>>>> working towards. As the split makes progress, this cross project>>>> coordination will no longer be necessary. I volunteer to RM these releases>>>> and do the needed co-ordination from HDFS.>>>>>>>>> +1 seems like a reasonable first step. Thanks for volunteering Suresh.>>>> Also, I'd say we make at least 3-4 alpha/beta releases in this shape.>>>> I volunteer to RM for MR/YARN releases and work with Suresh.>>>> Arun>>

-- Alejandro

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

The issues in our community, which I think Chris is referring to, donot generally revolve around project boundaries. It's not the casethat the HDFS community wants to go one way and the MR/YARN communitywants to go another, and we get into a conflict around it. If it were,then splitting into separate TLPs would make a ton of sense.

Instead, the issues are usually _within_ a component. So, if we splitinto 3 TLPs, then we'll just have 3 TLPs, each of which is just ascontentious as before.

Let's just embrace contention as a fact of life on a high-profilehigh-stakes project and get back to work.

I wasted nearly a month undoing the mess of the last attempt, and Idon't see why this time it would go any better. -1 from my perspectiveon splitting again at this point. Perhaps if we get to the point thatwe're never making cross-project commits it makes sense, but we're notthere still.

-Todd

On Wed, Aug 29, 2012 at 1:40 PM, Alejandro Abdelnur <[EMAIL PROTECTED]> wrote:> I volunteer to help cleanup/normalize Maven stuff.>> Thx>> On Wed, Aug 29, 2012 at 1:34 PM, Tom White <[EMAIL PROTECTED]> wrote:>> Eric - I agree with Common being included in HDFS. That's what I meant>> by Common not having a clear enough mission to be a TLP by itself.>>>> Arun - I'm happy to RM some of the upcoming MR releases too. Also to>> help out with the work on audience annotations and compatibility.>>>> Cheers,>> Tom>>>> On Wed, Aug 29, 2012 at 7:22 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote:>>> On Aug 29, 2012, at 10:04 AM, Arun C Murthy wrote:>>>> On Aug 29, 2012, at 10:02 AM, Suresh Srinivas wrote:>>>>>>>>>> Robert and Alejandro have brought up good questions. Here are my thoughts:>>>>> - For first one or two releases all the projects can coordinate and do the>>>>> releases together. This should help simplify the immediate work needed.>>>>> This should also help in us meeting the release timelines that we are>>>>> working towards. As the split makes progress, this cross project>>>>> coordination will no longer be necessary. I volunteer to RM these releases>>>>> and do the needed co-ordination from HDFS.>>>>>>>>>>>> +1 seems like a reasonable first step. Thanks for volunteering Suresh.>>>>>> Also, I'd say we make at least 3-4 alpha/beta releases in this shape.>>>>>> I volunteer to RM for MR/YARN releases and work with Suresh.>>>>>> Arun>>>>>>> --> Alejandro

-- Todd LipconSoftware Engineer, Cloudera

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

> > I think it makes sense to have Common live in HDFS at least for now,> since it's at the bottom of the stack / dependency chain and it's code> is the most intertwined with common, and, per Arun, we tend to work on> common stuff more than MR people. The HDFS project is really a lot> more than HDFS, eg has all the hadoop commands, non-HDFS file system> source, etc but that seems like an OK starting point. We need to> figure out the committers and PMC though since the goal is to just> have the HDFS community (vs the current Hadoop people) but the project> will contain non-HDFS stuff. I'd like to hear from the current Hadoop> committers and PMC members that mostly work on MR and YARN - are you> guys OK losing your current privileges on the HDFS repo?

Rather than ask the former question that way, I would just simply put upa list of proposed HDFS PMC folks (yes, I keep using PMC ^_^). Then,iterate on that.

> Otherwise we> haven't made much progress (ie HDFS still has multiple communities).

ACK.

> > We also need to address the areas where it's not so cut and dry, eg> where there is a single Hadoop project:> - The Hadoop trademark, assume this lives in the HDFS project if Common does?

Apache owns the Hadoop trademark, and the PMC helps to enforce it. Projectsdon't own trademarks.

> - The user community, eg the users lists that we *just* merged, shall> we still keep one list?

That's a good question -- maybe ask users to opt-in. Yes, this is intrusive, butI bet you'd find the real users of the specific projects if they have to resubscribe.Just my 2c.

> - We should move the global stuff like "how to get started" docs to> Bigtop, which can point to individual projects resources

Sounds cool to me.

> - Hadoop 1.x is is maintenance mode, though it still actively gets> patches so we need to consider it. The surgery necessary to split v1> Hadoop is probably not suitable for a sustaining release and not worth> it at this point in the lifetime of this branch. I assume the HDFS> project will then host the Hadoop 1.x branches? This implies only> members of the HDFS project can commit and release.

Arun, great work below. Concrete, and an actual proposal of PMC lists.

What do folks think?

Cheers,Chris

On Aug 29, 2012, at 11:48 AM, Arun C Murthy wrote:

> > On Aug 28, 2012, at 7:33 PM, Mattmann, Chris A (388J) wrote:> >> Process: >> >> 0. [DISCUSS] thread for <TLP name> in which you talk about #1 and #2 below, potentially draft resolution too.>> > > How about something like this... please provide your feedback.> > This is a very early draft, I'll post this on our wiki after discussion.> > ----> > Proposal: Apache Hadoop HDFS as a TLP> > I propose we graduate HDFS as a TLP named 'Apache Hadoop HDFS'. > > I think the simplest way is to have all existing HDFS committers be committers and PMC members of the new project. That list is found in the asf-authorization-template which has:> > hadoop-hdfs = acmurthy,atm,aw,boryas,cdouglas,cos,cutting,daryn,ddas,dhruba,eli,enis,eric14,eyang,gkesavan,hairong,harsh,jitendra,jghoman,johan,knoguchi,kzhang,lohit,mahadev,matei,mattf,molkov,nigel,omalley,ramya,rangadi,sharad,shv,sradia,stevel,suresh,szetszwo,tanping,todd,tomwhite,tucu,umamahesh,yhemanth,zshao> > > ----> > > Proposal: Apache Hadoop MapReduce as a TLP> > I propose we graduate MapReduce as a TLP named 'Apache Hadoop MapReduce'. > > I think the simplest way is to have all existing MR committers be committers and PMC members of the new project. That list is found in the asf-authorization-template which has:> > hadoop-mapreduce = acmurthy,amareshwari,amarrk,aw,bobby,cdouglas,cos,cutting,daryn,ddas,dhruba,enis,eric14,eyang,gkesavan,hairong,harsh,hitesh,jeagles,jitendra,jghoman,johan,kimballa,knoguchi,kzhang,llu,lohit,mahadev,matei,mattf,nigel,omalley,ramya,rangadi,ravigummadi,schen,sharad,shv,sradia,sreekanth,sseth,stevel,szetszwo,tgraves,todd,tomwhite,tucu,vinodkv,yhemanth,zshao> > > ----> > > Proposal: Apache Hadoop YARN as a TLP> > I propose we graduate YARN as a TLP named 'Apache Hadoop YARN'.> > I re-propose, based on the previous discussion that the YARN committer list and initial PMC list be:> > hadoop-yarn = acmurthy,cdouglas,ddas,hitesh,jeagles,llu,mahadev,sharad,sseth,tgraves,tomwhite,tucu,vinodkv> > ----> > Thoughts?> > thanks,> Arun> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++Chris Mattmann, Ph.D.Senior Computer ScientistNASA Jet Propulsion Laboratory Pasadena, CA 91109 USAOffice: 171-266B, Mailstop: 171-246Email: [EMAIL PROTECTED]WWW: http://sunset.usc.edu/~mattmann/++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++Adjunct Assistant Professor, Computer Science DepartmentUniversity of Southern California, Los Angeles, CA 90089 USA++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

On Wed, Aug 29, 2012 at 11:19PM, Mattmann, Chris A (388J) wrote:> Hi Eli,> > On Aug 29, 2012, at 11:41 AM, Eli Collins wrote:> > > Thanks for writing up a proposal Chris.> > NP.> > > > > I think it makes sense to have Common live in HDFS at least for now,> > since it's at the bottom of the stack / dependency chain and it's code> > is the most intertwined with common, and, per Arun, we tend to work on> > common stuff more than MR people. The HDFS project is really a lot> > more than HDFS, eg has all the hadoop commands, non-HDFS file system> > source, etc but that seems like an OK starting point. We need to> > figure out the committers and PMC though since the goal is to just> > have the HDFS community (vs the current Hadoop people) but the project> > will contain non-HDFS stuff. I'd like to hear from the current Hadoop> > committers and PMC members that mostly work on MR and YARN - are you> > guys OK losing your current privileges on the HDFS repo?> > Rather than ask the former question that way, I would just simply put up> a list of proposed HDFS PMC folks (yes, I keep using PMC ^_^). Then,> iterate on that.> > > Otherwise we> > haven't made much progress (ie HDFS still has multiple communities).> > ACK.> > > > > We also need to address the areas where it's not so cut and dry, eg> > where there is a single Hadoop project:> > - The Hadoop trademark, assume this lives in the HDFS project if Common does?> > Apache owns the Hadoop trademark, and the PMC helps to enforce it. Projects> don't own trademarks.> > > - The user community, eg the users lists that we *just* merged, shall> > we still keep one list?> > That's a good question -- maybe ask users to opt-in. Yes, this is intrusive, but> I bet you'd find the real users of the specific projects if they have to resubscribe.> Just my 2c.> > > - We should move the global stuff like "how to get started" docs to> > Bigtop, which can point to individual projects resources> > Sounds cool to me.> > > - Hadoop 1.x is is maintenance mode, though it still actively gets> > patches so we need to consider it. The surgery necessary to split v1> > Hadoop is probably not suitable for a sustaining release and not worth> > it at this point in the lifetime of this branch. I assume the HDFS> > project will then host the Hadoop 1.x branches? This implies only> > members of the HDFS project can commit and release.> > Why not put the 1.x stuff in Bigtop since it's global or whatever?

Wearing my BigTop hat now, I encourage this audience to rush something likethis to BigTop. If I am reading you correctly, you are asking BigTop to host1.x branches of Hadoop, aren't you? I don't see how it fits in there,actually. But this is a separate issue that needs to involve BigTop community.

> Have we not learned our lessons from the last attempts to split?> > The issues in our community, which I think Chris is referring to, do> not generally revolve around project boundaries. It's not the case> that the HDFS community wants to go one way and the MR/YARN community> wants to go another, and we get into a conflict around it. If it were,> then splitting into separate TLPs would make a ton of sense.

You're right, it's not project boundaries, it's poor community behavior, and general umbrella-project-ness.

One aspect I've seen is that exclusivity of allowing people to becomePMC members on the project, and the separation of PMC from C. Other things I've seen are the use of technical justifications or complexityissues as an excuse for the exclusivity, as an excuse for drawing boundariesbetween project committers and PMC members, and then between specificproducts that the project and community as a whole releases, and finallyother things I've seen include external interests influencing the way that business is done around here (need for releases in downstream companies,or projects driving upstream, Apache decisions, which are supposed to beindependent of any lone company, or set of companies -- it's individuals herethat do the work).

The above is not a discrete thing that's happened once, or twice, or thathappened three times, but was fixed later. It's never been fixed.

> > Instead, the issues are usually _within_ a component. So, if we split> into 3 TLPs, then we'll just have 3 TLPs, each of which is just as> contentious as before.

I doubt that. Creating TLPs either directly by going to the board, orvia going to the Incubator should involve a set of members of the committee (PMC) that desire to work together; that ideally trust one another; thatare inclusive to others who jump on the list and discuss things; and thatcollect these principles into the "Apache way", and build and deliver software atno cost to the public via this Foundation.

Currently, the Apache Hadoop project isn't doing that. Something needs to be done to fix it. Just because an attempt to split the projects in the pastdidn't work doesn't mean that the Hadoop community should just accept "this is a popular project; it's going to be contentious; nothing to see herefolks".

It's more than that.

> > Let's just embrace contention as a fact of life on a high-profile> high-stakes project and get back to work.

-1 to that. Apache projects shouldn't be contentious, whether you are a billion dollarindustry like Hadoop, or whether you are the US govt, or whether you are Joe Blow, Mom and Pop, building software to deliver to food truck vendors. It doesn't matter.Period.

> > I wasted nearly a month undoing the mess of the last attempt, and I> don't see why this time it would go any better. -1 from my perspective> on splitting again at this point. Perhaps if we get to the point that> we're never making cross-project commits it makes sense, but we're not> there still.

Again, technical issues cited for community problems. *there are not technical issues*.

>> Sounds cool to me.>> >>> - Hadoop 1.x is is maintenance mode, though it still actively gets>>> patches so we need to consider it. The surgery necessary to split v1>>> Hadoop is probably not suitable for a sustaining release and not worth>>> it at this point in the lifetime of this branch. I assume the HDFS>>> project will then host the Hadoop 1.x branches? This implies only>>> members of the HDFS project can commit and release.>> >> Why not put the 1.x stuff in Bigtop since it's global or whatever?> > Wearing my BigTop hat now, I encourage this audience to rush something like> this to BigTop. If I am reading you correctly, you are asking BigTop to host> 1.x branches of Hadoop, aren't you? I don't see how it fits in there,> actually. But this is a separate issue that needs to involve BigTop community.

Agreed that it would totally involve the BigTop community, and that that partis up to them. You guys would know this way better than me, so thanks formentioning this issue Cos. I just kinda threw this out there but it's not a blockerfor me -- whatever makes sense here and it's a good point raised by Eli that can probably be solved a number of different (easily solvable and documentable) ways :)

So I would propose:atm,daryn,ddas,eli,eyang,hairong,harsh,jitendra,mahadev,mattf,shv,sradia,stevel,suresh,szetszwo,todd,tomwhite,tucu,umamahesh

and listing the others as Emeritus, who could easily regain committerstatus if they started contributing again.

>>>>>> ---->>>>>> Proposal: Apache Hadoop MapReduce as a TLP>>>> I propose we graduate MapReduce as a TLP named 'Apache Hadoop MapReduce'.>>>> I think the simplest way is to have all existing MR committers be committers and PMC members of the new project. That list is found in the asf-authorization-template which has:>>>> hadoop-mapreduce = acmurthy,amareshwari,amarrk,aw,bobby,cdouglas,cos,cutting,daryn,ddas,dhruba,enis,eric14,eyang,gkesavan,hairong,harsh,hitesh,jeagles,jitendra,jghoman,johan,kimballa,knoguchi,kzhang,llu,lohit,mahadev,matei,mattf,nigel,omalley,ramya,rangadi,ravigummadi,schen,sharad,shv,sradia,sreekanth,sseth,stevel,szetszwo,tgraves,todd,tomwhite,tucu,vinodkv,yhemanth,zshao>>

On Wed, Aug 29, 2012 at 11:32PM, Mattmann, Chris A (388J) wrote:> Hi Cos,> > On Aug 29, 2012, at 4:27 PM, Konstantin Boudnik wrote:> > >> Sounds cool to me.> >> > >>> - Hadoop 1.x is is maintenance mode, though it still actively gets> >>> patches so we need to consider it. The surgery necessary to split v1> >>> Hadoop is probably not suitable for a sustaining release and not worth> >>> it at this point in the lifetime of this branch. I assume the HDFS> >>> project will then host the Hadoop 1.x branches? This implies only> >>> members of the HDFS project can commit and release.> >> > >> Why not put the 1.x stuff in Bigtop since it's global or whatever?> > > > Wearing my BigTop hat now, I encourage this audience to rush something like> > this to BigTop. If I am reading you correctly, you are asking BigTop to host> > 1.x branches of Hadoop, aren't you? I don't see how it fits in there,> > actually. But this is a separate issue that needs to involve BigTop community.> > Agreed that it would totally involve the BigTop community, and that that part> is up to them. You guys would know this way better than me, so thanks for> mentioning this issue Cos. I just kinda threw this out there but it's not a blocker

I think this might be a good idea really, but we need to think over, I willstart a thread on bigtop-dev@ to discuss what it means for us and how it canbe done.

> for me -- whatever makes sense here and it's a good point raised by Eli that > can probably be solved a number of different (easily solvable and documentable) ways :)

Exactly. As a general observation: there's always more than one solution forpeople who are willing to do stuff instead of pontificating.

No doubt there's bad behavior. But splitting into smaller projectswon't help anything. We'll still have the exact same behavior insidethe smaller projects.

>> One aspect I've seen is that exclusivity of allowing people to become> PMC members on the project, and the separation of PMC from C.> Other things I've seen are the use of technical justifications or complexity> issues as an excuse for the exclusivity, as an excuse for drawing boundaries> between project committers and PMC members, and then between specific> products that the project and community as a whole releases, and finally> other things I've seen include external interests influencing the way that> business is done around here (need for releases in downstream companies,> or projects driving upstream, Apache decisions, which are supposed to be> independent of any lone company, or set of companies -- it's individuals here> that do the work).>

It's individuals that do the work, but the individuals get paid bycompanies, so individuals acting in their best interests are going totend to align with their company. They also often know details abouttheir customer bases that they can't share directly, which can befrustrating, but it's a fact of life. I'm sure we'd see the same if wewere 20 independent consultants each with our own priorities, etc.

> The above is not a discrete thing that's happened once, or twice, or that> happened three times, but was fixed later. It's never been fixed.>

IMO it's massively improved since a couple years ago. We're makinggood progress on the 2.0 line, we no longer have divergent forks, andI haven't seen an issue get vetoed in recent memory. Please providesome recent examples where you think that splitting into smallergranularity projects would help anything.

>>>> Instead, the issues are usually _within_ a component. So, if we split>> into 3 TLPs, then we'll just have 3 TLPs, each of which is just as>> contentious as before.>> I doubt that. Creating TLPs either directly by going to the board, or> via going to the Incubator should involve a set of members of the> committee (PMC) that desire to work together; that ideally trust one another; that> are inclusive to others who jump on the list and discuss things; and that> collect these principles into the "Apache way", and build and deliver software at> no cost to the public via this Foundation.

Just because we argue doesn't mean we don't desire to work together.Smart passionate people will argue. I argue with my colleagues here atCloudera, I argue with Hortonworkers, and I argue with Facebookers -it doesn't really matter much. I still enjoy getting beers with themwhen I end up at conferences. No hard feelings, we're all adults,right?

>> Currently, the Apache Hadoop project isn't doing that. Something needs> to be done to fix it. Just because an attempt to split the projects in the past> didn't work doesn't mean that the Hadoop community should just accept> "this is a popular project; it's going to be contentious; nothing to see here> folks".

Again, please provide examples. From my vantage point, I see a lot ofprogress being made on critical features: we've done federation, HAnamenode, massive performance improvements, YARN, practicallyrewritten NameNode, and more in the last couple years. Hardly anunproductive community.

>> It's more than that.>>>>> Let's just embrace contention as a fact of life on a high-profile>> high-stakes project and get back to work.>> -1 to that. Apache projects shouldn't be contentious, whether you are a billion dollar> industry like Hadoop, or whether you are the US govt, or whether you are Joe Blow,> Mom and Pop, building software to deliver to food truck vendors. It doesn't matter.> Period.

I guess we'll have to agree to disagree....says the guy who isn't on the hook to stitch it all back togetherinto a deliverable for demanding customers, maintain green Jenkinsbuilds, etc. You can say these aren't technical issues, but if you'renot dealing with the project on a technical basis, I don't thinkyou're well qualified to judge. I certainly appreciate the work you'vedone way back in the Nutch days and your continued evangelism, butthis whole thread just seems like it's stirring up trouble and notgoing to accomplish anything except a bunch of wasted man-hours. (I'vealready wasted about 45 minutes today on it, oops!)

-ToddTodd LipconSoftware Engineer, Cloudera

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

> But I still think this discussion is silly, and we're not ready to do it.>

+1

Despite many allusions to problems that this project split proposal wouldpurport to solve, I honestly don't see the problems. Yes, Hadoop has hadcommunity problems in the past, but from my observation these have largelybeen addressed or are improving. We've been adding committers and PMCmembers, making more frequent releases, making sure that features show upon trunk first before other branches, generally been collaborating better,etc. Do we disagree from time to time? Sure. Are these disagreements acrossthe sub-project boundaries? Not in my experience. Given that, what _actualproblems_ will a project split solve?

I _do_ see plenty of problems that a project split would create, such asdifficulties with changes that span the projects, difficulties maintainingthe interfaces of code that's shared by the projects, difficulties of asplit user@ mailing list, etc. All of _these_ problems are well known to usfrom the previous "project split" which just split the mailing lists, coderepos, and issue trackers. In the last few months, we've thought better of2/3 of those decisions and actually merged back the repos and mailinglists. It's quite surprising to me to see many folks on this thread whosupported these merges actually being in favor of splitting them again.

Chris, you can dismissively say that these are "technical difficulties" butall of these problems directly impact the community as well. When theproject repos were split, I personally helped many struggling users justgetting their work environment set up to _compile_ the code. This was apain for everyone, so we undid it. When the lists were split, usersstruggled to know where they should email their questions, and there was alot of wasted effort telling folks to go ask this list or that. This was apain for everyone, so we undid it. I think both of these changes have beentremendous _positive_ impacts on the community, and the haste with whichwe're rushing to undo them is very surprising to me.

--Aaron T. MyersSoftware Engineer, Cloudera

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

> On Wed, Aug 29, 2012 at 4:29 PM, Mattmann, Chris A (388J)> <[EMAIL PROTECTED]> wrote:> >> You're right, it's not project boundaries, it's poor community behavior,>> and general umbrella-project-ness.> > No doubt there's bad behavior. But splitting into smaller projects> won't help anything. We'll still have the exact same behavior inside> the smaller projects.> >> [..snip...]> >> The above is not a discrete thing that's happened once, or twice, or that>> happened three times, but was fixed later. It's never been fixed.>> > > IMO it's massively improved since a couple years ago. We're making> good progress on the 2.0 line, we no longer have divergent forks, and> I haven't seen an issue get vetoed in recent memory. Please provide> some recent examples where you think that splitting into smaller> granularity projects would help anything.

Please provide examples that show umbrella projects work. I've beenat this Foundation a lot longer than you have. I've seen them not workand have been involved in ones that don't work. See splits from Lucene,the same threads (with different names, different products, different softwarebut the exact same issues). See your own splits from Hadoop cited elsethread.See the friggin' Apache board minutes discussing why umbrella projects are bad.

I don't know what else to tell you. I'm not going to go look up all the threads.I'm not Google nor do I care to. All I can say is that I've seen it before andso have others. In your own project.

> >>> >>> Instead, the issues are usually _within_ a component. So, if we split>>> into 3 TLPs, then we'll just have 3 TLPs, each of which is just as>>> contentious as before.>> >> I doubt that. Creating TLPs either directly by going to the board, or>> via going to the Incubator should involve a set of members of the>> committee (PMC) that desire to work together; that ideally trust one another; that>> are inclusive to others who jump on the list and discuss things; and that>> collect these principles into the "Apache way", and build and deliver software at>> no cost to the public via this Foundation.> > Just because we argue doesn't mean we don't desire to work together.> Smart passionate people will argue. I argue with my colleagues here at> Cloudera, I argue with Hortonworkers, and I argue with Facebookers -> it doesn't really matter much. I still enjoy getting beers with them> when I end up at conferences. No hard feelings, we're all adults,> right?

You still point to arguing to contention -- it's more than that Todd. The project'spolicies for inclusivity have nothing to do with arguing about technical issues.

> >> >> Currently, the Apache Hadoop project isn't doing that. Something needs>> to be done to fix it. Just because an attempt to split the projects in the past>> didn't work doesn't mean that the Hadoop community should just accept>> "this is a popular project; it's going to be contentious; nothing to see here>> folks".> > Again, please provide examples. From my vantage point, I see a lot of> progress being made on critical features: we've done federation, HA> namenode, massive performance improvements, YARN, practically> rewritten NameNode, and more in the last couple years. Hardly an> unproductive community.

Technical issues, again.

> [..snip..]> >>> >>> I wasted nearly a month undoing the mess of the last attempt, and I>>> don't see why this time it would go any better. -1 from my perspective>>> on splitting again at this point. Perhaps if we get to the point that>>> we're never making cross-project commits it makes sense, but we're not>>> there still.>> >> Again, technical issues cited for community problems. *there are not technical issues*.> > ...says the guy who isn't on the hook to stitch it all back together> into a deliverable for demanding customers, maintain green Jenkins

Dude, you have to do that regardless, that has nothing to do with *Apache Hadoop*.Take your Cloudera hat off and put your *Apache Software Foundation* hat on. Is your#1 priority developing software here to stitch code back together, turn it into a deliverablefor your customers (I'm guessing Cloudera customers, right? B/c Apache doesn't havespecific customers?) and to maintain green Jenkins builds?

Also tell me how the 4 SVN commands I suggested will stop you from doing the above?At Apache?

At Cloudera, tell me also how it will stop you?I think you can quote me several times in this same thread and else-thread sayingI'm not technically astute with Hadoop anymore :) Admitted.

However, I *am* astute with the aspects of this Software Foundation.

You had fun during those 45 mins don't lie :)

P.S. I appreciate you and am still one of your biggest fans. Just trying to help you see the bigger picture here and to wear your Apache hat.

The code bases are tightly intertwined. We pulled out Pig/Hive/HBasebecause they were substantial codebases that didn't share much codewith the rest, and thus could reasonably be expected to releaseindependently.

We could get HDFS and MR to that point, but we haven't yet, becausethey rely so much on Common.

If we copy-paste forked Common, we'd be doubling our maintenance workon this shared code. We basically did this with the IPC code forHBase, and then we had double the work to protobuf-ify both HBase andHDFS/MR earlier this year. I know because I spent a bunch of hours onboth.

> I've been> at this Foundation a lot longer than you have. I've seen them not work> and have been involved in ones that don't work. See splits from Lucene,> the same threads (with different names, different products, different software> but the exact same issues). See your own splits from Hadoop cited elsethread.> See the friggin' Apache board minutes discussing why umbrella projects> are bad.>> I don't know what else to tell you. I'm not going to go look up all the threads.> I'm not Google nor do I care to. All I can say is that I've seen it before and> so have others. In your own project.>

What's one concrete example of where it would be better if we split? Ican't think of any. We'd still have competing interests in HDFS, andwe'd still get in the same arguments.

To say that all ASF projects should work the same seems pretty bizarreto me. The ASF provides license protection, infrastructure, and a setof guidelines for what makes successful projects. But I don't think itis the foundation's place to dictate what its projects should do "fromabove" if the projects themselves do not see a problem.

If the project is so messed up, then maybe some folks should fork itinto the incubator like you've suggested? What's wrong with theanarchic "let the best project succeed" philosophy, which I've alsoheard from Apache?

> You still point to arguing to contention -- it's more than that Todd. The project's> policies for inclusivity have nothing to do with arguing about technical issues.

I'm absolutely for meritocracy. I just have a high bar for what shouldbe considered "merit". Perhaps the PMC as a whole has a high bar. Fora system that stores my data, I'm pretty happy about that.

>> Dude, you have to do that regardless, that has nothing to do with *Apache Hadoop*.> Take your Cloudera hat off and put your *Apache Software Foundation* hat on. Is your> #1 priority developing software here to stitch code back together, turn it into a deliverable> for your customers (I'm guessing Cloudera customers, right? B/c Apache doesn't have> specific customers?) and to maintain green Jenkins builds?

Yes? I think so? If we do a bad release and it loses substantial data,our user base would disappear quite quickly.

>> Also tell me how the 4 SVN commands I suggested will stop you from doing the above?> At Apache?

If the projects are on separate release schedules, this means thatcross-project changes have to be staged across the projects in such away that neither project breaks in the interim. All of our internalAPIs become public APIs. We worked like this for around a year duringthe "project split" period. It was super complicated and our buildswere often red, we wasted a lot of time, and new users couldn't figureout how to contribute.

In the absense of a reasonable *technical* strategy to releaseindependently, and a lot of work to stabilize internal APIs aroundsecurity and IPC in particular, doing it again would cause the sameproblems it caused the first time.

It also makes the users' lives much more difficult, or forces them toonly consume via downstream packagers. Earlier in this thread, youseemed to think that downstream packagers indicated an issue with thecommunity: fracturing the releases would only serve to make the ASFdownload page even less useful for someone who just wants to get goingfast.If the projects were on different release schedules, then we'd be morelikely to have to do a lot of local patching to get stuff to "fittogether" right. Version compatibility is a difficult problem - itmultiplies the QA matrix, complicates deployment, etc. It's notinsurmountable, but unless there's something to be gained (what is it,again, that you think we'd gain, specifically?) I don't see why we'dtake this additional hassle.Thanks for that. As for Apache vs Cloudera hat: I think they're wellaligned here. Both hats want the project to be easy for people tocontribute to, and want to avoid a bunch of wasted time spent on newtechnical issues that this would create. I want to spend that timemaking the product better, for our users benefit. Whether the usersare Apache community users, or Cloudera customers, or Facebook's datascientists, they all are going to be happier if I spend a monthimproving our HA support compared to spending a month figuring out howto release three separate projects which somehow stitch together in areasonable way at runtime without jar conflicts, tons of duplicateconfiguration work, byzantine version dependencies, etc.

-ToddTodd LipconSoftware Engineer, Cloudera

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

Who's "we"? You? Would you expect to be a PMC member/committer in all split projects?

Also, are you the only person working on the project? And the "we" wouldinclude others, right? Who may or may not be committers on the other projects?

I'm not proposing SVN copy and then all PMC members x N projects. Figure out who are on the PMCs for the distinct communities that are operatingon this hydra.

>> >> I don't know what else to tell you. I'm not going to go look up all the threads.>> I'm not Google nor do I care to. All I can say is that I've seen it before and>> so have others. In your own project.>> > > What's one concrete example of where it would be better if we split?

Training off bad community practices is difficult, I'll agree with you on that.Hopefully if these new projects went the Incubator route, you could getsome other fuddy duddy's like me that have been around and seen a lotat the Foundation helping the new projects really understand the communityaspects.

> > To say that all ASF projects should work the same seems pretty bizarre> to me.

Please show me where I said the above sentence?

> The ASF provides license protection, infrastructure, and a set> of guidelines for what makes successful projects.

> But I don't think it> is the foundation's place to dictate what its projects should do "from> above" if the projects themselves do not see a problem.

No, but it's the Foundation's (and its members) responsibility to ensurethat its projects are behaving in that loosely coupled set of principles and guidelines that we call the Apache way. Apache Hadoop is doinggreat technically. Not so sure about the Apache way part.

> > If the project is so messed up, then maybe some folks should fork it> into the incubator like you've suggested? What's wrong with the> anarchic "let the best project succeed" philosophy, which I've also> heard from Apache?

Yeah I proposed that too. We'll see if it happens. Concretely, I think allof the current Hadoop "sub projects" should take a spin through the Incubatorand see how they are doing as projects. If nothing is afoul, I'm sure it wouldbe a pretty quick process, right? Add new some PPMC members/committers,make a release or two, make sure all software is ALv2 and compat. You guysare already doing that, right?

> >> You still point to arguing to contention -- it's more than that Todd. The project's>> policies for inclusivity have nothing to do with arguing about technical issues.> > I'm absolutely for meritocracy. I just have a high bar for what should> be considered "merit". Perhaps the PMC as a whole has a high bar. For> a system that stores my data, I'm pretty happy about that.

You won't be pretty happy about it when your high bar leaves you as one of theonly people int he world maintaining a 100M line code base. Especially as youget older, have kids (or not), have a family, go on to do even bigger and betterthings, and care even less about reading emails like this.

You're going to see eventually (as will others) that the way that you growaround this Foundation (and in software in general) is to teach others howto do your job, and to attract people to your project, and not to shoo them away with exclusivity. You call it a "high bar" to "protect your data". I call it "enjoy maintaining the software forever and never taking a vacation". It's called scalability Todd. Of course, because 1 release kills a project right? And of course there weren't 30 some oddreleases before that one bad one that someone could roll back to, right? Huh??Because this is what happens with Tomcat, or whatever other dependenciesyou guys have in your modularized project right? You guys call up the TomcatPMC whenever there is a release and make sure that your Hadoop specific need is included in it right? Or that they include some bug fix that you really need?

C'mon, you know that's not the way stuff works. It's called insulation.I agree there should be a plan to technically work to make sure theindependent TLPs (or podlings->TLPs eventually whatever) sync upor line up -- that would be ideal. What if it doesn't happen? Will the worldend? Probably not. Because there are good people hanging aroundthat will get stuff done and make sure new TLP software foo bar technically works great as they have always done.No it doesn't. That's orthogonal?Nah, I was talking about downstream "companies" and their interests, not packagers.Why is that? Isn't that what *Apache* Big Top (incubating) is for (which also has an*Apache* download page?).+1, this could be the case.Yep agree. As for the gain, I think what you'd gain is less arguments about who to add to thePMC, how to add them, less maintenance of lame ASF authorization templateswithin *the same project*, less meta-discussions, and company politic spillover, and hopefully more beer to be shared by all.

Note, I said *I think*. I'm only truly physic sometimes.That's a fair statement Todd. But that's why it's not Apache Todd, orApache Todooop. And why there are others at the Foundation, that you have to rely on, others within your project that you have to rely on, and why not everyone has the same interests. Some people'

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

> Hi Cos,> > On Aug 29, 2012, at 4:27 PM, Konstantin Boudnik wrote:> >>> Sounds cool to me.>>> >>>> - Hadoop 1.x is is maintenance mode, though it still actively gets>>>> patches so we need to consider it. The surgery necessary to split v1>>>> Hadoop is probably not suitable for a sustaining release and not worth>>>> it at this point in the lifetime of this branch. I assume the HDFS>>>> project will then host the Hadoop 1.x branches? This implies only>>>> members of the HDFS project can commit and release.>>> >>> Why not put the 1.x stuff in Bigtop since it's global or whatever?>> >> Wearing my BigTop hat now, I encourage this audience to rush something like>> this to BigTop. If I am reading you correctly, you are asking BigTop to host>> 1.x branches of Hadoop, aren't you? I don't see how it fits in there,>> actually. But this is a separate issue that needs to involve BigTop community.> > Agreed that it would totally involve the BigTop community, and that that part> is up to them. You guys would know this way better than me, so thanks for> mentioning this issue Cos. I just kinda threw this out there but it's not a blocker> for me -- whatever makes sense here and it's a good point raised by Eli that > can probably be solved a number of different (easily solvable and documentable) ways :)

I agree with Eli we can solve it number of ways within the new TLPs - I'm also pretty it doesn't make sense to involve BigTop. I'd rather not waste bandwidth on that alley.

thanks,Arun

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

> On Wed, Aug 29, 2012 at 4:47 PM, Konstantin Boudnik <[EMAIL PROTECTED]> wrote:>> I am curious where the arbitrar numbery 5 is coming from: is it reflected in>> the bylaws?> > Nope, I picked it based on Arun's earlier picking of the same number> in the YARN thread. We have no bylaws about what would happen in the> eventual TLP-ification of subcomponents, of course.

I'm sure you just missed it - but, I want to set the record straight: I picked 20+ patch contributions or 10+ review/commits since *project inception*.Your pick seems to be just commits in last 12 months. I have put forth one, please put forth another proposal if you like. However, please, do include patches, not just commits.

For e.g. I'd propose we add llu@ for HDFS since he's done a ton of work on metrics2 recently. My bad for missing that initially - apologies Luke. I might have missed more, pls ping me or add yourself. I've put my proposal up on http://wiki.apache.org/hadoop/HDFS_MR_YARN_TLP_Proposal.

We could also revisit issues like emeritus after the split to allow each project to figure it's own norms - I'd urge for that option.

thanks,Arun

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

On Wed, Aug 29, 2012 at 04:44PM, Todd Lipcon wrote:> On Wed, Aug 29, 2012 at 4:29 PM, Mattmann, Chris A (388J)> <[EMAIL PROTECTED]> wrote:> ...> > I doubt that. Creating TLPs either directly by going to the board, or> > via going to the Incubator should involve a set of members of the> > committee (PMC) that desire to work together; that ideally trust one another; that> > are inclusive to others who jump on the list and discuss things; and that> > collect these principles into the "Apache way", and build and deliver software at> > no cost to the public via this Foundation.> > Just because we argue doesn't mean we don't desire to work together.> Smart passionate people will argue. I argue with my colleagues here at> Cloudera, I argue with Hortonworkers, and I argue with Facebookers -> it doesn't really matter much. I still enjoy getting beers with them> when I end up at conferences. No hard feelings, we're all adults,> right?

(sorry for snipping...)

That's a truly amazing, Todd, and you certainly are lucky to be working insuch a great environment!

(the following isn't a stab at you, personally, so please don't get it that way)I was "terminated" from my previous job because I was expressing my opinionson this list all too freely. And the said opinions happened to be misalignedwith the "official party line" of my then-employer. Or was it because myopinions were hurting somebody else, that my employer didn't want to piss offat the time? Hmm... does my memory getting vague? Hardly so. And it's exactlywhen I have added the disclaimer below to my apache email account's signature.But no hard feeling, I guess, right?

So let's put it straight - politics is hurting this community but in thepursuit of the 'best interest' haven't we became a bit too complacent?

Is there a way around it? I am sure there is!Will we find that way? Only time will tell, I guess.

Regards, Cos2CAC 8312 4870 D885 8616 6115 220F 6980 1F27 E622

Disclaimer: Opinions expressed in this email are those of the author, and donot necessarily represent the views of any company the author might beaffiliated with at the moment of writing.

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

On Wed, Aug 29, 2012 at 4:19 PM, Mattmann, Chris A (388J)<[EMAIL PROTECTED]> wrote:> Hi Eli,>> On Aug 29, 2012, at 11:41 AM, Eli Collins wrote:>>> Thanks for writing up a proposal Chris.>> NP.>>>>> I think it makes sense to have Common live in HDFS at least for now,>> since it's at the bottom of the stack / dependency chain and it's code>> is the most intertwined with common, and, per Arun, we tend to work on>> common stuff more than MR people. The HDFS project is really a lot>> more than HDFS, eg has all the hadoop commands, non-HDFS file system>> source, etc but that seems like an OK starting point. We need to>> figure out the committers and PMC though since the goal is to just>> have the HDFS community (vs the current Hadoop people) but the project>> will contain non-HDFS stuff. I'd like to hear from the current Hadoop>> committers and PMC members that mostly work on MR and YARN - are you>> guys OK losing your current privileges on the HDFS repo?>> Rather than ask the former question that way, I would just simply put up> a list of proposed HDFS PMC folks (yes, I keep using PMC ^_^). Then,> iterate on that.>>> Otherwise we>> haven't made much progress (ie HDFS still has multiple communities).>> ACK.>>>>> We also need to address the areas where it's not so cut and dry, eg>> where there is a single Hadoop project:>> - The Hadoop trademark, assume this lives in the HDFS project if Common does?>> Apache owns the Hadoop trademark, and the PMC helps to enforce it. Projects> don't own trademarks.

But which PMC does "the PMC" refer to though given that there is nolonger a Hadoop PMC?

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

On Wed, Aug 29, 2012 at 4:20 PM, Mattmann, Chris A (388J)<[EMAIL PROTECTED]> wrote:> Arun, great work below. Concrete, and an actual proposal of PMC lists.>> What do folks think?

I don't see how it helps. This substantially *increases* the size ofthe PMC for HDFS, I don't even recognize a bunch of names on thislist. Unless we're actually going to try to make the HDFS projectrepresent the people who actually contribute and run the project we'rejust replicating the current situation across 3 projects. 5+ hdfspatches in the last year seems like a pretty low bar to me.

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

>>> [..snip..]>>> - The Hadoop trademark, assume this lives in the HDFS project if Common does?>> >> Apache owns the Hadoop trademark, and the PMC helps to enforce it. Projects>> don't own trademarks.> > But which PMC does "the PMC" refer to though given that there is no> longer a Hadoop PMC?

Probably the collective set of PMCs that are created, along with trademarks@, and along with other members of the Foundation.

On Wed, Aug 29, 2012 at 11:06 PM, Mattmann, Chris A (388J)<[EMAIL PROTECTED]> wrote:> Hey Eli,>> On Aug 29, 2012, at 10:38 PM, Eli Collins wrote:>>>>> [..snip..]>>>> - The Hadoop trademark, assume this lives in the HDFS project if Common does?>>>>>> Apache owns the Hadoop trademark, and the PMC helps to enforce it. Projects>>> don't own trademarks.>>>> But which PMC does "the PMC" refer to though given that there is no>> longer a Hadoop PMC?>> Probably the collective set of PMCs that are created, along with trademarks@,> and along with other members of the Foundation.>

But what are we enforcing as the "Hadoop" trademark if there is nolonger a Hadoop product release?

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

> On Wed, Aug 29, 2012 at 4:20 PM, Mattmann, Chris A (388J)> <[EMAIL PROTECTED]> wrote:>> Arun, great work below. Concrete, and an actual proposal of PMC lists.>> >> What do folks think?> > I don't see how it helps. This substantially *increases* the size of> the PMC for HDFS, I don't even recognize a bunch of names on this> list. Unless we're actually going to try to make the HDFS project> represent the people who actually contribute and run the project we're> just replicating the current situation across 3 projects. 5+ hdfs> patches in the last year seems like a pretty low bar to me.Fine. Could you please provide us with an alternate for consideration?

thanks,Arun

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

> On Wed, Aug 29, 2012 at 11:06 PM, Mattmann, Chris A (388J)> <[EMAIL PROTECTED]> wrote:>> Hey Eli,>> >> On Aug 29, 2012, at 10:38 PM, Eli Collins wrote:>> >>>>> [..snip..]>>>>> - The Hadoop trademark, assume this lives in the HDFS project if Common does?>>>> >>>> Apache owns the Hadoop trademark, and the PMC helps to enforce it. Projects>>>> don't own trademarks.>>> >>> But which PMC does "the PMC" refer to though given that there is no>>> longer a Hadoop PMC?>> >> Probably the collective set of PMCs that are created, along with trademarks@,>> and along with other members of the Foundation.>> > > But what are we enforcing as the "Hadoop" trademark if there is no> longer a Hadoop product release?

Well Hadoop as a trademark, registered by the ASF, will remain. It doesn't go away,whether there is an explicit Hadoop TLP or product that TLP releases or not. I'd imagine as a PMC member onceon the Hadoop TLP before it went away, you could choose to enforce the Hadoop trademarks by working withtrademarks@ in the same way that you currently do, or don't, or whatever.

And "enforce" is a loose word, since everyone's idea of "enforce" with respect to ApachePMCs and trademarks and so forth somewhat differs.

> Have we not learned our lessons from the last attempts to split?>> Let's just embrace contention as a fact of life on a high-profile> high-stakes project and get back to work.>>+1.

Having me worked and wasted cycles on project split earlier, I agree withTodd.

IMO these are not matured enough to fly off independently and to make thathappen needs good amount of upfront investment in terms ofbuild/repo/jira/wiki/mailing lists and then recurring pain till the pointthe interfaces are stabilized, duplicate code, cross stack testing and whatnot. It is a big mess with very little gain. There are much more pressingproblems to solve and TLP for each of these projects is not just worth it.

We are making great progress in terms of Hadoop 1.0 and 2.0. I believe weshould not derail these efforts unnecessarily.

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

On Wed, Aug 29, 2012 at 11:31 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote:>> On Aug 29, 2012, at 10:46 PM, Eli Collins wrote:>>> On Wed, Aug 29, 2012 at 4:20 PM, Mattmann, Chris A (388J)>> <[EMAIL PROTECTED]> wrote:>>> Arun, great work below. Concrete, and an actual proposal of PMC lists.>>>>>> What do folks think?>>>> I don't see how it helps. This substantially *increases* the size of>> the PMC for HDFS, I don't even recognize a bunch of names on this>> list. Unless we're actually going to try to make the HDFS project>> represent the people who actually contribute and run the project we're>> just replicating the current situation across 3 projects. 5+ hdfs>> patches in the last year seems like a pretty low bar to me.>>> Fine. Could you please provide us with an alternate for consideration?>

Todd's list seems more in line with the goal of reducing projectmembers to reflect the actual community.

I see Chris' point about the community issues, however I also seeTodd's point that splitting the projects does not address these issueswhile bringing real overhead and rolling back things we've donerecently to un-split the projects (per the vote thread I'm in favor ofcombining the committer lists even if we later split projects). Inshort, I'm open to a project split and willing to discuss, I don't yetsee sufficient benefits to provide a concrete proposal myself.

Thanks,Eli

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

I'm for the split only after we sort out how to deal with thetechnical issues mentioned in this thread. IMO, unless we have a clearplan/understanding for them, this split will go sour from a technicalpoint.

Chris, I know you disagree on this, but given the current state of thecode/interface I think this is a blocker for the split.

Thx

On Thu, Aug 30, 2012 at 12:02 AM, Eli Collins <[EMAIL PROTECTED]> wrote:> On Wed, Aug 29, 2012 at 11:31 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote:>>>> On Aug 29, 2012, at 10:46 PM, Eli Collins wrote:>>>>> On Wed, Aug 29, 2012 at 4:20 PM, Mattmann, Chris A (388J)>>> <[EMAIL PROTECTED]> wrote:>>>> Arun, great work below. Concrete, and an actual proposal of PMC lists.>>>>>>>> What do folks think?>>>>>> I don't see how it helps. This substantially *increases* the size of>>> the PMC for HDFS, I don't even recognize a bunch of names on this>>> list. Unless we're actually going to try to make the HDFS project>>> represent the people who actually contribute and run the project we're>>> just replicating the current situation across 3 projects. 5+ hdfs>>> patches in the last year seems like a pretty low bar to me.>>>>>> Fine. Could you please provide us with an alternate for consideration?>>>> Todd's list seems more in line with the goal of reducing project> members to reflect the actual community.>> I see Chris' point about the community issues, however I also see> Todd's point that splitting the projects does not address these issues> while bringing real overhead and rolling back things we've done> recently to un-split the projects (per the vote thread I'm in favor of> combining the committer lists even if we later split projects). In> short, I'm open to a project split and willing to discuss, I don't yet> see sufficient benefits to provide a concrete proposal myself.>> Thanks,> Eli

-- Alejandro

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

On Wed, Aug 29, 2012 at 11:31 PM, Mattmann, Chris A (388J)<[EMAIL PROTECTED]> wrote:> Hey Eli,>> On Aug 29, 2012, at 11:18 PM, Eli Collins wrote:>>> On Wed, Aug 29, 2012 at 11:06 PM, Mattmann, Chris A (388J)>> <[EMAIL PROTECTED]> wrote:>>> Hey Eli,>>>>>> On Aug 29, 2012, at 10:38 PM, Eli Collins wrote:>>>>>>>>> [..snip..]>>>>>> - The Hadoop trademark, assume this lives in the HDFS project if Common does?>>>>>>>>>> Apache owns the Hadoop trademark, and the PMC helps to enforce it. Projects>>>>> don't own trademarks.>>>>>>>> But which PMC does "the PMC" refer to though given that there is no>>>> longer a Hadoop PMC?>>>>>> Probably the collective set of PMCs that are created, along with trademarks@,>>> and along with other members of the Foundation.>>>>>>> But what are we enforcing as the "Hadoop" trademark if there is no>> longer a Hadoop product release?>> Well Hadoop as a trademark, registered by the ASF, will remain. It doesn't go away,> whether there is an explicit Hadoop TLP or product that TLP releases or not. I'd imagine as a PMC member once> on the Hadoop TLP before it went away, you could choose to enforce the Hadoop trademarks by working with> trademarks@ in the same way that you currently do, or don't, or whatever.>> And "enforce" is a loose word, since everyone's idea of "enforce" with respect to Apache> PMCs and trademarks and so forth somewhat differs.>> My 2c.>

I get that part, just not sure what we'd be enforcing. A concreteproposal will need to figure out what this means once there is no suchthing as a Hadoop release. Seehttp://wiki.apache.org/hadoop/Defining%20Hadoop for some relevantbackground on an old proposal that didn't go anywhere.

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

On Wed, Aug 29, 2012 at 4:54 PM, Mattmann, Chris A (388J)<[EMAIL PROTECTED]> wrote:> OK I lied and said I wouldn't reply :)

Long thread. I just picked a random Chris's (as the initiator) email to reply.

Chris,You are basically saying there's been a history of community problemsin Hadoop project,and proposing a technical solution to split the project by replicatingthe source base under three new names,implying that this will solve the community problems we (the Hadoopcommunity) are facing.

I see several issues.

1. There are other ways to split the project.We essentially have a "natural" split of the project already in place.Hadoop 1, Hadoop 2, Hadoop 0.23, the Trunkare in a sense competing projects by themselves, with own contributorsand release cycles.

2. From technical (not community) viewpoint your "svn copy" is an uglyapproach,as it creates a lot of code duplication and will result in amaintenance nightmare or / andwill require many man-months to fix. My point is that you cannotneglect "technical issues" when you solve community problems.

3. I am as skeptical as Todd that the community problems will besolved by simply TLP-ing the three projects.Two years ago Hadoop was in crises as vendors were producing their ownreleases calling it Hadoop.I think this was solved, but "poor community behavior" and contentionsremained, embrace them or not.

4. Having said the above, separating the projects seems reasonable.(See timing though)HDFS will inevitable have to inherit and maintain most of Common.Totally understand frustration of people who just put a huge effortinto mergingthe sources back under common root.

5. Timing is important.Waiting until Hadoop 2 is stable as Arun suggested earlier wouldprobably be too long.Doing it next week, without discussing and solving technical issuelisted in the thread would be premature.I think Hadoop 0.23.3 release backed by Yahoo production has apotential to becomethe next stable version, letting the project to move ahead off thefour year old code base.We should help that happen first, and do necessary preparations forthe split in the mean time.

Thanks,--Konstantin

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

On Aug 30, 2012, at 3:12 AM, Konstantin Shvachko wrote:> > 5. Timing is important.> Waiting until Hadoop 2 is stable as Arun suggested earlier would> probably be too long.> Doing it next week, without discussing and solving technical issue> listed in the thread would be premature.> I think Hadoop 0.23.3 release backed by Yahoo production has a> potential to become> the next stable version, letting the project to move ahead off the> four year old code base.> We should help that happen first, and do necessary preparations for> the split in the mean time.Agreed. This seems very reasonable - this is along the lines of what I was proposing when I said we should split *before* we declare hadoop-2 as GA (not after Konst, no worries).

Arun

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

> 2. From technical (not community) viewpoint your "svn copy" is an ugly> approach,> as it creates a lot of code duplication and will result in a> maintenance nightmare or / and> will require many man-months to fix. My point is that you cannot> neglect "technical issues" when you solve community problems.

Agreed Konstantin. I don't think Chris was being serious here - it was merely *one* way forward.

There are, easily, better ways to solve this.

The big cross-project dependency is IPC/RPC, Security and Metrics2. Some others are the network topology apis etc. They need to be marked Public/Stable. We need to maintain compatibility across a major (stable) release anyway. This is true for every other Public/Stable api.

So, *technically*, the requirements are:a) Ensure projects only use Public/Stable apis.b) Maintain compatibility for Public/Stable apis within a major release.c) Clearly key components like IPC, Metrics2, Secuirty etc. *should* be marked stable by the time the ersatz hadoop-2 codebase is declared 'stable'.

None of these seem like the fashionably *scary* technical issues some people are using to justify blocking the way forward.

And, no, YARN/MR aren't the only ones downstream projects in this mix - HBase for e.g. uses hadoop metrics2 and our security apis. We need to support compatibility for HBase anyway. There are several other projects in the same boat. Pig/Hive need FileSystem, Security & MR apis. This is just *reality* being at the bottom of the stack.

Yes, there is work left - but that work is something we need to do with or without the split.

Furthermore, yes, the previous split/unsplit was painful. However, beyond that, we have made progress across several dimensions which should make this one smoother:a) Mavenization has helped a *lot*.b) Unlike the previous attempt, HDFS2 & YARN (v/s HDFS1 & MR1) no longer share the same run-time scripts etc. c) We have been fairly good at following through on our stability/visibility guarantees on APIs.

As a result, I don't buy the *this is technically impossible• argument.

As Konstantin suggested, we could spend the next few weeks/months preparing. Even after the split we would be in alpha/beta stage where-by we can recover from mistakes at the cost of a few extra HDFS alpha/beta releases for the sake of MR/YARN projects which seems like an acceptable cost given that there are several volunteers to RM releases.

Last, not least, the previous split failed because the overall community did not invest in ensuring it's success. It's clearly *not* the case this time around. I'm very confident of that.

Arun

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

> On Wed, Aug 29, 2012 at 11:31 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote:>> >> On Aug 29, 2012, at 10:46 PM, Eli Collins wrote:>> >>> On Wed, Aug 29, 2012 at 4:20 PM, Mattmann, Chris A (388J)>>> <[EMAIL PROTECTED]> wrote:>>>> Arun, great work below. Concrete, and an actual proposal of PMC lists.>>>> >>>> What do folks think?>>> >>> I don't see how it helps. This substantially *increases* the size of>>> the PMC for HDFS, I don't even recognize a bunch of names on this>>> list. Unless we're actually going to try to make the HDFS project>>> represent the people who actually contribute and run the project we're>>> just replicating the current situation across 3 projects. 5+ hdfs>>> patches in the last year seems like a pretty low bar to me.>> >> >> Fine. Could you please provide us with an alternate for consideration?>> >

Ok, I'll bite - I find learned helplessness very frustrating.

I modified my proposal to keep the current distinction of Committers v/s PMC for all projects i.e. all projects keep the list of committers I had but PMC is restricted to a intersection of current PMC and respective project's committer list:

> Have we not learned our lessons from the last attempts to split?>> The issues in our community, which I think Chris is referring to, do> not generally revolve around project boundaries. It's not the case> that the HDFS community wants to go one way and the MR/YARN community> wants to go another, and we get into a conflict around it. If it were,> then splitting into separate TLPs would make a ton of sense.>> Instead, the issues are usually _within_ a component. So, if we split> into 3 TLPs, then we'll just have 3 TLPs, each of which is just as> contentious as before.>> Let's just embrace contention as a fact of life on a high-profile> high-stakes project and get back to work.>> I wasted nearly a month undoing the mess of the last attempt, and I> don't see why this time it would go any better. -1 from my perspective> on splitting again at this point. Perhaps if we get to the point that> we're never making cross-project commits it makes sense, but we're not> there still.>> -Todd>> On Wed, Aug 29, 2012 at 1:40 PM, Alejandro Abdelnur <[EMAIL PROTECTED]>> wrote:> > I volunteer to help cleanup/normalize Maven stuff.> >> > Thx> >> > On Wed, Aug 29, 2012 at 1:34 PM, Tom White <[EMAIL PROTECTED]> wrote:> >> Eric - I agree with Common being included in HDFS. That's what I meant> >> by Common not having a clear enough mission to be a TLP by itself.> >>> >> Arun - I'm happy to RM some of the upcoming MR releases too. Also to> >> help out with the work on audience annotations and compatibility.> >>> >> Cheers,> >> Tom> >>> >> On Wed, Aug 29, 2012 at 7:22 PM, Arun C Murthy <[EMAIL PROTECTED]>> wrote:> >>> On Aug 29, 2012, at 10:04 AM, Arun C Murthy wrote:> >>>> On Aug 29, 2012, at 10:02 AM, Suresh Srinivas wrote:> >>>>>> >>>>> Robert and Alejandro have brought up good questions. Here are my> thoughts:> >>>>> - For first one or two releases all the projects can coordinate and> do the> >>>>> releases together. This should help simplify the immediate work> needed.> >>>>> This should also help in us meeting the release timelines that we are> >>>>> working towards. As the split makes progress, this cross project> >>>>> coordination will no longer be necessary. I volunteer to RM these> releases> >>>>> and do the needed co-ordination from HDFS.> >>>>> >>>>> >>>> +1 seems like a reasonable first step. Thanks for volunteering Suresh.> >>>> >>> Also, I'd say we make at least 3-4 alpha/beta releases in this shape.> >>>> >>> I volunteer to RM for MR/YARN releases and work with Suresh.> >>>> >>> Arun> >>>> >> >> >> > --> > Alejandro>>>> --> Todd Lipcon> Software Engineer, Cloudera>

> On Wed, Aug 29, 2012 at 4:54 PM, Mattmann, Chris A (388J)> <[EMAIL PROTECTED]> wrote:>> OK I lied and said I wouldn't reply :)> > Long thread. I just picked a random Chris's (as the initiator) email to reply.> > Chris,> You are basically saying there's been a history of community problems> in Hadoop project,> and proposing a technical solution to split the project by replicating> the source base under three new names,> implying that this will solve the community problems we (the Hadoop> community) are facing.

Well actually the replication of the source code is just a small part ofwhat I was proposing (and one that I don't really care about, and thatisn't crucial to what I'm saying). The breakage up of the project intoindividuals that actually share similar views, that can reach consensus onthings (besides technical issues), and that work in the Apache way is what I was really proposing.

> > I see several issues.> > 1. There are other ways to split the project.> We essentially have a "natural" split of the project already in place.> Hadoop 1, Hadoop 2, Hadoop 0.23, the Trunk> are in a sense competing projects by themselves, with own contributors> and release cycles.

+1, that's a great split too. I'm not wed to simply splitting the project alongcomponents, or systems or whatever.

Whatever makes sense to get communities of people working togetherat Apache is what I'm after. Community != technical.

+1, totally is ugly -- I used it for illustration in the hope that the Hadoop technicalexperts could come up with a better one and stop using it as an excuseto fix the community problems.

> > 3. I am as skeptical as Todd that the community problems will be> solved by simply TLP-ing the three projects.> Two years ago Hadoop was in crises as vendors were producing their own> releases calling it Hadoop.> I think this was solved, but "poor community behavior" and contentions> remained, embrace them or not.

Vendors still produce their own releases on top of Hadoop, whether theycall them Hadoop or not. That problem isn't fixed, and won't be fixed -- it'sgrown too much.

> > 4. Having said the above, separating the projects seems reasonable.> (See timing though)> HDFS will inevitable have to inherit and maintain most of Common.> Totally understand frustration of people who just put a huge effort> into merging> the sources back under common root.

Me too which is why I'm not urging for this or that, or how to solve thesetypes of things. I'm not sure, but I also know that it's most importantto get projects that understand how things work here at Apache.

> > 5. Timing is important.> Waiting until Hadoop 2 is stable as Arun suggested earlier would> probably be too long.> Doing it next week, without discussing and solving technical issue> listed in the thread would be premature.> I think Hadoop 0.23.3 release backed by Yahoo production has a> potential to become> the next stable version, letting the project to move ahead off the> four year old code base.> We should help that happen first, and do necessary preparations for> the split in the mean time.

As a direct Apache software product consumer and sometimes contributor, Ialso experienced firsthand the pain of the project splits. It was notpossible to build an installable release. It may have been many days orweeks before that was cured by a re-merge. I gave up after burning too manyhours on it, went back to the 1.0 code base, and came back only after thedamage was repaired.

It's also frustrating to hear, even if just one person's proposal, that wehave spent months preparing to stabilize our next production deploymentbased on the 2.0 branch, with the expectation that it will be the newstable, but now maybe 0.23 will be the new stable. 0.23 is quite backwardsin comparison and missing all of the critical HA HDFS work.

This thread seems to be becoming a competition for which is the moreradical proposal to snatch defeat from the jaws of success.

These proposals seem to be made with a total lack of care for the end user.

>From my point of view, things were going reasonably well until suddenlythere is this sudden turn into lunacy. I am positive this kind of"foundation" / PMC / project / administrivia tinkering is what willfragment or disband the Hadoop community of users and contributors, notdisagreements between committers. A Hadoop competitor couldn't be happer.

> On Wed, Aug 29, 2012 at 4:54 PM, Mattmann, Chris A (388J)> <[EMAIL PROTECTED]> wrote:> > OK I lied and said I wouldn't reply :)>> Long thread. I just picked a random Chris's (as the initiator) email to> reply.>> Chris,> You are basically saying there's been a history of community problems> in Hadoop project,> and proposing a technical solution to split the project by replicating> the source base under three new names,> implying that this will solve the community problems we (the Hadoop> community) are facing.>> I see several issues.>> 1. There are other ways to split the project.> We essentially have a "natural" split of the project already in place.> Hadoop 1, Hadoop 2, Hadoop 0.23, the Trunk> are in a sense competing projects by themselves, with own contributors> and release cycles.>> 2. From technical (not community) viewpoint your "svn copy" is an ugly> approach,> as it creates a lot of code duplication and will result in a> maintenance nightmare or / and> will require many man-months to fix. My point is that you cannot> neglect "technical issues" when you solve community problems.>> 3. I am as skeptical as Todd that the community problems will be> solved by simply TLP-ing the three projects.> Two years ago Hadoop was in crises as vendors were producing their own> releases calling it Hadoop.> I think this was solved, but "poor community behavior" and contentions> remained, embrace them or not.>> 4. Having said the above, separating the projects seems reasonable.> (See timing though)> HDFS will inevitable have to inherit and maintain most of Common.> Totally understand frustration of people who just put a huge effort> into merging> the sources back under common root.>> 5. Timing is important.> Waiting until Hadoop 2 is stable as Arun suggested earlier would> probably be too long.> Doing it next week, without discussing and solving technical issue> listed in the thread would be premature.> I think Hadoop 0.23.3 release backed by Yahoo production has a> potential to become> the next stable version, letting the project to move ahead off the> four year old code base.> We should help that happen first, and do necessary preparations for> the split in the mean time.>> Thanks,> --Konstantin>

I could not agree more with everything Andrew has written below. Thingshave been running really quite smoothly for months (a year?) now. We've hadone rather small disagreement, that we're about to have cleared up, and nowsuddenly we're talking about rearranging the whole thing. I still fail tosee how this could serve to help Hadoop.

> As a direct Apache software product consumer and sometimes contributor, I> also experienced firsthand the pain of the project splits. It was not> possible to build an installable release. It may have been many days or> weeks before that was cured by a re-merge. I gave up after burning too many> hours on it, went back to the 1.0 code base, and came back only after the> damage was repaired.>> It's also frustrating to hear, even if just one person's proposal, that we> have spent months preparing to stabilize our next production deployment> based on the 2.0 branch, with the expectation that it will be the new> stable, but now maybe 0.23 will be the new stable. 0.23 is quite backwards> in comparison and missing all of the critical HA HDFS work.>> This thread seems to be becoming a competition for which is the more> radical proposal to snatch defeat from the jaws of success.>> These proposals seem to be made with a total lack of care for the end user.>> From my point of view, things were going reasonably well until suddenly> there is this sudden turn into lunacy. I am positive this kind of> "foundation" / PMC / project / administrivia tinkering is what will> fragment or disband the Hadoop community of users and contributors, not> disagreements between committers. A Hadoop competitor couldn't be happer.>> On Thu, Aug 30, 2012 at 1:12 PM, Konstantin Shvachko> <[EMAIL PROTECTED]>wrote:>> > On Wed, Aug 29, 2012 at 4:54 PM, Mattmann, Chris A (388J)> > <[EMAIL PROTECTED]> wrote:> > > OK I lied and said I wouldn't reply :)> >> > Long thread. I just picked a random Chris's (as the initiator) email to> > reply.> >> > Chris,> > You are basically saying there's been a history of community problems> > in Hadoop project,> > and proposing a technical solution to split the project by replicating> > the source base under three new names,> > implying that this will solve the community problems we (the Hadoop> > community) are facing.> >> > I see several issues.> >> > 1. There are other ways to split the project.> > We essentially have a "natural" split of the project already in place.> > Hadoop 1, Hadoop 2, Hadoop 0.23, the Trunk> > are in a sense competing projects by themselves, with own contributors> > and release cycles.> >> > 2. From technical (not community) viewpoint your "svn copy" is an ugly> > approach,> > as it creates a lot of code duplication and will result in a> > maintenance nightmare or / and> > will require many man-months to fix. My point is that you cannot> > neglect "technical issues" when you solve community problems.> >> > 3. I am as skeptical as Todd that the community problems will be> > solved by simply TLP-ing the three projects.> > Two years ago Hadoop was in crises as vendors were producing their own> > releases calling it Hadoop.> > I think this was solved, but "poor community behavior" and contentions> > remained, embrace them or not.> >> > 4. Having said the above, separating the projects seems reasonable.> > (See timing though)> > HDFS will inevitable have to inherit and maintain most of Common.> > Totally understand frustration of people who just put a huge effort> > into merging> > the sources back under common root.> >> > 5. Timing is important.> > Waiting until Hadoop 2 is stable as Arun suggested earlier would> > probably be too long.> > Doing it next week, without discussing and solving technical issue> > listed in the thread would be premature.> > I think Hadoop 0.23.3 release backed by Yahoo production has a

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

As an observer, user, and sometimes contributor, I feel as though theproject has been going smoothly over the past year. As such, I wasquite surprised when this popped up.

On Thu, Aug 30, 2012 at 9:23 AM, Aaron T. Myers <[EMAIL PROTECTED]> wrote:> +1>> I could not agree more with everything Andrew has written below. Things> have been running really quite smoothly for months (a year?) now. We've had> one rather small disagreement, that we're about to have cleared up, and now> suddenly we're talking about rearranging the whole thing. I still fail to> see how this could serve to help Hadoop.>

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

The primary problem I see with umbrellas is that the PMC isn't able toaccurately represent the developer community. Hadoop used to havethat problem, when HBase, etc. were subprojects and most PMC memberswere not involved in those subprojects. Currently this is less of aproblem. Many PMC members are involved in several different parts ofthe project and most PMC members follow all the developer mailinglists. Hadoop at present thus has some semblance to an umbrella butis by no means a classic umbrella.

> One aspect I've seen is that exclusivity of allowing people to become> PMC members on the project, and the separation of PMC from C.> Other things I've seen are the use of technical justifications or complexity> issues as an excuse for the exclusivity, as an excuse for drawing boundaries> between project committers and PMC members, and then between specific> products that the project and community as a whole releases, and finally> other things I've seen include external interests influencing the way that> business is done around here (need for releases in downstream companies,> or projects driving upstream, Apache decisions, which are supposed to be> independent of any lone company, or set of companies -- it's individuals here> that do the work).

I am unconvinced that splitting Hadoop into three projects is apanacea for these issues. For example, adding committers to thesub-lists has been contentious even among the members of thosesublists.

Splitting is perhaps a better long-term structure for the project.But it should be done slowly and carefully. Moving too quickly couldcause a lot of extra work for a lot of people, both in the project anddownstream. A series of incremental steps should prove less painful.For example, the YARN developers might propose that they fork to a newTLP. The YARN code code could then be removed from the motherproject's trunk but remain in branches for compatible bugfix releases. Downstream projects could start adding a dependency on the YARNproject once it makes releases.

Doug

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

I am a user and big fan for Hadoop. I can see lot of great discussionshere. Most of them talks about the access rights and technical problem insplit. I don't see many members contributing to projects, but they have accesslooks like. If they are not contributing, why access required?Generally I will watch the mails in community. Some people not evenpublished and i can see their names in list you have proposed.

I am curious to know how that many people got access in Map Reduce/HDFS.when you think they are separate, Todd proposed list is more close to theabove links and looks to be true contributors.Take the correct information rather than messing up...

I too think that, it is good to split and have a lists like Todd proposedand take the list for YARN from Arun. This is just my thought. I may not know many things here. Please ignorethis mail, if I misunderstood some things in community.

>> On Aug 29, 2012, at 4:48 PM, Todd Lipcon wrote:>> > On Wed, Aug 29, 2012 at 4:47 PM, Konstantin Boudnik <[EMAIL PROTECTED]>> wrote:> >> I am curious where the arbitrar numbery 5 is coming from: is it> reflected in> >> the bylaws?> >> > Nope, I picked it based on Arun's earlier picking of the same number> > in the YARN thread. We have no bylaws about what would happen in the> > eventual TLP-ification of subcomponents, of course.>> I'm sure you just missed it - but, I want to set the record straight: I> picked 20+ patch contributions or 10+ review/commits since *project> inception*.> Your pick seems to be just commits in last 12 months. I have put forth> one, please put forth another proposal if you like. However, please, do> include patches, not just commits.>> For e.g. I'd propose we add llu@ for HDFS since he's done a ton of work> on metrics2 recently. My bad for missing that initially - apologies Luke. I> might have missed more, pls ping me or add yourself. I've put my proposal> up on http://wiki.apache.org/hadoop/HDFS_MR_YARN_TLP_Proposal.>> We could also revisit issues like emeritus after the split to allow each> project to figure it's own norms - I'd urge for that option.>> thanks,> Arun>>

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

Agreed. The contributions from the beginning of the project need to beconsidered.

Clearly YARN was heavily influenced by and borrowed heavily from MapReduce.The fact that a NodeManager isn't the same name as a TaskTracker doesn'tmean it doesn't do the same things using some of the same code. Based onthat, I'd propose that the MapReduce committer list be cloned as the Yarncommitter list too.

-- Owen

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

+1 for splitting the projects+1 for adding all MR contributors to Yarn

I may have missed its mention in this thread, but maintaining the 1.xbranch is probably the most awkward technical hurdle. I'm not sure howthat should be managed if the projects are split. In one strategy, itcan be left with Common+HDFS until 2.x stabilizes.

The tasks that are simpler in a unified project- releases,cross-project patches, etc- are relatively rare, but all dev has paida tax. That acknowledged, as Arun points out: the half-measures thathave made the split painful can be fixed and enthusiasm/resourcesappear to be available for that. As long as TLPs are reconciledquickly and decisively, this can be successful. Without dedicatedresources, we can expect the same result as before.

As for what this accomplishes: each subproject is more approachable onits own. I don't think it will alleviate political tensions, neitherare such tensions inherently unhealthy. But a split can limit thescope to the particular subproject and its interests. It's also easierfor collaborators to engage the subset of contributors charged withits roadmap: Pig/Hive should be able to wrangle MapReduce and Yarnfolks on their dev list, as HBase should engage HDFS without importingextra context.

As another practical matter: we should change the bylaws so emeritusPMC members/committers can reinstate themselves without a vote. Iexpect many people, including myself, would have no problem signalingperiods of inactivity if project politics were out of the equation. -C

On Tue, Aug 28, 2012 at 7:33 PM, Mattmann, Chris A (388J)<[EMAIL PROTECTED]> wrote:> [decided to minimize traffic and to simply put this in one thread]>> Hi Guys,>> See the recent discussion on these threads:>> YARN as its own Hadoop "sub project": http://s.apache.org/WW1> Maintain a single committer list for the Hadoop project: http://s.apache.org/Owx>> ...and just pay attention to the Hadoop project over the last 3-4 years. It's operating> as a single project, that's masking separate communities that themselves are really> separate ASF projects.>> At the ASF, this has been a problem area called "umbrella" projects and over the years,> all I've seen from them is wasted bandwidth, artificial barriers and the inventions of> new ways to perform process mongering and to reduce the fun in developing software> at this fantastic foundation.>> I've talked about umbrella projects enough. We've diverted conversation enough.> Enough people have tried to act like there is some technical mumbo jumbo that is> preventing the eventual act of higher power that I myself hope comes should these> discussions prove unfruitful through normal means.>> *these. are. separate. projects.*> *there.are.not.blocker.issues.from.spinning.out.these.projects.as.their.own.communities*>> In this email: http://s.apache.org/rSm>> And in the 2 subsequent follow ons in that thread, I've outlined a process that I'll copy> through below for splitting these projects into their own TLPs:>> -----snip> Process:>> 0. [DISCUSS] thread for <TLP name> in which you talk about #1 and #2 below, potentially draft resolution too.>> 1. Decide on an initial set of *PMC* members. I urge each new TLP to adopt PMC==C. See reasons I've> already discussed.>> 2. Decide on a chair. Try not to VOTE for this explicitly, see if can be discussed and consensus> can be reached (just a thought experiment). VOTE if necessary.>> 3. [VOTE] thread for <TLP name>>> 4. Create Project:> a. paste resolution from #0 to board@ or;> b. go to general@incubator and start new Incubator project.>> 5. infrastructure set up.> MLs moving; new UNIX groups; website setup;> SVN setup like this:>> svn copy -m "MR TLP." https://svn.apache.org/repos/asf/hadoop/https://svn.apache.org/repos/asf/<insert cool MR name>; or> svn copy -m "YARN TLP." https://svn.apache.org/repos/asf/hadoop/https://svn.apache.org/repos/asf/<insert cool YARN name>; or

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

Andrew's points are fair IMHO. In general, I think it makes sense to have the TLPs but we aren't there yet (as others have pointed out). I'd propose that we should think about the timelines (maybe an appropriate time is when we have Hadoop-2.0 GA'ed).

On Aug 30, 2012, at 7:11 AM, Andrew Purtell wrote:

> As a direct Apache software product consumer and sometimes contributor, I> also experienced firsthand the pain of the project splits. It was not> possible to build an installable release. It may have been many days or> weeks before that was cured by a re-merge. I gave up after burning too many> hours on it, went back to the 1.0 code base, and came back only after the> damage was repaired.> > It's also frustrating to hear, even if just one person's proposal, that we> have spent months preparing to stabilize our next production deployment> based on the 2.0 branch, with the expectation that it will be the new> stable, but now maybe 0.23 will be the new stable. 0.23 is quite backwards> in comparison and missing all of the critical HA HDFS work.> > This thread seems to be becoming a competition for which is the more> radical proposal to snatch defeat from the jaws of success.> > These proposals seem to be made with a total lack of care for the end user.> > From my point of view, things were going reasonably well until suddenly> there is this sudden turn into lunacy. I am positive this kind of> "foundation" / PMC / project / administrivia tinkering is what will> fragment or disband the Hadoop community of users and contributors, not> disagreements between committers. A Hadoop competitor couldn't be happer.> > On Thu, Aug 30, 2012 at 1:12 PM, Konstantin Shvachko> <[EMAIL PROTECTED]>wrote:> >> On Wed, Aug 29, 2012 at 4:54 PM, Mattmann, Chris A (388J)>> <[EMAIL PROTECTED]> wrote:>>> OK I lied and said I wouldn't reply :)>> >> Long thread. I just picked a random Chris's (as the initiator) email to>> reply.>> >> Chris,>> You are basically saying there's been a history of community problems>> in Hadoop project,>> and proposing a technical solution to split the project by replicating>> the source base under three new names,>> implying that this will solve the community problems we (the Hadoop>> community) are facing.>> >> I see several issues.>> >> 1. There are other ways to split the project.>> We essentially have a "natural" split of the project already in place.>> Hadoop 1, Hadoop 2, Hadoop 0.23, the Trunk>> are in a sense competing projects by themselves, with own contributors>> and release cycles.>> >> 2. From technical (not community) viewpoint your "svn copy" is an ugly>> approach,>> as it creates a lot of code duplication and will result in a>> maintenance nightmare or / and>> will require many man-months to fix. My point is that you cannot>> neglect "technical issues" when you solve community problems.>> >> 3. I am as skeptical as Todd that the community problems will be>> solved by simply TLP-ing the three projects.>> Two years ago Hadoop was in crises as vendors were producing their own>> releases calling it Hadoop.>> I think this was solved, but "poor community behavior" and contentions>> remained, embrace them or not.>> >> 4. Having said the above, separating the projects seems reasonable.>> (See timing though)>> HDFS will inevitable have to inherit and maintain most of Common.>> Totally understand frustration of people who just put a huge effort>> into merging>> the sources back under common root.>> >> 5. Timing is important.>> Waiting until Hadoop 2 is stable as Arun suggested earlier would>> probably be too long.>> Doing it next week, without discussing and solving technical issue>> listed in the thread would be premature.>> I think Hadoop 0.23.3 release backed by Yahoo production has a>> potential to become>> the next stable version, letting the project to move ahead off the>> four year old code base.>> We should help that happen first, and do necessary preparations for

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

> The tasks that are simpler in a unified project- releases,> cross-project patches, etc- are relatively rare, but all dev has paid> a tax.

Agreed.

> That acknowledged, as Arun points out: the half-measures that> have made the split painful can be fixed and enthusiasm/resources> appear to be available for that. As long as TLPs are reconciled> quickly and decisively, this can be successful. Without dedicated> resources, we can expect the same result as before.I am willing to volunteer myself to help with whatever it takes to accomplish this. Even the last time around, I did my bit to make the split happen, the fruits of which aren't all lost today.

Looking at the voting, it appears YARN wants to become a TLP RIGHT NOW butat the price of the complete decoherence of the Apache Hadoop platform. Forall of us who have invested in the Apache Hadoop platform, how does thisbenefit us? Certainly our interests seem to get little consideration withthis plan to just blow everything up tomorrow.

How does a downstream project that imports HDFS and MapReduce coordinatethe shared dependencies with those new projects? For, example Guava. Onecould have a multi way library incompatibility problem; this has alreadyhappened in the large with HDFS, HBase, and Pig. It's DLL hell magnified 3or 4 times just in the smoking ruins of "core". The obvious answer is: Oncethese pieces are moving in different trajectories at different rates, endusers and downstream projects will be forced to negotiate with manyparties, and those parties explicitly wont care about the issues concerninganother, according to this discussion. YARN must have broken ourminicluster based MapReduce tests 5 times over the last year. HDFS took upa certain version of Guava and this required us to refactor some code tomatch that version. We had a coherent group of committers to assist us thenbut that would go away. Proponents of the split seem to want exactly thissituation. BigTop was suggested as a vehicle for addressing that concernbut then explicitly rejected on this thread. A commercial vendor looking totorpedo the ability of anyone to build something on Apache Hadoop directlycouldn't come up with a better plan, because only a full time operation canbe expected to have the resources to harmonize the pieces plus all of theirdependencies with build patches, code wrangling, testing, testing, testing.Volunteer contributor and committer time is a precious gift. I wonder ifthe many professional full time Hadoop devs voting here have lost sight ofthis. Pushing your integration work downstream doesn't mean resources willbe there to pick it up. Downstream projects could be forced to reluctantlyabandon working with Apache releases for a commercial distribution such asCDH, or the MapR platform. Or, they will be unable to move from a "knowngood" combination in the face of a combinatorial explosion of dependencychanges, so their general utility to the end user steadily declines. Maybethe consensus is that is acceptable, but I would find that kind of a sadending to this remarkable project.

On Friday, August 31, 2012, Devaraj Das wrote:

> Andrew's points are fair IMHO. In general, I think it makes sense to have> the TLPs but we aren't there yet (as others have pointed out). I'd propose> that we should think about the timelines (maybe an appropriate time is when> we have Hadoop-2.0 GA'ed).>> On Aug 30, 2012, at 7:11 AM, Andrew Purtell wrote:>> > As a direct Apache software product consumer and sometimes contributor, I> > also experienced firsthand the pain of the project splits. It was not> > possible to build an installable release. It may have been many days or> > weeks before that was cured by a re-merge. I gave up after burning too> many> > hours on it, went back to the 1.0 code base, and came back only after the> > damage was repaired.> >> > It's also frustrating to hear, even if just one person's proposal, that> we> > have spent months preparing to stabilize our next production deployment> > based on the 2.0 branch, with the expectation that it will be the new> > stable, but now maybe 0.23 will be the new stable. 0.23 is quite> backwards> > in comparison and missing all of the critical HA HDFS work.> >> > This thread seems to be becoming a competition for which is the more> > radical proposal to snatch defeat from the jaws of success.> >> > These proposals seem to be made with a total lack of care for the end> user.> >> > From my point of view, things were going reasonably well until suddenly> > there is this sudden turn into lunacy. I am positive this kind of> > "foundation" / PMC / project / administrivia tinkering is what willBest regards,

How many new Apache Foundation *members* has the Hadoop PMC added over the past 3-4 years, and by whom (the answer to this question might surprise you)?

The thing you and others continue not to see is that the ASF isn't about themost superior technical solutions, or the best refactorings to prevent Google Guavadependencies, the ASF is about *community* _over_ *code*.

Period. The metrics that the Foundation and its members are interested in are the metrics that demonstrate the health of the project. Technical prowess and market-share are great, as are diverse, hungry, downstream user communities.But the ASF is here to create communities, communities that work together to develop code for public good at no charge to the public. Scope out Boardresolutions to create projects and read the repetitive text in them -- there's apattern there that elucidates this.

Also, the project members and community members here could slice anddice the project into 50 different Top Level Projects, but it doesn't mean thatHadoop would be at its "ending".

Cheers,ChrisOn Aug 30, 2012, at 11:02 PM, Andrew Purtell wrote:

> Looking at the voting, it appears YARN wants to become a TLP RIGHT NOW but> at the price of the complete decoherence of the Apache Hadoop platform. For> all of us who have invested in the Apache Hadoop platform, how does this> benefit us? Certainly our interests seem to get little consideration with> this plan to just blow everything up tomorrow.> > How does a downstream project that imports HDFS and MapReduce coordinate> the shared dependencies with those new projects? For, example Guava. One> could have a multi way library incompatibility problem; this has already> happened in the large with HDFS, HBase, and Pig. It's DLL hell magnified 3> or 4 times just in the smoking ruins of "core". The obvious answer is: Once> these pieces are moving in different trajectories at different rates, end> users and downstream projects will be forced to negotiate with many> parties, and those parties explicitly wont care about the issues concerning> another, according to this discussion. YARN must have broken our> minicluster based MapReduce tests 5 times over the last year. HDFS took up> a certain version of Guava and this required us to refactor some code to> match that version. We had a coherent group of committers to assist us then> but that would go away. Proponents of the split seem to want exactly this> situation. BigTop was suggested as a vehicle for addressing that concern> but then explicitly rejected on this thread. A commercial vendor looking to> torpedo the ability of anyone to build something on Apache Hadoop directly> couldn't come up with a better plan, because only a full time operation can> be expected to have the resources to harmonize the pieces plus all of their> dependencies with build patches, code wrangling, testing, testing, testing.> Volunteer contributor and committer time is a precious gift. I wonder if> the many professional full time Hadoop devs voting here have lost sight of> this. Pushing your integration work downstream doesn't mean resources will> be there to pick it up. Downstream projects could be forced to reluctantly> abandon working with Apache releases for a commercial distribution such as> CDH, or the MapR platform. Or, they will be unable to move from a "known> good" combination in the face of a combinatorial explosion of dependency> changes, so their general utility to the end user steadily declines. Maybe> the consensus is that is acceptable, but I would find that kind of a sad> ending to this remarkable project.> > On Friday, August 31, 2012, Devaraj Das wrote:> >> Andrew's points are fair IMHO. In general, I think it makes sense to have>> the TLPs but we aren't there yet (as others have pointed out). I'd propose>> that we should think about the timelines (maybe an appropriate time is when>> we have Hadoop-2.0 GA'ed).>> >> On Aug 30, 2012, at 7:11 AM, Andrew Purtell wrote:++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++Chris Mattmann, Ph.D.Senior Computer ScientistNASA Jet Propulsion Laboratory Pasadena, CA 91109 USAOffice: 171-266B, Mailstop: 171-246Email: [EMAIL PROTECTED]WWW: http://sunset.usc.edu/~mattmann/++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++Adjunct Assistant Professor, Computer Science DepartmentUniversity of Southern California, Los Angeles, CA 90089 USA++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

> Hi Andrew,> > How many new Apache Foundation *members* has the Hadoop PMC added over the past > 3-4 years, and by whom (the answer to this question might surprise you)?

To rephrase the above:

How many members of the Apache Hadoop PMC have been elected as members of the Apache Software Foundation in the past 3-4 years?

(is what I meant to say). For reference, the Apache Software Foundation membership iselected by the existing membership [1] at annual members meetings. However, inputinto membership and nominations is typically provided by ASF members who are (we hope)parts of those Apache communities (existing PMC members that are also ASF members; or other ASF members that care also, watch, but who themselves are not on the project'sPMC).

Successfully and healthy ASF projects typically add members to the Foundation's ranks through the standard Foundation processes.

If Apache Hadoop -- as an umbrella or sum of its parts -- isn't practicalto develop end applications or downstream projects on, the community willdisappear. I don't follow your logic. I deal with the technical realitiesof actually trying to use an Apache Hadoop distribution, the piecesreleased as source from the ASF, directly in production, and your positionis dismissive if not hostile to my concerns as an end user. What"community" do you mean then? Vendors? Academics? People who like to tinkerwith things they can't actually use?

And you can't just hand waive that this will all work out if done RIGHTNOW, especially with something as inelegant as a SVN copy.

On Friday, August 31, 2012, Mattmann, Chris A (388J) wrote:

> Hi Andrew,>> How many new Apache Foundation *members* has the Hadoop PMC added over the> past> 3-4 years, and by whom (the answer to this question might surprise you)?>> The thing you and others continue not to see is that the ASF isn't about> the> most superior technical solutions, or the best refactorings to prevent> Google Guava> dependencies, the ASF is about *community* _over_ *code*.>> Period. The metrics that the Foundation and its members are interested in> are> the metrics that demonstrate the health of the project. Technical prowess> and> market-share are great, as are diverse, hungry, downstream user> communities.> But the ASF is here to create communities, communities that work together> to> develop code for public good at no charge to the public. Scope out Board> resolutions to create projects and read the repetitive text in them --> there's a> pattern there that elucidates this.>> Also, the project members and community members here could slice and> dice the project into 50 different Top Level Projects, but it doesn't mean> that> Hadoop would be at its "ending".>> Cheers,> Chris>>> On Aug 30, 2012, at 11:02 PM, Andrew Purtell wrote:>> > Looking at the voting, it appears YARN wants to become a TLP RIGHT NOW> but> > at the price of the complete decoherence of the Apache Hadoop platform.> For> > all of us who have invested in the Apache Hadoop platform, how does this> > benefit us? Certainly our interests seem to get little consideration with> > this plan to just blow everything up tomorrow.> >> > How does a downstream project that imports HDFS and MapReduce coordinate> > the shared dependencies with those new projects? For, example Guava. One> > could have a multi way library incompatibility problem; this has already> > happened in the large with HDFS, HBase, and Pig. It's DLL hell magnified> 3> > or 4 times just in the smoking ruins of "core". The obvious answer is:> Once> > these pieces are moving in different trajectories at different rates, end> > users and downstream projects will be forced to negotiate with many> > parties, and those parties explicitly wont care about the issues> concerning> > another, according to this discussion. YARN must have broken our> > minicluster based MapReduce tests 5 times over the last year. HDFS took> up> > a certain version of Guava and this required us to refactor some code to> > match that version. We had a coherent group of committers to assist us> then> > but that would go away. Proponents of the split seem to want exactly this> > situation. BigTop was suggested as a vehicle for addressing that concern> > but then explicitly rejected on this thread. A commercial vendor looking> to> > torpedo the ability of anyone to build something on Apache Hadoop> directly> > couldn't come up with a better plan, because only a full time operation> can> > be expected to have the resources to harmonize the pieces plus all of> their> > dependencies with build patches, code wrangling, testing, testing,> testing.> > Volunteer contributor and committer time is a precious gift. I wonder if> > the many professional full time Hadoop devs voting here have lost sight> of> > this. Pushing your integration work downstream doesn't mean resources

> If Apache Hadoop -- as an umbrella or sum of its parts -- isn't practical> to develop end applications or downstream projects on, the community will> disappear.

Sure, the end-user community might disappear, but the point I'm trying to make isthat the community is more than that. It's developers that build code together("community over code"); it's folks who write documentation who are part of theproject's committee of folks working together to develop software for the publicgood at this Foundation. It's folks who write unit tests as part of that. It's also peoplethat fly by on the lists and that need help; or that may throw up a patch, orwhatever. It's other members of the Apache Software Foundation that are charged with caring and giving a rip about the Foundation's projects.

It's also downstream users of the software too -- they just aren't the only folks who are the community, that's all.

> I don't follow your logic. I deal with the technical realities> of actually trying to use an Apache Hadoop distribution, the pieces> released as source from the ASF, directly in production, and your position> is dismissive if not hostile to my concerns as an end user.

Sorry I wasn't trying to be dismissive. But at the same time I want to suggest thatthe community is broader than simply the technical folks who use the project.

> What> "community" do you mean then? Vendors? Academics? People who like to tinker> with things they can't actually use?

Yeah the community I'm talking about is the larger whole that makes upthe community of the project.

> > And you can't just hand waive that this will all work out if done RIGHT> NOW, especially with something as inelegant as a SVN copy.

Well the project's health is something that ought to be fixed, and it oughtto be done under a timeline. *right now* isn't probably going to be a reality.But I am doing my job as a member of the Foundation in helping to discuss, further root out, and educate the folks around here as to the way that projectswork at the Foundation.

The end user community might disappear, and you are ok with this? I'msimply astonished. Who are these people showing up to help, document, be onlists, whatever, if not current or prospective end users? Who the hellshows up to write unit tests? Who is this "public" in public good? Looks tome like a small cabal of commercial concerns in this case.

I guess the only thing we are going to agree on is that confidence inApache Hadoop project stewardship at the ASF isn't currently warranted. Andhere I thought things were going so well. Who knew this torpedo lurkedbeneath the waters. I guess just members of the cabal. There's nothing morefor me to say, just maybe a few hard decisions to make, depending how thisturns out.

On Friday, August 31, 2012, Mattmann, Chris A (388J) wrote:

> Hi Andrew,>> On Aug 30, 2012, at 11:42 PM, Andrew Purtell wrote:>> > If Apache Hadoop -- as an umbrella or sum of its parts -- isn't practical> > to develop end applications or downstream projects on, the community will> > disappear.>> Sure, the end-user community might disappear, but the point I'm trying to> make is> that the community is more than that. It's developers that build code> together> ("community over code"); it's folks who write documentation who are part> of the> project's committee of folks working together to develop software for the> public> good at this Foundation. It's folks who write unit tests as part of that.> It's also people> that fly by on the lists and that need help; or that may throw up a patch,> or> whatever. It's other members of the Apache Software Foundation that are> charged with caring and giving a rip about the Foundation's projects.>> It's also downstream users of the software too -- they just aren't the> only folks who> are the community, that's all.>> > I don't follow your logic. I deal with the technical realities> > of actually trying to use an Apache Hadoop distribution, the pieces> > released as source from the ASF, directly in production, and your> position> > is dismissive if not hostile to my concerns as an end user.>> Sorry I wasn't trying to be dismissive. But at the same time I want to> suggest that> the community is broader than simply the technical folks who use the> project.>> > What> > "community" do you mean then? Vendors? Academics? People who like to> tinker> > with things they can't actually use?>> Yeah the community I'm talking about is the larger whole that makes up> the community of the project.>> >> > And you can't just hand waive that this will all work out if done RIGHT> > NOW, especially with something as inelegant as a SVN copy.>> Well the project's health is something that ought to be fixed, and it ought> to be done under a timeline. *right now* isn't probably going to be a> reality.> But I am doing my job as a member of the Foundation in helping to discuss,> further root out, and educate the folks around here as to the way that> projects> work at the Foundation.>> Cheers,> Chris>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++> Chris Mattmann, Ph.D.> Senior Computer Scientist> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA> Office: 171-266B, Mailstop: 171-246> Email: [EMAIL PROTECTED] <javascript:;>> WWW: http://sunset.usc.edu/~mattmann/> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++> Adjunct Assistant Professor, Computer Science Department> University of Southern California, Los Angeles, CA 90089 USA> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>>

> Eric - I agree with Common being included in HDFS. That's what I meant> by Common not having a clear enough mission to be a TLP by itself.>

That makes sense too. Even better if you could do JIRAs/commits to the samecodebase together.>> Arun - I'm happy to RM some of the upcoming MR releases too. Also to> help out with the work on audience annotations and compatibility.>> Cheers,> Tom>> On Wed, Aug 29, 2012 at 7:22 PM, Arun C Murthy <[EMAIL PROTECTED]>> wrote:> > On Aug 29, 2012, at 10:04 AM, Arun C Murthy wrote:> >> On Aug 29, 2012, at 10:02 AM, Suresh Srinivas wrote:> >>>> >>> Robert and Alejandro have brought up good questions. Here are my> thoughts:> >>> - For first one or two releases all the projects can coordinate and do> the> >>> releases together. This should help simplify the immediate work needed.> >>> This should also help in us meeting the release timelines that we are> >>> working towards. As the split makes progress, this cross project> >>> coordination will no longer be necessary. I volunteer to RM these> releases> >>> and do the needed co-ordination from HDFS.> >>> >>> >> +1 seems like a reasonable first step. Thanks for volunteering Suresh.> >> > Also, I'd say we make at least 3-4 alpha/beta releases in this shape.> >> > I volunteer to RM for MR/YARN releases and work with Suresh.> >> > Arun> >>

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

I agree with you that the DLL/CLASSPATH issues is one huge concern thatneeds to be addressed before we can really move forward with a validlongterm split. There is hope on the horizon for that though with some ofthe OSGI work that Tom White has been doing.

Chris,

I completely agree with Andrew here. There are very *REAL* technicalissues that need to be addressed before a *CLEAN* split can happen. Wecan make a messy one, but the ramifications are far from trivial. If wesimply go in blindly it will at a minimum take months to stabilize and getback to where we are now. You may be OK with that, but many of us arenot. Simply dismissing others' concurs as invalid is not good for thecommunity. Many of us, as indaviduals, have a huge vested interest inhaving a stable version of Hadoop with new features in it regularlyreleased. That is why we are part of this community. It frankly bafflesme that "community over code" can be used to dismiss concurs about anissue that many of us see as something that will hurt the community. I am+1 for the split, and I am +1 for doing it soon, but I am -1 on doing itwithout at least having a plan as to how we will tease apart the differentpieces of Hadoop.

--Bobby

On 8/31/12 2:55 AM, "Andrew Purtell" <[EMAIL PROTECTED]> wrote:

>The end user community might disappear, and you are ok with this? I'm>simply astonished. Who are these people showing up to help, document, be>on>lists, whatever, if not current or prospective end users? Who the hell>shows up to write unit tests? Who is this "public" in public good? Looks>to>me like a small cabal of commercial concerns in this case.>>I guess the only thing we are going to agree on is that confidence in>Apache Hadoop project stewardship at the ASF isn't currently warranted.>And>here I thought things were going so well. Who knew this torpedo lurked>beneath the waters. I guess just members of the cabal. There's nothing>more>for me to say, just maybe a few hard decisions to make, depending how this>turns out.>>On Friday, August 31, 2012, Mattmann, Chris A (388J) wrote:>>> Hi Andrew,>>>> On Aug 30, 2012, at 11:42 PM, Andrew Purtell wrote:>>>> > If Apache Hadoop -- as an umbrella or sum of its parts -- isn't>>practical>> > to develop end applications or downstream projects on, the community>>will>> > disappear.>>>> Sure, the end-user community might disappear, but the point I'm trying>>to>> make is>> that the community is more than that. It's developers that build code>> together>> ("community over code"); it's folks who write documentation who are part>> of the>> project's committee of folks working together to develop software for>>the>> public>> good at this Foundation. It's folks who write unit tests as part of>>that.>> It's also people>> that fly by on the lists and that need help; or that may throw up a>>patch,>> or>> whatever. It's other members of the Apache Software Foundation that are>> charged with caring and giving a rip about the Foundation's projects.>>>> It's also downstream users of the software too -- they just aren't the>> only folks who>> are the community, that's all.>>>> > I don't follow your logic. I deal with the technical realities>> > of actually trying to use an Apache Hadoop distribution, the pieces>> > released as source from the ASF, directly in production, and your>> position>> > is dismissive if not hostile to my concerns as an end user.>>>> Sorry I wasn't trying to be dismissive. But at the same time I want to>> suggest that>> the community is broader than simply the technical folks who use the>> project.>>>> > What>> > "community" do you mean then? Vendors? Academics? People who like to>> tinker>> > with things they can't actually use?>>>> Yeah the community I'm talking about is the larger whole that makes up>> the community of the project.>>>> >>> > And you can't just hand waive that this will all work out if done>>RIGHT>

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

I agree with Bobby and Andrew here. As has been said on this thread, Ithink the technical issues should be addressed. Just going ahead anddoing the split will be counter productive. I am all for the projectgoing TLP's (sooner than later) but I think we need to work through aplan on when/how that addresses the issues brought up by folks in thethread.

thanksmahadevOn Fri, Aug 31, 2012 at 7:34 AM, Robert Evans <[EMAIL PROTECTED]> wrote:> Andrew,>> I agree with you that the DLL/CLASSPATH issues is one huge concern that> needs to be addressed before we can really move forward with a valid> longterm split. There is hope on the horizon for that though with some of> the OSGI work that Tom White has been doing.>> Chris,>> I completely agree with Andrew here. There are very *REAL* technical> issues that need to be addressed before a *CLEAN* split can happen. We> can make a messy one, but the ramifications are far from trivial. If we> simply go in blindly it will at a minimum take months to stabilize and get> back to where we are now. You may be OK with that, but many of us are> not. Simply dismissing others' concurs as invalid is not good for the> community. Many of us, as indaviduals, have a huge vested interest in> having a stable version of Hadoop with new features in it regularly> released. That is why we are part of this community. It frankly baffles> me that "community over code" can be used to dismiss concurs about an> issue that many of us see as something that will hurt the community. I am> +1 for the split, and I am +1 for doing it soon, but I am -1 on doing it> without at least having a plan as to how we will tease apart the different> pieces of Hadoop.>> --Bobby>> On 8/31/12 2:55 AM, "Andrew Purtell" <[EMAIL PROTECTED]> wrote:>>>The end user community might disappear, and you are ok with this? I'm>>simply astonished. Who are these people showing up to help, document, be>>on>>lists, whatever, if not current or prospective end users? Who the hell>>shows up to write unit tests? Who is this "public" in public good? Looks>>to>>me like a small cabal of commercial concerns in this case.>>>>I guess the only thing we are going to agree on is that confidence in>>Apache Hadoop project stewardship at the ASF isn't currently warranted.>>And>>here I thought things were going so well. Who knew this torpedo lurked>>beneath the waters. I guess just members of the cabal. There's nothing>>more>>for me to say, just maybe a few hard decisions to make, depending how this>>turns out.>>>>On Friday, August 31, 2012, Mattmann, Chris A (388J) wrote:>>>>> Hi Andrew,>>>>>> On Aug 30, 2012, at 11:42 PM, Andrew Purtell wrote:>>>>>> > If Apache Hadoop -- as an umbrella or sum of its parts -- isn't>>>practical>>> > to develop end applications or downstream projects on, the community>>>will>>> > disappear.>>>>>> Sure, the end-user community might disappear, but the point I'm trying>>>to>>> make is>>> that the community is more than that. It's developers that build code>>> together>>> ("community over code"); it's folks who write documentation who are part>>> of the>>> project's committee of folks working together to develop software for>>>the>>> public>>> good at this Foundation. It's folks who write unit tests as part of>>>that.>>> It's also people>>> that fly by on the lists and that need help; or that may throw up a>>>patch,>>> or>>> whatever. It's other members of the Apache Software Foundation that are>>> charged with caring and giving a rip about the Foundation's projects.>>>>>> It's also downstream users of the software too -- they just aren't the>>> only folks who>>> are the community, that's all.>>>>>> > I don't follow your logic. I deal with the technical realities>>> > of actually trying to use an Apache Hadoop distribution, the pieces>>> > released as source from the ASF, directly in production, and your>>> position>>> > is dismissive if not hostile to my concerns as an end user.

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

Sorry I think both of you are still missing my point (maybe I'm wrong). And sorry that I've failed to explain it in such a way that you guys understand,that's as much my issue as anyone else's.

My point is - technical issues, such as how to pull apart componentsand modules are difficult, and my svn copy suggestion, andmoreover, my overall suggestion to figure out how to split the umbrella projectof Hadoop up had less to do with technically pulling apart any its softwareproduct components than it did with actually suggesting a split in the members of the project management committee of the Apache Hadoop project.

The svn copy I suggested was merely to provide said new committees with code to work from (the same code basethey have now in fact). Put simply: I think you guys know a whole lot betterabout how to deliver your software product to the community than I do.So I'm not even trying to say that I know what the ins and outs of splitting MR,YARN and HDFS entail, nor am I even trying to say "hey you HAVE to do that part". That's the technical part.

I am saying that the current members of the Apache Software Foundation's HadoopProject Management Committee exhibit the characteristics (not just duringdiscrete events; it's been happening for a long time) of folks who in realityshouldn't belong to the same project management committee. Note: this isNOT a bad thing. There are probably plenty of (sub-)sets of groups at Apacheand elsewhere that folks wouldn't fit in to. I've enumerated some ofthose characteristics that you can see sometimes spill over(meta thought discussions about moving things around; or drawing arbitrarylines around pieces of code that really have nothing to do with technical stuff, and more to do about insulating and control;), but there are also otherconcerns such as frameworks put in to place (exclusivity amongst others)that themselves are pretty high indicators that this is an umbrella project.There are social memes *around* code, that certainlyhave an impact on the code, but are not the code themselves.

*That* is what I am talking about. If the code splits or whatever make senseas part of the internal navel gazing I'm suggesting regarding the *committee*of this project, then so be it. However, I have no direct say in any of that (nor would I expect to without having the merit in the code to have a say).

Hope that helps explain where I was coming from better.

Cheers,Chris

On Aug 31, 2012, at 7:34 AM, Robert Evans wrote:

> Andrew,> > I agree with you that the DLL/CLASSPATH issues is one huge concern that> needs to be addressed before we can really move forward with a valid> longterm split. There is hope on the horizon for that though with some of> the OSGI work that Tom White has been doing.> > Chris,> > I completely agree with Andrew here. There are very *REAL* technical> issues that need to be addressed before a *CLEAN* split can happen. We> can make a messy one, but the ramifications are far from trivial. If we> simply go in blindly it will at a minimum take months to stabilize and get> back to where we are now. You may be OK with that, but many of us are> not. Simply dismissing others' concurs as invalid is not good for the> community. Many of us, as indaviduals, have a huge vested interest in> having a stable version of Hadoop with new features in it regularly> released. That is why we are part of this community. It frankly baffles> me that "community over code" can be used to dismiss concurs about an> issue that many of us see as something that will hurt the community. I am> +1 for the split, and I am +1 for doing it soon, but I am -1 on doing it> without at least having a plan as to how we will tease apart the different> pieces of Hadoop.> > --Bobby> > On 8/31/12 2:55 AM, "Andrew Purtell" <[EMAIL PROTECTED]> wrote:> >> The end user community might disappear, and you are ok with this? I'm>> simply astonished. Who are these people showing up to help, document, be++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++Chris Mattmann, Ph.D.Senior Computer ScientistNASA Jet Propulsion Laboratory Pasadena, CA 91109 USAOffice: 171-266B, Mailstop: 171-246Email: [EMAIL PROTECTED]WWW: http://sunset.usc.edu/~mattmann/++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++Adjunct Assistant Professor, Computer Science DepartmentUniversity of Southern California, Los Angeles, CA 90089 USA++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

On Tue, Aug 28, 2012 at 7:33 PM, Mattmann, Chris A (388J)<[EMAIL PROTECTED]> wrote:> [decided to minimize traffic and to simply put this in one thread]>> Hi Guys,>> See the recent discussion on these threads:>> YARN as its own Hadoop "sub project": http://s.apache.org/WW1> Maintain a single committer list for the Hadoop project: http://s.apache.org/Owx>> ...and just pay attention to the Hadoop project over the last 3-4 years. It's operating> as a single project, that's masking separate communities that themselves are really> separate ASF projects.>> At the ASF, this has been a problem area called "umbrella" projects and over the years,> all I've seen from them is wasted bandwidth, artificial barriers and the inventions of> new ways to perform process mongering and to reduce the fun in developing software> at this fantastic foundation.>> I've talked about umbrella projects enough. We've diverted conversation enough.> Enough people have tried to act like there is some technical mumbo jumbo that is> preventing the eventual act of higher power that I myself hope comes should these> discussions prove unfruitful through normal means.>> *these. are. separate. projects.*> *there.are.not.blocker.issues.from.spinning.out.these.projects.as.their.own.communities*>> In this email: http://s.apache.org/rSm>> And in the 2 subsequent follow ons in that thread, I've outlined a process that I'll copy> through below for splitting these projects into their own TLPs:>> -----snip> Process:>> 0. [DISCUSS] thread for <TLP name> in which you talk about #1 and #2 below, potentially draft resolution too.>> 1. Decide on an initial set of *PMC* members. I urge each new TLP to adopt PMC==C. See reasons I've> already discussed.>> 2. Decide on a chair. Try not to VOTE for this explicitly, see if can be discussed and consensus> can be reached (just a thought experiment). VOTE if necessary.>> 3. [VOTE] thread for <TLP name>>> 4. Create Project:> a. paste resolution from #0 to board@ or;> b. go to general@incubator and start new Incubator project.>> 5. infrastructure set up.> MLs moving; new UNIX groups; website setup;> SVN setup like this:>> svn copy -m "MR TLP." https://svn.apache.org/repos/asf/hadoop/https://svn.apache.org/repos/asf/<insert cool MR name>; or> svn copy -m "YARN TLP." https://svn.apache.org/repos/asf/hadoop/https://svn.apache.org/repos/asf/<insert cool YARN name>; or> svn copy -m "HDFS TLP." https://svn.apache.org/repos/asf/hadoop/https://svn.apache.org/repos/asf/<insert cool HDFS name>>> After all 3 have been created run:>> svn remove -m "Remove Hadoop umbrella TLP. Split into separate projects." https://svn.apache.org/repos/asf/hadoop>> 6. (TLPs if 4a; Incubator podling if 4b;) proceed, collaborate, operate as distinct communities, and try to solve the code duplication/dependency> issues from there.>> 7. If 4b; then graduate as TLP from Incubator.>> -----snip>> So that's my proposal.

+1 on the general idea of splitting the projects predicated onfixing the issues that made the last split so painful and resolvingtechnicalities like dependencies, etc.

Here's a perspective of a downstream producer of a distributionbuilt on top of Hadoop: I firmly believe that at least with Hadoop 2.0we've reached a point where HDFS and YARN/Mapreduce beingstandalone loosely coupled projects would make much more sense.The user community of Bigtop has expressed interest in being able tomix-n-match versions of MR and HDFS and I believe this to be avery valid (and achievable!) use case. It is less clear what to dowith the Hadoop 1.X code line, but my perception so far has beenthat it is mainly in maintenance mode and thus could be dealt withas an exceptional case.

I've heard some integration concerns on this thread and while I appreciatethem, I still believe that individual projects shouldn't be burdened by themto the extent that they can maintain a reasonable compatibility of the APIs.It is my personal opinion that HDFS and YARN/Mapreduce of theHadoop 2.0 are ready to do that. Bigtop is there to keep them honest, providedthat folks are willing to help us with that mission.

Thanks,Roman.

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

On Fri, Aug 31, 2012 at 8:09 AM, Mattmann, Chris A (388J)<[EMAIL PROTECTED]> wrote:> I am saying that the current members of the Apache Software Foundation's Hadoop> Project Management Committee exhibit the characteristics (not just during> discrete events; it's been happening for a long time) of folks who in reality> shouldn't belong to the same project management committee. Note: this is> NOT a bad thing. There are probably plenty of (sub-)sets of groups at Apache> and elsewhere that folks wouldn't fit in to. I've enumerated some of> those characteristics that you can see sometimes spill over> (meta thought discussions about moving things around; or drawing arbitrary> lines around pieces of code that really have nothing to do with technical> stuff, and more to do about insulating and control;),

Hadoop's community is not perfect. But the divisions in the communityare not primarily aligned with subcomponent boundaries. A projectsplit will thus not likely fix the majority of these communityimperfections. It may fix some, but ought to be pursued carefully sothat it doesn't cause more harm than good.

> but there are also other> concerns such as frameworks put in to place (exclusivity amongst others)> that themselves are pretty high indicators that this is an umbrella project.

The partitioning of committers has now been removed in a separatevote. Hadoop is not a classic umbrella project.

Doug

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

> On Fri, Aug 31, 2012 at 8:09 AM, Mattmann, Chris A (388J)> <[EMAIL PROTECTED]> wrote:>> I am saying that the current members of the Apache Software Foundation's Hadoop>> Project Management Committee exhibit the characteristics (not just during>> discrete events; it's been happening for a long time) of folks who in reality>> shouldn't belong to the same project management committee. Note: this is>> NOT a bad thing. There are probably plenty of (sub-)sets of groups at Apache>> and elsewhere that folks wouldn't fit in to. I've enumerated some of>> those characteristics that you can see sometimes spill over>> (meta thought discussions about moving things around; or drawing arbitrary>> lines around pieces of code that really have nothing to do with technical>> stuff, and more to do about insulating and control;),> > Hadoop's community is not perfect. But the divisions in the community> are not primarily aligned with subcomponent boundaries. A project> split will thus not likely fix the majority of these community> imperfections. It may fix some, but ought to be pursued carefully so> that it doesn't cause more harm than good.

My own personal opinion of this is that yeah they aren't necessarilyaligned subcomponent boundaries too so +1 agree with you.

> >> but there are also other>> concerns such as frameworks put in to place (exclusivity amongst others)>> that themselves are pretty high indicators that this is an umbrella project.> > The partitioning of committers has now been removed in a separate> vote. Hadoop is not a classic umbrella project.

Despite me thinking that's a band-aid it's probably at least a good start.Let's hope it leads to some better interactions amongst the communitymembers and to better health overall.

1. YARN started as a separate project and has a more independentcommunity than Common/HDFS/MR (per below these communities do notdivide at sub-project boundaries) that appears to want to be even moreindependent.

2. YARN is technically much easier to separate from the rest of thecode base (than separating Common and HDFS for example). Separating itout will also help accelerate other efforts like MR2 support forApache Mesos.

3. It side steps a number of thorny issues (how to handle branch-1,how to handle what Hadoop is wrt enforcing trademark, who to removepeople from the Hadoop PMC, etc) that haven't been addressed in any ofthese proposals.

4. It's a proof point - if you can't make the case for YARN thenthere's no way we're going to make a case for splitting the otherprojects (this thread).

Ie this doesn't have to be an all-or-nothing proposition for allsub-projects, since the communities don't fall on sub-projectboundaries.

Thanks,Eli

On Tue, Aug 28, 2012 at 7:33 PM, Mattmann, Chris A (388J)<[EMAIL PROTECTED]> wrote:> [decided to minimize traffic and to simply put this in one thread]>> Hi Guys,>> See the recent discussion on these threads:>> YARN as its own Hadoop "sub project": http://s.apache.org/WW1> Maintain a single committer list for the Hadoop project: http://s.apache.org/Owx>> ...and just pay attention to the Hadoop project over the last 3-4 years. It's operating> as a single project, that's masking separate communities that themselves are really> separate ASF projects.>> At the ASF, this has been a problem area called "umbrella" projects and over the years,> all I've seen from them is wasted bandwidth, artificial barriers and the inventions of> new ways to perform process mongering and to reduce the fun in developing software> at this fantastic foundation.>> I've talked about umbrella projects enough. We've diverted conversation enough.> Enough people have tried to act like there is some technical mumbo jumbo that is> preventing the eventual act of higher power that I myself hope comes should these> discussions prove unfruitful through normal means.>> *these. are. separate. projects.*> *there.are.not.blocker.issues.from.spinning.out.these.projects.as.their.own.communities*>> In this email: http://s.apache.org/rSm>> And in the 2 subsequent follow ons in that thread, I've outlined a process that I'll copy> through below for splitting these projects into their own TLPs:>> -----snip> Process:>> 0. [DISCUSS] thread for <TLP name> in which you talk about #1 and #2 below, potentially draft resolution too.>> 1. Decide on an initial set of *PMC* members. I urge each new TLP to adopt PMC==C. See reasons I've> already discussed.>> 2. Decide on a chair. Try not to VOTE for this explicitly, see if can be discussed and consensus> can be reached (just a thought experiment). VOTE if necessary.>> 3. [VOTE] thread for <TLP name>>> 4. Create Project:> a. paste resolution from #0 to board@ or;> b. go to general@incubator and start new Incubator project.>> 5. infrastructure set up.> MLs moving; new UNIX groups; website setup;> SVN setup like this:>> svn copy -m "MR TLP." https://svn.apache.org/repos/asf/hadoop/https://svn.apache.org/repos/asf/<insert cool MR name>; or> svn copy -m "YARN TLP." https://svn.apache.org/repos/asf/hadoop/https://svn.apache.org/repos/asf/<insert cool YARN name>; or> svn copy -m "HDFS TLP." https://svn.apache.org/repos/asf/hadoop/https://svn.apache.org/repos/asf/<insert cool HDFS name>>> After all 3 have been created run:>> svn remove -m "Remove Hadoop umbrella TLP. Split into separate projects." https://svn.apache.org/repos/asf/hadoop>> 6. (TLPs if 4a; Incubator podling if 4b;) proceed, collaborate, operate as distinct communities, and try to solve the code duplication/dependency> issues from there.>> 7. If 4b; then graduate as TLP from Incubator.

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

The problem there is that YARN depends on Common, and MapReduce depends onYARN, so we would either have a circular dependency or we would have tosplit off MapRedcue too.

--Bobby

On 8/31/12 11:54 AM, "Eli Collins" <[EMAIL PROTECTED]> wrote:

>How about a proposal to just spin YARN off as a TLP? Rationale:>>1. YARN started as a separate project and has a more independent>community than Common/HDFS/MR (per below these communities do not>divide at sub-project boundaries) that appears to want to be even more>independent.>>2. YARN is technically much easier to separate from the rest of the>code base (than separating Common and HDFS for example). Separating it>out will also help accelerate other efforts like MR2 support for>Apache Mesos.>>3. It side steps a number of thorny issues (how to handle branch-1,>how to handle what Hadoop is wrt enforcing trademark, who to remove>people from the Hadoop PMC, etc) that haven't been addressed in any of>these proposals.>>4. It's a proof point - if you can't make the case for YARN then>there's no way we're going to make a case for splitting the other>projects (this thread).>>Ie this doesn't have to be an all-or-nothing proposition for all>sub-projects, since the communities don't fall on sub-project>boundaries.>>Thanks,>Eli>>On Tue, Aug 28, 2012 at 7:33 PM, Mattmann, Chris A (388J)><[EMAIL PROTECTED]> wrote:>> [decided to minimize traffic and to simply put this in one thread]>>>> Hi Guys,>>>> See the recent discussion on these threads:>>>> YARN as its own Hadoop "sub project": http://s.apache.org/WW1>> Maintain a single committer list for the Hadoop project:>>http://s.apache.org/Owx>>>> ...and just pay attention to the Hadoop project over the last 3-4>>years. It's operating>> as a single project, that's masking separate communities that>>themselves are really>> separate ASF projects.>>>> At the ASF, this has been a problem area called "umbrella" projects and>>over the years,>> all I've seen from them is wasted bandwidth, artificial barriers and>>the inventions of>> new ways to perform process mongering and to reduce the fun in>>developing software>> at this fantastic foundation.>>>> I've talked about umbrella projects enough. We've diverted conversation>>enough.>> Enough people have tried to act like there is some technical mumbo>>jumbo that is>> preventing the eventual act of higher power that I myself hope comes>>should these>> discussions prove unfruitful through normal means.>>>> *these. are. separate. projects.*>> >>*there.are.not.blocker.issues.from.spinning.out.these.projects.as.their.o>>wn.communities*>>>> In this email: http://s.apache.org/rSm>>>> And in the 2 subsequent follow ons in that thread, I've outlined a>>process that I'll copy>> through below for splitting these projects into their own TLPs:>>>> -----snip>> Process:>>>> 0. [DISCUSS] thread for <TLP name> in which you talk about #1 and #2>>below, potentially draft resolution too.>>>> 1. Decide on an initial set of *PMC* members. I urge each new TLP to>>adopt PMC==C. See reasons I've>> already discussed.>>>> 2. Decide on a chair. Try not to VOTE for this explicitly, see if can>>be discussed and consensus>> can be reached (just a thought experiment). VOTE if necessary.>>>> 3. [VOTE] thread for <TLP name>>>>> 4. Create Project:>> a. paste resolution from #0 to board@ or;>> b. go to general@incubator and start new Incubator project.>>>> 5. infrastructure set up.>> MLs moving; new UNIX groups; website setup;>> SVN setup like this:>>>> svn copy -m "MR TLP." https://svn.apache.org/repos/asf/hadoop/>>https://svn.apache.org/repos/asf/<insert cool MR name>; or>> svn copy -m "YARN TLP." https://svn.apache.org/repos/asf/hadoop/>>https://svn.apache.org/repos/asf/<insert cool YARN name>; or>> svn copy -m "HDFS TLP." https://svn.apache.org/repos/asf/hadoop/>>https://svn.apache.org/repos/asf/<insert cool HDFS name>>>

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

On Thu, Aug 30, 2012 at 11:50 PM, Mattmann, Chris A (388J)<[EMAIL PROTECTED]> wrote:> Hi Andrew,>> On Aug 30, 2012, at 11:42 PM, Andrew Purtell wrote:>>> If Apache Hadoop -- as an umbrella or sum of its parts -- isn't practical>> to develop end applications or downstream projects on, the community will>> disappear.>> Sure, the end-user community might disappear, but the point I'm trying to make is> that the community is more than that. It's developers that build code together> ("community over code"); it's folks who write documentation who are part of the> project's committee of folks working together to develop software for the public> good at this Foundation. It's folks who write unit tests as part of that. It's also people> that fly by on the lists and that need help; or that may throw up a patch, or> whatever. It's other members of the Apache Software Foundation that are> charged with caring and giving a rip about the Foundation's projects.

Well, speaking as one of the developer community who hasn't been atraditional user of Hadoop since my previous job in 2008: if the enduser community started to languish, I (and 80% of the other mostinvolved contributors) would probably stop working on the projectpretty quickly. We're here because a user community exists, whichfunds our employers, who fund us.

Another point I'll make is that I've talked to a number of formercontributors (from the 0.20 days) who pretty much stopped contributingbecause of the code base churn around the prior project split. Itbecame too much effort to forward and back port patches from theirinternal branches, so their cost/reward tradeoff dipped negative. Sothere are real community costs associated with what seem like"technical" changes.

I don't know who came up with the original "community over code"mantra, or whether the ASF truly thinks these are hard and fast rulesrather than principles and guidelines. But, if I may be so bold, Iwould much prefer the mantra of "community around code". Without thecode at the center of any project, we'd just be a bunch of nerdsshooting the shit. The code's what ties us together, and the pressureof keeping a centralized codebase that we can all feel good aboutshipping is what allows us to get past our differences and producehigh quality software.

> Note: While there is not an official list, the following six principles have been cited as the core beliefs of The Apache Way:> - collaborative software development> - commercial-friendly standard license> - consistently high quality software> - respectful, honest, technical-based interaction> - faithful implementation of standards> - security as a mandatory feature

Maybe you disagree, but from my perspective, we're doing reasonablywell on all of them. You may not think there's much collaboration, butin the last 2-3 weeks, I have collaborated on Hadoop-related work withdevelopers from Trend Micro, Facebook, Calxeda, Hortonworks, andinteracted with users from a much wider variety of organizations.

As Andrew said, I thought we were going along pretty well before this thread.As for technical things we need to do to get to a feasible split: big+1 that classpath pollution issues are near top of the list. We need areasonable classloader strategy, and I think Tom's OSGi stuff is agood start in that direction. But it's going to be quite some timebefore that's all integrated and pulled into dependent projects, etc.So let's work on it but not be rash in our decisions.

-Todd-- Todd LipconSoftware Engineer, Cloudera

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

On Fri, Aug 31, 2012 at 9:58 AM, Robert Evans <[EMAIL PROTECTED]> wrote:> The problem there is that YARN depends on Common, and MapReduce depends on> YARN, so we would either have a circular dependency or we would have to> split off MapRedcue too.

I haven't been in the MR codebase much of late, so I'll defer to yourjudgment here: would it be feasible to have an abstraction layer for"cluster manager" separated out into a pile of interfaces? Then wecould leave MR inside Hadoop, and Yarn would have an "MR->Yarnbinding" module. I'm not sure where the line would be drawn, but onepossibility would be to separate out the MR _task_ code from the MRscheduling code (AM, Job Submission, etc)

Again would be a large project, but as Eli said, it would help make MRmore "relocatable" onto other cluster schedulers like Mesos (or even atraditional grid scheduler). Another possible boon there would besomething I've discussed with Arun a few times: it would be cool if wecould get the new MR task code (in particular the rewritten reduce,but also some of the new exciting work that Tsuyoshi and Mariappan aredoing) running in the context of an MR1 cluster.

-Todd

>> On 8/31/12 11:54 AM, "Eli Collins" <[EMAIL PROTECTED]> wrote:>>>How about a proposal to just spin YARN off as a TLP? Rationale:>>>>1. YARN started as a separate project and has a more independent>>community than Common/HDFS/MR (per below these communities do not>>divide at sub-project boundaries) that appears to want to be even more>>independent.>>>>2. YARN is technically much easier to separate from the rest of the>>code base (than separating Common and HDFS for example). Separating it>>out will also help accelerate other efforts like MR2 support for>>Apache Mesos.>>>>3. It side steps a number of thorny issues (how to handle branch-1,>>how to handle what Hadoop is wrt enforcing trademark, who to remove>>people from the Hadoop PMC, etc) that haven't been addressed in any of>>these proposals.>>>>4. It's a proof point - if you can't make the case for YARN then>>there's no way we're going to make a case for splitting the other>>projects (this thread).>>>>Ie this doesn't have to be an all-or-nothing proposition for all>>sub-projects, since the communities don't fall on sub-project>>boundaries.>>>>Thanks,>>Eli>>>>On Tue, Aug 28, 2012 at 7:33 PM, Mattmann, Chris A (388J)>><[EMAIL PROTECTED]> wrote:>>> [decided to minimize traffic and to simply put this in one thread]>>>>>> Hi Guys,>>>>>> See the recent discussion on these threads:>>>>>> YARN as its own Hadoop "sub project": http://s.apache.org/WW1>>> Maintain a single committer list for the Hadoop project:>>>http://s.apache.org/Owx>>>>>> ...and just pay attention to the Hadoop project over the last 3-4>>>years. It's operating>>> as a single project, that's masking separate communities that>>>themselves are really>>> separate ASF projects.>>>>>> At the ASF, this has been a problem area called "umbrella" projects and>>>over the years,>>> all I've seen from them is wasted bandwidth, artificial barriers and>>>the inventions of>>> new ways to perform process mongering and to reduce the fun in>>>developing software>>> at this fantastic foundation.>>>>>> I've talked about umbrella projects enough. We've diverted conversation>>>enough.>>> Enough people have tried to act like there is some technical mumbo>>>jumbo that is>>> preventing the eventual act of higher power that I myself hope comes>>>should these>>> discussions prove unfruitful through normal means.>>>>>> *these. are. separate. projects.*>>>>>>*there.are.not.blocker.issues.from.spinning.out.these.projects.as.their.o>>>wn.communities*>>>>>> In this email: http://s.apache.org/rSm>>>>>> And in the 2 subsequent follow ons in that thread, I've outlined a>>>process that I'll copy>>> through below for splitting these projects into their own TLPs:>>>>>> -----snip>>> Process:>>>>>> 0. [DISCUSS] thread for <TLP name> in which you talk about #1 and #2

Todd LipconSoftware Engineer, Cloudera

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

> As for technical things we need to do to get to a feasible split: big> +1 that classpath pollution issues are near top of the list. We need a> reasonable classloader strategy, and I think Tom's OSGi stuff is a> good start in that direction. But it's going to be quite some time> before that's all integrated and pulled into dependent projects, etc.> So let's work on it but not be rash in our decisions.

Seriously, this is a MUST. Until we address this, splitting is like abroken pen.

Thx

-- Alejandro

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

On Fri, Aug 31, 2012 at 10:10 AM, Alejandro Abdelnur <[EMAIL PROTECTED]> wrote:> On Fri, Aug 31, 2012 at 9:59 AM, Todd Lipcon <[EMAIL PROTECTED]> wrote:>>> As for technical things we need to do to get to a feasible split: big>> +1 that classpath pollution issues are near top of the list. We need a>> reasonable classloader strategy, and I think Tom's OSGi stuff is a>> good start in that direction. But it's going to be quite some time>> before that's all integrated and pulled into dependent projects, etc.>> So let's work on it but not be rash in our decisions.>> Seriously, this is a MUST. Until we address this, splitting is like a> broken pen.>> Thx>> --> Alejandro

-- Alejandro

-

RE: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

> As for technical things we need to do to get to a feasible split: big> +1 that classpath pollution issues are near top of the list. We need a> reasonable classloader strategy, and I think Tom's OSGi stuff is a> good start in that direction. But it's going to be quite some time> before that's all integrated and pulled into dependent projects, etc.> So let's work on it but not be rash in our decisions.

Just a quick comment regarding the OSGi specification - Eclipse plugins use OSGi 'bundles'. This is the most excruciatingly painful aspect of building plugins for eclipse. I am sure there are other experts here who can chime in, but google for eclipse plugin classpath problems, and you will get an earful...

Jagane

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

That would be wonderful to have. +1 I would love to see MR run on morethen just HDFS/YARN. So people can pick what execution environment makessince for them, just like what MPI does, or something like what HDFS doeswith FileSystem. My perspective was just from the current state ofthings, if we want to invert the relationship that fixes the problem. Iwould be happy to help with doing that.

--Bobby

On 8/31/12 12:06 PM, "Todd Lipcon" <[EMAIL PROTECTED]> wrote:

>On Fri, Aug 31, 2012 at 9:58 AM, Robert Evans <[EMAIL PROTECTED]> wrote:>> The problem there is that YARN depends on Common, and MapReduce depends>>on>> YARN, so we would either have a circular dependency or we would have to>> split off MapRedcue too.>>I haven't been in the MR codebase much of late, so I'll defer to your>judgment here: would it be feasible to have an abstraction layer for>"cluster manager" separated out into a pile of interfaces? Then we>could leave MR inside Hadoop, and Yarn would have an "MR->Yarn>binding" module. I'm not sure where the line would be drawn, but one>possibility would be to separate out the MR _task_ code from the MR>scheduling code (AM, Job Submission, etc)>>Again would be a large project, but as Eli said, it would help make MR>more "relocatable" onto other cluster schedulers like Mesos (or even a>traditional grid scheduler). Another possible boon there would be>something I've discussed with Arun a few times: it would be cool if we>could get the new MR task code (in particular the rewritten reduce,>but also some of the new exciting work that Tsuyoshi and Mariappan are>doing) running in the context of an MR1 cluster.>>-Todd>>>>> On 8/31/12 11:54 AM, "Eli Collins" <[EMAIL PROTECTED]> wrote:>>>>>How about a proposal to just spin YARN off as a TLP? Rationale:>>>>>>1. YARN started as a separate project and has a more independent>>>community than Common/HDFS/MR (per below these communities do not>>>divide at sub-project boundaries) that appears to want to be even more>>>independent.>>>>>>2. YARN is technically much easier to separate from the rest of the>>>code base (than separating Common and HDFS for example). Separating it>>>out will also help accelerate other efforts like MR2 support for>>>Apache Mesos.>>>>>>3. It side steps a number of thorny issues (how to handle branch-1,>>>how to handle what Hadoop is wrt enforcing trademark, who to remove>>>people from the Hadoop PMC, etc) that haven't been addressed in any of>>>these proposals.>>>>>>4. It's a proof point - if you can't make the case for YARN then>>>there's no way we're going to make a case for splitting the other>>>projects (this thread).>>>>>>Ie this doesn't have to be an all-or-nothing proposition for all>>>sub-projects, since the communities don't fall on sub-project>>>boundaries.>>>>>>Thanks,>>>Eli>>>>>>On Tue, Aug 28, 2012 at 7:33 PM, Mattmann, Chris A (388J)>>><[EMAIL PROTECTED]> wrote:>>>> [decided to minimize traffic and to simply put this in one thread]>>>>>>>> Hi Guys,>>>>>>>> See the recent discussion on these threads:>>>>>>>> YARN as its own Hadoop "sub project": http://s.apache.org/WW1>>>> Maintain a single committer list for the Hadoop project:>>>>http://s.apache.org/Owx>>>>>>>> ...and just pay attention to the Hadoop project over the last 3-4>>>>years. It's operating>>>> as a single project, that's masking separate communities that>>>>themselves are really>>>> separate ASF projects.>>>>>>>> At the ASF, this has been a problem area called "umbrella" projects>>>>and>>>>over the years,>>>> all I've seen from them is wasted bandwidth, artificial barriers and>>>>the inventions of>>>> new ways to perform process mongering and to reduce the fun in>>>>developing software>>>> at this fantastic foundation.>>>>>>>> I've talked about umbrella projects enough. We've diverted>>>>conversation>>>>enough.>>>> Enough people have tried to act like there is some technical mumbo>>

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

How often you call for Emeriti lists.Otherwise , if list is simply growing, then people may surprise like this.And also some people(Eli Collins) showed concerns about growing lists aboveright. But in reality all that people may not be active and not looking toproject from long. Having them in active list will not help to Hadoopright. If they really want to active and help to project, they can regainat that time. Otherwise people may think to add new people, as list alreadybig like above. no?

> On Thu, Aug 30, 2012 at 12:33 PM, Inder.dev Java <[EMAIL PROTECTED]>> wrote:> > I am curious to know how that many people got access in Map Reduce/HDFS.>> Many of these are folks who were more active in the past. Hadoop is> now 6.5 years old.>> At Apache, merit does not expire:>> http://www.apache.org/dev/committers.html#committer-set-term>> Doug>

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

On Fri, Aug 31, 2012 at 12:00 PM, Inder.dev Java <[EMAIL PROTECTED]> wrote:> How often you call for Emeriti lists.> Otherwise , if list is simply growing, then people may surprise like this.> And also some people(Eli Collins) showed concerns about growing lists above> right. But in reality all that people may not be active and not looking to> project from long. Having them in active list will not help to Hadoop> right. If they really want to active and help to project, they can regain> at that time. Otherwise people may think to add new people, as list already> big like above. no?

Keeping people who are no longer active on the committer listshouldn't cause problems. No quorum is required for votes. Emeritusis used for folks who no longer follow the project at all. Somecommitters may no longer be contributing code regularly but theyshould still be reading the developer mailing lists and may vote.More active contributors primarily determine the current technicaldirection of the project by making contributions.

Doug

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

I'd be fascinated to hear more from folks who have lead other projects at Apache how Hadoop's (and Lucene's, same DNA) committer management process compares and some good and bad lessons learned from those projects. Chris M has mentioned his experience from other projects. Can others comment?

Many projects do seem to have Emeriti lists / processes. How do people feel about that?

How does the committer list size of Hadoop compare to other major Apache projects?

Other lessons learned?

On Aug 31, 2012, at 1:44 PM, Doug Cutting wrote:

> On Fri, Aug 31, 2012 at 12:00 PM, Inder.dev Java <[EMAIL PROTECTED]> wrote:>> How often you call for Emeriti lists.>> Otherwise , if list is simply growing, then people may surprise like this.>> And also some people(Eli Collins) showed concerns about growing lists above>> right. But in reality all that people may not be active and not looking to>> project from long. Having them in active list will not help to Hadoop>> right. If they really want to active and help to project, they can regain>> at that time. Otherwise people may think to add new people, as list already>> big like above. no?> > Keeping people who are no longer active on the committer list> shouldn't cause problems. No quorum is required for votes. Emeritus> is used for folks who no longer follow the project at all. Some> committers may no longer be contributing code regularly but they> should still be reading the developer mailing lists and may vote.> More active contributors primarily determine the current technical> direction of the project by making contributions.> > Doug

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

Lots of good points raised here. I remain convinced that the time to split Hadoop into TLPs in here, but I think we should also consider the practical concerns raised.

Hadoop 2.0 has been years of work in the making and is finally relatively close. I think it would be a mistake to throw another impediment in the way of getting a stable version of 2.0 done, as many folks have pointed out. So I'd suggest that we plan to do a split once there is a broad consensus that 2.0 is stable and widely deployed.

Perhaps folks interested in planning a split or concerned that a split might impact them can meet to refine a proposal that we can all consider implementing once 2.0 is stable.

What do folks think?

Thanks,

E14

On Aug 31, 2012, at 9:08 AM, Mattmann, Chris A (388J) wrote:

> Hey Doug,> > On Aug 31, 2012, at 9:00 AM, Doug Cutting wrote:> >> On Fri, Aug 31, 2012 at 8:09 AM, Mattmann, Chris A (388J)>> <[EMAIL PROTECTED]> wrote:>>> I am saying that the current members of the Apache Software Foundation's Hadoop>>> Project Management Committee exhibit the characteristics (not just during>>> discrete events; it's been happening for a long time) of folks who in reality>>> shouldn't belong to the same project management committee. Note: this is>>> NOT a bad thing. There are probably plenty of (sub-)sets of groups at Apache>>> and elsewhere that folks wouldn't fit in to. I've enumerated some of>>> those characteristics that you can see sometimes spill over>>> (meta thought discussions about moving things around; or drawing arbitrary>>> lines around pieces of code that really have nothing to do with technical>>> stuff, and more to do about insulating and control;),>> >> Hadoop's community is not perfect. But the divisions in the community>> are not primarily aligned with subcomponent boundaries. A project>> split will thus not likely fix the majority of these community>> imperfections. It may fix some, but ought to be pursued carefully so>> that it doesn't cause more harm than good.> > My own personal opinion of this is that yeah they aren't necessarily> aligned subcomponent boundaries too so +1 agree with you.> >> >>> but there are also other>>> concerns such as frameworks put in to place (exclusivity amongst others)>>> that themselves are pretty high indicators that this is an umbrella project.>> >> The partitioning of committers has now been removed in a separate>> vote. Hadoop is not a classic umbrella project.> > Despite me thinking that's a band-aid it's probably at least a good start.> Let's hope it leads to some better interactions amongst the community> members and to better health overall.> > Cheers,> Chris> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++> Chris Mattmann, Ph.D.> Senior Computer Scientist> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA> Office: 171-266B, Mailstop: 171-246> Email: [EMAIL PROTECTED]> WWW: http://sunset.usc.edu/~mattmann/> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++> Adjunct Assistant Professor, Computer Science Department> University of Southern California, Los Angeles, CA 90089 USA> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

>>> Hadoop 2.0 has been years of work in the making and is finally relatively> close. I think it would be a mistake to throw another impediment in the> way of getting a stable version of 2.0 done, as many folks have pointed> out. So I'd suggest that we plan to do a split once there is a broad> consensus that 2.0 is stable and widely deployed.>>+1 for revisiting the split once we have stable Hadoop 2.0 with large-scaleand wide deployments.

-

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

As a downstream user of and contributor to BigTop (though only once, I knowwe need to do better, Roman), it would be awesome to see the communityrally around it as an integration point if the project splits into finergrained components yet.

On Friday, August 31, 2012, Roman Shaposhnik wrote:

> On Tue, Aug 28, 2012 at 7:33 PM, Mattmann, Chris A (388J)> <[EMAIL PROTECTED] <javascript:;>> wrote:> > [decided to minimize traffic and to simply put this in one thread]> >> > Hi Guys,> >> > See the recent discussion on these threads:> >> > YARN as its own Hadoop "sub project": http://s.apache.org/WW1> > Maintain a single committer list for the Hadoop project:> http://s.apache.org/Owx> >> > ...and just pay attention to the Hadoop project over the last 3-4 years.> It's operating> > as a single project, that's masking separate communities that themselves> are really> > separate ASF projects.> >> > At the ASF, this has been a problem area called "umbrella" projects and> over the years,> > all I've seen from them is wasted bandwidth, artificial barriers and the> inventions of> > new ways to perform process mongering and to reduce the fun in> developing software> > at this fantastic foundation.> >> > I've talked about umbrella projects enough. We've diverted conversation> enough.> > Enough people have tried to act like there is some technical mumbo jumbo> that is> > preventing the eventual act of higher power that I myself hope comes> should these> > discussions prove unfruitful through normal means.> >> > *these. are. separate. projects.*> >> *there.are.not.blocker.issues.from.spinning.out.these.projects.as.their.own.communities*> >> > In this email: http://s.apache.org/rSm> >> > And in the 2 subsequent follow ons in that thread, I've outlined a> process that I'll copy> > through below for splitting these projects into their own TLPs:> >> > -----snip> > Process:> >> > 0. [DISCUSS] thread for <TLP name> in which you talk about #1 and #2> below, potentially draft resolution too.> >> > 1. Decide on an initial set of *PMC* members. I urge each new TLP to> adopt PMC==C. See reasons I've> > already discussed.> >> > 2. Decide on a chair. Try not to VOTE for this explicitly, see if can be> discussed and consensus> > can be reached (just a thought experiment). VOTE if necessary.> >> > 3. [VOTE] thread for <TLP name>> >> > 4. Create Project:> > a. paste resolution from #0 to board@ or;> > b. go to general@incubator and start new Incubator project.> >> > 5. infrastructure set up.> > MLs moving; new UNIX groups; website setup;> > SVN setup like this:> >> > svn copy -m "MR TLP." https://svn.apache.org/repos/asf/hadoop/> https://svn.apache.org/repos/asf/<insert cool MR name>; or> > svn copy -m "YARN TLP." https://svn.apache.org/repos/asf/hadoop/> https://svn.apache.org/repos/asf/<insert cool YARN name>; or> > svn copy -m "HDFS TLP." https://svn.apache.org/repos/asf/hadoop/> https://svn.apache.org/repos/asf/<insert cool HDFS name>> >> > After all 3 have been created run:> >> > svn remove -m "Remove Hadoop umbrella TLP. Split into separate> projects." https://svn.apache.org/repos/asf/hadoop> >> > 6. (TLPs if 4a; Incubator podling if 4b;) proceed, collaborate, operate> as distinct communities, and try to solve the code duplication/dependency> > issues from there.> >> > 7. If 4b; then graduate as TLP from Incubator.> >> > -----snip> >> > So that's my proposal.>> +1 on the general idea of splitting the projects predicated on> fixing the issues that made the last split so painful and resolving> technicalities like dependencies, etc.>> Here's a perspective of a downstream producer of a distribution> built on top of Hadoop: I firmly believe that at least with Hadoop 2.0> we've reached a point where HDFS and YARN/Mapreduce being> standalone loosely coupled projects would make much more sense.> The user community of Bigtop has expressed interest in being able toBest regards,

Or resurrect MR(v1) in Apache Hadoop as Apache YARN becomes a TLP, and letthe new YARN TLP decide if they want to use the Hadoop MR artifacts and/orcontribute patches that harmonize the implementation with theirs, or pursuean alternate MR implementation within their larger framework.

I'd imagine such a MR(v1) in Hadoop, if this happened, would concentrate onperformance improvements, maybe such things as alternate shuffle plugins.Perhaps a HA JobTracker for parity with HDFS. But we could expect a clearseparation where next generation framework work would be continued in andcentered upon YARN, while Hadoop remains... well, Hadoop.

On Friday, August 31, 2012, Robert Evans wrote:

> The problem there is that YARN depends on Common, and MapReduce depends on> YARN, so we would either have a circular dependency or we would have to> split off MapRedcue too.>> --Bobby>> On 8/31/12 11:54 AM, "Eli Collins" <[EMAIL PROTECTED]> wrote:>> >How about a proposal to just spin YARN off as a TLP? Rationale:> >> >1. YARN started as a separate project and has a more independent> >community than Common/HDFS/MR (per below these communities do not> >divide at sub-project boundaries) that appears to want to be even more> >independent.> >> >2. YARN is technically much easier to separate from the rest of the> >code base (than separating Common and HDFS for example). Separating it> >out will also help accelerate other efforts like MR2 support for> >Apache Mesos.> >> >3. It side steps a number of thorny issues (how to handle branch-1,> >how to handle what Hadoop is wrt enforcing trademark, who to remove> >people from the Hadoop PMC, etc) that haven't been addressed in any of> >these proposals.> >> >4. It's a proof point - if you can't make the case for YARN then> >there's no way we're going to make a case for splitting the other> >projects (this thread).> >> >Ie this doesn't have to be an all-or-nothing proposition for all> >sub-projects, since the communities don't fall on sub-project> >boundaries.> >> >Thanks,> >Eli> >> >On Tue, Aug 28, 2012 at 7:33 PM, Mattmann, Chris A (388J)> ><[EMAIL PROTECTED]> wrote:> >> [decided to minimize traffic and to simply put this in one thread]> >>> >> Hi Guys,> >>> >> See the recent discussion on these threads:> >>> >> YARN as its own Hadoop "sub project": http://s.apache.org/WW1> >> Maintain a single committer list for the Hadoop project:> >>http://s.apache.org/Owx> >>> >> ...and just pay attention to the Hadoop project over the last 3-4> >>years. It's operating> >> as a single project, that's masking separate communities that> >>themselves are really> >> separate ASF projects.> >>> >> At the ASF, this has been a problem area called "umbrella" projects and> >>over the years,> >> all I've seen from them is wasted bandwidth, artificial barriers and> >>the inventions of> >> new ways to perform process mongering and to reduce the fun in> >>developing software> >> at this fantastic foundation.> >>> >> I've talked about umbrella projects enough. We've diverted conversation> >>enough.> >> Enough people have tried to act like there is some technical mumbo> >>jumbo that is> >> preventing the eventual act of higher power that I myself hope comes> >>should these> >> discussions prove unfruitful through normal means.> >>> >> *these. are. separate. projects.*> >>> >>*there.are.not.blocker.issues.from.spinning.out.these.projects.as.their.o> >>wn.communities*> >>> >> In this email: http://s.apache.org/rSm> >>> >> And in the 2 subsequent follow ons in that thread, I've outlined a> >>process that I'll copy> >> through below for splitting these projects into their own TLPs:> >>> >> -----snip> >> Process:> >>> >> 0. [DISCUSS] thread for <TLP name> in which you talk about #1 and #2> >>below, potentially draft resolution too.> >>> >> 1. Decide on an initial set of *PMC* members. I urge each new TLP to> >>adopt PMC==C. See reasons I've