It's been nearly a year since we merged Hadoop YARN into trunk and we have made several releases since.

It's exciting to see various open-source communities (both in the ASF and externally) start to explore integration with YARN such as Apache Hama, Apache Giraph, Apache S4, Spark etc. This promises to help us realize our hopes of making Apache Hadoop a much more general data processing platform (& storage, of course) and not tied to MapReduce alone for processing data. Furthermore, we already have people contributing interesting prototypes such as DistributedShell and PaaS on YARN.

Given this, I think it would be useful to make YARN a sub-project of Apache Hadoop along with Common, HDFS & MapReduce. I believe this would help other communities realize that they could consider using YARN as a general-purpose resource management layer and help us enhance YARN beyond it's humble beginnings.

Clearly, YARN and MapReduce are different enough that they can and will attract a diverse community.

I'd like to clarify that this proposal *does not* mean we move the code base out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, there would be *no changes* to release cycles - YARN would be co-released with Common, HDFS & MapReduce.

Thoughts?

----

What does it mean to the Hadoop developer community?

# Project dependencies

The change is that Hadoop would now have 4 sub-projects: Common, HDFS, YARN & MapReduce. As today, the dependencies *do not change*: - Common is the base- HDFS depends only on Common- YARN depends only on Common & HDFS - MapReduce depends on Common, HDFS & YARN.

# Jira & Mailing lists

We would have a separate YARN jira project and a yarn-dev@ mailing list.

We already use separate MAPREDUCE jira issues for making changes to YARN (ResourceManager, NodeManager) and to the MapReduce framework (MapReduce ApplicationMaster, MapReduce runtime etc.). Hence, this isn't a much of a change.

# Subversion

Not much at all! YARN has, since the beginning, been developed with the understanding that it is very independent of MapReduce and the code-bases are already independent i.e. hadoop-mapreduce-project/hadoop-yarn and hadoop-mapreduce-project/hadoop-mapreduce-client.

IMHO, it sounds like you guys might be better off proposing a new project for the Apache Incubator.Looking at the things you list below the ---, it looks like an Incubator proposal minus the initial committerlist, and affiliations and mentors/champions ;)

If you don't want to go to that level, I don't think you guys need anyone's permission, and/or etc., right?If YARN is a product of the Apache Hadoop PMC, you guys, as the PMC, can develop it and evolve it(it = the software and the community) how you guys see fit.

Cheers,ChrisOn Jul 25, 2012, at 6:40 PM, Arun C Murthy wrote:

> Folks,> > It's been nearly a year since we merged Hadoop YARN into trunk and we have made several releases since.> > It's exciting to see various open-source communities (both in the ASF and externally) start to explore integration with YARN such as Apache Hama, Apache Giraph, Apache S4, Spark etc. This promises to help us realize our hopes of making Apache Hadoop a much more general data processing platform (& storage, of course) and not tied to MapReduce alone for processing data. Furthermore, we already have people contributing interesting prototypes such as DistributedShell and PaaS on YARN.> > Given this, I think it would be useful to make YARN a sub-project of Apache Hadoop along with Common, HDFS & MapReduce. I believe this would help other communities realize that they could consider using YARN as a general-purpose resource management layer and help us enhance YARN beyond it's humble beginnings. > > Clearly, YARN and MapReduce are different enough that they can and will attract a diverse community.> > I'd like to clarify that this proposal *does not* mean we move the code base out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, there would be *no changes* to release cycles - YARN would be co-released with Common, HDFS & MapReduce.> > Thoughts?> > ----> > What does it mean to the Hadoop developer community?> > # Project dependencies> > The change is that Hadoop would now have 4 sub-projects: Common, HDFS, YARN & MapReduce. As today, the dependencies *do not change*: > - Common is the base> - HDFS depends only on Common> - YARN depends only on Common & HDFS > - MapReduce depends on Common, HDFS & YARN.> > # Jira & Mailing lists> > We would have a separate YARN jira project and a yarn-dev@ mailing list.> > We already use separate MAPREDUCE jira issues for making changes to YARN (ResourceManager, NodeManager) and to the MapReduce framework (MapReduce ApplicationMaster, MapReduce runtime etc.). Hence, this isn't a much of a change.> > # Subversion> > Not much at all! YARN has, since the beginning, been developed with the understanding that it is very independent of MapReduce and the code-bases are already independent i.e. hadoop-mapreduce-project/hadoop-yarn and hadoop-mapreduce-project/hadoop-mapreduce-client. > > Essentially the change would be:> $ svn mv hadoop-mapreduce-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn> ... and the necessary, albeit small, changes to our maven build infrastructure.> > # Release Cycles> > No changes.> > YARN would be co-released with Common, HDFS & MapReduce, as is the case today.> > thanks,> Arun++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++Chris Mattmann, Ph.D.Senior Computer ScientistNASA Jet Propulsion Laboratory Pasadena, CA 91109 USAOffice: 171-266B, Mailstop: 171-246Email: [EMAIL PROTECTED]WWW: http://sunset.usc.edu/~mattmann/++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++Adjunct Assistant Professor, Computer Science DepartmentUniversity of Southern California, Los Angeles, CA 90089 USA++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

> Given this, I think it would be useful to make YARN a sub-project of Apache Hadoop along with Common, HDFS & MapReduce. I believe this would help other communities realize that they could consider using YARN as a general-purpose resource management layer and help us enhance YARN beyond it's humble beginnings.>> Clearly, YARN and MapReduce are different enough that they can and will attract a diverse community.

+1

On Thu, Jul 26, 2012 at 10:40 AM, Arun C Murthy <[EMAIL PROTECTED]> wrote:> Folks,>> It's been nearly a year since we merged Hadoop YARN into trunk and we have made several releases since.>> It's exciting to see various open-source communities (both in the ASF and externally) start to explore integration with YARN such as Apache Hama, Apache Giraph, Apache S4, Spark etc. This promises to help us realize our hopes of making Apache Hadoop a much more general data processing platform (& storage, of course) and not tied to MapReduce alone for processing data. Furthermore, we already have people contributing interesting prototypes such as DistributedShell and PaaS on YARN.>> Given this, I think it would be useful to make YARN a sub-project of Apache Hadoop along with Common, HDFS & MapReduce. I believe this would help other communities realize that they could consider using YARN as a general-purpose resource management layer and help us enhance YARN beyond it's humble beginnings.>> Clearly, YARN and MapReduce are different enough that they can and will attract a diverse community.>> I'd like to clarify that this proposal *does not* mean we move the code base out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, there would be *no changes* to release cycles - YARN would be co-released with Common, HDFS & MapReduce.>> Thoughts?>> ---->> What does it mean to the Hadoop developer community?>> # Project dependencies>> The change is that Hadoop would now have 4 sub-projects: Common, HDFS, YARN & MapReduce. As today, the dependencies *do not change*:> - Common is the base> - HDFS depends only on Common> - YARN depends only on Common & HDFS> - MapReduce depends on Common, HDFS & YARN.>> # Jira & Mailing lists>> We would have a separate YARN jira project and a yarn-dev@ mailing list.>> We already use separate MAPREDUCE jira issues for making changes to YARN (ResourceManager, NodeManager) and to the MapReduce framework (MapReduce ApplicationMaster, MapReduce runtime etc.). Hence, this isn't a much of a change.>> # Subversion>> Not much at all! YARN has, since the beginning, been developed with the understanding that it is very independent of MapReduce and the code-bases are already independent i.e. hadoop-mapreduce-project/hadoop-yarn and hadoop-mapreduce-project/hadoop-mapreduce-client.>> Essentially the change would be:> $ svn mv hadoop-mapreduce-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn> ... and the necessary, albeit small, changes to our maven build infrastructure.>> # Release Cycles>> No changes.>> YARN would be co-released with Common, HDFS & MapReduce, as is the case today.>> thanks,> Arun

> Hi Arun,> > IMHO, it sounds like you guys might be better off proposing a new project for the Apache Incubator.> Looking at the things you list below the ---, it looks like an Incubator proposal minus the initial committer> list, and affiliations and mentors/champions ;)>

Fair point, thanks for chiming in Chris. However, I think we should revisit that when everything in Apache Hadoop (Common, HDFS, YARN & MapReduce) can fly out of the nest as separate projects. That, I think, is too early and also that keeping Common, HDFS, YARN & MapReduce together has value in ensuring that Hadoop continues to move along at a fair clip.

> If you don't want to go to that level, I don't think you guys need anyone's permission, and/or etc., right?> If YARN is a product of the Apache Hadoop PMC, you guys, as the PMC, can develop it and evolve it> (it = the software and the community) how you guys see fit.>

Agreed. Which is why I'm trying to gather consensus among the Hadoop community.

thanks,Arun> Cheers,> Chris> > > On Jul 25, 2012, at 6:40 PM, Arun C Murthy wrote:> >> Folks,>> >> It's been nearly a year since we merged Hadoop YARN into trunk and we have made several releases since.>> >> It's exciting to see various open-source communities (both in the ASF and externally) start to explore integration with YARN such as Apache Hama, Apache Giraph, Apache S4, Spark etc. This promises to help us realize our hopes of making Apache Hadoop a much more general data processing platform (& storage, of course) and not tied to MapReduce alone for processing data. Furthermore, we already have people contributing interesting prototypes such as DistributedShell and PaaS on YARN.>> >> Given this, I think it would be useful to make YARN a sub-project of Apache Hadoop along with Common, HDFS & MapReduce. I believe this would help other communities realize that they could consider using YARN as a general-purpose resource management layer and help us enhance YARN beyond it's humble beginnings. >> >> Clearly, YARN and MapReduce are different enough that they can and will attract a diverse community.>> >> I'd like to clarify that this proposal *does not* mean we move the code base out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, there would be *no changes* to release cycles - YARN would be co-released with Common, HDFS & MapReduce.>> >> Thoughts?>> >> ---->> >> What does it mean to the Hadoop developer community?>> >> # Project dependencies>> >> The change is that Hadoop would now have 4 sub-projects: Common, HDFS, YARN & MapReduce. As today, the dependencies *do not change*: >> - Common is the base>> - HDFS depends only on Common>> - YARN depends only on Common & HDFS >> - MapReduce depends on Common, HDFS & YARN.>> >> # Jira & Mailing lists>> >> We would have a separate YARN jira project and a yarn-dev@ mailing list.>> >> We already use separate MAPREDUCE jira issues for making changes to YARN (ResourceManager, NodeManager) and to the MapReduce framework (MapReduce ApplicationMaster, MapReduce runtime etc.). Hence, this isn't a much of a change.>> >> # Subversion>> >> Not much at all! YARN has, since the beginning, been developed with the understanding that it is very independent of MapReduce and the code-bases are already independent i.e. hadoop-mapreduce-project/hadoop-yarn and hadoop-mapreduce-project/hadoop-mapreduce-client. >> >> Essentially the change would be:>> $ svn mv hadoop-mapreduce-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn>> ... and the necessary, albeit small, changes to our maven build infrastructure.>> >> # Release Cycles>> >> No changes.>> >> YARN would be co-released with Common, HDFS & MapReduce, as is the case today.>> >> thanks,>> Arun> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

> Hi Chris,> > On Jul 25, 2012, at 7:03 PM, Mattmann, Chris A (388J) wrote:> >> Hi Arun,>> >> IMHO, it sounds like you guys might be better off proposing a new project for the Apache Incubator.>> Looking at the things you list below the ---, it looks like an Incubator proposal minus the initial committer>> list, and affiliations and mentors/champions ;)>> > > Fair point, thanks for chiming in Chris. However, I think we should revisit that when everything in Apache Hadoop (Common, HDFS, YARN & MapReduce) can fly out of the nest as separate projects.

Yep the way I've seen them managed, IMHO, they should be separate projects.

> That, I think, is too early and also that keeping Common, HDFS, YARN & MapReduce together has value in ensuring that Hadoop continues to move along at a fair clip.

I realize I'm asking a hard question here: why *aren't* they separate projects? What's the barrier? They seemto be operating that way (and have been for a while). And I don't see how Hadoop still couldnt' move along ata fair clip with them as official TLPs themselves.

> >> If you don't want to go to that level, I don't think you guys need anyone's permission, and/or etc., right?>> If YARN is a product of the Apache Hadoop PMC, you guys, as the PMC, can develop it and evolve it>> (it = the software and the community) how you guys see fit.>> > > Agreed. Which is why I'm trying to gather consensus among the Hadoop community.

Yeah I know you are doing great -- my point is, technically, what consensus is required -- you develop code at Apacheas individuals -- code is committed -- as are patches, etc. The PMC is there to regulate that, but it sounds like code wiseyou are proposing an svn mv command -- do you need an email thread to discuss that? Why not just do it, and if someonehas a problem, *then* discuss? Dunno, that's just my opinion.

The things that you are proposing that are new (e.g., mailing lists) will serve to splinter (at least the discussion in) the community IMHO -- this is spoken from experience in 2 situations (Nutch, Lucene) where we had an umbrella projects with tons of virtual "sub projects" that in the end have thrived as their own individual projects. if you are going to go that far, why not create a new Incubator project and just do it clean from the start?

Cheers,Chris

> > >> Cheers,>> Chris>> >> >> On Jul 25, 2012, at 6:40 PM, Arun C Murthy wrote:>> >>> Folks,>>> >>> It's been nearly a year since we merged Hadoop YARN into trunk and we have made several releases since.>>> >>> It's exciting to see various open-source communities (both in the ASF and externally) start to explore integration with YARN such as Apache Hama, Apache Giraph, Apache S4, Spark etc. This promises to help us realize our hopes of making Apache Hadoop a much more general data processing platform (& storage, of course) and not tied to MapReduce alone for processing data. Furthermore, we already have people contributing interesting prototypes such as DistributedShell and PaaS on YARN.>>> >>> Given this, I think it would be useful to make YARN a sub-project of Apache Hadoop along with Common, HDFS & MapReduce. I believe this would help other communities realize that they could consider using YARN as a general-purpose resource management layer and help us enhance YARN beyond it's humble beginnings. >>> >>> Clearly, YARN and MapReduce are different enough that they can and will attract a diverse community.>>> >>> I'd like to clarify that this proposal *does not* mean we move the code base out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, there would be *no changes* to release cycles - YARN would be co-released with Common, HDFS & MapReduce.>>> >>> Thoughts?>>> >>> ---->>> >>> What does it mean to the Hadoop developer community?>>> >>> # Project dependencies++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++Chris Mattmann, Ph.D.Senior Computer ScientistNASA Jet Propulsion Laboratory Pasadena, CA 91109 USAOffice: 171-266B, Mailstop: 171-246Email: [EMAIL PROTECTED]WWW: http://sunset.usc.edu/~mattmann/++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++Adjunct Assistant Professor, Computer Science DepartmentUniversity of Southern California, Los Angeles, CA 90089 USA++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

On Wed, Jul 25, 2012 at 7:09 PM, Edward J. Yoon <[EMAIL PROTECTED]> wrote:>> Given this, I think it would be useful to make YARN a sub-project of Apache Hadoop along with Common, HDFS & MapReduce. I believe this would help other communities realize that they could consider using YARN as a general-purpose resource management layer and help us enhance YARN beyond it's humble beginnings.>>>> Clearly, YARN and MapReduce are different enough that they can and will attract a diverse community.>> +1>> On Thu, Jul 26, 2012 at 10:40 AM, Arun C Murthy <[EMAIL PROTECTED]> wrote:>> Folks,>>>> It's been nearly a year since we merged Hadoop YARN into trunk and we have made several releases since.>>>> It's exciting to see various open-source communities (both in the ASF and externally) start to explore integration with YARN such as Apache Hama, Apache Giraph, Apache S4, Spark etc. This promises to help us realize our hopes of making Apache Hadoop a much more general data processing platform (& storage, of course) and not tied to MapReduce alone for processing data. Furthermore, we already have people contributing interesting prototypes such as DistributedShell and PaaS on YARN.>>>> Given this, I think it would be useful to make YARN a sub-project of Apache Hadoop along with Common, HDFS & MapReduce. I believe this would help other communities realize that they could consider using YARN as a general-purpose resource management layer and help us enhance YARN beyond it's humble beginnings.>>>> Clearly, YARN and MapReduce are different enough that they can and will attract a diverse community.>>>> I'd like to clarify that this proposal *does not* mean we move the code base out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, there would be *no changes* to release cycles - YARN would be co-released with Common, HDFS & MapReduce.>>>> Thoughts?>>>> ---->>>> What does it mean to the Hadoop developer community?>>>> # Project dependencies>>>> The change is that Hadoop would now have 4 sub-projects: Common, HDFS, YARN & MapReduce. As today, the dependencies *do not change*:>> - Common is the base>> - HDFS depends only on Common>> - YARN depends only on Common & HDFS>> - MapReduce depends on Common, HDFS & YARN.>>>> # Jira & Mailing lists>>>> We would have a separate YARN jira project and a yarn-dev@ mailing list.>>>> We already use separate MAPREDUCE jira issues for making changes to YARN (ResourceManager, NodeManager) and to the MapReduce framework (MapReduce ApplicationMaster, MapReduce runtime etc.). Hence, this isn't a much of a change.>>>> # Subversion>>>> Not much at all! YARN has, since the beginning, been developed with the understanding that it is very independent of MapReduce and the code-bases are already independent i.e. hadoop-mapreduce-project/hadoop-yarn and hadoop-mapreduce-project/hadoop-mapreduce-client.>>>> Essentially the change would be:>> $ svn mv hadoop-mapreduce-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn>> ... and the necessary, albeit small, changes to our maven build infrastructure.>>>> # Release Cycles>>>> No changes.>>>> YARN would be co-released with Common, HDFS & MapReduce, as is the case today.>>>> thanks,>> Arun>>>> --> Best Regards, Edward J. Yoon> @eddieyoon

> I realize I'm asking a hard question here: why *aren't* they separate> projects? What's the barrier? They seem> to be operating that way (and have been for a while). And I don't see how> Hadoop still couldnt' move along at> a fair clip with them as official TLPs themselves.>

I'm opposed to this if for no other reason than that it makes it difficultto make logically-individual changes which span the projects. As much as wemight like it to be the case, it is not presently true that Common is soindependent and stable from HDFS and MR/YARN that Common could reasonablybe separate and have its own release schedule. I think this view issupported by the fact that we once had separate SVN repos for Common, HDFS,and MR, but we undid that because having to make coordinated commits acrossthe several repos, and the complex build dependencies it induced, was tooonerous.

The main reason I'm opposed to making them separate projects is that Idon't think their internal interfaces are so stable that they couldreasonably release independently. Though we've been pretty good atmaintaining the stability of the external interfaces, we routinely makechanges in the internal interfaces of Common/HDFS/MR that make the projectsfairly tightly-coupled. Note that Arun's proposal specifically calls outthat the sub-projects would still release together, which I support.

Yeah I know you are doing great -- my point is, technically, what consensus> is required -- you develop code at Apache> as individuals -- code is committed -- as are patches, etc. The PMC is> there to regulate that, but it sounds like code wise> you are proposing an svn mv command -- do you need an email thread to> discuss that? Why not just do it, and if someone> has a problem, *then* discuss? Dunno, that's just my opinion.>

I for one really appreciate Arun having this discussion beforehand. Makinga change like this, even if it ends up being uncontroversial, will at leastbe quite disruptive to the developers working on Hadoop daily. I think it'sgreat that Arun sought out feedback first to make sure folks agree thatit's a worthwhile change to make.>> The things that you are proposing that are new (e.g., mailing lists) will> serve to splinter (at least the discussion in) the community IMHO --> this is spoken from experience in 2 situations (Nutch, Lucene) where we> had an umbrella projects with tons of virtual "sub projects" that> in the end have thrived as their own individual projects. if you are going> to go that far, why not create a new Incubator project and just do> it clean from the start?>

We recently discussed (and approved) merging all of the Hadoop*-user@mailing lists, so as to not splinter the user community, andmake theproject more approachable for users. In my experience, I've seen mostdevelopers (myself included) subscribe to all of the *-dev@ mailing lists.Even though I personally subscribe to all of them, I still prefer to havethem separate, so that I can easily set up email filters/labels.

On Wed, Jul 25, 2012 at 9:40 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote:> Folks,>> It's been nearly a year since we merged Hadoop YARN into trunk and we have made several releases since.>> It's exciting to see various open-source communities (both in the ASF and externally) start to explore integration with YARN such as Apache Hama, Apache Giraph, Apache S4, Spark etc. This promises to help us realize our hopes of making Apache Hadoop a much more general data processing platform (& storage, of course) and not tied to MapReduce alone for processing data. Furthermore, we already have people contributing interesting prototypes such as DistributedShell and PaaS on YARN.>> Given this, I think it would be useful to make YARN a sub-project of Apache Hadoop along with Common, HDFS & MapReduce. I believe this would help other communities realize that they could consider using YARN as a general-purpose resource management layer and help us enhance YARN beyond it's humble beginnings.>> Clearly, YARN and MapReduce are different enough that they can and will attract a diverse community.>> I'd like to clarify that this proposal *does not* mean we move the code base out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, there would be *no changes* to release cycles - YARN would be co-released with Common, HDFS & MapReduce.>> Thoughts?

+1 to the direction.

>> ---->> What does it mean to the Hadoop developer community?>> # Project dependencies>> The change is that Hadoop would now have 4 sub-projects: Common, HDFS, YARN & MapReduce. As today, the dependencies *do not change*:> - Common is the base> - HDFS depends only on Common> - YARN depends only on Common & HDFS> - MapReduce depends on Common, HDFS & YARN.

To be clear, these are runtime dependencies - YARN and MapReduceshould not have any compile-time dependencies on HDFS. SeeMAPREDUCE-4147 and MAPREDUCE-4148.

>> # Jira & Mailing lists>> We would have a separate YARN jira project and a yarn-dev@ mailing list.>> We already use separate MAPREDUCE jira issues for making changes to YARN (ResourceManager, NodeManager) and to the MapReduce framework (MapReduce ApplicationMaster, MapReduce runtime etc.). Hence, this isn't a much of a change.>> # Subversion>> Not much at all! YARN has, since the beginning, been developed with the understanding that it is very independent of MapReduce and the code-bases are already independent i.e. hadoop-mapreduce-project/hadoop-yarn and hadoop-mapreduce-project/hadoop-mapreduce-client.>> Essentially the change would be:> $ svn mv hadoop-mapreduce-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn> ... and the necessary, albeit small, changes to our maven build infrastructure.

It would be good to eliminate the resulting redundant level in thehierarchy at the same time: i.e.hadoop-mapreduce-project/hadoop-mapreduce-client ->hadoop-mapreduce-project.

Cheers,Tom

>> # Release Cycles>> No changes.>> YARN would be co-released with Common, HDFS & MapReduce, as is the case today.>> thanks,> Arun

+1 for what Aaron said. The projects are not ready to split yet.MAPREDUCE-3300 for example. YARN cannot display a UI for aggregatedcontainer logs unless we also have the MR History Server up and running.If we do want to split all of the projects HDFS, COMMON, YARN, andMAPREDUCE it will take some feature and design work to get the APIs to apoint that there are no more @LimitedPrivate APIs. I personally wouldlike to see this happen eventually, but it is not something on my prioritylist.--Bobby Evans

On 7/26/12 1:16 AM, "Aaron T. Myers" <[EMAIL PROTECTED]> wrote:

>On Wed, Jul 25, 2012 at 7:30 PM, Mattmann, Chris A (388J) <>[EMAIL PROTECTED]> wrote:>>> I realize I'm asking a hard question here: why *aren't* they separate>> projects? What's the barrier? They seem>> to be operating that way (and have been for a while). And I don't see>>how>> Hadoop still couldnt' move along at>> a fair clip with them as official TLPs themselves.>>>>I'm opposed to this if for no other reason than that it makes it difficult>to make logically-individual changes which span the projects. As much as>we>might like it to be the case, it is not presently true that Common is so>independent and stable from HDFS and MR/YARN that Common could reasonably>be separate and have its own release schedule. I think this view is>supported by the fact that we once had separate SVN repos for Common,>HDFS,>and MR, but we undid that because having to make coordinated commits>across>the several repos, and the complex build dependencies it induced, was too>onerous.>>The main reason I'm opposed to making them separate projects is that I>don't think their internal interfaces are so stable that they could>reasonably release independently. Though we've been pretty good at>maintaining the stability of the external interfaces, we routinely make>changes in the internal interfaces of Common/HDFS/MR that make the>projects>fairly tightly-coupled. Note that Arun's proposal specifically calls out>that the sub-projects would still release together, which I support.>>Yeah I know you are doing great -- my point is, technically, what>consensus>> is required -- you develop code at Apache>> as individuals -- code is committed -- as are patches, etc. The PMC is>> there to regulate that, but it sounds like code wise>> you are proposing an svn mv command -- do you need an email thread to>> discuss that? Why not just do it, and if someone>> has a problem, *then* discuss? Dunno, that's just my opinion.>>>>I for one really appreciate Arun having this discussion beforehand. Making>a change like this, even if it ends up being uncontroversial, will at>least>be quite disruptive to the developers working on Hadoop daily. I think>it's>great that Arun sought out feedback first to make sure folks agree that>it's a worthwhile change to make.>>>>>> The things that you are proposing that are new (e.g., mailing lists)>>will>> serve to splinter (at least the discussion in) the community IMHO -->> this is spoken from experience in 2 situations (Nutch, Lucene) where we>> had an umbrella projects with tons of virtual "sub projects" that>> in the end have thrived as their own individual projects. if you are>>going>> to go that far, why not create a new Incubator project and just do>> it clean from the start?>>>>We recently discussed (and approved) merging all of the Hadoop>*-user@mailing lists, so as to not splinter the user community, and>make the>project more approachable for users. In my experience, I've seen most>developers (myself included) subscribe to all of the *-dev@ mailing lists.>Even though I personally subscribe to all of them, I still prefer to have>them separate, so that I can easily set up email filters/labels.>>-->Aaron T. Myers>Software Engineer, Cloudera

> On Wed, Jul 25, 2012 at 7:30 PM, Mattmann, Chris A (388J) <> [EMAIL PROTECTED]> wrote:> >> I realize I'm asking a hard question here: why *aren't* they separate>> projects? What's the barrier? They seem>> to be operating that way (and have been for a while). And I don't see how>> Hadoop still couldnt' move along at>> a fair clip with them as official TLPs themselves.>> > > I'm opposed to this if for no other reason than that it makes it difficult> to make logically-individual changes which span the projects. As much as we> might like it to be the case, it is not presently true that Common is so> independent and stable from HDFS and MR/YARN that Common could reasonably> be separate and have its own release schedule. I think this view is> supported by the fact that we once had separate SVN repos for Common, HDFS,> and MR, but we undid that because having to make coordinated commits across> the several repos, and the complex build dependencies it induced, was too> onerous.

Fair enough.

> > The main reason I'm opposed to making them separate projects is that I> don't think their internal interfaces are so stable that they could> reasonably release independently.> Though we've been pretty good at> maintaining the stability of the external interfaces, we routinely make> changes in the internal interfaces of Common/HDFS/MR that make the projects> fairly tightly-coupled. Note that Arun's proposal specifically calls out> that the sub-projects would still release together, which I support.

Sub projects are not a good thing at Apache. Well, "official" sub projects that have their own committees, mailing lists, etc. You guys aren't talkingabout sub projects (though you call them that) -- in reality you are talkingabout *products* that the Apache Hadoop PMC releases. They may havedifferent names, be on different release schedules, have different mailinglists even (which I still is not the right thing to do), but they are not *projects*.

I guess that's one thing that got me confused with Arun's original proposal:in it there is talk of different sub-*projects* and making YARN a new sub-*project*and discussion of it and Map Reduce and each attracting a diverse (implied: different)community.

If you guys are talking about *products* that themselves have different *communities*then pretty much at Apache those are different *projects*.

If you are talking about different *products* that themselves have *the same community*who releases those *products* then we are talking about a single *project* at Apachethat has different *products* that it releases (am I confusing you yet?) :)

Regardless, I guess in the end what I was questioning was that if you lookat the net of Arun's proposal minus Project Dependencies (which is reallycode level things -- at Apache code is one thing, but we are dealing with*communities*), and Release Cycles (no changes), the proposal boils downto:

1. Creating separate mailing lists for YARN2. an svn mv command

My advice on #1 was be careful on splitting mailing lists, I've seen that cause trouble(even before Hadoop existed and in other Apache projects I've cited), and then on #2,why not execute the svn mv command and just move forward? You all are on the HadoopPMC and I assume trust Arun (and that he trusts you guys since you've given each otherthe commit bit), so move forward on it.

As for #2, your point about being happy Arun brought this up as it would have impact on the build cycle/etc etc., that makes sense and is a good reason to DISCUSS it.> > Yeah I know you are doing great -- my point is, technically, what consensus>> is required -- you develop code at Apache>> as individuals -- code is committed -- as are patches, etc. The PMC is>> there to regulate that, but it sounds like code wise>> you are proposing an svn mv command -- do you need an email thread to>> discuss that? Why not just do it, and if someone

Yep thanks. This is good validation for #2 above then.Yeah, that's cool. I do the same myself and that makes sense. It justseemed like a formal proposal to create a project, minus the creatingproject thing, so I thought I'd ask.

> +1 for what Aaron said. The projects are not ready to split yet.> MAPREDUCE-3300 for example. YARN cannot display a UI for aggregated> container logs unless we also have the MR History Server up and running.> If we do want to split all of the projects HDFS, COMMON, YARN, and> MAPREDUCE it will take some feature and design work to get the APIs to a> point that there are no more @LimitedPrivate APIs. I personally would> like to see this happen eventually, but it is not something on my priority> list.> > > --Bobby Evans> > On 7/26/12 1:16 AM, "Aaron T. Myers" <[EMAIL PROTECTED]> wrote:> >> On Wed, Jul 25, 2012 at 7:30 PM, Mattmann, Chris A (388J) <>> [EMAIL PROTECTED]> wrote:>> >>> I realize I'm asking a hard question here: why *aren't* they separate>>> projects? What's the barrier? They seem>>> to be operating that way (and have been for a while). And I don't see>>> how>>> Hadoop still couldnt' move along at>>> a fair clip with them as official TLPs themselves.>>> >> >> I'm opposed to this if for no other reason than that it makes it difficult>> to make logically-individual changes which span the projects. As much as>> we>> might like it to be the case, it is not presently true that Common is so>> independent and stable from HDFS and MR/YARN that Common could reasonably>> be separate and have its own release schedule. I think this view is>> supported by the fact that we once had separate SVN repos for Common,>> HDFS,>> and MR, but we undid that because having to make coordinated commits>> across>> the several repos, and the complex build dependencies it induced, was too>> onerous.>> >> The main reason I'm opposed to making them separate projects is that I>> don't think their internal interfaces are so stable that they could>> reasonably release independently. Though we've been pretty good at>> maintaining the stability of the external interfaces, we routinely make>> changes in the internal interfaces of Common/HDFS/MR that make the>> projects>> fairly tightly-coupled. Note that Arun's proposal specifically calls out>> that the sub-projects would still release together, which I support.>> >> Yeah I know you are doing great -- my point is, technically, what>> consensus>>> is required -- you develop code at Apache>>> as individuals -- code is committed -- as are patches, etc. The PMC is>>> there to regulate that, but it sounds like code wise>>> you are proposing an svn mv command -- do you need an email thread to>>> discuss that? Why not just do it, and if someone>>> has a problem, *then* discuss? Dunno, that's just my opinion.>>> >> >> I for one really appreciate Arun having this discussion beforehand. Making>> a change like this, even if it ends up being uncontroversial, will at>> least>> be quite disruptive to the developers working on Hadoop daily. I think>> it's>> great that Arun sought out feedback first to make sure folks agree that>> it's a worthwhile change to make.>> >> >>> >>> The things that you are proposing that are new (e.g., mailing lists)>>> will>>> serve to splinter (at least the discussion in) the community IMHO -->>> this is spoken from experience in 2 situations (Nutch, Lucene) where we>>> had an umbrella projects with tons of virtual "sub projects" that>>> in the end have thrived as their own individual projects. if you are>>> going>>> to go that far, why not create a new Incubator project and just do>>> it clean from the start?>>> >> >> We recently discussed (and approved) merging all of the Hadoop>> *-user@mailing lists, so as to not splinter the user community, and>> make the>> project more approachable for users. In my experience, I've seen most>> developers (myself included) subscribe to all of the *-dev@ mailing lists.>> Even though I personally subscribe to all of them, I still prefer to have>> them separate, so that I can easily set up email filters/labels.++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++Chris Mattmann, Ph.D.Senior Computer ScientistNASA Jet Propulsion Laboratory Pasadena, CA 91109 USAOffice: 171-266B, Mailstop: 171-246Email: [EMAIL PROTECTED]WWW: http://sunset.usc.edu/~mattmann/++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++Adjunct Assistant Professor, Computer Science DepartmentUniversity of Southern California, Los Angeles, CA 90089 USA++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

+1 on moving hadoop-yarn to trunk/ level. As part of that, can we flattenthe internal hierarchy so there are not multiple nested modules withinhadoop-yarn module? just one level as in common, hdfs & tools? this willmake the build more consistent and will allow to consolidate logic in thePOMs. This flattening would also apply to MR modules.

Also does this means we'll be creating a new JIRA project 'YARN'? Myproblem with the current multi projects approach is that you cannot doumbrella JIRAs with subtasks spanning across different projects, allsubtasks must be in the same project. Does anybody know if there is aconfig in JIRA to enable cross-project subtasks within a set of projects?

Thx.

On Thu, Jul 26, 2012 at 7:23 AM, Tom White <[EMAIL PROTECTED]> wrote:

> On Wed, Jul 25, 2012 at 9:40 PM, Arun C Murthy <[EMAIL PROTECTED]>> wrote:> > Folks,> >> > It's been nearly a year since we merged Hadoop YARN into trunk and we> have made several releases since.> >> > It's exciting to see various open-source communities (both in the ASF> and externally) start to explore integration with YARN such as Apache Hama,> Apache Giraph, Apache S4, Spark etc. This promises to help us realize our> hopes of making Apache Hadoop a much more general data processing platform> (& storage, of course) and not tied to MapReduce alone for processing data.> Furthermore, we already have people contributing interesting prototypes> such as DistributedShell and PaaS on YARN.> >> > Given this, I think it would be useful to make YARN a sub-project of> Apache Hadoop along with Common, HDFS & MapReduce. I believe this would> help other communities realize that they could consider using YARN as a> general-purpose resource management layer and help us enhance YARN beyond> it's humble beginnings.> >> > Clearly, YARN and MapReduce are different enough that they can and will> attract a diverse community.> >> > I'd like to clarify that this proposal *does not* mean we move the code> base out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside> hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, there> would be *no changes* to release cycles - YARN would be co-released with> Common, HDFS & MapReduce.> >> > Thoughts?>> +1 to the direction.>> >> > ----> >> > What does it mean to the Hadoop developer community?> >> > # Project dependencies> >> > The change is that Hadoop would now have 4 sub-projects: Common, HDFS,> YARN & MapReduce. As today, the dependencies *do not change*:> > - Common is the base> > - HDFS depends only on Common> > - YARN depends only on Common & HDFS> > - MapReduce depends on Common, HDFS & YARN.>> To be clear, these are runtime dependencies - YARN and MapReduce> should not have any compile-time dependencies on HDFS. See> MAPREDUCE-4147 and MAPREDUCE-4148.>> >> > # Jira & Mailing lists> >> > We would have a separate YARN jira project and a yarn-dev@ mailing list.> >> > We already use separate MAPREDUCE jira issues for making changes to YARN> (ResourceManager, NodeManager) and to the MapReduce framework (MapReduce> ApplicationMaster, MapReduce runtime etc.). Hence, this isn't a much of a> change.> >> > # Subversion> >> > Not much at all! YARN has, since the beginning, been developed with the> understanding that it is very independent of MapReduce and the code-bases> are already independent i.e. hadoop-mapreduce-project/hadoop-yarn and> hadoop-mapreduce-project/hadoop-mapreduce-client.> >> > Essentially the change would be:> > $ svn mv hadoop-mapreduce-project/hadoop-yarn> hadoop-yarn-project/hadoop-yarn> > ... and the necessary, albeit small, changes to our maven build> infrastructure.>> It would be good to eliminate the resulting redundant level in the> hierarchy at the same time: i.e.> hadoop-mapreduce-project/hadoop-mapreduce-client ->> hadoop-mapreduce-project.>> Cheers,> Tom>> >> > # Release Cycles> >> > No changes.> >> > YARN would be co-released with Common, HDFS & MapReduce, as is the case

> Folks,>> It's been nearly a year since we merged Hadoop YARN into trunk and we have> made several releases since.>> It's exciting to see various open-source communities (both in the ASF and> externally) start to explore integration with YARN such as Apache Hama,> Apache Giraph, Apache S4, Spark etc. This promises to help us realize our> hopes of making Apache Hadoop a much more general data processing platform> (& storage, of course) and not tied to MapReduce alone for processing data.> Furthermore, we already have people contributing interesting prototypes> such as DistributedShell and PaaS on YARN.>> Given this, I think it would be useful to make YARN a sub-project of> Apache Hadoop along with Common, HDFS & MapReduce. I believe this would> help other communities realize that they could consider using YARN as a> general-purpose resource management layer and help us enhance YARN beyond> it's humble beginnings.>> Clearly, YARN and MapReduce are different enough that they can and will> attract a diverse community.>> I'd like to clarify that this proposal *does not* mean we move the code> base out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside> hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, there> would be *no changes* to release cycles - YARN would be co-released with> Common, HDFS & MapReduce.>>

If the goal is to clearly partition the scheduling layer from the applayer, and you think it helps isolate changes, then yes

+1

Forcing that strict hierarchy does ensure that you really do have a cleanseparation of modules, and emphasises that it is more than just MapRed -aspeople add more applications I can see that the separation would get theirneeds addressed. Having a separate project could also allow Yarn to do apoint release in sync with those other projects, as well as do co-ordinatedreleases with Hadoop itself.

It should also make clear that Yarn is designed to be a topology-awareunderpinning of a datacentre, interesting in its own right. Which remindsme, I'd better get my topology stuff in.

> As part of that, can we flatten> the internal hierarchy so there are not multiple nested modules within> hadoop-yarn module? just one level as in common, hdfs & tools? this will> make the build more consistent and will allow to consolidate logic in the> POMs. This flattening would also apply to MR modules.>

You need to start a a project using Gradle as its build tool. Your lifewill be better, and you can stop worrying about how Maven handles things.

Otherwise, +1 to doing something about the POMs, though that's very much anartifact of Maven's world view. Bigtop is similarly complex.

The main question is, is this a good idea without considering the detailsof how easy/hard it is to do? I think it is a good idea and we should movein this direction. If we all agree on this, lets discuss main issues thatneed to be resolved to split YARN into a separate project. As others havesuggested, we should ensure this is done smoothly and does not disrupt theproject and does not make day to day work for contributors very hard.On Thu, Jul 26, 2012 at 7:28 AM, Robert Evans <[EMAIL PROTECTED]> wrote:

> +1 for what Aaron said. The projects are not ready to split yet.> MAPREDUCE-3300 for example. YARN cannot display a UI for aggregated> container logs unless we also have the MR History Server up and running.> If we do want to split all of the projects HDFS, COMMON, YARN, and> MAPREDUCE it will take some feature and design work to get the APIs to a> point that there are no more @LimitedPrivate APIs. I personally would> like to see this happen eventually, but it is not something on my priority> list.>>> --Bobby Evans>> On 7/26/12 1:16 AM, "Aaron T. Myers" <[EMAIL PROTECTED]> wrote:>> >On Wed, Jul 25, 2012 at 7:30 PM, Mattmann, Chris A (388J) <> >[EMAIL PROTECTED]> wrote:> >> >> I realize I'm asking a hard question here: why *aren't* they separate> >> projects? What's the barrier? They seem> >> to be operating that way (and have been for a while). And I don't see> >>how> >> Hadoop still couldnt' move along at> >> a fair clip with them as official TLPs themselves.> >>> >> >I'm opposed to this if for no other reason than that it makes it difficult> >to make logically-individual changes which span the projects. As much as> >we> >might like it to be the case, it is not presently true that Common is so> >independent and stable from HDFS and MR/YARN that Common could reasonably> >be separate and have its own release schedule. I think this view is> >supported by the fact that we once had separate SVN repos for Common,> >HDFS,> >and MR, but we undid that because having to make coordinated commits> >across> >the several repos, and the complex build dependencies it induced, was too> >onerous.> >> >The main reason I'm opposed to making them separate projects is that I> >don't think their internal interfaces are so stable that they could> >reasonably release independently. Though we've been pretty good at> >maintaining the stability of the external interfaces, we routinely make> >changes in the internal interfaces of Common/HDFS/MR that make the> >projects> >fairly tightly-coupled. Note that Arun's proposal specifically calls out> >that the sub-projects would still release together, which I support.> >> >Yeah I know you are doing great -- my point is, technically, what> >consensus> >> is required -- you develop code at Apache> >> as individuals -- code is committed -- as are patches, etc. The PMC is> >> there to regulate that, but it sounds like code wise> >> you are proposing an svn mv command -- do you need an email thread to> >> discuss that? Why not just do it, and if someone> >> has a problem, *then* discuss? Dunno, that's just my opinion.> >>> >> >I for one really appreciate Arun having this discussion beforehand. Making> >a change like this, even if it ends up being uncontroversial, will at> >least> >be quite disruptive to the developers working on Hadoop daily. I think> >it's> >great that Arun sought out feedback first to make sure folks agree that> >it's a worthwhile change to make.> >> >> >>> >> The things that you are proposing that are new (e.g., mailing lists)> >>will> >> serve to splinter (at least the discussion in) the community IMHO --> >> this is spoken from experience in 2 situations (Nutch, Lucene) where we> >> had an umbrella projects with tons of virtual "sub projects" that> >> in the end have thrived as their own individual projects. if you are> >>going> >> to go that far, why not create a new Incubator project and just dohttp://hortonworks.com/download/

I'm not sure what the goal of that is. If this is an Apacheorganizational/political thing then I am oblivious.

If the point is that YARN should not be a subproject of MapReduce, then Iagree completely. Any argument by which YARN is a subproject of MR couldalso be made that YARN should be a subproject of MPI, Spark, etc. Andobviously it cannot be a subproject of all of them.

To that end, YARN should be a peer of core and hdfs. I prefer that MRremain a peer of those as well, but since the current approach seems toprefer over factoring things with painfully deep hierarchies, then theconsistent thing to do would be to make MR a subproject of YARN (blech).I prefer simple flat trees, though.

jay

On 7/25/12 6:40 PM, "Arun C Murthy" <[EMAIL PROTECTED]> wrote:

>Folks,>>It's been nearly a year since we merged Hadoop YARN into trunk and we>have made several releases since.>>It's exciting to see various open-source communities (both in the ASF and>externally) start to explore integration with YARN such as Apache Hama,>Apache Giraph, Apache S4, Spark etc. This promises to help us realize our>hopes of making Apache Hadoop a much more general data processing>platform (& storage, of course) and not tied to MapReduce alone for>processing data. Furthermore, we already have people contributing>interesting prototypes such as DistributedShell and PaaS on YARN.>>Given this, I think it would be useful to make YARN a sub-project of>Apache Hadoop along with Common, HDFS & MapReduce. I believe this would>help other communities realize that they could consider using YARN as a>general-purpose resource management layer and help us enhance YARN beyond>it's humble beginnings.>>Clearly, YARN and MapReduce are different enough that they can and will>attract a diverse community.>>I'd like to clarify that this proposal *does not* mean we move the code>base out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside>hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also,>there would be *no changes* to release cycles - YARN would be co-released>with Common, HDFS & MapReduce.>>Thoughts?>>---->>What does it mean to the Hadoop developer community?>># Project dependencies>>The change is that Hadoop would now have 4 sub-projects: Common, HDFS,>YARN & MapReduce. As today, the dependencies *do not change*:>- Common is the base>- HDFS depends only on Common>- YARN depends only on Common & HDFS>- MapReduce depends on Common, HDFS & YARN.>># Jira & Mailing lists>>We would have a separate YARN jira project and a yarn-dev@ mailing list.>>We already use separate MAPREDUCE jira issues for making changes to YARN>(ResourceManager, NodeManager) and to the MapReduce framework (MapReduce>ApplicationMaster, MapReduce runtime etc.). Hence, this isn't a much of a>change.>># Subversion>>Not much at all! YARN has, since the beginning, been developed with the>understanding that it is very independent of MapReduce and the code-bases>are already independent i.e. hadoop-mapreduce-project/hadoop-yarn and>hadoop-mapreduce-project/hadoop-mapreduce-client.>>Essentially the change would be:>$ svn mv hadoop-mapreduce-project/hadoop-yarn>hadoop-yarn-project/hadoop-yarn>... and the necessary, albeit small, changes to our maven build>infrastructure.>># Release Cycles>>No changes.>>YARN would be co-released with Common, HDFS & MapReduce, as is the case>today.>>thanks,>Arun

> Sub projects are not a good thing at Apache. Well, "official" sub projects> that have their own committees, mailing lists, etc. You guys aren't talking> about sub projects (though you call them that) -- in reality you are> talking> about *products* that the Apache Hadoop PMC releases. They may have> different names, be on different release schedules, have different mailing> lists even (which I still is not the right thing to do), but they are not> *projects*. <snip>>

Yea, sounds like we have a bit of a terminology problem here. We've alwayscalled them "sub-projects", but in fact they're all managed by a singlePMC, released as a single artifact, live in a single source repository,will soon have a single user mailing list, and have a largely overlappingset of committers. The things they do maintain separately are *-dev@/*-issues@/*-commits@ mailing lists, and separate "JIRA projects." I thinkthese separations are worth maintaining.

Anyway, I think that having totally separate TLPs may one day make sense,but I think it would be premature to do so now.

> Hi Chris,> > On Thu, Jul 26, 2012 at 8:00 AM, Mattmann, Chris A (388J) <> [EMAIL PROTECTED]> wrote:> >> Sub projects are not a good thing at Apache. Well, "official" sub projects>> that have their own committees, mailing lists, etc. You guys aren't talking>> about sub projects (though you call them that) -- in reality you are>> talking>> about *products* that the Apache Hadoop PMC releases. They may have>> different names, be on different release schedules, have different mailing>> lists even (which I still is not the right thing to do), but they are not>> *projects*. <snip>>> > > Yea, sounds like we have a bit of a terminology problem here. We've always> called them "sub-projects", but in fact they're all managed by a single> PMC, released as a single artifact, live in a single source repository,> will soon have a single user mailing list, and have a largely overlapping> set of committers. The things they do maintain separately are *-dev@> /*-issues@/*-commits@ mailing lists, and separate "JIRA projects." I think> these separations are worth maintaining.> > Anyway, I think that having totally separate TLPs may one day make sense,> but I think it would be premature to do so now.> > Thanks for the discussion, Chris.> > --> Aaron T. Myers> Software Engineer, Cloudera++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++Chris Mattmann, Ph.D.Senior Computer ScientistNASA Jet Propulsion Laboratory Pasadena, CA 91109 USAOffice: 171-266B, Mailstop: 171-246Email: [EMAIL PROTECTED]WWW: http://sunset.usc.edu/~mattmann/++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++Adjunct Assistant Professor, Computer Science DepartmentUniversity of Southern California, Los Angeles, CA 90089 USA++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

On Wed, Jul 25, 2012 at 6:40 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote:> Folks,>> It's been nearly a year since we merged Hadoop YARN into trunk and we have made several releases since.>> It's exciting to see various open-source communities (both in the ASF and externally) start to explore integration with YARN such as Apache Hama, Apache Giraph, Apache S4, Spark etc. This promises to help us realize our hopes of making Apache Hadoop a much more general data processing platform (& storage, of course) and not tied to MapReduce alone for processing data. Furthermore, we already have people contributing interesting prototypes such as DistributedShell and PaaS on YARN.>> Given this, I think it would be useful to make YARN a sub-project of Apache Hadoop along with Common, HDFS & MapReduce. I believe this would help other communities realize that they could consider using YARN as a general-purpose resource management layer and help us enhance YARN beyond it's humble beginnings.>> Clearly, YARN and MapReduce are different enough that they can and will attract a diverse community.>> I'd like to clarify that this proposal *does not* mean we move the code base out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, there would be *no changes* to release cycles - YARN would be co-released with Common, HDFS & MapReduce.>> Thoughts?>> ---->> What does it mean to the Hadoop developer community?>> # Project dependencies>> The change is that Hadoop would now have 4 sub-projects: Common, HDFS, YARN & MapReduce. As today, the dependencies *do not change*:> - Common is the base> - HDFS depends only on Common> - YARN depends only on Common & HDFS> - MapReduce depends on Common, HDFS & YARN.>> # Jira & Mailing lists>> We would have a separate YARN jira project and a yarn-dev@ mailing list.>> We already use separate MAPREDUCE jira issues for making changes to YARN (ResourceManager, NodeManager) and to the MapReduce framework (MapReduce ApplicationMaster, MapReduce runtime etc.). Hence, this isn't a much of a change.>> # Subversion>> Not much at all! YARN has, since the beginning, been developed with the understanding that it is very independent of MapReduce and the code-bases are already independent i.e. hadoop-mapreduce-project/hadoop-yarn and hadoop-mapreduce-project/hadoop-mapreduce-client.>> Essentially the change would be:> $ svn mv hadoop-mapreduce-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn> ... and the necessary, albeit small, changes to our maven build infrastructure.>> # Release Cycles>> No changes.>> YARN would be co-released with Common, HDFS & MapReduce, as is the case today.>> thanks,> Arun

+1 for the idea. I think separating the framework from the MR applicationmakes sense.

Tom On 7/25/12 8:40 PM, "Arun C Murthy" <[EMAIL PROTECTED]> wrote:

> Folks,> > It's been nearly a year since we merged Hadoop YARN into trunk and we have> made several releases since.> > It's exciting to see various open-source communities (both in the ASF and> externally) start to explore integration with YARN such as Apache Hama, Apache> Giraph, Apache S4, Spark etc. This promises to help us realize our hopes of> making Apache Hadoop a much more general data processing platform (& storage,> of course) and not tied to MapReduce alone for processing data. Furthermore,> we already have people contributing interesting prototypes such as> DistributedShell and PaaS on YARN.> > Given this, I think it would be useful to make YARN a sub-project of Apache> Hadoop along with Common, HDFS & MapReduce. I believe this would help other> communities realize that they could consider using YARN as a general-purpose> resource management layer and help us enhance YARN beyond it's humble> beginnings. > > Clearly, YARN and MapReduce are different enough that they can and will> attract a diverse community.> > I'd like to clarify that this proposal *does not* mean we move the code base> out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside> hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, there> would be *no changes* to release cycles - YARN would be co-released with> Common, HDFS & MapReduce.> > Thoughts?> > ----> > What does it mean to the Hadoop developer community?> > # Project dependencies> > The change is that Hadoop would now have 4 sub-projects: Common, HDFS, YARN &> MapReduce. As today, the dependencies *do not change*:> - Common is the base> - HDFS depends only on Common> - YARN depends only on Common & HDFS> - MapReduce depends on Common, HDFS & YARN.> > # Jira & Mailing lists> > We would have a separate YARN jira project and a yarn-dev@ mailing list.> > We already use separate MAPREDUCE jira issues for making changes to YARN> (ResourceManager, NodeManager) and to the MapReduce framework (MapReduce> ApplicationMaster, MapReduce runtime etc.). Hence, this isn't a much of a> change.> > # Subversion> > Not much at all! YARN has, since the beginning, been developed with the> understanding that it is very independent of MapReduce and the code-bases are> already independent i.e. hadoop-mapreduce-project/hadoop-yarn and> hadoop-mapreduce-project/hadoop-mapreduce-client.> > Essentially the change would be:> $ svn mv hadoop-mapreduce-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn> ... and the necessary, albeit small, changes to our maven build> infrastructure.> > # Release Cycles> > No changes.> > YARN would be co-released with Common, HDFS & MapReduce, as is the case today.> > thanks,> Arun

Thanks Arun! +1, this organization makes sense. Also, what will be thestrategy for applications other than MapReduce going forward. Willthey be part of YARN or separate sub-projects like MapReduce? They nowlive inside hadoop-yarn-applications. I think they can remain there,and when getting mature enough, they can either become separatesub-projects, or even TLPs based on how large and independent theyare. Thoughts?

Best RegardsAhmed

On Wed, Jul 25, 2012 at 6:40 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote:> Folks,>> It's been nearly a year since we merged Hadoop YARN into trunk and we have made several releases since.>> It's exciting to see various open-source communities (both in the ASF and externally) start to explore integration with YARN such as Apache Hama, Apache Giraph, Apache S4, Spark etc. This promises to help us realize our hopes of making Apache Hadoop a much more general data processing platform (& storage, of course) and not tied to MapReduce alone for processing data. Furthermore, we already have people contributing interesting prototypes such as DistributedShell and PaaS on YARN.>> Given this, I think it would be useful to make YARN a sub-project of Apache Hadoop along with Common, HDFS & MapReduce. I believe this would help other communities realize that they could consider using YARN as a general-purpose resource management layer and help us enhance YARN beyond it's humble beginnings.>> Clearly, YARN and MapReduce are different enough that they can and will attract a diverse community.>> I'd like to clarify that this proposal *does not* mean we move the code base out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, there would be *no changes* to release cycles - YARN would be co-released with Common, HDFS & MapReduce.>> Thoughts?>> ---->> What does it mean to the Hadoop developer community?>> # Project dependencies>> The change is that Hadoop would now have 4 sub-projects: Common, HDFS, YARN & MapReduce. As today, the dependencies *do not change*:> - Common is the base> - HDFS depends only on Common> - YARN depends only on Common & HDFS> - MapReduce depends on Common, HDFS & YARN.>> # Jira & Mailing lists>> We would have a separate YARN jira project and a yarn-dev@ mailing list.>> We already use separate MAPREDUCE jira issues for making changes to YARN (ResourceManager, NodeManager) and to the MapReduce framework (MapReduce ApplicationMaster, MapReduce runtime etc.). Hence, this isn't a much of a change.>> # Subversion>> Not much at all! YARN has, since the beginning, been developed with the understanding that it is very independent of MapReduce and the code-bases are already independent i.e. hadoop-mapreduce-project/hadoop-yarn and hadoop-mapreduce-project/hadoop-mapreduce-client.>> Essentially the change would be:> $ svn mv hadoop-mapreduce-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn> ... and the necessary, albeit small, changes to our maven build infrastructure.>> # Release Cycles>> No changes.>> YARN would be co-released with Common, HDFS & MapReduce, as is the case today.>> thanks,> Arun

> Folks,> > It's been nearly a year since we merged Hadoop YARN into trunk and we have made several releases since.> > It's exciting to see various open-source communities (both in the ASF and externally) start to explore integration with YARN such as Apache Hama, Apache Giraph, Apache S4, Spark etc. This promises to help us realize our hopes of making Apache Hadoop a much more general data processing platform (& storage, of course) and not tied to MapReduce alone for processing data. Furthermore, we already have people contributing interesting prototypes such as DistributedShell and PaaS on YARN.> > Given this, I think it would be useful to make YARN a sub-project of Apache Hadoop along with Common, HDFS & MapReduce. I believe this would help other communities realize that they could consider using YARN as a general-purpose resource management layer and help us enhance YARN beyond it's humble beginnings. > > Clearly, YARN and MapReduce are different enough that they can and will attract a diverse community.> > I'd like to clarify that this proposal *does not* mean we move the code base out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, there would be *no changes* to release cycles - YARN would be co-released with Common, HDFS & MapReduce.> > Thoughts?> > ----> > What does it mean to the Hadoop developer community?> > # Project dependencies> > The change is that Hadoop would now have 4 sub-projects: Common, HDFS, YARN & MapReduce. As today, the dependencies *do not change*: > - Common is the base> - HDFS depends only on Common> - YARN depends only on Common & HDFS > - MapReduce depends on Common, HDFS & YARN.> > # Jira & Mailing lists> > We would have a separate YARN jira project and a yarn-dev@ mailing list.> > We already use separate MAPREDUCE jira issues for making changes to YARN (ResourceManager, NodeManager) and to the MapReduce framework (MapReduce ApplicationMaster, MapReduce runtime etc.). Hence, this isn't a much of a change.> > # Subversion> > Not much at all! YARN has, since the beginning, been developed with the understanding that it is very independent of MapReduce and the code-bases are already independent i.e. hadoop-mapreduce-project/hadoop-yarn and hadoop-mapreduce-project/hadoop-mapreduce-client. > > Essentially the change would be:> $ svn mv hadoop-mapreduce-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn> ... and the necessary, albeit small, changes to our maven build infrastructure.> > # Release Cycles> > No changes.> > YARN would be co-released with Common, HDFS & MapReduce, as is the case today.> > thanks,> Arun

As others have noted we should probably stop using the term"subproject" for these, as that's most often used at Apache for thingsthat are released independently. Better terms might be "components"or "modules". Addressing that might also require restructuring thewebsite.

Doug

On Wed, Jul 25, 2012 at 6:40 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote:> Folks,>> It's been nearly a year since we merged Hadoop YARN into trunk and we have made several releases since.>> It's exciting to see various open-source communities (both in the ASF and externally) start to explore integration with YARN such as Apache Hama, Apache Giraph, Apache S4, Spark etc. This promises to help us realize our hopes of making Apache Hadoop a much more general data processing platform (& storage, of course) and not tied to MapReduce alone for processing data. Furthermore, we already have people contributing interesting prototypes such as DistributedShell and PaaS on YARN.>> Given this, I think it would be useful to make YARN a sub-project of Apache Hadoop along with Common, HDFS & MapReduce. I believe this would help other communities realize that they could consider using YARN as a general-purpose resource management layer and help us enhance YARN beyond it's humble beginnings.>> Clearly, YARN and MapReduce are different enough that they can and will attract a diverse community.>> I'd like to clarify that this proposal *does not* mean we move the code base out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, there would be *no changes* to release cycles - YARN would be co-released with Common, HDFS & MapReduce.>> Thoughts?>> ---->> What does it mean to the Hadoop developer community?>> # Project dependencies>> The change is that Hadoop would now have 4 sub-projects: Common, HDFS, YARN & MapReduce. As today, the dependencies *do not change*:> - Common is the base> - HDFS depends only on Common> - YARN depends only on Common & HDFS> - MapReduce depends on Common, HDFS & YARN.>> # Jira & Mailing lists>> We would have a separate YARN jira project and a yarn-dev@ mailing list.>> We already use separate MAPREDUCE jira issues for making changes to YARN (ResourceManager, NodeManager) and to the MapReduce framework (MapReduce ApplicationMaster, MapReduce runtime etc.). Hence, this isn't a much of a change.>> # Subversion>> Not much at all! YARN has, since the beginning, been developed with the understanding that it is very independent of MapReduce and the code-bases are already independent i.e. hadoop-mapreduce-project/hadoop-yarn and hadoop-mapreduce-project/hadoop-mapreduce-client.>> Essentially the change would be:> $ svn mv hadoop-mapreduce-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn> ... and the necessary, albeit small, changes to our maven build infrastructure.>> # Release Cycles>> No changes.>> YARN would be co-released with Common, HDFS & MapReduce, as is the case today.>> thanks,> Arun

It's been nearly a year since we merged Hadoop YARN into trunk and we have made several releases since.

It's exciting to see various open-source communities (both in the ASF and externally) start to explore integration with YARN such as Apache Hama, Apache Giraph, Apache S4, Spark etc. This promises to help us realize our hopes of making Apache Hadoop a much more general data processing platform (& storage, of course) and not tied to MapReduce alone for processing data. Furthermore, we already have people contributing interesting prototypes such as DistributedShell and PaaS on YARN.

Given this, I think it would be useful to make YARN a sub-project of Apache Hadoop along with Common, HDFS & MapReduce. I believe this would help other communities realize that they could consider using YARN as a general-purpose resource management layer and help us enhance YARN beyond it's humble beginnings.

Clearly, YARN and MapReduce are different enough that they can and will attract a diverse community.

I'd like to clarify that this proposal *does not* mean we move the code base out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, there would be *no changes* to release cycles - YARN would be co-released with Common, HDFS & MapReduce.

Thoughts?

----

What does it mean to the Hadoop developer community?

# Project dependencies

The change is that Hadoop would now have 4 sub-projects: Common, HDFS, YARN & MapReduce. As today, the dependencies *do not change*: - Common is the base- HDFS depends only on Common- YARN depends only on Common & HDFS - MapReduce depends on Common, HDFS & YARN.

# Jira & Mailing lists

We would have a separate YARN jira project and a yarn-dev@ mailing list.

We already use separate MAPREDUCE jira issues for making changes to YARN (ResourceManager, NodeManager) and to the MapReduce framework (MapReduce ApplicationMaster, MapReduce runtime etc.). Hence, this isn't a much of a change.

# Subversion

Not much at all! YARN has, since the beginning, been developed with the understanding that it is very independent of MapReduce and the code-bases are already independent i.e. hadoop-mapreduce-project/hadoop-yarn and hadoop-mapreduce-project/hadoop-mapreduce-client.

Looks like the feedback has been very positive, I'll start a vote to formalize it.

thanks,Arun

On Jul 25, 2012, at 6:40 PM, Arun C Murthy wrote:

> Folks,> > It's been nearly a year since we merged Hadoop YARN into trunk and we have made several releases since.> > It's exciting to see various open-source communities (both in the ASF and externally) start to explore integration with YARN such as Apache Hama, Apache Giraph, Apache S4, Spark etc. This promises to help us realize our hopes of making Apache Hadoop a much more general data processing platform (& storage, of course) and not tied to MapReduce alone for processing data. Furthermore, we already have people contributing interesting prototypes such as DistributedShell and PaaS on YARN.> > Given this, I think it would be useful to make YARN a sub-project of Apache Hadoop along with Common, HDFS & MapReduce. I believe this would help other communities realize that they could consider using YARN as a general-purpose resource management layer and help us enhance YARN beyond it's humble beginnings. > > Clearly, YARN and MapReduce are different enough that they can and will attract a diverse community.> > I'd like to clarify that this proposal *does not* mean we move the code base out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, there would be *no changes* to release cycles - YARN would be co-released with Common, HDFS & MapReduce.> > Thoughts?> > ----> > What does it mean to the Hadoop developer community?> > # Project dependencies> > The change is that Hadoop would now have 4 sub-projects: Common, HDFS, YARN & MapReduce. As today, the dependencies *do not change*: > - Common is the base> - HDFS depends only on Common> - YARN depends only on Common & HDFS > - MapReduce depends on Common, HDFS & YARN.> > # Jira & Mailing lists> > We would have a separate YARN jira project and a yarn-dev@ mailing list.> > We already use separate MAPREDUCE jira issues for making changes to YARN (ResourceManager, NodeManager) and to the MapReduce framework (MapReduce ApplicationMaster, MapReduce runtime etc.). Hence, this isn't a much of a change.> > # Subversion> > Not much at all! YARN has, since the beginning, been developed with the understanding that it is very independent of MapReduce and the code-bases are already independent i.e. hadoop-mapreduce-project/hadoop-yarn and hadoop-mapreduce-project/hadoop-mapreduce-client. > > Essentially the change would be:> $ svn mv hadoop-mapreduce-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn> ... and the necessary, albeit small, changes to our maven build infrastructure.> > # Release Cycles> > No changes.> > YARN would be co-released with Common, HDFS & MapReduce, as is the case today.> > thanks,> Arun

> Looks like the feedback has been very positive, I'll start a vote to> formalize it.>> thanks,> Arun>> On Jul 25, 2012, at 6:40 PM, Arun C Murthy wrote:>> > Folks,> >> > It's been nearly a year since we merged Hadoop YARN into trunk and we> have made several releases since.> >> > It's exciting to see various open-source communities (both in the ASF> and externally) start to explore integration with YARN such as Apache Hama,> Apache Giraph, Apache S4, Spark etc. This promises to help us realize our> hopes of making Apache Hadoop a much more general data processing platform> (& storage, of course) and not tied to MapReduce alone for processing data.> Furthermore, we already have people contributing interesting prototypes> such as DistributedShell and PaaS on YARN.> >> > Given this, I think it would be useful to make YARN a sub-project of> Apache Hadoop along with Common, HDFS & MapReduce. I believe this would> help other communities realize that they could consider using YARN as a> general-purpose resource management layer and help us enhance YARN beyond> it's humble beginnings.> >> > Clearly, YARN and MapReduce are different enough that they can and will> attract a diverse community.> >> > I'd like to clarify that this proposal *does not* mean we move the code> base out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside> hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, there> would be *no changes* to release cycles - YARN would be co-released with> Common, HDFS & MapReduce.> >> > Thoughts?> >> > ----> >> > What does it mean to the Hadoop developer community?> >> > # Project dependencies> >> > The change is that Hadoop would now have 4 sub-projects: Common, HDFS,> YARN & MapReduce. As today, the dependencies *do not change*:> > - Common is the base> > - HDFS depends only on Common> > - YARN depends only on Common & HDFS> > - MapReduce depends on Common, HDFS & YARN.> >> > # Jira & Mailing lists> >> > We would have a separate YARN jira project and a yarn-dev@ mailing list.> >> > We already use separate MAPREDUCE jira issues for making changes to YARN> (ResourceManager, NodeManager) and to the MapReduce framework (MapReduce> ApplicationMaster, MapReduce runtime etc.). Hence, this isn't a much of a> change.> >> > # Subversion> >> > Not much at all! YARN has, since the beginning, been developed with the> understanding that it is very independent of MapReduce and the code-bases> are already independent i.e. hadoop-mapreduce-project/hadoop-yarn and> hadoop-mapreduce-project/hadoop-mapreduce-client.> >> > Essentially the change would be:> > $ svn mv hadoop-mapreduce-project/hadoop-yarn> hadoop-yarn-project/hadoop-yarn> > ... and the necessary, albeit small, changes to our maven build> infrastructure.> >> > # Release Cycles> >> > No changes.> >> > YARN would be co-released with Common, HDFS & MapReduce, as is the case> today.> >> > thanks,> > Arun>> --> Arun C. Murthy> Hortonworks Inc.> http://hortonworks.com/>>>

MR is still MR, while YARN is a resource scheduler (generic, agnostic of 'MR').

MR1 ran over JobTracker and TaskTrackers, while MR2 runs from an AMand runs tasks via YARN.

It would not make sense to rename MR to YARN as these are separatethings, and calling YARN as MR2 only adds to the confusion.

On Fri, Jul 27, 2012 at 9:11 AM, Zizon Qiu <[EMAIL PROTECTED]> wrote:> why not naming MAPREDUCE to YARN ,as in hadoop 2.0 MR2 is a implementation> of YARN?>> On Fri, Jul 27, 2012 at 11:20 AM, Arun C Murthy <[EMAIL PROTECTED]> wrote:>>> Looks like the feedback has been very positive, I'll start a vote to>> formalize it.>>>> thanks,>> Arun>>>> On Jul 25, 2012, at 6:40 PM, Arun C Murthy wrote:>>>> > Folks,>> >>> > It's been nearly a year since we merged Hadoop YARN into trunk and we>> have made several releases since.>> >>> > It's exciting to see various open-source communities (both in the ASF>> and externally) start to explore integration with YARN such as Apache Hama,>> Apache Giraph, Apache S4, Spark etc. This promises to help us realize our>> hopes of making Apache Hadoop a much more general data processing platform>> (& storage, of course) and not tied to MapReduce alone for processing data.>> Furthermore, we already have people contributing interesting prototypes>> such as DistributedShell and PaaS on YARN.>> >>> > Given this, I think it would be useful to make YARN a sub-project of>> Apache Hadoop along with Common, HDFS & MapReduce. I believe this would>> help other communities realize that they could consider using YARN as a>> general-purpose resource management layer and help us enhance YARN beyond>> it's humble beginnings.>> >>> > Clearly, YARN and MapReduce are different enough that they can and will>> attract a diverse community.>> >>> > I'd like to clarify that this proposal *does not* mean we move the code>> base out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside>> hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, there>> would be *no changes* to release cycles - YARN would be co-released with>> Common, HDFS & MapReduce.>> >>> > Thoughts?>> >>> > ---->> >>> > What does it mean to the Hadoop developer community?>> >>> > # Project dependencies>> >>> > The change is that Hadoop would now have 4 sub-projects: Common, HDFS,>> YARN & MapReduce. As today, the dependencies *do not change*:>> > - Common is the base>> > - HDFS depends only on Common>> > - YARN depends only on Common & HDFS>> > - MapReduce depends on Common, HDFS & YARN.>> >>> > # Jira & Mailing lists>> >>> > We would have a separate YARN jira project and a yarn-dev@ mailing list.>> >>> > We already use separate MAPREDUCE jira issues for making changes to YARN>> (ResourceManager, NodeManager) and to the MapReduce framework (MapReduce>> ApplicationMaster, MapReduce runtime etc.). Hence, this isn't a much of a>> change.>> >>> > # Subversion>> >>> > Not much at all! YARN has, since the beginning, been developed with the>> understanding that it is very independent of MapReduce and the code-bases>> are already independent i.e. hadoop-mapreduce-project/hadoop-yarn and>> hadoop-mapreduce-project/hadoop-mapreduce-client.>> >>> > Essentially the change would be:>> > $ svn mv hadoop-mapreduce-project/hadoop-yarn>> hadoop-yarn-project/hadoop-yarn>> > ... and the necessary, albeit small, changes to our maven build>> infrastructure.>> >>> > # Release Cycles>> >>> > No changes.>> >>> > YARN would be co-released with Common, HDFS & MapReduce, as is the case>> today.>> >>> > thanks,>> > Arun>>>> -->> Arun C. Murthy>> Hortonworks Inc.>> http://hortonworks.com/>>>>>>

I think the service lifecycle stuff (inner start/stop methods) are actuallya layer below Yarn and could go into common, though there are some thingsI'd like to fix there first (state machine doesn't let you stop withoutstarting, implementations state checks happen after subclasses execstart/stop transitions &c. There is no reason why other services such asthe NN and DN can't adopt the same lifecycle, and it would unify somemanagement operations to have a consistent state view of all hadoopservices.

NEW: Monitor These Apps!

All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by Sematext