For discussion, please see previous thread "[PROPOSAL] introduce Python asbuild-time and run-time dependency for Hadoop and throughout Hadoop stack".

This vote consists of three separate items:

1. Contributors shall be allowed to use Python as a platform-independentscripting language for build-time tasks, and add Python as a build-timedependency.Please vote +1, 0, -1.

2. Contributors shall be encouraged to use Maven tasks in combination witheither plug-ins or Groovy scripts to do cross-platform build-time tasks,even under ant in Hadoop-1.Please vote +1, 0, -1.

3. Contributors shall be allowed to use Python as a platform-independentscripting language for run-time tasks, and add Python as a run-timedependency.Please vote +1, 0, -1.

Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors touse Maven plug-ins or Groovy as the only means of cross-platform build-timetasks, or to simply continue using platform-dependent scripts as is beingdone today.

Vote closes at 12:30pm PST on Saturday 1 December.---------Personally, my vote is +1, +1, +1.I think #2 is preferable to #1, but still has many unknowns in it, anduntil those are worked out I don't want to delay moving to cross-platformscripts for build-time tasks.

Best regards,--Matt

-

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

> For discussion, please see previous thread "[PROPOSAL] introduce Python as> build-time and run-time dependency for Hadoop and throughout Hadoop stack".>> This vote consists of three separate items:>> 1. Contributors shall be allowed to use Python as a platform-independent> scripting language for build-time tasks, and add Python as a build-time> dependency.> Please vote +1, 0, -1.>> 2. Contributors shall be encouraged to use Maven tasks in combination with> either plug-ins or Groovy scripts to do cross-platform build-time tasks,> even under ant in Hadoop-1.> Please vote +1, 0, -1.>> 3. Contributors shall be allowed to use Python as a platform-independent> scripting language for run-time tasks, and add Python as a run-time> dependency.> Please vote +1, 0, -1.>> Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to> use Maven plug-ins or Groovy as the only means of cross-platform build-time> tasks, or to simply continue using platform-dependent scripts as is being> done today.>> Vote closes at 12:30pm PST on Saturday 1 December.> ---------> Personally, my vote is +1, +1, +1.> I think #2 is preferable to #1, but still has many unknowns in it, and> until those are worked out I don't want to delay moving to cross-platform> scripts for build-time tasks.>> Best regards,> --Matt>

-

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

> For discussion, please see previous thread "[PROPOSAL] introduce Python as> build-time and run-time dependency for Hadoop and throughout Hadoop stack".>> This vote consists of three separate items:>> 1. Contributors shall be allowed to use Python as a platform-independent> scripting language for build-time tasks, and add Python as a build-time> dependency.> Please vote +1, 0, -1.>>+1

> 2. Contributors shall be encouraged to use Maven tasks in combination with> either plug-ins or Groovy scripts to do cross-platform build-time tasks,> even under ant in Hadoop-1.> Please vote +1, 0, -1.>>+1

My feelings on Maven are well known, but Groovy can mitigate things. AndI'm not going to advocate post-M2 build tools such as Gradle.

It's ironic that Maven's utter inflexibility forces people to use scriptinglanguages to get their work done, but Groovy is fairly nimble here -andeasy to learn for any Java programmer. "Groovy in Action" is the book toown.

> 3. Contributors shall be allowed to use Python as a platform-independent> scripting language for run-time tasks, and add Python as a run-time> dependency.> Please vote +1, 0, -1.>+1. I look forward to never having to debug shell script env variableinheritance ever again.

This does not mean that I advocate writing big bits of the system in .py;as someone who is debugging OpenStack request throttling this weekend, Iknow that Python is not "the solution" to problems. For Hadoop it has arole, but the role should be ('better than bash') and ('streamingintegration').> Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to> use Maven plug-ins or Groovy as the only means of cross-platform build-time> tasks, or to simply continue using platform-dependent scripts as is being> done today.>> Vote closes at 12:30pm PST on Saturday 1 December.> ---------> Personally, my vote is +1, +1, +1.> I think #2 is preferable to #1, but still has many unknowns in it, and> until those are worked out I don't want to delay moving to cross-platform> scripts for build-time tasks.>> Best regards,> --Matt>

-

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

>For discussion, please see previous thread "[PROPOSAL] introduce Python as>build-time and run-time dependency for Hadoop and throughout Hadoop>stack".>>This vote consists of three separate items:>>1. Contributors shall be allowed to use Python as a platform-independent>scripting language for build-time tasks, and add Python as a build-time>dependency.>Please vote +1, 0, -1.>>2. Contributors shall be encouraged to use Maven tasks in combination with>either plug-ins or Groovy scripts to do cross-platform build-time tasks,>even under ant in Hadoop-1.>Please vote +1, 0, -1.>>3. Contributors shall be allowed to use Python as a platform-independent>scripting language for run-time tasks, and add Python as a run-time>dependency.>Please vote +1, 0, -1.>>Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors>to>use Maven plug-ins or Groovy as the only means of cross-platform>build-time>tasks, or to simply continue using platform-dependent scripts as is being>done today.>>Vote closes at 12:30pm PST on Saturday 1 December.>--------->Personally, my vote is +1, +1, +1.>I think #2 is preferable to #1, but still has many unknowns in it, and>until those are worked out I don't want to delay moving to cross-platform>scripts for build-time tasks.>>Best regards,>--Matt

-

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Also, it feels like maybe the discussion should have been kept open a little longer, thanksgiving holidays last week meant that people may have missed it.

Cheers,Adam

On Nov 26, 2012, at 10:16 AM, Robert Evans wrote:

> +1, +1, 0> > On 11/24/12 2:13 PM, "Matt Foley" <[EMAIL PROTECTED]> wrote:> >> For discussion, please see previous thread "[PROPOSAL] introduce Python as>> build-time and run-time dependency for Hadoop and throughout Hadoop>> stack".>> >> This vote consists of three separate items:>> >> 1. Contributors shall be allowed to use Python as a platform-independent>> scripting language for build-time tasks, and add Python as a build-time>> dependency.>> Please vote +1, 0, -1.>> >> 2. Contributors shall be encouraged to use Maven tasks in combination with>> either plug-ins or Groovy scripts to do cross-platform build-time tasks,>> even under ant in Hadoop-1.>> Please vote +1, 0, -1.>> >> 3. Contributors shall be allowed to use Python as a platform-independent>> scripting language for run-time tasks, and add Python as a run-time>> dependency.>> Please vote +1, 0, -1.>> >> Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors>> to>> use Maven plug-ins or Groovy as the only means of cross-platform>> build-time>> tasks, or to simply continue using platform-dependent scripts as is being>> done today.>> >> Vote closes at 12:30pm PST on Saturday 1 December.>> --------->> Personally, my vote is +1, +1, +1.>> I think #2 is preferable to #1, but still has many unknowns in it, and>> until those are worked out I don't want to delay moving to cross-platform>> scripts for build-time tasks.>> >> Best regards,>> --Matt>

-

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Also, let's please clearly define the versions of Python we support ifwe do chooes to go this route. Something like 2.4+ would bereasonable. The process launching APIs in particular changed a lot inthose early 2.x releases.

best,ColinOn Sat, Nov 24, 2012 at 12:13 PM, Matt Foley <[EMAIL PROTECTED]> wrote:> For discussion, please see previous thread "[PROPOSAL] introduce Python as> build-time and run-time dependency for Hadoop and throughout Hadoop stack".>> This vote consists of three separate items:>> 1. Contributors shall be allowed to use Python as a platform-independent> scripting language for build-time tasks, and add Python as a build-time> dependency.> Please vote +1, 0, -1.>> 2. Contributors shall be encouraged to use Maven tasks in combination with> either plug-ins or Groovy scripts to do cross-platform build-time tasks,> even under ant in Hadoop-1.> Please vote +1, 0, -1.>> 3. Contributors shall be allowed to use Python as a platform-independent> scripting language for run-time tasks, and add Python as a run-time> dependency.> Please vote +1, 0, -1.>> Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to> use Maven plug-ins or Groovy as the only means of cross-platform build-time> tasks, or to simply continue using platform-dependent scripts as is being> done today.>> Vote closes at 12:30pm PST on Saturday 1 December.> ---------> Personally, my vote is +1, +1, +1.> I think #2 is preferable to #1, but still has many unknowns in it, and> until those are worked out I don't want to delay moving to cross-platform> scripts for build-time tasks.>> Best regards,> --Matt

-

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Declaring 2.4 to be the minimum supported version sounds like a great idea. I've worked with CentOS distributions that have a dependency on Python2.4, and it was always awkward to get a later version on those machines.

On Sat, Nov 24, 2012 at 12:13PM, Matt Foley wrote:> For discussion, please see previous thread "[PROPOSAL] introduce Python as> build-time and run-time dependency for Hadoop and throughout Hadoop stack".> > This vote consists of three separate items:> > 1. Contributors shall be allowed to use Python as a platform-independent> scripting language for build-time tasks, and add Python as a build-time> dependency.> Please vote +1, 0, -1.> > 2. Contributors shall be encouraged to use Maven tasks in combination with> either plug-ins or Groovy scripts to do cross-platform build-time tasks,> even under ant in Hadoop-1.> Please vote +1, 0, -1.> > 3. Contributors shall be allowed to use Python as a platform-independent> scripting language for run-time tasks, and add Python as a run-time> dependency.> Please vote +1, 0, -1.> > Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to> use Maven plug-ins or Groovy as the only means of cross-platform build-time> tasks, or to simply continue using platform-dependent scripts as is being> done today.> > Vote closes at 12:30pm PST on Saturday 1 December.> ---------> Personally, my vote is +1, +1, +1.> I think #2 is preferable to #1, but still has many unknowns in it, and> until those are worked out I don't want to delay moving to cross-platform> scripts for build-time tasks.> > Best regards,> --Matt

-

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

> For discussion, please see previous thread "[PROPOSAL] introduce Python as> build-time and run-time dependency for Hadoop and throughout Hadoop stack".>> This vote consists of three separate items:>> 1. Contributors shall be allowed to use Python as a platform-independent> scripting language for build-time tasks, and add Python as a build-time> dependency.> Please vote +1, 0, -1.>> 2. Contributors shall be encouraged to use Maven tasks in combination with> either plug-ins or Groovy scripts to do cross-platform build-time tasks,> even under ant in Hadoop-1.> Please vote +1, 0, -1.>> 3. Contributors shall be allowed to use Python as a platform-independent> scripting language for run-time tasks, and add Python as a run-time> dependency.> Please vote +1, 0, -1.>> Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to> use Maven plug-ins or Groovy as the only means of cross-platform build-time> tasks, or to simply continue using platform-dependent scripts as is being> done today.>> Vote closes at 12:30pm PST on Saturday 1 December.> ---------> Personally, my vote is +1, +1, +1.> I think #2 is preferable to #1, but still has many unknowns in it, and> until those are worked out I don't want to delay moving to cross-platform> scripts for build-time tasks.>> Best regards,> --Matt>

> For discussion, please see previous thread "[PROPOSAL] introduce Python as> build-time and run-time dependency for Hadoop and throughout Hadoop stack".>> This vote consists of three separate items:>> 1. Contributors shall be allowed to use Python as a platform-independent> scripting language for build-time tasks, and add Python as a build-time> dependency.> Please vote +1, 0, -1.>> 2. Contributors shall be encouraged to use Maven tasks in combination with> either plug-ins or Groovy scripts to do cross-platform build-time tasks,> even under ant in Hadoop-1.> Please vote +1, 0, -1.>> 3. Contributors shall be allowed to use Python as a platform-independent> scripting language for run-time tasks, and add Python as a run-time> dependency.> Please vote +1, 0, -1.>> Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to> use Maven plug-ins or Groovy as the only means of cross-platform build-time> tasks, or to simply continue using platform-dependent scripts as is being> done today.>> Vote closes at 12:30pm PST on Saturday 1 December.> ---------> Personally, my vote is +1, +1, +1.> I think #2 is preferable to #1, but still has many unknowns in it, and> until those are worked out I don't want to delay moving to cross-platform> scripts for build-time tasks.>> Best regards,> --Matt>

-

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

The scope of this vote seems different from what was discussed in thePROPOSAL thread.

In the PROPOSAL thread you indicated this was for Hadoop1 because it is ANTbased. And the main reason was to remove saveVersion.sh.

Your #3 was not discussed in the proposal, was it?

It seems this vote is dragging much more stuff it was originally discussed.I think you should suspend the vote, recap the motivation and then restartthe vote. As things are laid out at the moment my vote is:

-1 (It still seems an overkill to introduce a new runtime requirement forbuilding to replace a script.)+1 (I think this is the right way to simplify the build)-1 (AFAIK there is not such requirement at the moment, and if it comes itwould be in the form of an AM, which I'd argue it should leave outside ofHadoop)

> In the PROPOSAL thread you indicated this was for Hadoop1 because it is ANT> based. And the main reason was to remove saveVersion.sh.>> Your #3 was not discussed in the proposal, was it?it was part of original proposal but not discussed much because language war was more attractive option. You want vote like this?

1. Using external language vs maven plugin to build2. Using external language for startup scripts vs JVM script language. Such as Jython use in websphere.3. Choose python as external language

-

RE: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

For discussion, please see previous thread "[PROPOSAL] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack".

This vote consists of three separate items:

1. Contributors shall be allowed to use Python as a platform-independent scripting language for build-time tasks, and add Python as a build-time dependency.Please vote +1, 0, -1.

2. Contributors shall be encouraged to use Maven tasks in combination with either plug-ins or Groovy scripts to do cross-platform build-time tasks, even under ant in Hadoop-1.Please vote +1, 0, -1.

>>> I believe 1&2 in combination make a total sense. I ported a few scripts to Python, and thus far, it showed to be up to the task and satisfy the cross-platform requirements. In my option, it is also important to agree on the version, as I've run into some breaking changes in version 3+.3. Contributors shall be allowed to use Python as a platform-independent scripting language for run-time tasks, and add Python as a run-time dependency.

>>> This is a great aspirational goal! Maintaining two sets of scripts would be a real challenge.Please vote +1, 0, -1.

Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to use Maven plug-ins or Groovy as the only means of cross-platform build-time tasks, or to simply continue using platform-dependent scripts as is being done today.

Vote closes at 12:30pm PST on Saturday 1 December.---------Personally, my vote is +1, +1, +1.I think #2 is preferable to #1, but still has many unknowns in it, and until those are worked out I don't want to delay moving to cross-platform scripts for build-time tasks.

Best regards,--Matt

-

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Let me repost my previous questions and a few more. I'd appreciate youranswers, as it will help me understand the full impact this would have inHadoop and related projects.

* Phyton as runtime requirement. Are you planing to migrate all BASHscripts provided by Hadoop (or dynamically created -ie launcher scripts) to Phyton?* What else in the current build, besides saveVersion.sh, you see ascandidate to be migrated to Phyton?* How are you planning to define what Phyton modules can be used? Willdevelopers have to install them manually?* What kind of tasks you envision Python scripts will enable that are notpossible today?* Will the requirement of Python be pushed to clients using the hadoopscript? If so, this would affect all downstream projects that use hadoopscript in one why or the other, right?

Is the main motivation of the proposal to make things easier for window, sothere is no need for cygwin? If that is the case, have you considered doingdirectly BAT scripts? If you take Tomcat for example, they have BAT scriptsand SH scripts and things work quite nicely.

Personally, I wouldn't be trilled to see the logic in the scripts to getmore complex, but on the opposite direction; IMO, scripts should be trimmedto set env vars (with no voodoo logic), build the classpath (with no voodoologic, just from a set of dirs) and call Java.

Finally, this is code change, so I'm not sure why we are doing a vote.

> Matt, thanks for the clarification.>> I may have missed the main point of the PROPOSAL thread then. I personally> want to continue the discussion before voting.>> * Phyton as runtime requirement. Are you planing to migrate all BASH> scripts provided by Hadoop (or dynamically created -ie launcher scripts)> to Phyton?> * What else in the current build, besides saveVersion.sh, you see as> candidate to be migrated to Phyton?> * How are you planning to define what Phyton modules can be used? Will> developers have to install them manually?>> Cheers>>> On Thu, Nov 29, 2012 at 2:39 PM, Matt Foley <[EMAIL PROTECTED]>wrote:>>> Hi Alejandro,>> Please see in-line below.>>>> On Mon, Nov 26, 2012 at 1:52 PM, Alejandro Abdelnur <[EMAIL PROTECTED]>>> wrote:>>>> > Matt,>> >>> > The scope of this vote seems different from what was discussed in the>> > PROPOSAL thread.>> > In the PROPOSAL thread you indicated this was for Hadoop1 because it is>> ANT>> > based. And the main reason was to remove saveVersion.sh.>> > Your #3 was not discussed in the proposal, was it?>> >>>>> The item #3 was in my original statement of the problem, with which I>> started the proposal thread. In fact, the thread title was "[PROPOSAL]>> introduce Python as build-time and run-time dependency for Hadoop and>> throughout Hadoop stack". It is true that only one or two people chose to>> discuss #3 further in that thread.>>>> The point is not just to replace a single script, but to provide a means>> to>> do cross-platform scripts, which will over time replace many>> non-platform-specific scripts written in platform-specific languages.>>>>>> >>> > It seems this vote is dragging much more stuff it was originally>> discussed.>> > I think you should suspend the vote, recap the motivation and then>> restart>> > the vote.>> >>>>> I respectfully disagree. I believe a careful reading of the cited>> discussion thread, plus my own statement of the vote, provides sufficient>> background for a thoughtful decision on the subject. Presumably so do the>> ten other people who had already voted before you made that comment.>>>> If several other people want more discussion first, please speak up.>> Thanks,>> --Matt>>>> As things are laid out at the moment my vote is:>> >>> > -1 (It still seems an overkill to introduce a new runtime requirement>> for>> > building to replace a script.)>> > +1 (I think this is the right way to simplify the build)

Alejandro

-

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

>> Python as runtime requirement. Are you planing to migrate allBASH scripts provided by Hadoop (or dynamically created -ie launcherscripts) to Python?

I don't intend to mandate use of Python. Rather, I want there to be across-platform option available. Things that are best done inplatform-specific manner, should be done in shell for linux, and powershellfor windows. But things that are best done in a platform-independent way,can be, with a lower long-term maintenance cost than using differentscripts per platform.

This means that some, but not all, existing scripts may naturally migrateto Python as the overall system is ported to Windows. Hopefully whensomeone is porting a script that can be well done in a platform-independentway, they will be able to choose Python and write a single script that canreplace the shell script and make it unnecessary to maintain two scripts(doing the same job but in different languages!) going forward.

>> What else in the current build, besides saveVersion.sh, you seeas candidate to be migrated to Python?

I have a greatly improved version of src/docs/relnotes.py that I would liketo submit, for auto-gen of release notes.That's all that I have on my hotlist right now, although I anticipate thatsome of the shell scripts invoked by ant may be natural candidates.

>> How are you planning to define what Python modules can be used?Will developers have to install them manually?

That's something the community will work out, the same way they decide whatlibrary jars to include, and when to upgrade those versions. But first,let's get an agreement in principle that this is the direction we want togo.

> Matt, thanks for the clarification.>> I may have missed the main point of the PROPOSAL thread then. I personally> want to continue the discussion before voting.>> * Phyton as runtime requirement. Are you planing to migrate all BASH> scripts provided by Hadoop (or dynamically created -ie launcher scripts)> to Phyton?> * What else in the current build, besides saveVersion.sh, you see as> candidate to be migrated to Phyton?> * How are you planning to define what Phyton modules can be used? Will> developers have to install them manually?>> Cheers>>> On Thu, Nov 29, 2012 at 2:39 PM, Matt Foley <[EMAIL PROTECTED]>> wrote:>> > Hi Alejandro,> > Please see in-line below.> >> > On Mon, Nov 26, 2012 at 1:52 PM, Alejandro Abdelnur <[EMAIL PROTECTED]>> > wrote:> >> > > Matt,> > >> > > The scope of this vote seems different from what was discussed in the> > > PROPOSAL thread.> > > In the PROPOSAL thread you indicated this was for Hadoop1 because it is> > ANT> > > based. And the main reason was to remove saveVersion.sh.> > > Your #3 was not discussed in the proposal, was it?> > >> >> > The item #3 was in my original statement of the problem, with which I> > started the proposal thread. In fact, the thread title was "[PROPOSAL]> > introduce Python as build-time and run-time dependency for Hadoop and> > throughout Hadoop stack". It is true that only one or two people chose> to> > discuss #3 further in that thread.> >> > The point is not just to replace a single script, but to provide a means> to> > do cross-platform scripts, which will over time replace many> > non-platform-specific scripts written in platform-specific languages.> >> >> > >> > > It seems this vote is dragging much more stuff it was originally> > discussed.> > > I think you should suspend the vote, recap the motivation and then> > restart> > > the vote.> > >> >> > I respectfully disagree. I believe a careful reading of the cited> > discussion thread, plus my own statement of the vote, provides sufficient> > background for a thoughtful decision on the subject. Presumably so do> the> > ten other people who had already voted before you made that comment.> >> > If several other people want more discussion first, please speak up.

-

RE: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Build-time scripts: Using a platform independent language such as python (or maven in certain cases) will greatly help in reducing build breaks and improve on build script maintainability.

Run-time scripts: Most run-time scripts are end-user visible and are scripts that are needed to be run by admin such as starting/stop Hadoop cluster (hadoop-daemons) or by developers submitting a job (hadoop.cmd). There seem to be two types of script files: - Scripts intended for a cluster admin or an IT admin: - It is desirable to use a common set of python scripts that work across all platforms. However, in a Windows enterprise environment IT admins won't like it if they have to run python scripts to start/stop a cluster. So for these, there should be a PowerShell interface wrapper that can accept the right parameters and pass it down to the python script. Hopefully, the power-shell layer can be a simple pass-thru. This way the python scripts is like any other Java code hidden behind a well-known API surface. IT Admins can't debug it or modify it easily, but this is fine since for scripts like the aforementioned there isn't a requirement that IT Admins should be able to easily be able to view/modify the underlying code. - For Windows specific things not supported by Python natively, such as setting ACLs, starting/stopping windows services it should be possible to re-factor the code appropriately. But a little bit of powershell/cmd for these call outs would be unavoidable.

- Scripts intended for developers/cluster users: - Most of these scripts (e.g. hadoop.cmd) would be behind other API surface such as WebHDFS, ODBC, JDBC, Templeton etc. So the advantage of having a common script across platforms outweighs the use of cmd/powershell as a native windows feature. Again, it should also be possible to provide simple powershell wrappers for a windows environment.

For discussion, please see previous thread "[PROPOSAL] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack".

This vote consists of three separate items:

1. Contributors shall be allowed to use Python as a platform-independent scripting language for build-time tasks, and add Python as a build-time dependency.Please vote +1, 0, -1.

2. Contributors shall be encouraged to use Maven tasks in combination with either plug-ins or Groovy scripts to do cross-platform build-time tasks, even under ant in Hadoop-1.Please vote +1, 0, -1.

>>> I believe 1&2 in combination make a total sense. I ported a few scripts to Python, and thus far, it showed to be up to the task and satisfy the cross-platform requirements. In my option, it is also important to agree on the version, as I've run into some breaking changes in version 3+.3. Contributors shall be allowed to use Python as a platform-independent scripting language for run-time tasks, and add Python as a run-time dependency.

>>> This is a great aspirational goal! Maintaining two sets of scripts would be a real challenge.Please vote +1, 0, -1.

Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to use Maven plug-ins or Groovy as the only means of cross-platform build-time tasks, or to simply continue using platform-dependent scripts as is being done today.

Vote closes at 12Personally, my vote is +1, +1, +1.I think #2 is preferable to #1, but still has many unknowns in it, and until those are worked out I don't want to delay moving to cross-platform scripts for build-time tasks.

Best regards,

-

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

* What kind of tasks you envision Python scripts will enable that are> not possible today?The point isn't to open brave new worlds. The point is to avoid thenightmare of having to maintain multiple "parallel" scripts doing the SAMETHING in multiple scripting languages. I know from experience that theynever get maintained right. It's just a huge source of bugs, because whenthey are in different languages, it can be quite difficult to determinethat they are *really* doing the same thing. And in a case like shell vspowershell, it will be very common to have contributors who are not expertsin both.

I care deeply about having a high-quality release in both Linux andWindows. And having a cross-platform scripting language will make it mucheasier to maintain that quality over time, without "slip" between the twoplatforms.

* Will the requirement of Python be pushed to clients using the> hadoop script? If so, this would affect all downstream projects that use> hadoop script in one why or the other, right?If question #3 passes, then Python will become a run-time dependency forHadoop. That means it would need to be installed as part of the Hadoopinstall preparation, just like all the other Hadoop run-time dependencies.

Is the main motivation of the proposal to make things easier for window,> so there is no need for cygwin? If that is the case, have you considered> doing directly BAT scripts? If you take Tomcat for example, they have BAT> scripts and SH scripts and things work quite nicely.Of course it is sufficient, from the simple implementation perspective, totranslate all the shell scripts into bat or (better) powershell scripts. That is, in fact, the most evident alternative to my proposals #1 and #3.

However, I ask -- beg! -- the community to consider it from the softwareengineering perspective. We aren't here to just implement something onceand be done. It has to be maintained, as most of you on this list are wellaware, for years and years, across multiple generations. And trying tomaintain parallel scripts in multiple languages, when not necessitated bygenuine platform-specific requirements, is just creating bug generators inthe system.

Personally, I wouldn't be trilled to see the logic in the scripts to> get more complex, but on the opposite direction; IMO, scripts should be> trimmed to set env vars (with no voodoo logic), build the classpath (with> no voodoo logic, just from a set of dirs) and call Java.See the first item above. The point is to enable cross-platform scriptingof the things we already have to script. IMO, scripts should get out ofthe env var business entirely, but that's unrelated to this question :-)

Finally, this is code change, so I'm not sure why we are doing a vote.I view this as a tools issue, that affects questions that go beyond theone-time choice of how to write (or re-write) saveVersion.sh. Also Aaron(atm) recommended that I bring it to the list. So here we are :-)

> Matt,>> Let me repost my previous questions and a few more. I'd appreciate your> answers, as it will help me understand the full impact this would have in> Hadoop and related projects.>> * Phyton as runtime requirement. Are you planing to migrate all BASH> scripts provided by Hadoop (or dynamically created -ie launcher scripts)> to Phyton?> * What else in the current build, besides saveVersion.sh, you see as> candidate to be migrated to Phyton?> * How are you planning to define what Phyton modules can be used? Will> developers have to install them manually?> * What kind of tasks you envision Python scripts will enable that are not> possible today?> * Will the requirement of Python be pushed to clients using the hadoop> script? If so, this would affect all downstream projects that use hadoop> script in one why or the other, right?

-

RE: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

I think on one side we have Shell which is a script language and OS dependent, e.g. as in bash vs powershell;on the other side we have Java which is not a script language and OS independent.I would accept any script language that can fix the gap as an OS independent scripting language.Personally, I also prefer Python over Ruby.

* What kind of tasks you envision Python scripts will enable that are> not possible today?The point isn't to open brave new worlds. The point is to avoid thenightmare of having to maintain multiple "parallel" scripts doing the SAMETHING in multiple scripting languages. I know from experience that theynever get maintained right. It's just a huge source of bugs, because whenthey are in different languages, it can be quite difficult to determinethat they are *really* doing the same thing. And in a case like shell vspowershell, it will be very common to have contributors who are not expertsin both.

I care deeply about having a high-quality release in both Linux andWindows. And having a cross-platform scripting language will make it mucheasier to maintain that quality over time, without "slip" between the twoplatforms.

* Will the requirement of Python be pushed to clients using the> hadoop script? If so, this would affect all downstream projects that use> hadoop script in one why or the other, right?If question #3 passes, then Python will become a run-time dependency forHadoop. That means it would need to be installed as part of the Hadoopinstall preparation, just like all the other Hadoop run-time dependencies.

Is the main motivation of the proposal to make things easier for window,> so there is no need for cygwin? If that is the case, have you considered> doing directly BAT scripts? If you take Tomcat for example, they have BAT> scripts and SH scripts and things work quite nicely.Of course it is sufficient, from the simple implementation perspective, totranslate all the shell scripts into bat or (better) powershell scripts. That is, in fact, the most evident alternative to my proposals #1 and #3.

However, I ask -- beg! -- the community to consider it from the softwareengineering perspective. We aren't here to just implement something onceand be done. It has to be maintained, as most of you on this list are wellaware, for years and years, across multiple generations. And trying tomaintain parallel scripts in multiple languages, when not necessitated bygenuine platform-specific requirements, is just creating bug generators inthe system.

Personally, I wouldn't be trilled to see the logic in the scripts to> get more complex, but on the opposite direction; IMO, scripts should be> trimmed to set env vars (with no voodoo logic), build the classpath (with> no voodoo logic, just from a set of dirs) and call Java.See the first item above. The point is to enable cross-platform scriptingof the things we already have to script. IMO, scripts should get out ofthe env var business entirely, but that's unrelated to this question :-)

Finally, this is code change, so I'm not sure why we are doing a vote.I view this as a tools issue, that affects questions that go beyond theone-time choice of how to write (or re-write) saveVersion.sh. Also Aaron(atm) recommended that I bring it to the list. So here we are :-)

We have had promising results for 1 and 2 when porting to Windows. 3 wouldallow us to remove platform dependencies from test code. Agree that theremight be some nuanced operations that require OS specific environments butthis would lead to keeping them at a minimum.

Bikas

On 11/29/12 7:22 PM, "Chuan Liu" <[EMAIL PROTECTED]> wrote:

>+1 +1 +1>>Agree with Matt on the code maintainability.>>I think on one side we have Shell which is a script language and OS>dependent, e.g. as in bash vs powershell;>on the other side we have Java which is not a script language and OS>independent.>I would accept any script language that can fix the gap as an OS>independent scripting language.>Personally, I also prefer Python over Ruby.>>Thanks,>Chuan>>________________________________________>From: [EMAIL PROTECTED] on behalf of Matt Foley>Sent: Thursday, November 29, 2012 6:26 PM>To: [EMAIL PROTECTED]>Subject: Re: [VOTE] introduce Python as build-time and run-time>dependency for Hadoop and throughout Hadoop stack>>Hello again. Crossed in the mail.>>* What kind of tasks you envision Python scripts will enable that are>> not possible today?>>>The point isn't to open brave new worlds. The point is to avoid the>nightmare of having to maintain multiple "parallel" scripts doing the SAME>THING in multiple scripting languages. I know from experience that they>never get maintained right. It's just a huge source of bugs, because when>they are in different languages, it can be quite difficult to determine>that they are *really* doing the same thing. And in a case like shell vs>powershell, it will be very common to have contributors who are not>experts>in both.>>I care deeply about having a high-quality release in both Linux and>Windows. And having a cross-platform scripting language will make it much>easier to maintain that quality over time, without "slip" between the two>platforms.>>* Will the requirement of Python be pushed to clients using the>> hadoop script? If so, this would affect all downstream projects that use>> hadoop script in one why or the other, right?>>>If question #3 passes, then Python will become a run-time dependency for>Hadoop. That means it would need to be installed as part of the Hadoop>install preparation, just like all the other Hadoop run-time dependencies.>>Is the main motivation of the proposal to make things easier for window,>> so there is no need for cygwin? If that is the case, have you considered>> doing directly BAT scripts? If you take Tomcat for example, they have>>BAT>> scripts and SH scripts and things work quite nicely.>>>Of course it is sufficient, from the simple implementation perspective, to>translate all the shell scripts into bat or (better) powershell scripts.> That is, in fact, the most evident alternative to my proposals #1 and #3.>>However, I ask -- beg! -- the community to consider it from the software>engineering perspective. We aren't here to just implement something once>and be done. It has to be maintained, as most of you on this list are>well>aware, for years and years, across multiple generations. And trying to>maintain parallel scripts in multiple languages, when not necessitated by>genuine platform-specific requirements, is just creating bug generators in>the system.>>Personally, I wouldn't be trilled to see the logic in the scripts to>> get more complex, but on the opposite direction; IMO, scripts should be>> trimmed to set env vars (with no voodoo logic), build the classpath>>(with>> no voodoo logic, just from a set of dirs) and call Java.>>>See the first item above. The point is to enable cross-platform scripting>of the things we already have to script. IMO, scripts should get out of>the env var business entirely, but that's unrelated to this question :-)>>Finally, this is code change, so I'm not sure why we are doing a vote.>>>I view this as a tools issue, that affects questions that go beyond the

-

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

> > Finally, this is code change, so I'm not sure why we are doing a vote.>>> I view this as a tools issue, that affects questions that go beyond the> one-time choice of how to write (or re-write) saveVersion.sh. Also Aaron> (atm) recommended that I bring it to the list. So here we are :-)>

-

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Considering the hadoop stack/ecosystem as a whole, I think the best crossplatform scripting language to adopt is jruby for following reasons:

1. HBase already adopted jruby for HBase shell, which all current platformvendors support.2. We can control the version of language implementation at a per releasebasis.3. We don't have to introduce new dependencies in the de facto hadoopstack. (see 1).

I'm all for improving multi-platform support. I think the best way to dothis is to have a thin native script wrappers (using env vars) to call thecross-platform jruby scripts.

__Luke

On Fri, Nov 30, 2012 at 3:21 AM, Luke Lu <[EMAIL PROTECTED]> wrote:

> Thanks for the voting thread. Otherwise, many committers would have missed> it.>> I agree that this is a superset of code change that has larger impact than> typical code change.>>> On Thu, Nov 29, 2012 at 6:26 PM, Matt Foley <[EMAIL PROTECTED]> wrote:>>> > Finally, this is code change, so I'm not sure why we are doing a vote.>>>>>> I view this as a tools issue, that affects questions that go beyond the>> one-time choice of how to write (or re-write) saveVersion.sh. Also Aaron>> (atm) recommended that I bring it to the list. So here we are :-)>>>>

-

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

> I'd like to change my binding vote to -1, -0, -1.>> Considering the hadoop stack/ecosystem as a whole, I think the best cross> platform scripting language to adopt is jruby for following reasons:>> 1. HBase already adopted jruby for HBase shell, which all current platform> vendors support.> 2. We can control the version of language implementation at a per release> basis.> 3. We don't have to introduce new dependencies in the de facto hadoop> stack. (see 1).>>I don't see why these arguments should have any impact on using python atbuild time, as it doesn't introduce any dependencies downstream. Yes, youneed python at build time, but that's no worse than having a protoccompiler, gcc and the automake toolchain.

> I'm all for improving multi-platform support. I think the best way to do> this is to have a thin native script wrappers (using env vars) to call the> cross-platform jruby scripts.>>Were it not for the env-var configuration hierarchy mess that things are intoday, I'd agree. where do you set your env vars? hadoop-env.sh? Where doesthat come from? the hadoop conf dir? How do you find that? An env variableor a ../../conf from bin/hadoop.sh which breaks once you start symlinkingto hadoop/bin; or do you assume a root installation in /etc/hadoop/conf,which points to /etc/alternatives/hadoop-conf, which can then point back to/etc/hadoop/conf.pseudo ? And what about JAVA_HOME?

Those env vars are something I'd like see the back of.

-

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

>> inline ant scripts>>>> =0. Ant's versioning is stricter; you can pull down the exact Jar versions,>> and some of us in the Ant team worked very hard to get it going everywhere.>> You don't gain anything by going to .pythere are sh scripts inside maven ant plugin stuff

-

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

There should be only two env vars (JAVA_HOME and HADOOP_HOME) to deal within the native scripts (.bat on windows and .sh on unix platforms) toboostrap jruby scripts, which deal with the rest of the envs.

__Luke

-

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

> Yes, you need python at build time, but that's no worse than having a> protoc> compiler, gcc and the automake toolchain.>

The problem is that python is known to have _backward_ compatibility issueson various platforms. It would be very annoying/time consuming to deal withvarious support issues regarding python versions on various platforms.

I agree that autotools is a nightmare and should be converted (in branch-1as well) to cmake (which has good versioning support :) The goal is to haveless external dependencies, not more, again mostly due to support issues.If we want to introduce an external dependencies, we need to pick somethingthat are easy to support compatibility wise.

__Luke

-

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Run- & build-time scripting should be limited to operations that areimpossible in Java. These should not be complex nor should weencourage more complexity in them. A parallel set of simple .batfiles for such operations seems preferable to adding a Pythondependency.

Doug

On Sat, Nov 24, 2012 at 12:13 PM, Matt Foley <[EMAIL PROTECTED]> wrote:> For discussion, please see previous thread "[PROPOSAL] introduce Python as> build-time and run-time dependency for Hadoop and throughout Hadoop stack".>> This vote consists of three separate items:>> 1. Contributors shall be allowed to use Python as a platform-independent> scripting language for build-time tasks, and add Python as a build-time> dependency.> Please vote +1, 0, -1.>> 2. Contributors shall be encouraged to use Maven tasks in combination with> either plug-ins or Groovy scripts to do cross-platform build-time tasks,> even under ant in Hadoop-1.> Please vote +1, 0, -1.>> 3. Contributors shall be allowed to use Python as a platform-independent> scripting language for run-time tasks, and add Python as a run-time> dependency.> Please vote +1, 0, -1.>> Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to> use Maven plug-ins or Groovy as the only means of cross-platform build-time> tasks, or to simply continue using platform-dependent scripts as is being> done today.>> Vote closes at 12:30pm PST on Saturday 1 December.> ---------> Personally, my vote is +1, +1, +1.> I think #2 is preferable to #1, but still has many unknowns in it, and> until those are worked out I don't want to delay moving to cross-platform> scripts for build-time tasks.>> Best regards,> --Matt

-

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

>> inline ant scripts>>>>>> =0. Ant's versioning is stricter; you can pull down the exact Jar>>> versions,>>> and some of us in the Ant team worked very hard to get it going>>> everywhere.>>> You don't gain anything by going to .py>>>>> there are sh scripts inside maven ant plugin stuff>

> For discussion, please see previous thread "[PROPOSAL] introduce Python as> build-time and run-time dependency for Hadoop and throughout Hadoop stack".>> This vote consists of three separate items:>> 1. Contributors shall be allowed to use Python as a platform-independent> scripting language for build-time tasks, and add Python as a build-time> dependency.> Please vote +1, 0, -1.>> 2. Contributors shall be encouraged to use Maven tasks in combination with> either plug-ins or Groovy scripts to do cross-platform build-time tasks,> even under ant in Hadoop-1.> Please vote +1, 0, -1.>> 3. Contributors shall be allowed to use Python as a platform-independent> scripting language for run-time tasks, and add Python as a run-time> dependency.> Please vote +1, 0, -1.>> Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to> use Maven plug-ins or Groovy as the only means of cross-platform build-time> tasks, or to simply continue using platform-dependent scripts as is being> done today.>> Vote closes at 12:30pm PST on Saturday 1 December.> ---------> Personally, my vote is +1, +1, +1.> I think #2 is preferable to #1, but still has many unknowns in it, and> until those are worked out I don't want to delay moving to cross-platform> scripts for build-time tasks.>> Best regards,> --Matt>

-

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

IIUC the only platform we plan to add support for that we can't easilysupport today (w/o an emulation layer like cygwin) is Windows, and itseems like making the bash scripts simpler and having parallel batfiles is IMO a better approach.

On Sat, Nov 24, 2012 at 12:13 PM, Matt Foley <[EMAIL PROTECTED]> wrote:> For discussion, please see previous thread "[PROPOSAL] introduce Python as> build-time and run-time dependency for Hadoop and throughout Hadoop stack".>> This vote consists of three separate items:>> 1. Contributors shall be allowed to use Python as a platform-independent> scripting language for build-time tasks, and add Python as a build-time> dependency.> Please vote +1, 0, -1.>> 2. Contributors shall be encouraged to use Maven tasks in combination with> either plug-ins or Groovy scripts to do cross-platform build-time tasks,> even under ant in Hadoop-1.> Please vote +1, 0, -1.>> 3. Contributors shall be allowed to use Python as a platform-independent> scripting language for run-time tasks, and add Python as a run-time> dependency.> Please vote +1, 0, -1.>> Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to> use Maven plug-ins or Groovy as the only means of cross-platform build-time> tasks, or to simply continue using platform-dependent scripts as is being> done today.>> Vote closes at 12:30pm PST on Saturday 1 December.> ---------> Personally, my vote is +1, +1, +1.> I think #2 is preferable to #1, but still has many unknowns in it, and> until those are worked out I don't want to delay moving to cross-platform> scripts for build-time tasks.>> Best regards,> --Matt

-

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

> -1, 0, -1>> IIUC the only platform we plan to add support for that we can't easily> support today (w/o an emulation layer like cygwin) is Windows, and it> seems like making the bash scripts simpler and having parallel bat> files is IMO a better approach.>>WinNT Bat/CMD files are the worst possible scripting language invented. Atthe very least, .py should be the language of choice there

-

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

>> inline ant scripts>>>>>> =0. Ant's versioning is stricter; you can pull down the exact Jar>>> versions,>>> and some of us in the Ant team worked very hard to get it going>>> everywhere.>>> You don't gain anything by going to .py>>>>> there are sh scripts inside maven ant plugin stuff>

Which is because there are some things you can't do in Java -run rpmbuildto pick up file permissions and hanging symlinks that only become valid ondeployment.

The reason Ant is used to start them is Maven views trying to run nativescripts as a forbidden action - probably popping up some patronising text"you are trying to run a shell script, please look atmaven.apache.org/wiki/whymavenwontletyoudothings/ to understand this; theyalso view building RPMs as not something to encourage either.

(but we digress into an ant vs maven argument. I do actually appreciate theconsistent target naming across projects and the ability for the IDE to setup structure, it's just the entire underlying architecture andimplementation that I dislike)

-

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

> For discussion, please see previous thread "[PROPOSAL] introduce Python as> build-time and run-time dependency for Hadoop and throughout Hadoop stack".> > This vote consists of three separate items:> > 1. Contributors shall be allowed to use Python as a platform-independent> scripting language for build-time tasks, and add Python as a build-time> dependency.> Please vote +1, 0, -1.> > 2. Contributors shall be encouraged to use Maven tasks in combination with> either plug-ins or Groovy scripts to do cross-platform build-time tasks,> even under ant in Hadoop-1.> Please vote +1, 0, -1.> > 3. Contributors shall be allowed to use Python as a platform-independent> scripting language for run-time tasks, and add Python as a run-time> dependency.> Please vote +1, 0, -1.> > Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to> use Maven plug-ins or Groovy as the only means of cross-platform build-time> tasks, or to simply continue using platform-dependent scripts as is being> done today.> > Vote closes at 12:30pm PST on Saturday 1 December.> ---------> Personally, my vote is +1, +1, +1.> I think #2 is preferable to #1, but still has many unknowns in it, and> until those are worked out I don't want to delay moving to cross-platform> scripts for build-time tasks.> > Best regards,> --Matt

-

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Python has fairly inconsistent support across all major OS vendors. It ishard to get it right unless the scripts are all designed to make use ofPython 2.4. However, Python 2.4 doesn't have necessary OS features to makePython useful in runtime or build environment unless you write a lot ofcustom modules. Which defeats the purpose to use python as intermediatelayer to do OS dependent work. Jruby may be a better choice.

> Hello again. Crossed in the mail.> > * What kind of tasks you envision Python scripts will enable that are>> not possible today?> > > The point isn't to open brave new worlds. The point is to avoid the> nightmare of having to maintain multiple "parallel" scripts doing the SAME> THING in multiple scripting languages.

+1, +1, +1

Couldn't agree more, I don't want to be in the business of having the same logic in multiple platform-specific scripts - doesn't make any sense.

Arun

-

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

On Sat, Nov 24, 2012 at 8:13 PM, Matt Foley <[EMAIL PROTECTED]> wrote:> For discussion, please see previous thread "[PROPOSAL] introduce Python as> build-time and run-time dependency for Hadoop and throughout Hadoop stack".>> This vote consists of three separate items:>> 1. Contributors shall be allowed to use Python as a platform-independent> scripting language for build-time tasks, and add Python as a build-time> dependency.> Please vote +1, 0, -1.>> 2. Contributors shall be encouraged to use Maven tasks in combination with> either plug-ins or Groovy scripts to do cross-platform build-time tasks,> even under ant in Hadoop-1.> Please vote +1, 0, -1.>> 3. Contributors shall be allowed to use Python as a platform-independent> scripting language for run-time tasks, and add Python as a run-time> dependency.> Please vote +1, 0, -1.>> Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to> use Maven plug-ins or Groovy as the only means of cross-platform build-time> tasks, or to simply continue using platform-dependent scripts as is being> done today.>> Vote closes at 12:30pm PST on Saturday 1 December.> ---------> Personally, my vote is +1, +1, +1.> I think #2 is preferable to #1, but still has many unknowns in it, and> until those are worked out I don't want to delay moving to cross-platform> scripts for build-time tasks.>> Best regards,> --Matt

-

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

It's not clear to me what kind of a vote this is. It seems closest toa code change vote, since it implies code changes, although without aspecific patch yet proposed. As such it would follow lazy consensusrules. Or is it merely intended as a straw poll, to gauge communityopinion?

Doug

-

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

It is intended to be a "technical discussion", in the sense of the bylawsstatement (in section "Roles and Responsibilities: Committers"), "Committersmay cast binding votes on any technical discussion regarding anysubproject." I therefore intended it to be a majority vote of Committers.

Interestingly, this need to discuss tooling and other issues that go beyonda simple "code change" is not addressed in the "Decision Making: Actions"section of the bylaws. That need seems to have been overlooked in thecurrent rev of that section. But I do not agree that such issues are "codechanges"; it relates to the tools we depend on to make code changes, whichis clearly qualitatively different.

> On Sat, Nov 24, 2012 at 12:13 PM, Matt Foley <[EMAIL PROTECTED]> wrote:> > Vote closes at 12:30pm PST on Saturday 1 December.>> It's not clear to me what kind of a vote this is. It seems closest to> a code change vote, since it implies code changes, although without a> specific patch yet proposed. As such it would follow lazy consensus> rules. Or is it merely intended as a straw poll, to gauge community> opinion?>> Doug>

-

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

On Mon, Dec 3, 2012 at 11:21 AM, Matt Foley <[EMAIL PROTECTED]> wrote:> It is intended to be a "technical discussion", in the sense of the bylaws> statement (in section "Roles and Responsibilities: Committers"), "Committers> may cast binding votes on any technical discussion regarding any> subproject." I therefore intended it to be a majority vote of Committers.

I'm not sure how you conclude that technical discussions are resolvedwith majority votes.

> Interestingly, this need to discuss tooling and other issues that go beyond> a simple "code change" is not addressed in the "Decision Making: Actions"> section of the bylaws. That need seems to have been overlooked in the> current rev of that section. But I do not agree that such issues are "code> changes"; it relates to the tools we depend on to make code changes, which> is clearly qualitatively different.

I don't see a striking difference between this and a proposed codechange. How is a -1 here fundamentally different than a veto on apatch submitted to HADOOP-9082?

Doug

-

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

> On Mon, Dec 3, 2012 at 11:21 AM, Matt Foley <[EMAIL PROTECTED]>> wrote:> > It is intended to be a "technical discussion", in the sense of the bylaws> > statement (in section "Roles and Responsibilities: Committers"),> "Committers> > may cast binding votes on any technical discussion regarding any> > subproject." I therefore intended it to be a majority vote of> Committers.>> I'm not sure how you conclude that technical discussions are resolved> with majority votes.>> http://www.apache.org/foundation/voting.html>> > Interestingly, this need to discuss tooling and other issues that go> beyond> > a simple "code change" is not addressed in the "Decision Making: Actions"> > section of the bylaws. That need seems to have been overlooked in the> > current rev of that section. But I do not agree that such issues are> "code> > changes"; it relates to the tools we depend on to make code changes,> which> > is clearly qualitatively different.>> I don't see a striking difference between this and a proposed code> change. How is a -1 here fundamentally different than a veto on a> patch submitted to HADOOP-9082?>> Doug>

-

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

This may be a little atypical but I don't see any harm. The HadoopPMC is willing to respect the veto of any committer as binding. I'dworry more if we tried to reduce vetoes to a subset of the PMC thanextend it to a superset.

Do you think this is problematic?

Doug

-

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

No, but it speaks to whether the Hadoop bylaws can extend the Apache votingprocedures and draw finer distinctions. For example, the Apache votingprocedures only identify 3 types of votable issue, while the Hadoop bylawsidentify 9 types of votable issues.

If we were forced to fit "development tools" into one of the threecategories cited by the Apache voting procedures, it would be fitting asquare peg in a round hole. Since we can instead look at the 9 categoriesprovided by the Hadoop bylaws, we can acknowledge that "development tools"was an overlooked category. But in my opinion it certainly doesn't fitinto the "code change" category. Tooling is a meta-issue regarding HOW wedo what needs to be done. In this case, whether we allow aplatform-independent solution, or force contributors to maintain parallelscripts in multiple platform-specific languages for no reason.

> On Mon, Dec 3, 2012 at 2:08 PM, Matt Foley <[EMAIL PROTECTED]> wrote:> > The apache voting process contradicts the Hadoop bylaws:> > http://www.apache.org/foundation/voting.html says that only PMC members> can> > make binding votes on code modification issues, but> > http://hadoop.apache.org/bylaws.html says that Committers can make> binding> > votes on them. Does that mean the Hadoop bylaws have to change?>> This may be a little atypical but I don't see any harm. The Hadoop> PMC is willing to respect the veto of any committer as binding. I'd> worry more if we tried to reduce vetoes to a subset of the PMC than> extend it to a superset.>> Do you think this is problematic?>> Doug>

-

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Hadoop's bylaws do draw finer distinctions than the Apache votingguidelines document, but we follow the same general principles thatare described there.

As I understand it, the rationale for using consensus for code is thateveryone needs to agree on everything in the codebase or we'vedisenfranchised some. We share a single code repository and we needto all agree on what goes into it. A release does not requiremajority since if someone doesn't agree on the timing of a releasethey can choose to make another at a different time, but every changethat goes into each release requires consensus. We also requireconsensus for committers and PMC member votes so that we have a groupthat's coherent and is able to reach consensus on code changes.

Re-writing bash scripts in Python is neither a release nor otherprocedural issue. It involves changes to the software we maintain andseems to fall clearly into the "code change" category.

If you disagree then perhaps you'd like to propose a change to thebylaws so that scripts have different rules than other kinds ofsoftware, but I don't yet see the rationale for such a change.

Doug

On Mon, Dec 3, 2012 at 5:22 PM, Matt Foley <[EMAIL PROTECTED]> wrote:> No, but it speaks to whether the Hadoop bylaws can extend the Apache voting> procedures and draw finer distinctions. For example, the Apache voting> procedures only identify 3 types of votable issue, while the Hadoop bylaws> identify 9 types of votable issues.>> If we were forced to fit "development tools" into one of the three> categories cited by the Apache voting procedures, it would be fitting a> square peg in a round hole. Since we can instead look at the 9 categories> provided by the Hadoop bylaws, we can acknowledge that "development tools"> was an overlooked category. But in my opinion it certainly doesn't fit> into the "code change" category. Tooling is a meta-issue regarding HOW we> do what needs to be done. In this case, whether we allow a> platform-independent solution, or force contributors to maintain parallel> scripts in multiple platform-specific languages for no reason.>> --Matt>>> On Mon, Dec 3, 2012 at 3:57 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:>>> On Mon, Dec 3, 2012 at 2:08 PM, Matt Foley <[EMAIL PROTECTED]> wrote:>> > The apache voting process contradicts the Hadoop bylaws:>> > http://www.apache.org/foundation/voting.html says that only PMC members>> can>> > make binding votes on code modification issues, but>> > http://hadoop.apache.org/bylaws.html says that Committers can make>> binding>> > votes on them. Does that mean the Hadoop bylaws have to change?>>>> This may be a little atypical but I don't see any harm. The Hadoop>> PMC is willing to respect the veto of any committer as binding. I'd>> worry more if we tried to reduce vetoes to a subset of the PMC than>> extend it to a superset.>>>> Do you think this is problematic?>>>> Doug>>

-

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Hi Doug,I didn't read your email until this morning, but I spent time overnightthinking about the Apache Way and reached similar conclusions. Whiletooling is broader in scope than a single code change, it is a technicalchoice that we all have to live with.

More importantly, "Community over Code" would suggest that if only slightlyless than 50% of the community is uncomfortable with adding Python to themix which is the Hadoop stack, then we probably shouldn't do it, regardlessof the technical merits.

Thanks to all who voted and contributed to the discussion.Best regards,--MattOn Mon, Dec 3, 2012 at 8:50 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:

> Hadoop's bylaws do draw finer distinctions than the Apache voting> guidelines document, but we follow the same general principles that> are described there.>> As I understand it, the rationale for using consensus for code is that> everyone needs to agree on everything in the codebase or we've> disenfranchised some. We share a single code repository and we need> to all agree on what goes into it. A release does not require> majority since if someone doesn't agree on the timing of a release> they can choose to make another at a different time, but every change> that goes into each release requires consensus. We also require> consensus for committers and PMC member votes so that we have a group> that's coherent and is able to reach consensus on code changes.>> Re-writing bash scripts in Python is neither a release nor other> procedural issue. It involves changes to the software we maintain and> seems to fall clearly into the "code change" category.>> If you disagree then perhaps you'd like to propose a change to the> bylaws so that scripts have different rules than other kinds of> software, but I don't yet see the rationale for such a change.>> Doug>> On Mon, Dec 3, 2012 at 5:22 PM, Matt Foley <[EMAIL PROTECTED]> wrote:> > No, but it speaks to whether the Hadoop bylaws can extend the Apache> voting> > procedures and draw finer distinctions. For example, the Apache voting> > procedures only identify 3 types of votable issue, while the Hadoop> bylaws> > identify 9 types of votable issues.> >> > If we were forced to fit "development tools" into one of the three> > categories cited by the Apache voting procedures, it would be fitting a> > square peg in a round hole. Since we can instead look at the 9> categories> > provided by the Hadoop bylaws, we can acknowledge that "development> tools"> > was an overlooked category. But in my opinion it certainly doesn't fit> > into the "code change" category. Tooling is a meta-issue regarding HOW> we> > do what needs to be done. In this case, whether we allow a> > platform-independent solution, or force contributors to maintain parallel> > scripts in multiple platform-specific languages for no reason.> >> > --Matt> >> >> > On Mon, Dec 3, 2012 at 3:57 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:> >> >> On Mon, Dec 3, 2012 at 2:08 PM, Matt Foley <[EMAIL PROTECTED]>> wrote:> >> > The apache voting process contradicts the Hadoop bylaws:> >> > http://www.apache.org/foundation/voting.html says that only PMC

-

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

i've been playing around writing a couple of maven plugins, one to replace saveversion.sh and the other to invoke protoc. they both work in windows standard cmd (no cygwin required). together with hadoop-8887 they would remove most of the scripting done the poms.

(they also work in linux and osx)

they are java based, only require having SVN GIT & PROTOC avail in the PATH.

if cmake works in windows, i assume hadoop-8887 would be almost there.

this would leave the tar stitching, which is done as script to handle SO symlinks. though i have and idea on how we could take care of it.

i'll be creating a jira momentarily.

thx

Alejandro

On Dec 4, 2012, at 12:28 PM, Matt Foley <[EMAIL PROTECTED]> wrote:

> Please close HADOOP-9073 as "will not fix", citing this discussion.> > I'm -1 on groovy in maven. That's worse, not better. Let it sit for a> while and let people propose simplifications of the script situation.> > Thanks,> --Matt> > > On Tue, Dec 4, 2012 at 11:41 AM, Radim Kolar <[EMAIL PROTECTED]> wrote:> >> result of vote is to close https://issues.apache.org/**>> jira/browse/HADOOP-9073<https://issues.apache.org/jira/browse/HADOOP-9073>and write groovy in pom.xml variant (option number 2)?>>

-

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

> i've been playing around writing a couple of maven plugins, one to replace> saveversion.sh and the other to invoke protoc. they both work in windows> standard cmd (no cygwin required). together with hadoop-8887 they would> remove most of the scripting done the poms.>> (they also work in linux and osx)>> they are java based, only require having SVN GIT & PROTOC avail in the> PATH.>> if cmake works in windows, i assume hadoop-8887 would be almost there.>> this would leave the tar stitching, which is done as script to handle SO> symlinks. though i have and idea on how we could take care of it.>> i'll be creating a jira momentarily.>> thx>> Alejandro>> On Dec 4, 2012, at 12:28 PM, Matt Foley <[EMAIL PROTECTED]> wrote:>> > Please close HADOOP-9073 as "will not fix", citing this discussion.> >> > I'm -1 on groovy in maven. That's worse, not better. Let it sit for a> > while and let people propose simplifications of the script situation.> >> > Thanks,> > --Matt> >> >> > On Tue, Dec 4, 2012 at 11:41 AM, Radim Kolar <[EMAIL PROTECTED]> wrote:> >> >> result of vote is to close https://issues.apache.org/**> >> jira/browse/HADOOP-9073<> https://issues.apache.org/jira/browse/HADOOP-9073>and write groovy in> pom.xml variant (option number 2)?> >>>

-

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

On Sat, Dec 01, 2012 at 10:44AM, Steve Loughran wrote:> On 1 December 2012 01:08, Eli Collins <[EMAIL PROTECTED]> wrote:> > > -1, 0, -1> >> > IIUC the only platform we plan to add support for that we can't easily> > support today (w/o an emulation layer like cygwin) is Windows, and it> > seems like making the bash scripts simpler and having parallel bat> > files is IMO a better approach.> >> >> WinNT Bat/CMD files are the worst possible scripting language invented. At> the very least, .py should be the language of choice there

Compare to the OS in question - it isn't _that_ bad ;)

-

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

On Sat, Dec 01, 2012 at 10:07PM, Eric Yang wrote:> -1, +1, -1> > Python has fairly inconsistent support across all major OS vendors. It is> hard to get it right unless the scripts are all designed to make use of> Python 2.4. However, Python 2.4 doesn't have necessary OS features to make> Python useful in runtime or build environment unless you write a lot of> custom modules. Which defeats the purpose to use python as intermediate> layer to do OS dependent work. Jruby may be a better choice.

JRuby? Really? Groovy is already there and it is really a Java dialect unlikeJRuby. And yes - it is quite suitable for build things, considering the use ofit in BigTop