From common-dev-return-80284-apmail-hadoop-common-dev-archive=hadoop.apache.org@hadoop.apache.org Wed Nov 21 19:26:02 2012
Return-Path:
X-Original-To: apmail-hadoop-common-dev-archive@www.apache.org
Delivered-To: apmail-hadoop-common-dev-archive@www.apache.org
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
by minotaur.apache.org (Postfix) with SMTP id 42A98DECA
for ; Wed, 21 Nov 2012 19:26:02 +0000 (UTC)
Received: (qmail 5975 invoked by uid 500); 21 Nov 2012 19:26:00 -0000
Delivered-To: apmail-hadoop-common-dev-archive@hadoop.apache.org
Received: (qmail 5884 invoked by uid 500); 21 Nov 2012 19:26:00 -0000
Mailing-List: contact common-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
List-Help:
List-Unsubscribe:
List-Post:
List-Id:
Reply-To: common-dev@hadoop.apache.org
Delivered-To: mailing list common-dev@hadoop.apache.org
Received: (qmail 5876 invoked by uid 99); 21 Nov 2012 19:26:00 -0000
Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136)
by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Nov 2012 19:26:00 +0000
X-ASF-Spam-Status: No, hits=1.5 required=5.0
tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS
X-Spam-Check-By: apache.org
Received-SPF: pass (athena.apache.org: domain of tucu@cloudera.com designates 209.85.212.48 as permitted sender)
Received: from [209.85.212.48] (HELO mail-vb0-f48.google.com) (209.85.212.48)
by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Nov 2012 19:25:55 +0000
Received: by mail-vb0-f48.google.com with SMTP id l22so6699996vbn.35
for ; Wed, 21 Nov 2012 11:25:34 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=google.com; s=20120113;
h=mime-version:in-reply-to:references:from:date:message-id:subject:to
:content-type:x-gm-message-state;
bh=YVTbbHsrA6SGJYRZjKc2/GHsMDQIKS9CvBriYD+Fm8E=;
b=eABw3kqvOIAWfBfgspzEVcr5a3wJAIZZ7opFUS4TBxvWLECnHnWJDQYdKx0t/s2dbZ
XaqgoN9XSlyUKcy9S5QUTGc5E9K1xe/bCtg4wSrvSdseuyKdXZbfACln4aYDYixXlWkH
SorGPFAEUFMDRK7UqHqMS467rXz4Ae8YGooXs0rFNcvl7AP1Ogn8Ssihnx0n2iJ53ZHN
pZeeZh4zW+BgJTvvk7FIMYjsagbWr0vKttkOLT2hKPsCTIGUra+r6nkNwriZR3G5Kw3t
9VjtHQGXAY30PD4cA2XxhxNbF97p2l6wSa6p9NimCHN8p76eJ/3Y/DOI2RbWaw8TJvP5
fcwQ==
Received: by 10.59.13.197 with SMTP id fa5mr31022136ved.47.1353525934624; Wed,
21 Nov 2012 11:25:34 -0800 (PST)
MIME-Version: 1.0
Received: by 10.58.218.194 with HTTP; Wed, 21 Nov 2012 11:25:04 -0800 (PST)
In-Reply-To:
References:
From: Alejandro Abdelnur
Date: Wed, 21 Nov 2012 11:25:04 -0800
Message-ID:
Subject: Re: [PROPOSAL] introduce Python as build-time and run-time dependency
for Hadoop and throughout Hadoop stack
To: "common-dev@hadoop.apache.org" , mattf@apache.org
Content-Type: multipart/alternative; boundary=089e0118499c87117804cf0650f4
X-Gm-Message-State: ALoCoQlE049ufN99wh8YxQePVBy/c1ed7Dv4y3c9ahiMx1cQXrOdiHuwpRtidL1fB533VuoZgP/L
X-Virus-Checked: Checked by ClamAV on apache.org
--089e0118499c87117804cf0650f4
Content-Type: text/plain; charset=ISO-8859-1
Hey Matt,
We already require java/mvn/protoc/cmake/forrest (forrest is hopefully on
its way out with the move of docs to APT)
Why not do a maven-plugin to do that?
Colin already has something to simplify all the cmake calls from the builds
using a maven-plugin (https://issues.apache.org/jira/browse/HADOOP-8887)
We could do the same with protoc, thus simplifying the POMs.
The saveVersion.sh seems like another prime candidate for a maven plugin,
and in this case it would not require external tools.
Does this make sense?
Thx
On Wed, Nov 21, 2012 at 11:15 AM, Matt Foley wrote:
> This discussion started in
> HADOOP-8924
> , where it was proposed to replace the build-time utility "saveVersion.sh"
> with a python script. This would require Python as a build-time
> dependency. Here's the background:
>
> Those of us involved in the branch-1-win port of Hadoop to Windows without
> use of Cygwin, have faced the issue of frequent use of shell scripts
> throughout the system, both in build time (eg, the utility
> "saveVersion.sh"),
> and run time (config files like "hadoop-env.sh" and the start/stop scripts
> in "bin/*" ). Similar usages exist throughout the Hadoop stack, in all
> projects.
>
> The vast majority of these shell scripts do not do anything platform
> specific; they can be expressed in a posix-conforming way. Therefore, it
> seems to us that it makes sense to start using a cross-platform scripting
> language, such as python, in place of shell for these purposes. For those
> rare occasions where platform-specific functionality really is needed,
> python also supports quite a lot of platform-specific functionality on both
> Linux and Windows; but where that is inadequate, one could still
> conditionally invoke a platform-specific module written in shell (for
> Linux/*nix) or powershell or bat (for Windows).
>
> The primary motive for moving to a cross-platform scripting language is
> maintainability. The alternative would be to maintain two complete suites
> of scripts, one for Linux and one for Windows (and perhaps others in the
> future). We want to avoid the need to update dual modules in two different
> languages when functionality changes, especially given that many Linux
> developers are not familiar with powershell or bat, and many Windows
> developers are not familiar with shell or bash.
>
> Regarding the choice of python:
>
> - There are already a few instances of python usage in Hadoop, such as
> the utility (currently broken) "relnotes.py", and massive usage of
> python
> in the examples/ and contrib/ directories.
> - Python is also used in Bigtop build-time.
> - The Python language is available for free on essentially all
> platforms, under an Apache-compatible
> license.
>
> - It is supported in Eclipse and similar IDEs.
> - Most importantly, it is widely accepted as a reasonably good OO
> scripting language, and it is easily learned by anyone who already knows
> shell or perl, or other common scripting languages.
> - On the Tiobe index of programming language
> popularity<
> http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html>,
> which seeks to measure the relative number of software engineers who
> know
> and use each language, Python far exceeds Perl and Ruby. The only more
> well-known scripting languages are PHP and Visual Basic, neither of
> which
> seems a prime candidate for this use.
>
> For build-time usage, I think we should immediately approve python as a
> build-time dependency, and allow people who are motivated to do so, to open
> jiras for migrating existing build-time shell scripts to python.
>
> For run-time, there is likely to be a lot more discussion. Lots of folks,
> including me, aren't real happy with use of active scripts for
> configuration, and various others, including I believe some of the Bigtop
> folks, have issues with the way the start/stop scripts work. Nevertheless,
> all those scripts exist today and are widely used. And they present an
> impediment to porting to Windows-without-cygwin.
>
> Nothing about run-time use of scripts has changed significantly over the
> past three years, and I don't think we should hold up the Windows port
> while we have a huge discussion about issues that veer dangerously into
> religious/aesthetic domains. It would be fun to have that discussion, but I
> don't want this decision to be dependent on it!
>
> So I propose that we go ahead and also approve python as a run-time
> dependency, and allow the inclusion of python scripts in place of current
> shell-based functionality. The unpleasant alternative is to spawn a bunch
> of powershell scripts in parallel to the current shell scripts, with a very
> negative impact on maintainability. The Windows port must, after all, be
> allowed to proceed.
>
> Let's have a discussion, and then I'll put both issues, separately, to a
> vote (unless we miraculously achieve consensus without a vote :-)
>
> I also encourage members of the other Hadoop-related projects, to carry
> this discussion into those forums. It would be very cool to agree on a
> whole-stack solution for the scripting problem.
>
> Best regards,
> --Matt
>
--
Alejandro
--089e0118499c87117804cf0650f4--