I think there's general confusion right now about how the developmentprocess is supposed to work now that the project split has taken place. Or,at least, I am generally confused :) Here are a couple questions I havelingering:

- Whenever a task will need to touch both Common and one of the components(Mapred/HDFS) should there be two JIRAs or is it sufficient to have just one"HADOOP" JIRA with separate patches uploaded for the two repositories?

- If we're to do two separate JIRAs, is the best bet to use JIRA's "linking"feature to show the dependency between them? I often have HDFS or Mapredchanges that require a small addition or change to o.a.h.util, for example.On its own, that util change might look silly (and "unused") but if we don'tmake it easy to put stuff in util, we're going to end up with a lot of codeduplication between MR and HDFS.

- When we have one of these cross-project changes, how are we supposed to dobuilds? It looks like right now there's a core 21-dev jar checked into theMR and HDFS repositories. Does this mean that any time we have across-component change we'll need to get it committed to Common, do a jarbuild of common, copy that new jar into the component, and include that aspart of the component commit? This seems very ugly to me, and completelyinefficient while doing development. It will also make our git repositorysizes baloon like crazy since we'll have hundreds of different versions ofthe core dev jar.

If someone could write up some kind of "post-split transition guide" on theWiki I think that would be generally appreciated.

Todd Lipcon wrote:> - Whenever a task will need to touch both Common and one of the components> (Mapred/HDFS) should there be two JIRAs or is it sufficient to have just one> "HADOOP" JIRA with separate patches uploaded for the two repositories?

Two Jiras, I think. In the long run, such issues should be few. E.g., we should not be changing the FileSystem API incompatibly much.

> - If we're to do two separate JIRAs, is the best bet to use JIRA's "linking"> feature to show the dependency between them?

+1

> - When we have one of these cross-project changes, how are we supposed to do> builds?

I talked with Owen & Nigel about this, and we thought that, for now, it might be reasonable to have the mapreduce and hdfs trunk each contain an svn external link to the current common jar. Then folks can commit new versions of hadoop-common-trunk.jar as they commit changes to common's trunk. We'd need to remove or update this svn external link when branching. Thoughts?

Todd Lipcon wrote: >> If someone could write up some kind of "post-split transition guide" >> on the Wiki I think that would be generally appreciated.Here's something I wrote up for re-setting up Eclipse after the split in a way that gives relatively seamless access to the various projects' source code. If it's works for other people, it could be part of the wiki.

Setting up Eclipse post-dev:I believe my Eclipse dev environment is set up for near-seamless inter-project development following the Great Split of 2009. Here's how I did it, step-by-step, with no guarantee as to orthodoxy, correctness or appropriateness. Please let me know, or update, with corrections. Obligatory works-on-my-machine and YMMV.

With these instructions, you'll be able to work on all three projects at once. For instance, after the split, trying to look at the source of a class defined in Common, such as Text, will show you the bytecode in the included jar file. You can attach the source for review purposes, but this still leaves the problem of eclipse only running the code from the jar, rather than from the other project, in the event that you've modified both. These instructions fix this problem so that any changes, say in Common, are also applied to HDFS or MapReduce?.

These instructions assume svn and specifically svn from the command line. They will work fine with git once those repos are set up and also will work with the svn or git plugin from within eclipse.

1. Import each of the new repositories. Here're the directories that I picked.

2. Start Eclipse. For each of the new directories, create a new java project using that directory name, allowing Eclipse to do its standard work of importing the project. It was previously necessary to change Eclipse's default build directory (bin) to something else to keep it from wiping out Hadoop's bin directory. At the moment, only the common project has a bin directory. However, I still changed each of my eclipse build directories (set on the second screen of the new project wizard) to build/eclipse-files, just in case there is a bin added in the future and as it's tidier.

4. From the Navigator window, right click on the build.xml and (#2 Run as ant build). Among the targets, specify compile, compile-{common,hdfs,mapred}-test, and eclipse-files. Let Eclipse do its work and after it's done, each of the projects should compile successfully and be working correctly independently.

5. To allow the projects to call directly into each other's code, rather than relying on the bundled libraries, connect each of the projects as dependencies. For each project set up the natural dependency (hdfs relies on common, mapred relies on common and hdfs). Right click on the each project and go build path -> configure build path -> projects tab. For HDFS, add common. For MapReduce, add Common and HDFS. Unfortunately, you can't just add everything to everything, as Eclipse detects this as a cycle and throws up errors.

6. Finally, in order to force Eclipse to look at the other projects' source code, rather than the included jar files, remove the jar files from the build path of each project, respectively. From HDFS, remove the common(core) jar from the build path. From MapReduce, remove the hdfs and common(core) jars from the build path. HDFS still has a dependency on MapReduce for tests, and I couldn't get Eclipse to allow me to remove the MapReduce jar from HDFS project; if anyone figures out how, please update this document or let me know.

7. All should be well. Now, for instance, if you control-click on a class defined in Common from hdfs (say Text), you are brought to its definition in the common project, as expected. Also, if you modify code in common and run a test from hdfs, it'll pull in the modified code from common. This doesn't solve the problem of needing to generated patches for each of the projects if your changes affect each of them, however.

>> - When we have one of these cross-project changes, how are we supposed to>> do>> builds?>>>> I talked with Owen & Nigel about this, and we thought that, for now, it> might be reasonable to have the mapreduce and hdfs trunk each contain an svn> external link to the current common jar. Then folks can commit new versions> of hadoop-common-trunk.jar as they commit changes to common's trunk. We'd> need to remove or update this svn external link when branching. Thoughts?-1 to checking in jars. It's quite a bit of bloat in the repository (whichadmittedly affects the git.apache folks more than the svn folks), but it'salso cumbersome to develop.

It'd be nice to have a one-liner that builds the equivalent of the tarballbuilt by "ant binary" in the old world. When you're working on somethingthat affects both common and hdfs, it'll be pretty painful to make the jarsin common, move them over to hdfs, and then compile hdfs.

Could the build.xml in hdfs call into common's build.xml and build common aspart of building hdfs? Or perhaps have a separate "top-level" build filethat builds everything?

My 5 cents about svn:externals.I could not live without it but...I always tend to forget to update oursvn:externals and about once a month I wonder why I accidentally releasedbleeding edge code in our production environent *smile* (should've writtenthat auto-branching script waaay back ago)....

> Todd Lipcon wrote:>>> - Whenever a task will need to touch both Common and one of the components>> (Mapred/HDFS) should there be two JIRAs or is it sufficient to have just>> one>> "HADOOP" JIRA with separate patches uploaded for the two repositories?>>>> Two Jiras, I think. In the long run, such issues should be few. E.g., we> should not be changing the FileSystem API incompatibly much.>> - If we're to do two separate JIRAs, is the best bet to use JIRA's>> "linking">> feature to show the dependency between them?>>>> +1>> - When we have one of these cross-project changes, how are we supposed to>> do>> builds?>>>> I talked with Owen & Nigel about this, and we thought that, for now, it> might be reasonable to have the mapreduce and hdfs trunk each contain an svn> external link to the current common jar. Then folks can commit new versions> of hadoop-common-trunk.jar as they commit changes to common's trunk. We'd> need to remove or update this svn external link when branching. Thoughts?>> Doug>

> -1 to checking in jars. It's quite a bit of bloat in the repository (which> admittedly affects the git.apache folks more than the svn folks), but it's> also cumbersome to develop.>> It'd be nice to have a one-liner that builds the equivalent of the tarball> built by "ant binary" in the old world. When you're working on something> that affects both common and hdfs, it'll be pretty painful to make the jars> in common, move them over to hdfs, and then compile hdfs.>> Could the build.xml in hdfs call into common's build.xml and build common> as> part of building hdfs? Or perhaps have a separate "top-level" build file> that builds everything?>

Agree with Phillip here. Requiring a new jar to be checked in anywhere afterevery common commit seems unscalable and nonperformant. For git users thiswill make the repository size baloon like crazy (the jar is 400KB and wehave around 5300 commits so far = 2GB!). For svn users it will still meanthat every "svn update" requires a download of a new jar. Using svnexternals to manage them also complicates things when trying to work on across-component patch with two dirty directories - you really need a symlinkbetween your working directories rather than through the SVN tree.

I think it would be reasonable to require that developers check out astructure like:

working-dir/ hadoop-common/ hadoop-mapred/ hadoop-hdfs/

We can then use relative paths for the mapred->common and hdfs->commondependencies. Those who only work on HDFS or only work on mapred will nothave to check out the other, but everyone will check out common.

Whether there exists a fourth repository (eg hadoop-build) that has abuild.xml that ties together the other build.xmls is another open questionIMO.

Can build.xml be updated such that Ivy fetches recent (nightly) build?

HDFS could have a build target that builds common jar from a specified source location for common.

Raghu.Todd Lipcon wrote:> On Wed, Jul 1, 2009 at 2:10 PM, Philip Zeyliger <[EMAIL PROTECTED]> wrote:> >> -1 to checking in jars. It's quite a bit of bloat in the repository (which>> admittedly affects the git.apache folks more than the svn folks), but it's>> also cumbersome to develop.>>>> It'd be nice to have a one-liner that builds the equivalent of the tarball>> built by "ant binary" in the old world. When you're working on something>> that affects both common and hdfs, it'll be pretty painful to make the jars>> in common, move them over to hdfs, and then compile hdfs.>>>> Could the build.xml in hdfs call into common's build.xml and build common>> as>> part of building hdfs? Or perhaps have a separate "top-level" build file>> that builds everything?>>> > Agree with Phillip here. Requiring a new jar to be checked in anywhere after> every common commit seems unscalable and nonperformant. For git users this> will make the repository size baloon like crazy (the jar is 400KB and we> have around 5300 commits so far = 2GB!). For svn users it will still mean> that every "svn update" requires a download of a new jar. Using svn> externals to manage them also complicates things when trying to work on a> cross-component patch with two dirty directories - you really need a symlink> between your working directories rather than through the SVN tree.> > I think it would be reasonable to require that developers check out a> structure like:> > working-dir/> hadoop-common/> hadoop-mapred/> hadoop-hdfs/> > We can then use relative paths for the mapred->common and hdfs->common> dependencies. Those who only work on HDFS or only work on mapred will not> have to check out the other, but everyone will check out common.> > Whether there exists a fourth repository (eg hadoop-build) that has a> build.xml that ties together the other build.xmls is another open question> IMO.> > -Todd>

Another option (one that is used by Hive) is to have an ant macro that canbe overridden from the ant command line. This macro points to the locationof the common.jar. By default, it is set to the same value as it is now. Ifa developer has a common jar that is built in his/her directory, he/she canset this macro from the command line while compiling hdfs.

For example, ant testdoes the same as it does now, but ant -Dhadoop.common.jar=/home/dhruba/common/hadoop-common.jar test willpick up the common jar from my home directory.

> On Wed, Jul 1, 2009 at 2:10 PM, Philip Zeyliger <[EMAIL PROTECTED]>> wrote:>> > -1 to checking in jars. It's quite a bit of bloat in the repository> (which> > admittedly affects the git.apache folks more than the svn folks), but> it's> > also cumbersome to develop.> >> > It'd be nice to have a one-liner that builds the equivalent of the> tarball> > built by "ant binary" in the old world. When you're working on something> > that affects both common and hdfs, it'll be pretty painful to make the> jars> > in common, move them over to hdfs, and then compile hdfs.> >> > Could the build.xml in hdfs call into common's build.xml and build common> > as> > part of building hdfs? Or perhaps have a separate "top-level" build file> > that builds everything?> >>> Agree with Phillip here. Requiring a new jar to be checked in anywhere> after> every common commit seems unscalable and nonperformant. For git users this> will make the repository size baloon like crazy (the jar is 400KB and we> have around 5300 commits so far = 2GB!). For svn users it will still mean> that every "svn update" requires a download of a new jar. Using svn> externals to manage them also complicates things when trying to work on a> cross-component patch with two dirty directories - you really need a> symlink> between your working directories rather than through the SVN tree.>> I think it would be reasonable to require that developers check out a> structure like:>> working-dir/> hadoop-common/> hadoop-mapred/> hadoop-hdfs/>> We can then use relative paths for the mapred->common and hdfs->common> dependencies. Those who only work on HDFS or only work on mapred will not> have to check out the other, but everyone will check out common.>> Whether there exists a fourth repository (eg hadoop-build) that has a> build.xml that ties together the other build.xmls is another open question> IMO.>> -Todd>

> Hi Todd,>> Another option (one that is used by Hive) is to have an ant macro that can> be overridden from the ant command line. This macro points to the location> of the common.jar. By default, it is set to the same value as it is now. If> a developer has a common jar that is built in his/her directory, he/she can> set this macro from the command line while compiling hdfs.>> For example,> ant test> does the same as it does now, but> ant -Dhadoop.common.jar=/home/dhruba/common/hadoop-common.jar test will> pick up the common jar from my home directory.>> is this feasible?>

That's feasible, but it will still require having a built jar in one oranother repository after every new commit (yuck!) I imagine in hive's caseit's reasonably rare that you have to import a new hadoop dev jar in, sinceyou mostly target existing stable releases. This is going to be happeningall the time in MR/HDFS, at least for the forseeable future imho.

> On Wed, Jul 1, 2009 at 2:10 PM, Philip Zeyliger <[EMAIL PROTECTED]>> wrote:>> > -1 to checking in jars. It's quite a bit of bloat in the repository> (which> > admittedly affects the git.apache folks more than the svn folks), but> it's> > also cumbersome to develop.> >> > It'd be nice to have a one-liner that builds the equivalent of the> tarball> > built by "ant binary" in the old world. When you're working onsomething> > that affects both common and hdfs, it'll be pretty painful to make the> jars> > in common, move them over to hdfs, and then compile hdfs.> >> > Could the build.xml in hdfs call into common's build.xml and buildcommon> > as> > part of building hdfs? Or perhaps have a separate "top-level" buildfile> > that builds everything?> >>> Agree with Phillip here. Requiring a new jar to be checked in anywhere> after> every common commit seems unscalable and nonperformant. For git users this> will make the repository size baloon like crazy (the jar is 400KB and we> have around 5300 commits so far = 2GB!). For svn users it will still mean> that every "svn update" requires a download of a new jar. Using svn> externals to manage them also complicates things when trying to work on a> cross-component patch with two dirty directories - you really need a> symlink> between your working directories rather than through the SVN tree.>> I think it would be reasonable to require that developers check out a> structure like:>> working-dir/> hadoop-common/> hadoop-mapred/> hadoop-hdfs/>> We can then use relative paths for the mapred->common and hdfs->common> dependencies. Those who only work on HDFS or only work on mapred will not> have to check out the other, but everyone will check out common.>> Whether there exists a fourth repository (eg hadoop-build) that has a> build.xml that ties together the other build.xmls is another open question> IMO.>> -Todd>

>> -1 for committing the jar.>> Most of the various options proposed sound certainly better.>> Can build.xml be updated such that Ivy fetches recent (nightly) build?>

This seems slightly better than actually committing the jars. However, whatshould we do when the nightly build has failed hudson tests? We seem tosometimes go weeks at a time without a "green" build out of Hudson.>> HDFS could have a build target that builds common jar from a specified> source location for common.>

This is still my preffered option. Whether it does this with a <javac> taskor with some kind of <subant> or even <exec>, I think having the sourcetrees "loosely" tied together for developers is a must.-ToddTodd Lipcon wrote:

> On Wed, Jul 1, 2009 at 2:10 PM, Philip Zeyliger <[EMAIL PROTECTED]>> wrote:>> -1 to checking in jars. It's quite a bit of bloat in the repository>> (which>> admittedly affects the git.apache folks more than the svn folks), but it's>> also cumbersome to develop.>>>> It'd be nice to have a one-liner that builds the equivalent of the tarball>> built by "ant binary" in the old world. When you're working on something>> that affects both common and hdfs, it'll be pretty painful to make the>> jars>> in common, move them over to hdfs, and then compile hdfs.>>>> Could the build.xml in hdfs call into common's build.xml and build common>> as>> part of building hdfs? Or perhaps have a separate "top-level" build file>> that builds everything?>>>>> Agree with Phillip here. Requiring a new jar to be checked in anywhere> after> every common commit seems unscalable and nonperformant. For git users this> will make the repository size baloon like crazy (the jar is 400KB and we> have around 5300 commits so far = 2GB!). For svn users it will still mean> that every "svn update" requires a download of a new jar. Using svn> externals to manage them also complicates things when trying to work on a> cross-component patch with two dirty directories - you really need a> symlink> between your working directories rather than through the SVN tree.>> I think it would be reasonable to require that developers check out a> structure like:>> working-dir/> hadoop-common/> hadoop-mapred/> hadoop-hdfs/>> We can then use relative paths for the mapred->common and hdfs->common> dependencies. Those who only work on HDFS or only work on mapred will not> have to check out the other, but everyone will check out common.>> Whether there exists a fourth repository (eg hadoop-build) that has a> build.xml that ties together the other build.xmls is another open question> IMO.>> -Todd>>

Raghu Angadi wrote:> Can build.xml be updated such that Ivy fetches recent (nightly) build?>> HDFS could have a build target that builds common jar from a specified > source location for common.>+1 This is the standard way to go. Ivy can fetch the latest nightly build or from the last successful build (default).

>> For example,> ant test> does the same as it does now, but> ant -Dhadoop.common.jar=/home/dhruba/common/hadoop-common.jar test will> pick up the common jar from my home directory.+1; we'll also need the same sort of thing for the daemons.

Regardless of whether or not jars are checked in, something like this is amust.

A last Hudson-approved maven target would be very useful, too, especiallyfor folks wishing to depend on a bleeding edge but not bloody version. Butthere's no need that those built binaries need to go into the samerepository as the source: Hudson could manage them in a separate repository(or even directory structure).

On Wed, Jul 1, 2009 at 6:45 PM, Todd Lipcon<[EMAIL PROTECTED]> wrote:> Agree with Phillip here. Requiring a new jar to be checked in anywhere after> every common commit seems unscalable and nonperformant. For git users this> will make the repository size baloon like crazy (the jar is 400KB and we> have around 5300 commits so far = 2GB!).

This is silly. Obviously, just like the source the jars compressacross versions very well.

> I think it would be reasonable to require that developers check out a> structure like:>> working-dir/> hadoop-common/> hadoop-mapred/> hadoop-hdfs/

-1 They are separate subprojects. In the medium term, mapreduce andhdfs should compile and run against the released version common.Checking in the jars is a temporary step while the interfaces incommon stabilize. Furthermore, I expect the volume in common should bemuch lower than in mapreduce or hdfs.

+1. Using ant command line parameters for Ivy, the hdfs and mapreduce builds can depend on the latest Common build from one of:a) a local filesystem ivy repo/directory (ie. a developer build of Common that is published automatically to local fs ivy directory)b) a maven repo (ie. a stable published signed release of Common)c) a URL

Option c can be a stable URL to that last successful Hudson build and is in fact what all the Hudson hdfs and mapreduce builds could be configured to use. An example URL would be something like:

Giri is creating a patch for this and will respond with more insight on how this might work.

> This seems slightly better than actually committing the jars. > However, what> should we do when the nightly build has failed hudson tests? We seem > to> sometimes go weeks at a time without a "green" build out of Hudson.

Hudson creates a "lastSuccessfulBuild" link that should be used in most cases (see my example above). If Common builds are failing we need to respond immediately. Same for other sub-projects. We've got to drop this culture that allows failing/flaky unit tests to persist.

>>>> HDFS could have a build target that builds common jar from a >> specified>> source location for common.>>>> This is still my preffered option. Whether it does this with a > <javac> task> or with some kind of <subant> or even <exec>, I think having the > source> trees "loosely" tied together for developers is a must.

> > > On Jul 1, 2009, at 10:16 PM, Todd Lipcon wrote:> >> On Wed, Jul 1, 2009 at 10:10 PM, Raghu Angadi <rangadi@yahoo->> inc.com> wrote:>> >>> >>> -1 for committing the jar.>>> >>> Most of the various options proposed sound certainly better.>>> >>> Can build.xml be updated such that Ivy fetches recent (nightly)>>> build?> > +1. Using ant command line parameters for Ivy, the hdfs and mapreduce> builds can depend on the latest Common build from one of:> a) a local filesystem ivy repo/directory (ie. a developer build of> Common that is published automatically to local fs ivy directory)> b) a maven repo (ie. a stable published signed release of Common)> c) a URL>

The standard approach to this problem is the above -- a local file systemrepository, with local developer build output, and a shared repository withbuild-system blessed content.A developer can choose which to use based on their needs.

For ease of use, there is always a way to trigger the dependency chain for a"full" build. Typically with Java this is a master ant script or a mavenPOM. The build system must either know to build all at once with the properdependency order, or versions are decoupled and dependency changes happenonly when manually triggered (e.g. Hdfs at revision 9999 uses common 9000,and then a check-in pushes hdfs 10000 to use a new common version).Checking in Jars is usually very frowned upon. Rather, metadata is checkedin -- the revision number and branch that can create the jar, and the jarcan be fetched from a repository or built with that metadata.

AFAICS those are the only two options -- tight coupling, or strictseparation. The latter means that changes to common aren't picked up byhdfs or mpareduce until the dependent version is incremented in the metadata(harder and more restrictive to devs), and the former means that all areessentially the same coupled version (more complicated on the build systemside but easy for devs).Developers can span both worlds, but the build system has to pick only one.> Option c can be a stable URL to that last successful Hudson build and> is in fact what all the Hudson hdfs and mapreduce builds could be> configured to use. An example URL would be something like:> > http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/lastSuccessfulBu> ild/artifact/> ...> > Giri is creating a patch for this and will respond with more insight> on how this might work.> >> This seems slightly better than actually committing the jars.>> However, what>> should we do when the nightly build has failed hudson tests? We seem>> to>> sometimes go weeks at a time without a "green" build out of Hudson.> > Hudson creates a "lastSuccessfulBuild" link that should be used in> most cases (see my example above). If Common builds are failing we> need to respond immediately. Same for other sub-projects. We've got> to drop this culture that allows failing/flaky unit tests to persist.> >>> >>> HDFS could have a build target that builds common jar from a>>> specified>>> source location for common.>>> >> >> This is still my preffered option. Whether it does this with a>> <javac> task>> or with some kind of <subant> or even <exec>, I think having the>> source>> trees "loosely" tied together for developers is a must.> > -1. If folks really want this, then let's revert the project split. :-o> > Nige> >

I have been using ivy for a multi-module project which have similar independent modules built separately.

I think the way to got is like :1. Let the projects be checked out independently and in arbitrary directory structure2. Define ivy repositories in the following order : local, maven3. Add ivy-publish-local targets which publishes jars to local ivy repository, with publish.status="integration"4. Add common as a dependency to mapred and hdfs in ivy.xml (not checked in as jar, or svn:external) <dependency name="hadoop-common" rev="latest.integration" changing="true"/> changing="true" forces to refresh the cache in case a new version is deployed locally.

for example when building mapred, if common is deployed locally it is used, if not it is fetched from maven (assuming we have deployed to maven).

I'm not so sure about rev="latest.integration" will work for fetching from maven, but if not, we can plug our LatestStrategy to ivy.

Scott Carey wrote:> On 7/1/09 11:58 PM, "Nigel Daley" <[EMAIL PROTECTED]> wrote:>> >> On Jul 1, 2009, at 10:16 PM, Todd Lipcon wrote:>>>> >>> On Wed, Jul 1, 2009 at 10:10 PM, Raghu Angadi <rangadi@yahoo->>> inc.com> wrote:>>>>>> >>>> -1 for committing the jar.>>>>>>>> Most of the various options proposed sound certainly better.>>>>>>>> Can build.xml be updated such that Ivy fetches recent (nightly)>>>> build?>>>> >> +1. Using ant command line parameters for Ivy, the hdfs and mapreduce>> builds can depend on the latest Common build from one of:>> a) a local filesystem ivy repo/directory (ie. a developer build of>> Common that is published automatically to local fs ivy directory)>> b) a maven repo (ie. a stable published signed release of Common)>> c) a URL>>>> >> The standard approach to this problem is the above -- a local file system> repository, with local developer build output, and a shared repository with> build-system blessed content.> A developer can choose which to use based on their needs.>> For ease of use, there is always a way to trigger the dependency chain for a> "full" build. Typically with Java this is a master ant script or a maven> POM. The build system must either know to build all at once with the proper> dependency order, or versions are decoupled and dependency changes happen> only when manually triggered (e.g. Hdfs at revision 9999 uses common 9000,> and then a check-in pushes hdfs 10000 to use a new common version).> Checking in Jars is usually very frowned upon. Rather, metadata is checked> in -- the revision number and branch that can create the jar, and the jar> can be fetched from a repository or built with that metadata.>> AFAICS those are the only two options -- tight coupling, or strict> separation. The latter means that changes to common aren't picked up by> hdfs or mpareduce until the dependent version is incremented in the metadata> (harder and more restrictive to devs), and the former means that all are> essentially the same coupled version (more complicated on the build system> side but easy for devs).> Developers can span both worlds, but the build system has to pick only one.>>> >> Option c can be a stable URL to that last successful Hudson build and>> is in fact what all the Hudson hdfs and mapreduce builds could be>> configured to use. An example URL would be something like:>>>> http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/lastSuccessfulBu>> ild/artifact/>> ...>>>> Giri is creating a patch for this and will respond with more insight>> on how this might work.>>>> >>> This seems slightly better than actually committing the jars.>>> However, what>>> should we do when the nightly build has failed hudson tests? We seem>>> to>>> sometimes go weeks at a time without a "green" build out of Hudson.>>> >> Hudson creates a "lastSuccessfulBuild" link that should be used in

Marcus Herou wrote:> Hi.> > My 5 cents about svn:externals.> I could not live without it but...I always tend to forget to update our> svn:externals and about once a month I wonder why I accidentally released> bleeding edge code in our production environent *smile* (should've written> that auto-branching script waaay back ago)....>

I have separate VMWare images for cutting releases; keeps things very isolated at the expense of yet another linux image to keep up to date, and even the OS updates can burn you if it decides to put gcj back on the path.

Todd Lipcon wrote:> On Wed, Jul 1, 2009 at 2:10 PM, Philip Zeyliger <[EMAIL PROTECTED]> wrote:> >> -1 to checking in jars. It's quite a bit of bloat in the repository (which>> admittedly affects the git.apache folks more than the svn folks), but it's>> also cumbersome to develop.>>>> It'd be nice to have a one-liner that builds the equivalent of the tarball>> built by "ant binary" in the old world. When you're working on something>> that affects both common and hdfs, it'll be pretty painful to make the jars>> in common, move them over to hdfs, and then compile hdfs.>>>> Could the build.xml in hdfs call into common's build.xml and build common>> as>> part of building hdfs? Or perhaps have a separate "top-level" build file>> that builds everything?>>> > Agree with Phillip here. Requiring a new jar to be checked in anywhere after> every common commit seems unscalable and nonperformant. For git users this> will make the repository size baloon like crazy (the jar is 400KB and we> have around 5300 commits so far = 2GB!). For svn users it will still mean> that every "svn update" requires a download of a new jar. Using svn> externals to manage them also complicates things when trying to work on a> cross-component patch with two dirty directories - you really need a symlink> between your working directories rather than through the SVN tree.> > I think it would be reasonable to require that developers check out a> structure like:> > working-dir/> hadoop-common/> hadoop-mapred/> hadoop-hdfs/> > We can then use relative paths for the mapred->common and hdfs->common> dependencies. Those who only work on HDFS or only work on mapred will not> have to check out the other, but everyone will check out common.> > Whether there exists a fourth repository (eg hadoop-build) that has a> build.xml that ties together the other build.xmls is another open question> IMO.

1. you can have a build file on top that uses <ivy:buildlist> to create a correctly ordered list of child projects

For this to work you need a common set of build file targets (clean, release, tested)

Todd Lipcon wrote:> On Wed, Jul 1, 2009 at 10:10 PM, Raghu Angadi <[EMAIL PROTECTED]> wrote:> >> -1 for committing the jar.>>>> Most of the various options proposed sound certainly better.>>>> Can build.xml be updated such that Ivy fetches recent (nightly) build?>>> > This seems slightly better than actually committing the jars. However, what> should we do when the nightly build has failed hudson tests? We seem to> sometimes go weeks at a time without a "green" build out of Hudson.> > >> HDFS could have a build target that builds common jar from a specified>> source location for common.>>> > This is still my preffered option. Whether it does this with a <javac> task> or with some kind of <subant> or even <exec>, I think having the source> trees "loosely" tied together for developers is a must.

Here's a stripped down version of how we order our child builds; ivy works out the order to build things. Anything in the subdir external/ gets build too, so I can play games with other projects and symbolic links

Owen O'Malley wrote:> On Wed, Jul 1, 2009 at 6:45 PM, Todd Lipcon<[EMAIL PROTECTED]> wrote:>> Agree with Phillip here. Requiring a new jar to be checked in anywhere after>> every common commit seems unscalable and nonperformant. For git users this>> will make the repository size baloon like crazy (the jar is 400KB and we>> have around 5300 commits so far = 2GB!).> > This is silly. Obviously, just like the source the jars compress> across versions very well.> >> I think it would be reasonable to require that developers check out a>> structure like:>>>> working-dir/>> hadoop-common/>> hadoop-mapred/>> hadoop-hdfs/> > -1 They are separate subprojects. In the medium term, mapreduce and> hdfs should compile and run against the released version common.> Checking in the jars is a temporary step while the interfaces in> common stabilize. Furthermore, I expect the volume in common should be> much lower than in mapreduce or hdfs.>

There are various use cases here

-people working in hdfs who don't need mapred (though they should for regression testing their work) but do need a stable common-people working in mapred who need a working common/hdfs-someone trying to work across all three (or in common, which is effectively that from a regression testing viewpoint)-someone who just wants all the code for debugging/using mapreduce or other bits of hadoo

For anyone who is playing in at the source level where they are getting changing libraries, having the separate projects in subdirs with common targets is invaluable; ivy can do the glue. But at the same time, should you require everyone working on mapred to pull down and build common and hdfs?

Based on the discussions we have the first version of the patch uploaded to jira HADOOP-5107This patch can be used for publishing and resolving hadoop artifacts for a repository.1) Publishing/resolving common/hdfs/mapred artifacts to/from the local filesystem.

ant ivy-publish-local would publish the jars locally to ${ivy.repo.dir} which defaults to ${user.home}/ivyrepoant -Dresolver=local would resolve artifacts from the local filesystem which resolves artifacts from ${user.home}/ivyrepo

2) Publishing artifacts to the people.apache.org

ssh resolver is configured which publishes common/hdfs/mapred artifacts to my home folder /home/gkesavan/ivyrepo

If someone can me tell about using people's repository I can recreate the patch to publish ivy artifacts to people server's standard repository.

Thanks,Giri

> -----Original Message-----> From: Scott Carey [mailto:[EMAIL PROTECTED]]> Sent: Thursday, July 02, 2009 10:32 PM> To: [EMAIL PROTECTED]> Subject: Re: Developing cross-component patches post-split> > > On 7/1/09 11:58 PM, "Nigel Daley" <[EMAIL PROTECTED]> wrote:> > >> >> > On Jul 1, 2009, at 10:16 PM, Todd Lipcon wrote:> >> >> On Wed, Jul 1, 2009 at 10:10 PM, Raghu Angadi <rangadi@yahoo-> >> inc.com> wrote:> >>> >>>> >>> -1 for committing the jar.> >>>> >>> Most of the various options proposed sound certainly better.> >>>> >>> Can build.xml be updated such that Ivy fetches recent (nightly)> >>> build?> >> > +1. Using ant command line parameters for Ivy, the hdfs and> mapreduce> > builds can depend on the latest Common build from one of:> > a) a local filesystem ivy repo/directory (ie. a developer build of> > Common that is published automatically to local fs ivy directory)> > b) a maven repo (ie. a stable published signed release of Common)> > c) a URL> >> > The standard approach to this problem is the above -- a local file> system> repository, with local developer build output, and a shared repository> with> build-system blessed content.> A developer can choose which to use based on their needs.> > For ease of use, there is always a way to trigger the dependency chain> for a> "full" build. Typically with Java this is a master ant script or a> maven> POM. The build system must either know to build all at once with the> proper> dependency order, or versions are decoupled and dependency changes> happen> only when manually triggered (e.g. Hdfs at revision 9999 uses common> 9000,> and then a check-in pushes hdfs 10000 to use a new common version).> Checking in Jars is usually very frowned upon. Rather, metadata is> checked> in -- the revision number and branch that can create the jar, and the> jar> can be fetched from a repository or built with that metadata.> > AFAICS those are the only two options -- tight coupling, or strict> separation. The latter means that changes to common aren't picked up> by> hdfs or mpareduce until the dependent version is incremented in the> metadata> (harder and more restrictive to devs), and the former means that all> are> essentially the same coupled version (more complicated on the build> system> side but easy for devs).> Developers can span both worlds, but the build system has to pick only> one.> > > > Option c can be a stable URL to that last successful Hudson build and> > is in fact what all the Hudson hdfs and mapreduce builds could be> > configured to use. An example URL would be something like:> >> > http://hudson.zones.apache.org/hudson/job/Hadoop-Common-> trunk/lastSuccessfulBu> > ild/artifact/> > ...> >> > Giri is creating a patch for this and will respond with more insight

I agree with you on this; We have some common places on people server people.apache.org/repo and people.apache.org/repository but I've not seen any ivy artifacts being published there.> -----Original Message-----[Giridharan Kesavan] > From: Owen O'Malley [mailto:[EMAIL PROTECTED]]> Sent: Friday, July 17, 2009 11:15 PM> To: [EMAIL PROTECTED]> Subject: Re: Developing cross-component patches post-split> > > On Jul 16, 2009, at 3:54 AM, Giridharan Kesavan wrote:> > > 2) Publishing artifacts to the people.apache.org> >> > ssh resolver is configured which publishes common/hdfs/mapred> > artifacts to my home folder /home/gkesavan/ivyrepo> > I think that publishing to your home directory is a mistake. We need> some common repository.> > -- Owen

Giridharan Kesavan wrote:> I agree with you on this; > We have some common places on people server > people.apache.org/repo and people.apache.org/repository but I've not seen any ivy artifacts being published there.

Writing things there causes them to be published to repo{1,2}.maven.org.

FWIW, I used Ivy's makepom Ant task and Ant's checksum Task to create these files, then ran gpg manaually to sign them, as with normal Apache releases, then used scp to post them to people.apache.org. I created the maven-metadata.xml files by hand, although I'm not sure they're required. YMMV.

I have not used the snapshot repositories, but they should behave similarly.

Doug

NEW: Monitor These Apps!

All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by Sematext