Speed up ivy resolution in builds with clever caching

Details

Description

An awful lot of time is spent in the ivy:resolve parts of the build, even when all of the dependencies have been fetched and cached. Profiling showed this was in XML parsing. I have a sort-of-ugly hack which speeds up incremental compiles (and more importantly "ant test") significantly using some ant macros to cache the resolved classpaths.

There's still a bit of room to improve here - SLF4J is missing some uptodate check which cause an unnecessary javac pass.

The way this works is that the macro writes the resolved classpath into build/ivy/ivy-resolve-cache/<project>-<conf> if it doesn't already exist. If that does exist, it loads that file into the appropriate properties as if ivy had done the resolution anew.

This patch is not final quality - it outputs some debugging printouts and could be cleaned up a bit, just wanted to see what people thought before spending the time to make it prettier.

You may wonder why this is starting as a MAPREDUCE patch and not HADOOP - no particular reason. If people like this I will do the same for common and HDFS.

Todd Lipcon
added a comment - 15/Oct/09 06:26 Attaching patch which speeds up a contrib-skipping null test case run from 32 seconds to 15 seconds wallclock.
todd@todd-laptop:~/git/hadoop-mapreduce$ time ant test -Dskip.contrib=1 -Dtestcase=xxxx > /dev/null
real 0m11.360s
user 0m15.897s
sys 0m1.436s
todd@todd-laptop:~/git/hadoop-mapreduce$ git reset --hard 'HEAD^'
HEAD is now at 8ac54e1 MAPREDUCE-1113. Fix mumak to not compile aspects with skip.contrib is set
todd@todd-laptop:~/git/hadoop-mapreduce$ time ant test -Dskip.contrib=1 -Dtestcase=xxxx > /dev/null
real 0m25.222s
user 0m32.870s
sys 0m1.948s
There's still a bit of room to improve here - SLF4J is missing some uptodate check which cause an unnecessary javac pass.
The way this works is that the macro writes the resolved classpath into build/ivy/ivy-resolve-cache/<project>-<conf> if it doesn't already exist. If that does exist, it loads that file into the appropriate properties as if ivy had done the resolution anew.
This patch is not final quality - it outputs some debugging printouts and could be cleaned up a bit, just wanted to see what people thought before spending the time to make it prettier.
You may wonder why this is starting as a MAPREDUCE patch and not HADOOP - no particular reason. If people like this I will do the same for common and HDFS.

Todd Lipcon
added a comment - 15/Oct/09 09:21 SLF4J is missing some uptodate check which cause an unnecessary javac pass.
Sorry, I misread some output there. The uptodate check is due to avro-generate regenerating its .java output for every build. I'll attack that in a later JIRA next time I get aggravated by slow builds

The long build times are a problem and ivy's a big part of that, but I agree with your assessment: this is a hack. I don't think the 15 second payoff justifies the maintenance cost of a custom caching layer for ivy.

Chris Douglas
added a comment - 04/Dec/09 09:48 The patch is stale.
The long build times are a problem and ivy's a big part of that, but I agree with your assessment: this is a hack. I don't think the 15 second payoff justifies the maintenance cost of a custom caching layer for ivy.

I don't think the 15 second payoff justifies the maintenance cost of a custom caching layer for ivy.

Comparing the 15 second payoff to the full build time isn't particular important to me. For me, the ability to quickly iterate on code while recompiling and rerunning unit tests is the big payoff - so I look at this as a 60% speedup in my development cycle rather than a few % speedup in the full build.

I may be in the minority, though, as I don't use eclipse or anything other fancy IDE that does incremental compilation.

Todd Lipcon
added a comment - 04/Dec/09 22:41 I don't think the 15 second payoff justifies the maintenance cost of a custom caching layer for ivy.
Comparing the 15 second payoff to the full build time isn't particular important to me. For me, the ability to quickly iterate on code while recompiling and rerunning unit tests is the big payoff - so I look at this as a 60% speedup in my development cycle rather than a few % speedup in the full build.
I may be in the minority, though, as I don't use eclipse or anything other fancy IDE that does incremental compilation.
Anyone else care to chime in?

> I look at this as a 60% speedup in my development cycle rather than a few % speedup in the full build.

I agree with this logic. My most common development cycle is to run a single unit test. For Avro this takes just a few seconds, and I'm willing to wait without finding a new task to work on. With Hadoop this takes long enough that I switch to doing something else, lose my context, etc. Improving this significantly will significantly improve many developers productivity.

I wonder if we can simply check if build/ivy/lib/Hadoop-Hdfs/

{common,test}

exist, and, if they do, assumes they're up-to-date, and only runs Ivy otherwise. Might that be simpler?

Doug Cutting
added a comment - 04/Dec/09 23:08 > I look at this as a 60% speedup in my development cycle rather than a few % speedup in the full build.
I agree with this logic. My most common development cycle is to run a single unit test. For Avro this takes just a few seconds, and I'm willing to wait without finding a new task to work on. With Hadoop this takes long enough that I switch to doing something else, lose my context, etc. Improving this significantly will significantly improve many developers productivity.
I wonder if we can simply check if build/ivy/lib/Hadoop-Hdfs/
{common,test}
exist, and, if they do, assumes they're up-to-date, and only runs Ivy otherwise. Might that be simpler?

Doug: the slowness is actually in the resolve task which generates the various classpath properties in ant. Without caching those properties to disk, there's no way to get around running ivy that I can think of. This patch essentially persists them to disk between runs, since the majority of the time they don't change.

Todd Lipcon
added a comment - 04/Dec/09 23:22 Doug: the slowness is actually in the resolve task which generates the various classpath properties in ant. Without caching those properties to disk, there's no way to get around running ivy that I can think of. This patch essentially persists them to disk between runs, since the majority of the time they don't change.

Comparing the 15 second payoff to the full build time isn't particular important to me. For me, the ability to quickly iterate on code while recompiling and rerunning unit tests is the big payoff

As a vi user, I got that. I haven't argued that the long build times are unimportant, but that a hack introducing a custom caching layer for classpaths is not, in my mind, a justifiable tradeoff in complexity. Maintaining black magic in the build is tedious and avoidable.

the slowness is actually in the resolve task which generates the various classpath properties in ant

Aren't the classpaths named? Would there be a way to short-circuit the resolution if it created/checked for a file mapped to that path?

My most common development cycle is to run a single unit test. For Avro this takes just a few seconds, and I'm willing to wait without finding a new task to work on.

As a workaround: depending on how often I'm running it, adding a main to the unit test is sometimes worthwhile.

Chris Douglas
added a comment - 05/Dec/09 00:28 Comparing the 15 second payoff to the full build time isn't particular important to me. For me, the ability to quickly iterate on code while recompiling and rerunning unit tests is the big payoff
As a vi user, I got that. I haven't argued that the long build times are unimportant, but that a hack introducing a custom caching layer for classpaths is not, in my mind, a justifiable tradeoff in complexity. Maintaining black magic in the build is tedious and avoidable.
the slowness is actually in the resolve task which generates the various classpath properties in ant
Aren't the classpaths named? Would there be a way to short-circuit the resolution if it created/checked for a file mapped to that path?
My most common development cycle is to run a single unit test. For Avro this takes just a few seconds, and I'm willing to wait without finding a new task to work on.
As a workaround: depending on how often I'm running it, adding a main to the unit test is sometimes worthwhile.

Todd Lipcon
added a comment - 05/Dec/09 00:35 Aren't the classpaths named? Would there be a way to short-circuit the resolution if it created/checked for a file mapped to that path?
That is exactly what this patch does...

Todd Lipcon
added a comment - 05/Dec/09 02:06 When the classpath is resolved, it's written out to a text file named for that variable. Then when it needs to be resolved again, if that file exists, it's loaded rather than re-resolving.

I thought the bulk of the problem was re-resolving these properties during the same run. Is that mistaken? The current proposal also works across runs, which could be helpful, but again: maintaining the build is already a pain. Adding a cache to a bad idea is a well established software engineering practice, but I'd favor either fixing our use of ivy or replacing it if middling performance requires this.

Chris Douglas
added a comment - 05/Dec/09 02:38 I thought the bulk of the problem was re-resolving these properties during the same run. Is that mistaken? The current proposal also works across runs, which could be helpful, but again: maintaining the build is already a pain. Adding a cache to a bad idea is a well established software engineering practice, but I'd favor either fixing our use of ivy or replacing it if middling performance requires this.

Todd Lipcon
added a comment - 05/Dec/09 05:13 Ivy already caches the resolves done in the same run, in theory, but there are a lot of "different" resolves, I think? The gain here is from caching between runs as you surmised.

Konstantin Boudnik
added a comment - 10/Dec/09 03:10 May be I'm barking on a wrong tree but I've tried to run a couple of commands in current MR trunk:
% time ant ivy-resolve-common
Buildfile: build.xml
...
BUILD SUCCESSFUL
Total time: 3 seconds
real 0m4.513s
user 0m5.186s
sys 0m0.616s
I got very close result for % time ant ivy-resolve-tree
so it seems to me that resolver works pretty damn fast considering the number of artifacts it needs to check. May be the latency is hiding somewhere else?

The issue is that the build ends up spawning a lot of subants, which each re-resolve everything. I get a total of 22 ivy-resolves even if I skip contrib! Part of this is that skip.contrib=1 still resolves all of the contrib stuff (MAPREDUCE-1113)

Todd Lipcon
added a comment - 18/Dec/09 21:56 The issue is that the build ends up spawning a lot of subants, which each re-resolve everything. I get a total of 22 ivy-resolves even if I skip contrib! Part of this is that skip.contrib=1 still resolves all of the contrib stuff ( MAPREDUCE-1113 )
todd@todd-laptop:~/git/hadoop-mapreduce$ ant test -Dskip.contrib=1 -Dtestcase=xxx 2>&1 | grep 'ivy-resolve' | wc -l
22
todd@todd-laptop:~/git/hadoop-mapreduce$ ant test -Dskip.contrib=1 -Dtestcase=xxx 2>&1 | grep 'ivy-resolve-common' | wc -l
19
Some of these are within the same ant run, so they get cached. But 16 of them actually do some non-cached work:
todd@todd-laptop:~/git/hadoop-mapreduce$ ant test -Dskip.contrib=1 -Dtestcase=xxx 2>&1 | grep 'resolving dependencies' | wc -l
16

Well, build.xml has 7 'retrieves' in it. If you all add contrib to this it's gonna be total mess (e.g. 22 re-resolutions). IMO fixing ivy doesn't make much sense - we'd be better off focusing on moving towards Maven.

Some of these are within the same ant run, so they get cached. But 16 of them actually do some non-cached work [...]

If I understand you correctly, the punchline is that improvements to intra-build caching are not only tedious, but also not a sound way of reducing the build time, as most classpaths are independently defined. So without fundamentally changing how dependencies are resolved, attacking the problem as in the current patch is the only way to effect a meaningful reduction. Is that the argument?

fixing ivy doesn't make much sense - we'd be better off focusing on moving towards Maven.

Is there a JIRA tracking progress in removing ivy? If it's not happening in the near term, then something like the current patch may be worth keeping in trunk during interim 0.22 development.

Chris Douglas
added a comment - 19/Dec/09 01:38 Some of these are within the same ant run, so they get cached. But 16 of them actually do some non-cached work [...]
If I understand you correctly, the punchline is that improvements to intra-build caching are not only tedious, but also not a sound way of reducing the build time, as most classpaths are independently defined. So without fundamentally changing how dependencies are resolved, attacking the problem as in the current patch is the only way to effect a meaningful reduction. Is that the argument?
fixing ivy doesn't make much sense - we'd be better off focusing on moving towards Maven.
Is there a JIRA tracking progress in removing ivy? If it's not happening in the near term, then something like the current patch may be worth keeping in trunk during interim 0.22 development.

Todd Lipcon
added a comment - 19/Dec/09 01:40 So without fundamentally changing how dependencies are resolved, attacking the problem as in the current patch is the only way to effect a meaningful reduction
Yes, as far as I'm aware, that's the case. Thanks for the concise way of explaining it.
To be completely honest, I'm nowhere near an expert in ant/ivy/maven/etc. I'm just a developer who doesn't use an IDE, and waiting 30-40 seconds every time I need to rerun a testcase got aggravating
Is there a JIRA tracking progress in removing ivy?
I'm not aware of any such.

Konstantin Boudnik
added a comment - 19/Dec/09 06:48 Is there a JIRA tracking progress in removing ivy? If it's not happening in the near term, then something like the current patch may be worth keeping in trunk during interim 0.22 development.
I know that folks here and there are eager to move to Maven, but I don't know how fast it might actually happen. This said I'm completely fine with having such short term solution in place.