Patches applied in the last week (44)

Thursday, November 20, 2008

Reposting a recent darcs-users update from Ian (with some very slight adjustments on my part for the conversion to HTML)

Hi all,

I haven't written about camp in some time, and a lot has happened, so I figure I should send an e-mail. So, here's the first edition of the "Camp Irregular News", if you will :-)

Mailing list

Camp now has a mailing list. I'll probably continue to send things of more general interest to the darcs list, but camp-specific stuff will generally go to the camp list. For details, please see http://projects.haskell.org/camp/contact

Bug tracker

But the main reason that camp has acquired a mailing list is that camp also now has a bug tracker and I wanted somewhere for the ticket change messages to go. Fow now, this is really just a TODO list, with the major missing pieces listed.

Development

And some real work, too. At and around the sprint, I:

Implemented "chunky" hunks, which mean that we don't need to break a file up into lines and then join it back together again when applying hunk patches

Implemented primitive interactive patch selection. It's nothing fancy, but it makes it easier to work with than the all-or-nothing record that camp had before

General improvement, e.g. there is now a repository type, rather than just misusing FilePath

Worked out how to pkg-config, libcurl and Cabal to play nicely on Windows/MSYS/mingw

Made a darcs2camp tool

Implemented the "get" command

darcs2camp

darcs2camp is currently fiddly to build, as it needs to be linked against some of darcs's sources. In the near future it will either use libdarcs, or I'll fork a copy of darcs and wibble it until it just builds darcs2camp.

Due to working on each primitive patch separately, darcs2camp isn't the fastest beast in the world; on the 19766 megapatch (359470 primitive patches) GHC repo it takes me 1 hour 47 mins to convert from darcs to camp format. Then again, the original git conversion took 3 days, so it could be worse! And it shows a patches-converted count to keep you entertained.

The disk usage for darcs's patches directory is

disk usage

115M

actual number of bytes

49M

actual number of bytes when uncompressed

204M

Meanwhile, camp's patch file weighs in at 214M (which is both the actual number of bytes and the disk usage, as it's all in one file). There are a number of things going on here:

camp currently doesn't store any meta-data, so it should be a little more than 214M.

currently, if we store the primitive patch "name-3" inside the patch "name" then we store the string "name-3" even though we don't have to.

We could easily compress individual patches. Presumably if we did this with gzip then we'd get down to about 50M.

With a little work we could compress clumps of patches. However, gzipping the whole file only gets us down to 46M, so there is little to be gained there. bzip2ing the whole file gets us down to 38M.

"get"ing repos

And that means we can do timings etc for large repos easily.

Some timings for get and the ghc repo:

With darcs 1.0.9rc1, get takes around 5.5 seconds. However, I believe it's copying the pristine directory rather than actually applying the patches, which isn't safe if you can't lock the repo. However, "darcs check" takes 1 minute 45 seconds, and that does essentially the same work that "get" is supposed to

I haven't looked at optimising get with camp yet, but one thing that should definitely make a big difference is batching up multiple changes to a single file. It is common to get a megapatch which contains a sequence of n patches which change a hunk the same file. When applying such a megapatch, camp currently reads and writes the whole file n times, which obviously isn't optimal! IIRC that made a significant difference when we added it to darcs, and I expect it will for camp too.

camp is also cheating slightly, as it doesn't do a syntactic-validity check of the patches it is given before applying them. This means that it'll fail less prettily than it ought to. However, I'm not sure if darcs also cheats, and I don't expect that it will make much difference to the time taken anyway.

What next?

The above is mostly development stuff, mainly due to being at the sprint. I plan to focus more on theory stuff next. As you may have seen on the darcs list, I've started thinking about conflict marking, and I also have some patch theory proofs in my head that I need to get written down in the paper.

Monday, November 17, 2008

This is deliberately a very modest release, containing as few changesfrom darcs 2.1.0 as we can manage. The changes we have included areGHC 6.10 support and a bugfix for Windows (notably, one which makesdarcs help work from the DOS prompt without the need for MSYS orCygwin). See the attached ChangeLog for details.

So what's next? We have been doing a lot of work and we're eager toshow you some of the results. Our next release is scheduled for Januaryof 2009, with performance improvements from the darcs hacking sprint,improved Windows support, a new optional Cabal-based build and a firstcut at libdarcs.