Friday, June 6, 2008

Git Repack Parameters

A few people have asked my if I chose the parameters for Git’s repack correctly. Shouldn’t I use a higher --depth value than the default? Why did I pick a --window value of 250? Shouldn’t I have repacked with the default values?

To answer this first question last: no. I did these conversions as best as I could, in order to make a fair comparison. My assumption is that anyone converting their repository to Bazaar, Git or Mercurial knows what he or she is doing. Then why should I settle for less? Repacking a repository as tightly as I did is necessary only once, but it is an important step: git fast-import creates really bad packs in order to be fast, so a repack really helps. It is also suggested in the manpage to use a higher —window value than normal.

However, it got me curious on how the parameters (--depth and --window) influence final repository size. First I wanted to see if changing the --depth would have made a difference in final size. I repacked all repositories, with a depth value of either 50 (the default) or 100. I varied the window parameter over the values [10, 20, 50, 100, 150, 200, 250].

First let’s look at how the depth variable influences repository size.

As can be seen from this figure, increase of repack depth only influences repository size on a repack with a small window. As I used a window size of 250, the depth variable did not influence results much.

However, it’s also interesting to see how these variables affect other parameters. An example of this is repack time.

Repack time still increases with increasing window size. As a repository won’t be packed much tighter on a window of 250 than on a window of 100, you might as well choose a lower value for your window when doing an aggressive repack.

However, there is a more interesting interaction going on: the effect of the window parameter depends on the size of your repository. Let’s look at repositories of different sizes (See “Meet the Candidates” for a description of the repositories):

As can be seen, a higher window value will have an effect only on repositories that are actually quite large, like the emacs repository. If you have a small repository, there’s not much use to repacking with anything higher than --window=50, but if your repository is several hundreds MB’s, it skim off a few more megs.

(Please note that the repack times are done on an Intel iMac Core Duo, 2Ghz with 2GB RAM running OS X. Repacks are done with git repack -adf, which means that a repository will be completely packed. If you do a normal, incremental repack, expect to see much faster repacks.)

3 comments:

Personally, I tried on 7 different git repositories today, and it turned out that 'git gc --aggressive --prune' performs much faster than repack with window=250, *AND* the size of the repository is about the same for two methods. So, I'm sticking with first method.

Hi, I wonder what happens if you set "pack.compression" to a higher value. There seems to be room for even more efficient compression. In practice I believe that the default is a good compromise between speed and compression.

You have gathered nice statistics there. Wouldn't it be a good idea to adjust the window parameter used by "git gc --aggressive" based on this information? I'd like to encourage you to go and suggest Git developers to change the "git gc --aggressive" to use --window=50. It looks like it's only slightly slower than the current (that is --window=10 I think) and it makes gc --aggressive actually aggressive with big repositories too.

I think the benefit would be that for normal situations there wouldn't be any need for tweaking with "git repack". Just "git gc --aggressive" after repository conversion and that's it.