Using Git With Mozilla

October 23, 2010

This summer I went on a quest to improve my workflow. I wasn’t really happy with the standard Mercurial/mq approach used by most Mozilla developers. I spent a while experimenting with alternative ways of using Mercurial, and even did a fair amount of hacking on hg itself to fix some bugs and shortcomings. I wrote quite a long blog post about all of this and almost published it, but in the end I decided that it still wasn’t as good as I’d like it to be.

To its credit, Mercurial’s extension model made all this very doable, and I probably could have continued to cobble together a workflow that did what I wanted. However, I was pretty sure that Git did exactly what I wanted out of the box. So I gave it a shot, and it works even better than I’d hoped.

Brief Aside – A Ten-Second Introduction to Git

Git and Mercurial are quite similar – both use SHA-1 hashes to identify commits. The primary user-facing difference between Git and Mercurial is that Git branches are extremely lightweight. Git is essentially a user-space filesystem, where each commit is represented as file named by its SHA-1 hash. A branch is nothing more than a smart alias for a hash identifier. So a Git repository consists of 3 primary things (this is a bit of an oversimplification, but it’s fine for our purposes):

An objects directory, which contains a soup of commit files, bucketed into sub-directories.

A refs/heads directory, which contains one file for each named branch. So if I have a branch called collectunderpants whose latest commit is 7bc99958bc164028b94ec47dbf1fb1ad9034c580, there’s a file called refs/heads/collectunderpants whose contents is simply 7bc99958bc164028b94ec47dbf1fb1ad9034c580. That’s all git needs.

A file called HEAD containing the name of the current branch. This is important, because when I make a commit, Git needs to know which branch should be scooted forward to point to the new commit.

Getting Started

Suspend your disbelief for the time being and assume that I have a git repository called /files/mozilla/link that contains an up-to-date mirror of mozilla-central in git form (I’ll explain how this is done later).

$ cd /files/mozilla$ git clone link src

After a waiting a few moments, I now have a full git repository named src. The default branch is master, which I can see immediately because of a neat shell prompt trick (works best when put in ~/.profile):

So I’m on master. Unfortunately, I check TBPL and it looks like the tree is burning as a result of another Jonas Sicking push-and-run. The last green commit was 5 changesets back, so I want to base my work off of that.

(master) $ git checkout -b master-stable master~5(master-stable) $

This makes a new branch called master-stable based 5 commits back from the commit pointed to by master, and switches the working directory to it.

I make a .mozconfig, set the objdir to /files/mozilla/build/main, make -f client.mk, and go shoot some nerf darts at dolske. A short while later, I’ve got a full build waiting for me in /files/mozilla/build/main.

The ability to reference revisions symbolically (relative to either heads or branches) is really nice, and is something that I missed with Mercurial. Edit: bz points out in the comments that this is actually possible with Mercurial.

Now suppose I get an idea for a quick one-off patch, and hack on a few files. To save this work (along with its ancestry), I create a branch off the current head:

The first command creates a new branch called oneoff that points to the same commit as master-stable. The second creates a new commit containing the changes in the working directory. The reason for the -a option has to do with a git feature called the “index”, which is a staging area between your working directory and full-blown commits. I don’t want to digress too much, but you should definitely read more about it.

Remember that branches are just aliases to SHA-1 identifiers, which in turn are used to locate the actual commit in the soup. So oneoff is an alias for a SHA-1 identifier which points to the new commit. That commit knows the hash of its parent, which is the same hash pointed to by master-stable. Git commits are immutable, since their names a are cryptographic function of their contents (so if a commit changes, it’s really just a new commit). Furthermore, git is garbage collected when you call git gc. So objects in git are just like immutable objects in a garbage-collected language. For example, suppose we want to modify that commit we just made:

(oneoff) $ ...more hacking...(oneoff) $ git commit -a --amend

Normally git commit makes a new child of the previous commit. However the --amend option makes a new sibling that combines the previous commit with any working changes, and points the branch and head to it. The old commit is still there, but is now orphaned, and will be removed in the next call to git gc.

Rebasing

I use one branch per bug, and one commit per patch. This allows me to model my patches as a DAG, where patches are descendents of work they depend on. Contrast this with the MQ model, where a linear ordering is forced upon possibly unrelated patches.

Suppose I’m doing some architectural refactoring in a bug called substrate, and using the clean new architecture in a feature bug called bling. Initially, I start work on bling as follows:

At this point, I’d really like get back to working on bling, but unfortunately bling isn’t yet based on the latest patch in substrate. To fix this, we need to rebase:

(bling) $ git rebase --onto substrate bling~3..bling

This tells git to take all the changesets in the range (bling~3, bling] and apply them incrementally as commits on top of substrate. If there are conflicts, I’m given the opportunity to resolve them, or to abort the whole endeavor. Once the rebase is complete, the branch bling is updated to point to the new, rebased tip. Now I can reapply my work-in-progress and get back to business:

(bling) $ git stash pop

Rewriting History

My code is always perfect the first time I write it, but suppose for the sake of argument that Joe gets a bee in his bonnet and I have to alter patch 7 of 18 in bigbug to appease him. I could do it the long way:

So if I change the pick on the first line to edit (or just e), git brings me to that revision, lets me edit it, and does all the rebasing for me. Huzzah!

Pushing to Bugzilla

One nice bonus of git is an add-on developed by Owen Taylor called git-bz. I’ve made some modifications to it to make it more mozilla-friendly, and haven’t yet found the time to make them upstreamable. So in the mean time, I’d recommend that you grab my fork, git-bz-moz.

While it does a lot of things, my favorite part of git-bz is pushing to bugzilla. For credentials, git-bz uses login cookie of your most recently opened Firefox profile – so if you’re already logged into BMO things should work seamlessly. Let’s say I want to attach all 18 patches of bigbug to bug 513681. I run:

(bigbug) $ git bz attach --no-add-url -e 513681 HEAD~19..HEAD

And then I’m presented with a sequence of 18 files to edit in my editor, each of which looks like the following:

This pulls the relevant data from the bug, and let’s me do a lot in one edit. I can set the patch description, add a comment in the bug, edit the commit message (for facilitating hg qimport), obsolete other patches in the bug, flag for review, and grant self-review. I’ve found this to be a massive timesaver when working on many-part bugs.

When I want to push, I just qimportbz from the bug. This gives me an incentive to make sure that the patches committed are the ones on bugzilla.

Aside – I haven’t done much active development since the end of august, and git-bz just choked on the cookie database of a recent nightly when I tried it. A 3.6 profile still works fine though.Edit – dwitte points out in the comments that this is due to a change in the sqlite database format, and should be fixed by upgrading to sqlite 3.7.x.

Multiple Working Directories

The ability to multitask is crucial to being productive in the Mozilla ecosystem. I can be waiting on tryserver results for one patch, guidance from bz on a second, review from Jeff on a third, and a dependent patch from Joe for a fourth. I need to be able to work on multiple patches at once, and context-switch quickly.

In theory, multitasking with git is quite simple: just do a git checkout of the branch you want to work on. However, some code changes require significant rebuilding. For example, if I have a patch that modifies nsDocument.h, context-switching between that patch and any other patch incurs a massive recompilation burden.

I’ve heard through the grape-vine that bz manages this problem by having 8 different mercurial repositories (each with its own object directory), and economizing on space via hardlinks. This eliminates the recompilation burden, but doesn’t allow work to be easily shared between repositories. For example, I might want to give both bling and substrate separate object directories, but still be able to rebase bling on top of new code in substrate.

This gives me a full working directory and a lightweight repository that is composed mostly of symlinks to files in ../src/.git/. Everything is shared seamlessly between them, and just about the only thing private to the new repository is the HEAD file, which specifies the checked-out branch. I can then make a .mozconfig pointing to a new object directory in /files/mozilla/build/a, and build away.

The Link

Earlier in this post I promised to explain where /files/mozilla/link came from.

I initially started using git with a mirror maintained by Julien Rivaud. Unfortunately, there was some flakiness with the cron job, and the repository would often stop updating from mozilla-central. So I decided to generate my own mirror. Edit: Julien mentions in the comments that the repository should be reliable now. Give it a shot!

Long-story short: don’t use hg-git. It chokes miserably on mozilla-central. Instead, use hg-fast-export. Let it run overnight, and it should be done in the morning. Incremental updates are also very fast (roughly linear in the number of new commits), so I don’t ever find myself waiting for it.

Other Thoughts

From a general zippiness standpoint, git seems about 5 times faster than Mercurial. Your mileage may vary.

Overall, I really like the garbage-collection model of git. With Mercurial, rewriting history involves stripping entries out of the repository, which can be very slow. With git, unwanted objects go away just by redirecting pointers, and they’re still recoverable (with careful munging) until the next git gc.

I’ve found that I’m spending a lot less time dealing with merge conflicts than I did when I was using hg/mq. Git seems to be pretty smart about these things, and I think it uses 3 lines of context internally. In contrast, it’s standard to use 8 lines of context for mq patches so they can be easily exported to bugzilla. I’ve modified git-bz to generate 8 lines of context when posting to bugzilla, which allows me to be more efficient locally while still sharing my work in the appropriate format.

There’s lots more to say about git, but I think that this is enough for now. Share your experiences in the comments!

I did use hg-fast-export for a while, but the sad thing about it is that it can’t export directly from a remote mercurial repository. You need to first create a local clone for it to start from.

Another way to get mercurial repositories in git is git-remote-hg, but I had problems with it (it kept starting over as if it never cloned anything). I’d like to get it to work, and to give it write/push support… It has the advantage of being much more integrated with git (the hg repo is then just a “normal” git remote)

I found the reason for the frequent errors of hg: it was caused by the g+s flag set on the files by the friend hosting the updater; it turns out that hg sometimes replace files in its store by directories, and when doing so tries to preserve ownership and group; but it has no right to set the group to its wanted value and it fails.

Now the cron job seem to only fail when hg.mozilla.org answers with 500 internal error. So no more flakiness. Yay!

I still have to check for those errors and reenable the update (which disables itself when there is an error to prevent corruption of the hg repo, which used to be very frequent due to a bug in mercurial).

Bobby, berhaps I can give you my git-fast-export status files so that we are certain we work from the same basis, and give you push rights on the repo.or.cz mirror, so that any of us can update it (we would just have to warn each other when we change our version of git-fast-export to ensure we generate the same SHAs for the same commits).

I really think it is important to give others a “canonical” mirror to base upon, for once because it is sometimes impractical to handle the conversion locally, and because having a canonical mirror ensures everybody uses the same SHAs for the commits and thus can share work/recover (I once lost my local mirror and had to redo it completely, which is long)

I won’t be doing much active Mozilla work until august, so I’m probably not the best person to give push access to at the moment. If you’ve figured out the source of the lockup, perhaps the cron job is sufficient from here on out?

Hey, great post, thanks! First of all, I’m a big git fan and I loved your sharing your experiences. However, I just couldn’t make the switch from hg to git in my mozilla-related work because there’s a feature with the current hg/mq setup that I can’t easily recreate with git: it’s the fact that I can apply all of my patches just to create a “merged branch”, and build with all these patches together.

Let me explain. Clearly, if my patch B depends on my patch A, in hg I put patch B after patch A in my .hg/patches/series file, and in git, I just create a branch B from branch A and rebase accordingly (just like you said). Now let’s say I have a patch C that is in no way related to patch A or patch B. That’s a third branch. What if I want to build with patches A, B and C? With mq, it’s as simple as running hg qpush 2 (to apply the three patches). In git, I have to create a temporary branch and merge branch B and C together. Less convenient…

That’s a good point. The real answer is that this doesn’t come up all that often for me. mozilla-central is generally pretty stable, so the hydra approach works well for me. My small patches have a short enough development cycle that I can usually live without them for the few days that they take to land, and my big patches are invasive enough that I really do want them to be separate.

However, if I needed to do something like that, I’d probably just commit any common patches to master-stable (the common ancestor of most of my active work).