My Git Habits
Miles Gould asked his Twitter followers whether they used git-add
-p or git-commit -a and how often. My reply was too
long for Twitter, so here it is.

First the short version: I use git-add -p frequently, and
git-commit -a almost never. The exception is when I'm working
on the repo that holds my blog, where I rarely commit changes to more
than one or two files at a time. Then I'll usually just
git-commit -a -m ....

But I use git-add -p all the time. Typically what will happen
is that I will be developing some fairly complicated feature. It will
necessitate a bunch of changes and reshuffling elsewhere in the
system. I'll make commits on the topic branch as I go along without
worrying too much about whether the commits are neatly packaged.

Often I'll be in the middle of something, with a dirty work tree, when
it's time to leave for the day. Then I'll just commit everything with
the subject WIP ("work-in-progress"). First thing the next
morning I'll git-reset HEAD^ and continue where I left
off.

So the model is that the current head is usually a terrible mess,
accumulating changes as it moves forward in time. When I'm done, I
will merge the topic into master and run the tests.

If they pass, I am not finished. The merge I just created is only a
draft merge. The topic branch is often full of all sorts of garbage,
commits where I tried one approach, found it didn't work later on, and
then tried a different approach, places where I committed debugging
code, and so on. So it is now time to clean up the topic branch. Only
the cleaned-up topic branch gets published.

Cleaning up messy topic branches

The core of the cleanup procedure is to reset the head back to the
last place that look good, possibly all the way back to the merge-base
if that is not too long ago. This brings all the topic changes into
the working directory. Then:

Compose the commits: Repeat until the working tree is clean:

Eyeball the output of git-diff

Think of an idea for an intelligible commit

Use git-add -p to stage the planned commit

Use git diff --cached to make sure it makes sense

Commit it

Order the commits: Use git-rebase --interactive

Notice that this separates the work of composing the commits from the
work of ordering them. This is more important than it might appear.
It would be extremely difficult to try to do these at the same time.
I can't know the sensible order for the commits until I know what the
commits are! But it's very hard to know what the commits are without
actually making them.

By separating these tasks, I can proceed something like this: I
eyeball the diff, and the first thing I see is something about the
penguin feature. I can immediately say "Great, I'll make up a commit
of all the stuff related to the penguin feature", and proceed to the
git-add -p step without worrying that there might be other
stuff that should precede the penguin feature in the commit sequence.
I can focus on just getting the penguin commit right without needing
to think about any of the other changes.

When the time comes to put the commits in order, I can do it well
because by then I have abstracted away all the details, and reduced
each group of changes to a single atomic unit with a one-line
description.

For the most complicated cases, I will print out the diffs, read them
over, and mark them up in six colors of highlighter: code to throw
away gets marked in orange; code that I suspect is erroneous is pink.
I make many notes in pen to remind me how I want to divide up the
changes into commits. When a commit occurs to me I'll jot a numbered
commit message, and then mark all the related parts of the diff with
that number. Once I have the commits planned, I'll reset the topic
ref and then run through the procedure above, using git-add
-p repeatedly to construct the commits I planned on paper. Since
I know ahead of time what they are I might do them in the right order,
but more likely I'll just do them in the order I thought of them and
then reorder them at the end, as usual.

For simple cases I'll just do a series of git-rebase
--interactive passes, pausing at any leftover WIP
commits to run the loop above, reordering the commits to squash
related commits together, and so on.

The very simplest cases of all require no cleanup, of course.

For example, here's my current topic branch, called c-domain,
with the oldest commits at the top:

3c5cdd4 (a) was the end-of-day state for yesterday; I made it and
pushed it just before I dashed out the door to go home. Such commits
rarely survive beyond the following morning, but if I didn't make them,
I wouldn't be able to continue work from home if the mood took me to
do that.

f64361f (b) is a prime candidate for later squashing. 5c218fb (c)
introduced a module with a "croak" method. This turned out to be a
stupid idea, because this conflicted with the croak function
from Perl's Carp module, which we use everywhere. I needed
to rename it. By then, the intervening commit already existed. I
probably should have squashed these right away, but I didn't think of
it at the time. No problem! Git means never having to say "If only
I'd realized sooner."

Similarly, 6083a97 (e) added a days_in_year function that I later
decided at 87f3b09 (d) should be in a utility module in a
different repository. 87f3b09 will eventually be squashed into
6083a97 so that days_in_year never appears in this code at all.

I don't know what is in the WIP commits c8dbf41 or 3cd9f3b, for which
I didn't invent commit messages. I don't know why those are left in
the tree, but I can figure it out later.

An example cleanup

Now I'm going to clean up this branch. First I git-checkout -b
cleanup c-domain so that if something goes awry I can start over
completely fresh by doing git-reset --hard c-domain. That's
probably superfluous in this case because origin/c-domain is
also pointing to the same place, and origin is my private
repo, but hey, branches are cheap.

The first order of business is to get rid of those WIP
commits. I'll git-reset HEAD^ to bring 3c5cdd4 into the
working directory, then use git-status to see how many
changes there are:

The git ix command at the end there is an alias for git diff
--cached: it displays what's staged in the index. The output
looks good, so I'll commit it:

% git commit -m 'mock OpenSRS object; add tests'

Now I want to see if those tests actually pass. Maybe I forgot
something!

% git stash
% make test
...
OK
% git stash pop

The git-stash command hides the unrelated changes from the
test suite so that I can see if the tests I just put into
t/consumer/domain.t work properly. They do, so I bring back
the stashed changes and continue. If they didn't, I'd probably amend
the last commit with git commit --amend and try again.

That last bit should have been part of the "mock OpenSRS object"
commit, but I forgot it. So I make a fixup commit, which I'll merge
into the main commit later on. A fixup commit is one whose subject
begins with fixup!. Did you know that you can name a commit
by writing :/text, and it names the most recent commit
whose message contains that text?

By this time all the remaining changes belong in the same commit, so I
use git-add -u to add them all at once. The working tree is
now clean. The history is as I showed above, except that in place of
the final WIP commit, I have:

Because of --autosquash, the git-rebase menu is
reordered so that the fixup commit is put just after
the commit it fixes up, and its default action is 'fixup' instead of
'pick'. So I don't need to edit the rebase instructions at all. But
I might as well take the opportunity to put the commits in the right
order. The result is:

I have two tools for dealing with cleaned-up
branches like this one. One is git-vee, which compares two branches. It's
just a wrapper around the command git log --decorate --cherry-mark
--oneline --graph --boundary A"..."B.

Here's a
comparison the original c-domain branch and my new
cleanup version:

This clearly shows where the original and cleaned up branches diverge,
and what the differences are. I also use git-vee to compare
pre- and post-rebase versions of branches (with git-vee
ORIG_HEAD) and local branches with their remote tracking branches
after fetching (with git-vee remote or just plain
git-vee).

A cleaned-up branch should usually have the same final tree as the
tree at the end of the original branch. I have another tool, git-treehash,
which compares trees. By default it compares HEAD with
ORIG_HEAD, so after I use git-rebase to squash or to split
commits, I sometimes run "git treehash" to make sure that the tree
hasn't changed. In this example, I do:

which tells me that they are not the same. Most often this
happens because I threw away all the debugging code that I put in
earlier, but this time it was because of that line of superfluous code
I eliminated from HasDomain.pm. When the treehashes differ, I'll use
git-diff to make sure that the difference is innocuous:

The output of git-treehash says that the tree at the end of
the wip-cleanup branch is identical to the one in the WIP
commit it is supposed to replace, so it's perfectly safe to rebase the
rest of the cleanup branch onto it, replacing the one WIP
commit with the four new commits in wip-cleanup. Now the
cleaned up branch looks like this:

git-vee marks a commit with an equal sign instead of a star
if it's equivalent to a commit in the other branch. The commits in
the middle marked with equals signs are the ones that weren't changed.
The upper WIP was replaced with five commits, and the lower one with
four.

I've been planning for a long time to write a tool to help me with
breaking up WIP commits like this, and with branch cleanup in general:
It will write each changed hunk into a file, and then let me separate
the hunk files into several subdirectories, each of which represents
one commit, and then it will create the commits automatically from the
directory contents. This is still only partly finished, but I think
when it's done it will eliminate the six-color diff printouts.

[ Addendum 20120404: Further observation has revealed that I almost
never use git-commit -a, even when it would be quicker to do
so. Instead, I almost always use git-add -u and then
git-commit the resulting index. This is just an observation,
and not a claim that my practice is either better or worse than using
git-commit -a. ]