Git for the real world

13 Jul 2008

Now that we’ve been using git at Twitter for a couple of months, we’ve
overcome several crippling problems and misunderstandings about how to use it
properly. There are dozens of “intros” and “tutorials” to git online, but at
some point you need to know more than just the basics of DVCS and the map to
svn commands – you need to know practical considerations of real-world usage.
None of the intros or tutorials had this stuff, so I thought I’d share what we
learned.

Git’s command-line interface is hands-down the worst of any DVCS (except the
archaic tla). There are inconsistencies: Some commands will expect you to
type “origin/master”, while others will want “origin master”. Other commands
should never ever be used, but are presented in the documentation as if
they’re part of a normal usage pattern. Some commands are useless in their
default form and need several command-line options to make them work right.

I ended up writing a wrapper script to cover up a lot of these flaws, which I
consider an “ultimate fail” for a UI. But I’m still not sure the script is a
good idea, since it may make me forget all the quirks I need to keep in mind
when the script isn’t around.

Don’t change history

Two commands you should avoid: git rebase and git reset. Some of
the tutorials will tell you that rebase is one of the first commands you
should learn. Lies! rebase is a way to trick you into creating merge
conflicts.

When you rebase, you are erasing every local commit you’ve made, and turning
them into patches (as if you were back on CVS). After syncing your repository
up with the remote one, your patches are re-applied one by one. Presto! you’ve
changed history.

The only reason I can think of for doing this is if you’re not comfortable
doing merges. But DVCS is all about merges, so you should just get used to
doing them. A merge provides a little signpost to everyone else about your
branch. Don’t fear the merge – love it! It records exactly how your local
work should be rectified with remote changes, without requiring you to keep
tweaking your patch.

reset is even worse. It erases commits from your history, which will
very likely make your local repository different from everyone else’s, and
guarantee future conflicts or even an inability to push in the future. Some
people will say you should learn reset so you can use it in a panic
situation, but if you’re panicking, you’re more likely to make things worse,
so stop. Calm down. You have time to think and solve the problem in a
rational way.

My problem with these two commands is that they violate a core philosophy of
DVCS: Everyone has their own view of the repository, but these views obey
entropy and flow in only one time direction. When they meet, they merge. Doing
a rebase or reset goes back in time and changes the past. They
should be in a separate tool, like “git-fix” or “git-hack”.

The story matters more than the chronology

Have you ever read a history book that said “In 1812, the British empire
shelled the tiny new American capital. Meanwhile, Napoleon marched across
Europe. In China, …”? Hopefully not, that would suck. Telling a thread of
the story from beginning to end is more important than placing every single
event in its exact chronological order. The default format for git log
reorders commits by their exact date and time, so you need to be aware of
that and not get confused.

Say, for example, you made a local branch, and made 3 local commits: L1, L2,
and L3. Meanwhile, someone else is working on a different feature on their
branch, and does commits R1 and R2. After you merge (M1), git is likely to
show you a history like this:

M1 -> R2 -> L3 -> L2 -> R1 -> L1

Huh? What? Why are my local commits intermixed with my co-worker’s commits?
The merge must have messed up! Crap! Time to git reset and destroy
everything, right? No! Stop! Don’t do it. It’s a trap! Git is lying by
omission – it’s telling the literal, actual truth, but it’s telling it to you
in a way that makes it confusing. Git is re-ordering the history to make sure
every commit is shown in its actual time order, not the story order.

You should probably just go ahead and alias log to:

git log --topo-order --decorate

That tells git to show things in “topological” (story) order, and to also mark
where various branches are sitting. I usually find it useful to take that one
step further:

git log --topo-order --decorate --first-parent

That tells git to show things in story order and to tell that story from
my point of view. It’s sometimes interesting to see every commit that one of
your coworkers did in their branch, but often you just want to see the
merge-commit and move on. "-- first-parent" tells git
to skip over the details of every branch that isn’t a linear parent of yours.
Generally this means you’ll see a simplified history of what’s been going on,
without the intricacies of what happened on forked branches while they were
forked off.

If you want to see all the threads of history intertwined, I suggest using a
graphical tool like gitk instead of git-log.

Don’t fast-forward – live every moment

This one is pretty confusing. And it sucks, because this concept doesn’t even
exist in other DVCS. I think it’s another symptom of “fear of merge”.
Basically, sometimes when you ask git to merge branch A into branch B, it will
decide that it doesn’t want to merge and it will instead turn A and B into
clones of each other.

For example, let’s pretend you made a branch of “master” called “feature” and
did a few commits on it, and are now ready to merge it back into master. If no
other work has happened on the master branch, git will try to out-clever you.
It thinks: “Well, nobody else has worked on the master branch, so I could just
make the feature branch become the new master branch and
that would be logically equivalent.” So after the merge, you’ll see every
single commit you made, as if you had done them directly on the master branch.
Git has cloned your feature branch into the new master branch.

This might not be so bad if there are only a couple of people working on the
project, but there are a few side effects: Your branch has effectively
vanished from history. There is no longer any indication that you were working
on a side branch; it looks like you were working directly on master. And if it
turns out that there were bugs in your new feature (which, you know, sometimes
happens), you can’t reverse the merge-commit because
there is no merge-commit. You will have to reverse every single commit you
made, in reverse order, or worse.

So really, you want git to always create a merge-commit when you do a merge.
For this, you have to ask it nicely:

git merge --no-ff

(Git calls the history erasing “fast-forwarding”.)

A few other things

To remove a branch from a remote repository after it’s been merged and
deployed, you have to push the branch with a colon in front of the branch
name. This has become a running joke in the office: “Colon means delete.”
Look, don’t ask me, I’m not Linus. That’s just how it works.

git push origin :stale_completed_branch

When other people remove branches, they won’t be removed from your local copy
of the repository. To take care of this housekeeping, you need to express a
fruit preference:

git remote prune origin

Again, don’t ask. I don’t know why. That’s just how it is.

I have a few ranty topics on how git is implemented and used, and how that
compares with the older DVCS (especially bazaar), but I’ll save that for some
other time. If you’re using git, hopefully this information is useful.