In the Q&A section after the git presentation, there was a lot of
heated debate in which it seemed that Jan and Jens were talking "past" each
other. As a git backer, I thought I'd try to
bring some clarity to things.

It seemed that Jens has one fundamental problem with git, which
itself is fundamental to its operation: commits are not
transferred to the remote module; instead, you need an explicit
git-push command to send all local changes to the remote
repository. Jens claimed three implications of this (that I remember):

git did not permit line-by-line authorship information, as with
cvs annotate or svn blame.

Developers would not see changes made by other developers as soon as
they happen.

QA and Release Engineering wouldn't be alerted as soon as developers
made any change on any child workspace.

The line-by-line authorship information is possible in git
with the git blame or git annotate commands (they are
synonyms for each other). I suspect I misinterpreted this part of the debate,
as all parties should have known that git supported this.

Which leaves the other two issues, which (again) are fundamental to
git: a commit does not send any data to the repository. Thus
we get to the title of this blog entry: this is a Good Thing™.

Local commits are world changing in a very small way: they're insanely
fast, much faster than Subversion. (For example, committing a one-line
change to a text file under a Subversion remote directory took me 4.775s;
a similar change under git is 0.246s -- 19x faster -- and this is a
small Subversion module, ~1.5MB, hosted on the ximian.com Subversion
repo, which never seems as loaded as the openoffice.org servers.)

What can you do when your commits are at least 19x faster? You
commit more often. You commit when you save your file (or soon
thereafter). You commit when you code is 99.995% guaranteed to be
WRONG.

Why do this? Because human memory is limited. Most studies show that the
average person can remember 7±2 items at a time before they start
forgetting things. This matters because a single bug may
require changes to multiple different files, and even within a single file
your memory will be filled with such issues as what's the scope of this
variable?, what's the type of this variable?, what's this
method do?, what bug am I trying to fix again?, etc.
Human short-term memory is very limited.

So what's the poor developer to do? Most bugs can be partitioned in some
way, e.g. into multiple methods or blocks of code, and each such
block/sub-problem is solved sequentially -- you pick one sub-problem, solve it,
test it (individually if possible), and continue to the next sub-problem.
During this process and when you're finished you'll review the patch
(is it formatted nicely?, could this code be cleaned up to be more
maintainable?), then finally commit your single patch to the
repository. It has to be done this way because if you commit at any
earlier point in time, someone else will get your intermediate (untested)
changes, and you'll break THEIR code flow. This is obviously bad.

During this solve+test cycle, I frequently find that I'll make a set of
changes to a file, save it, make other changes, undo them, etc. I
never close my file, because (and here's the key point) cvs
diff shows me too many changes. It'll show me the changes I made
yesterday as well as the changes I made 5 minutes ago, and I need to
keep those changes separate -- the ones from yesterday (probably) work, the
ones from 5 minutes ago (probably) don't, and the only way I can possibly
remember which is the set from 5 minutes ago is to hit Undo in my editor and
find out. :-)

So git's local commits are truly world-changing for me: I can
commit something as soon as I have it working for a (small) test
case, at which point I can move on to related code and fix that
sub-problem, even (especially) if it's a change in the same file. I need an
easy way to keep track of which are the solved problems (the stuff I fixed
yesterday) and the current problem. I need this primarily because the current
problem filled my 7±2 memory slots, and I'm unable to easily
remember what I did yesterday. (I'm only human! And "easily remember"
means "takes less than 0.1s to recall." If you need to think you've
already lost.)

This is why I think the other two issues -- developers don't see other
changes instantly, and neither does QA -- are a non-issue. It's a
feature.

So let's bring in a well-used analogy to programming: writing a book.
You write a paragraph, spell check it, save your document, go onto another
paragraph/chapter, repeat for a bit, then review what was written. At any
part of this process, you'll be ready to Undo your changes because you
changed your mind. Changes may need to occur across the entire manuscript.

Remote commits are equivalent to sending each saved manuscript to the
author's editor. If someone is going to review/use/depend upon your change,
you're going to Damn Well make sure that it Works/is correct before you send
that change.

Which brings us to the workflow dichotomy between centralized source code
managers (cvs, svn) and distributed managers (git
et. al). Centralized source managers by design require more developer
effort, because the developer needs to manually track all of the individual
changes of a larger work/patch before sending it upstream (as described
above).

Decentralized source managers instead help the developer with the
tedious effort of tracking individual changes, because the developer can commit
without those changes being seen/used by anyone else. The commit instead gets
sent when the developer is done with the feature.

This is why I prefer git to Subversion. git
allows me to easily work with my 7±2 short-term memory limitations,
by allowing me to commit "probably working but not fully tested" code so that
I don't need to review those changes at the next cvs diff for the
current problem I'm working on.