A clean commit history is one where each commit is a solid piece of work,
representing a milestone in your feature or fix. This might be the backend
for some part of the feature, or a component of the UI. It doesn’t have to be
a large amount of work, just some good chunk that, conceptually, stands alone.

An unclean commit history is often littered with commits like “Fixed a bug
in my previous commit” or “Oops, forgot this file” or “rewrite that class
again for the 3rd time.”

Ideally, you should strive for a series of commits that almost reads as a
story of how your feature came together.

A good example of a clean commit history is:

* Added the models and forms for potatoes.
* Added the API for interacting with potatoes, along with unit tests.
* Added the comment dialog for reviewing potatoes.

An example of an unclean commit history is:

* Added the models and forms for potatoes.
* Decided the is_spud field wasn't necessary and removed it.
* Forgot forms.py.
* Added the API for interacting with potatoes, along with unit tests.
* One of the tests failed, fixed it.
* Added the comment dialog for reviewing potatoes.
* Fixed a typo.
* Another typo.

Now, some degree of “Oops” commits tends to happen, but the goal is to
minimize this. If your commits are all local to your checkout, with nothing
pushed to any other repository, you can make this happen using the tricks
in this guide.

Some of the tricks in this guide will change your actual commit history,
which can cause you to lose commits if you’re not careful. While you often
can get your commits back, it’s a bit of extra work.

If you’re about to try something that will change history, you can keep
a “backup” of those commits by creating a branch or tag at the HEAD of your
branch. You can then switch back to your feature branch and then perform
the operation.

This will result in two branches, one with the newly revised history,
and one with the original. When you’re happy with your new history,
you can just delete the backup branch.

Your work should always be done on a branch of your own, and never an upstream
branch. This means you should never make a commit on master or any
other branch with the same name as an origin branch. Instead, create your
own with a specific name of your choosing.

Committing to master or another upstream branch and then pushing to your
GitHub is the easiest way to complicate things and break your checkout.

Part of keeping things maintainable is making sure your branches and names are
clear and organized. A branch name should clearly describe the feature or
fix you are working on.

The following are good examples of branch names:

file-attachment

ui-rewrite

search-api

And the following are bad:

my-work

enhancements

bugfix

Now, it should be clear that when we talk about good branch names, it’s
primarily important if that branch is ever going to be exposed to the world,
such as on your GitHub clone. If it’s a very temporary branch, by all means
call it whatever you like, but it’s still best to practice good naming.

Another trick is to organize your branch names through /-separated
“namespaces.” This just means naming the branch in the form of
feature/specific-task. For example:

Anyone looking at your commits should be able to easily determine what a
commit accomplished and why it was made. To ensure this, make sure every
commit message is clear and readable.

A good commit message is in the following form:

Summary (less than 80 characters)
Multi-line description

Your summary should be brief but should clearly summarize what the commit
was for. An example may be “Implemented the API for file attachments.”

Your description should be detailed, describing what changes you made and
how they work. While it shouldn’t be massively long, it should cover the
high points of the change, and perhaps why you did what you did (if you
think it could be confusing).

If there are any known problems you still intend to fix at the time of commit,
that would be a good place for them. It can even help you later as a To Do
when you’re amending or rewriting history.

It’s common to make more than one set of changes to a file before you commit,
possibly as you’re testing code or as you hit other regressions. These
changes may all be mixed in the same set of files, but that doesn’t mean
you have to commit them all at once.

Git makes it easy to commit only parts of your changes. This is “Patch
Adding.” Simply type:

$ git add -p <filename>

This will start going through all the individual changes made to the file,
asking if you want to stage each for commit.

There are a few handy keys you’ll want to learn.

y – Stage the change for commit

n – Skip it and leave it out of the commit

s – Split the chunk you’re looking at into smaller chunks,
if possible.

e – Edit the actual diff. Useful for getting rid of debug output.

q – Quit processing the rest of the changes. This is equivalent to
saying n to everything remaining.

There are other keys as well. You can check githelpadd for more.

If you’re going to be patch adding a bunch of files for one commit, you can
leave off the filename above:

$ git add -p

Git will loop through each modified file and begin the process for each.

One of the most powerful ways to clean up your history is to use
interactive rebasing. This is a way to take a history of commits and
quickly dispose of some, or merge them together, or reorder them. It’s
a powerful tool, and one that can bite you if you’re not careful, but
is well worth knowing.

To start this out, you want to run:

$ git rebase -i <parent>

Where <parent> is some parent branch or commit. Everything between
that branch/commit and HEAD will be included in the rebase list. (It’s
important to note that that parent itself won’t be included.) Often,
the parent will be master.

After typing this, your editor will come up with a list of the commits
in order. There will be some helpful instructions in there, but basically,
each line will have an operation and a commit summary. By changing the
operations or reordering/deleting lines in the editor, you’ll be changing
the commit history.

A good way to clean up history is to keep your “fixed blah blah” commits
simple, run gitrebase-i<parent>, and then move your fix commit
below the commit it’ll be fixing, and change the operation to squash
or fixup.

squash will merge the commit with the one above it and allow you to
change the commit message (by default, both of the commits will have their
messages combined).

fixup will merge the commit with the one above it, but use the above
commit’s message. This is a bit faster to work with. Note that fixup
is a more recent addition and you may need a newer version of Git,
depending on what your repository ships.

Note

Like with amending commits, you can only change commits that have not
been pushed. Otherwise, you will complicate things for you and anyone
following your pushed branch.

It’s best to look at your branch in gitk before deciding whether
it’s safe to do an interactive rebase.

Git has two ways of staying up-to-date with other branches: merging, and
rebasing.

A merge takes a set of changes from the source branch and moves them into
your current branch, as a special commit. This commit generally includes
a commit message such as “Merge branch ‘master’ into foo”. It works like:

$ git checkout my-branch
$ git merge master

A rebase takes your current branch and rebuilds it on top of the source
branch, effectively rewriting history (like the interactive rebase above).
It works like:

$ git checkout my-branch
$ git rebase master

The advantage of a rebase over a merge is that you won’t get those extra
merge commits in your branch, cluttering things up. In general, if you have
a new branch with a few commits, you may want to do a rebase.

However, there are a couple reasons you would want a merge over a rebase.
A rebase will break things if the commits were already pushed, so you can
only rebase unpushed commits. Also, it can be harder to resolve conflicts if
your branch is old and a lot has changed in the branch you’re rebasing onto.

One strategy is to use rebasing until you do your initial push. After that,
you will want to always merge.

Don’t merge too often though. If you merge frequently, you’ll just clutter
your branch with merge lines. It’s best to merge either when you’re dependent
on a change that just went in, or you’re about to post your branch for
review.

When dealing with a remote repository, such as a GitHub fork, you should
be careful when you decide to push. Once you push a commit, there’s no
going back. You can’t amend it, or rewrite it, or delete it. Therefore,
you should always push only when you’re satisfied with the history of the
commits you’re pushing.

That isn’t to say that you won’t find flaws in your commits that you wish
you could fix. That is bound to happen. However, by ensuring the history
is clean before you push, you will find it easier to reduce the number
of spurious commits in your branch.