You are here

A Rebase Workflow for Git

Update October, 2014: Lots of things have gotten easier over the years.
These days, the easy way to fix this set of things is with the Pull Request workflow, which is essentially the Integration Manager workflow discussed here (probably).

Use github or bitbucket or somebody that makes the PR workflow easy

Delegate a person as integration manager, who will pull or comment on the PR

Require contributors to rebase their own PR branch before pulling if there are conflicts.

Update: Just for clarification, I'm not opposed to merges. I'm only opposed to unintentional merges (especially with a git pull). This followup article describes a simple way to rebase most of the time without even thinking about it). Also, for local development I love the git merge --squash method described by joachim below.

In this post I'm going to try to get you to adopt a specific rebase-based workflow, and to avoid (mostly) the merge workflow.

What is the Merge Workflow?

The merge workflow consists of:

git commit -m "something"git pull # this does a merge from origin and may add a merge commitgit push # Push back both my commit and the (possible) merge commit

Note that you normally are forced to do the pull unless you're the only committer and you committed the last commit.

Why Don't I Want the Merge Workflow?

As we saw in Avoiding Git Disasters, the multiple-committer merge workflow has very specific perils due to the fact that every committer for a time has responsibility for what the other committers have committed.

These are the problems with the merge workflow:

It has the potential for disaster, as that merge and merge commit have to be handled correctly by every committer. That said, most committers will have no trouble with it and will not mess it up. But if you have lots of committers, and they don't all understand Git, or they are using a GUI that hides the actual results from them, watch out.

Your history becomes a mess. It has all kinds of inexplicable merge commits (which you typically don't look inside to see what's there) and the history (gitk) becomes useless.

Debugging using git bisect is confused massively due to the merge commits.

When Is the Merge Workflow OK?

The merge workflow will do you no damage at all if you

Only have one committer (or a very small number of committers, and you trust them all)

and

You don't care much about reading your history.

OK, What is Rebasing?

First, definitions:

A branch is a separate line of work. You may have seen these before in other VCS's, but in Git they're so easy to use that they're addictive and life-altering. You can expose branches in the public repository (a public branch) or they may never get off of your machine (a topical branch).

A public branch is one that more than one person pulls from. In Drupal, 7.x-1.x for most modules and themes would be a public branch.

A topical branch (or feature branch) is a private branch that you alone are using, and will not exposed in the public repository.

A tracking branch is a local branch that knows where its remote is, and that can push to and pull from that remote. Assuming a remote named "origin" and a public branch named "7.x-1.x", we could create a tracking branch with git branch --track 7.x-1.x origin/7.x-1.x, or with newer versions of git, git checkout --track origin/7.x-1.x

The fundamental idea of rebasing is that you make sure that your commits go on top of the "public" branch, that you "rebase" them so that instead of being related to some commit way back when you started working on this feature, they get reworked a little so they go on top of what's there now.

Don't do your work on the public branch (Don't work on master or 6.x-1.x or whatever). Instead, work on a "topical" or "feature" branch, one that's devoted to what you want to do.

When you're ready to commit something, you rebase onto the public branch, plopping your work onto the very tip of the public branch, as if it were a single patch you were applying.

Here's the approach. We'll assume that we already have a tracking branch 7.x-1.x for the public 7.x-1.x branch.

git checkout 7.x-1.x # Check out the "public" branch git pull # Get the latest version from remotegit checkout -b comment_broken_links_101026 # topical branch... # do stuff here.. Make commits.. test...git fetch origin # Update your repository's origin/ branches from remote repogit rebase origin/7.x-1.x # Plop our commits on top of everybody else'sgit checkout 7.x-1.x # Switch to the local tracking branchgit pull # This won't result in a merge commitgit rebase comment_broken_links_101026 # Pull those commits over to the "public" branchgit push # Push the public branch back up, with my stuff on the top

There are ways to simplify this, but I wanted to show it explicitly. The fundamental idea is that I as a developer am taking responsibility to make sure that my work goes right in on top of the everybody else's work. And that it "fits" there - that it doesn't require any magic or merge commits.

Using this technique, your work always goes on top of the public branch like a patch that is up-to-date with current HEAD. This is very much like the CVS patch workflow, and results in a clean history.

For extra credit, you can use git rebase -i and munge your commits into a single commit which has an excellent commit message, but I'm not going to go there today.

Merging and Merge Conflicts

Any time you do a rebase, you may have a merge conflict, in which Git doesn't know how to put your work on top of the work others have done. If you and others are working in different spaces and have your responsibilities well separated, this will happen rarely. But still, you have to know how to deal with it.

Every OS has good merge tools available which work beautifully with Git. Working from the command line you can use git mergetool when you have a conflict to resolve the conflict. We'll save that for another time.

Branch Cleanup

You can imagine that, using this workflow, you end up with all kinds of useless, abandoned topical branches. Yes you do. From time to time, clean them up with

git branch -d comment_broken_links_101026

or, if you haven't ever merged the topical branch (for example, if you just used it to prepare a patch)

git branch -D comment_broken_links_101026

Objections

If you read the help for git rebase it will tell you "Be careful. You shouldn't rewrite history that will be exposed publicly because everybody will hate you.". Note, though, that the way we're using rebase here, we only plop our commit(s) right on top, and then push. It does not change the public history. Of course there are other ways of using rebase that could change publicly-exposed history, and that is frowned upon.

Conclusion

This looks more complicated than the merge workflow. It is. It is not hard. It is valuable.

If you have improvements, suggestions, or alternate workflows to suggest, please post in the comments. If you find errors or things that can be stated more clearly or correctly, I'll fix the post.

I will follow up before long with a post on the "integration manager" workflow, which is essentially the github model. Everybody works in their own repositories, which are pseudo-private, and then when they have their work ready, they rebase it onto the public branch of the integration manager, push their work to the pseudo-private repo, and ask the integration manager to pull from it.

I'm not rebasing to release a feature. I'm in the "... # do stuff here.. Make commits.. test..." step, and i want the latest code. The detail in this "..." step is even more important if the feature branch will be a public branch also (re: Larry and yourself below), but regardless, the question is how do you recommend the feature branch we are working on follow the public 7.x-1.x in your example?

Any time you need your feature branch to catch up with the public branch it's following, do just what I listed as being done only at integration time:

git fetch origingit rebase origin/7.x-1.x

That will update your feature branch to where it's right on schedule. Note that if (as one could hope) different committers are working on different features and not interfering with each other, this will be unnecessary.

I tried setting rere.enabled to true after getting into the situation where rebasing resulted in too many conflicts - didn't work, I'm guessing I have to do this before the first conflict.

Sometimes, all you want to do is chuck your personal clone and start fresh from your remote. Seems to me that you rarely want to do otherwise. When I do "git rebase" and the code for my local head matches the code for the remote, I'm desperately hoping that the command will just erase my history and make it identical to that of the remote. I know "git rebase" is doing something much more elegant, but I want stupid and obvious.

My workflow is to delete and reclone whenever things get just a little hairy. That works great.

Just FYI, to throw away your branch just requires creating a new tracking branch and deleting the old one. You don't have to clone again. I use a new branch for every separate task, so my common workflow is just

- You are the only person working on a feature.
- You never need to share that feature with anyone else until it's done and "ready to be committed (pushed)"

If I'm working on something that takes a week to do, I absolutely don't want to leave it on my laptop and nowhere else for all that time. I want it backed up to a remote repo, I want my colleagues to be able to review it and point out what a stupid thing I just did, etc. That means pushing any but the smallest feature branches to the remote authoritative server at some point, after which rebasing is a no-no (for very good reasons).

You want your feature to be reviewable by others and copied safely onto the remote repo, but nobody else will be committing to it. In that case, you can just push the branch up and everything else stays the same.

git push origin comment_broken_links_101026

On the other hand, if you want others to be able to commit to this feature branch you've just pushed, then treat it just like any other public branch (and branch off of it for each section you work on, then rebase back onto it). When you've done this, it's just another public branch, and I'd treat it just as we were treating 7.x-1.x before. And then when you want to merge it into 7.x-1.x, the procedure here is the same.

Right up until that last part. :-) If you have your master branch, and a big-new-feature branch that 5 people have been working on via local micro-branches while the master branch still had bug fixing going on, how would you handle finishing big-new-feature? It sounds like you're suggesting that one person rebase their copy onto master and then do a fast-forward merge, which would then invalidate everyone else's copy of big-new-feature. That violates what I understood to be the first rule of rebasing: Once you push, don't do that.

There is a time that communication has to happen. Yes, I think that when everybody has decided that the feature is done, one person should rebase it onto the original "master" (in the example 7.x-1.x) and delete the feature branch. In this case it would be

It would be great to see a more detailed breakdown of how this works. I'm not quite grasping things yet and I think this is the most common method of working:

1) Start working on feature X locally.
2) Somehow get changes onto staging server for client / co-worker review.
3) Make changes locally.
4) Repeat #2 and #3 as many times as necessary.
5) Get feature onto production branch and deploy.

And as Crell mentions this whole process may itself be on some mega feature that itself needs to get onto the real production branch.

According to everyone's claims Git is apparently great at merging. But it's unfortunate that it requires using three times as many commands as you would use with SVN. Many of which seem to be different words for the same thing (fetch, pull, and checkout all seem to update the current code with what's new in the repository). The increased confusion might not actually be worth it especially when there's people that need to work with the repository that don't have the abstract thinking skills of a developer (themers, clients that commit stuff, project managers).

When I'm ready to bring my feature branch into the main branch, I merge rather than rebase, though I must admit having just tried the rebase technique, it's identical.

What you can do with merge however, is squash your entire branch down to one commit. If your branch is for a single bug, and in exploring and fixing that bug you've made lots of local commits (which is what git lets you do and why it's great), you don't need the whole world to see your commits that fix typos, remove whitespace, remove swearwords you put it comments while you were headdeasking and so on ;)

Once my feature has been rebased to the latest head of the main branch, I go:

git checkout main
git merge --squash feature

This puts the whole of the feature branch into the current tree and adds it, but does not commit it. It's equivalent to doing:

Your approach is quite reasonable and I think a great candidate for a workflow on Drupal.org. It's so nearly equivalent to the patch workflow that it fits nicely.

The rebase approach keeps everything in commits and still hides the fact that a bunch of work happened on a topic branch. If the topic branch was irrelevant to the long-term view of the production branch, and a clean history is important, then perhaps the rebase approach is cleaner. However, if the work that was done in the topic branch should be exposed as a branch, the merge is better. And a git merge --no-ff is better yet, because it makes the merge explicit.

Note that you can squash when rebasing in just the same way as you can squash when merging, but that the work stays in commits, rather than being dumped into the staging area at the last minute, as git merge --squash.

What confuses me about the final rebase is this: I reread the git man page for rebase yesterday and according to that (and some head-scratching), doing this git rebase comment_broken_links_101026 ought to replay the master branch on top of the feature branch, and hence put your local commits *before* the remote commits you've just pulled in.

... which having tried it on a setup with new commits on master, is what happens.

Now our topic branch is on a straight line with the 7.x-1.x branch, but two commits ahead:

git checkout 7.x-1.x
git rebase topic # now our 7.x-1.x is fast-forwarded to have the topic commits on it

Then we would have just fast-forwarded the 7.x-1.x to point at the same place as the topic branch had been:

Now instead of the last step we could have:

git merge --squash topic # as you suggest

that would leave us (after committing the results) with the topic branch dangling (OK) and no record of it in the 7.x-1.x branch:

which would have squashed the contents of the topic branch and dropped the resulting commit into the index ready to be committed. (This is not really a merge at all, so it's a bit hard to understand how it got into the merge command.). It works fine. You can then commit it with a good message.

We could also have:

git merge topic

This would have had the same basic effect as the rebase:

and finally, we could have:

git merge --no-ff topic

This would explicitly document our merge, adding a merge commit and leaving the merge showing in the history (see gitk or git log --graph). This would be the preferred approach for a long-running public branch.

What I was missing was that once the topic branch is on a straight line ahead of 7.x-1.x, then a rebase has the desired effect. Which as you say, is identical to a 'git merge topic'.

I think that saying merge for this operation is preferable, as results in a clear mnemonic: rebase onto master branches, merge in topic branches.

But overall, --squash is the workflow I prefer for anything that's going to be more than a few hunks of diff. For me, the huge advantage is that on my topic branch I can work however I please: make tiny commits for typos, changes of variable names in my new code, and so on. I've often found that when working on a particularly thorny problem, I get to a point where it all *works* but it's a mess, and then I run the risk of accidentally breaking it again during my clean-up. With this workflow, I can just commit the messy version with a big 'Hurrah this works!' commit, and then get started on tidying up ;)

Did you do the merge on the original repo, or on the repo after the rebase. Because I was confused, because you did not get a merge commit on the repo after your "git merge" without the --no-ff. However when I checked out your example and performed the merges "git merge topic" and "git merge --no-ff topic" give absolutely the same result (which they should, since the branches diverged an a fast-forward is not possible).

The only way a fast forward and the a no-ff merge can differ is if one of the two commit is the merge base, as for example 7.x-1.x after the rebase (but not before).

So you won't want to rebase on public branches. That will confuse everybody. Rebasing is absolutely fine for private (or de-facto-private) branches.

If you're only bringing in a feature branch once you have a lot of options:

Use git merge --squash feature_branch on the main branch. This magically squashes the entire feature branch and brings it in as a commit ready to happen on the main branch.

Use git rebase when you're ready for the single final commit onto the main branch

If you're going to be bringing the feature branch in more than once over its lifespan, you'll probably want to consider not squashing at all.

I guess my experience in the time since this article was written is that most people won't end up dealing with the complexity of rebasing in public. Lots of people use it on private feature branches.

While I'm talking... IMO it's OK to treat a nominally public branch as a private one. If I'm working on a branch called rfay_fix_up_broken_tests, and my team has a convention that that's my branch because it has my name on it, then I consider it private (and available for rebase) even though it's pushed to a public repo.

So, consider a feature branch that has 5 commits, worked on by one dev, and pushed (so it's now a nominally public branch). Now suppose that the task has been assigned to another dev (me!) and I want to add a commit or two and squash the whole thing down to a single commit using `git rebase -i`.

In principle, this is changing history. And until I push, no harm done.

But *if* I can be sure that no one has branched off those 5 public commits in the public history, then is it fair to assume that a `git push --force` will not be problematic for any other devs? In particular, when the original dev does his own fetch/pull on this branch, he'll just see the "new" history and his old commits are - if not gone, then at least - not (easily) accessible. Is that right?

Yes, the github PR model, with throwaway forks and branches has changed all of our views, I think.

The issue with rebasing is not whether something is public, but what the rules are around it. Of course I can rebase and force-push a branch on my fork... where the rules are that nobody else has the right to build off the work. And of course I can abandon a branch and start a new one, which is mostly the same thing.

Isn't this workflow make it more difficult to revert some feature? How do you remember at which point in the history you started? Have I to use a tag each time?

For now I'm using git-flow (merge based workflow) because I'm not alone, and I believe git-flow is easier to explain than this rebase style method. Furthermore, I use github as a mean of synchronizing my work on many different computer. Isn't this workflow would force me to make some git push -f feature-branch
each time I rebase locally?

I also have a difficulty to understand why having a linear history is better? For me it feels more like loosing some information. As "a branch = a feature" is a valuable information for me. Or I may didn't understood something about this workflow.

I would think it would be super-easy to revert a feature with this, since (if you squashed when rebasing) it's just one commit.

I'll look forward to learning more about git-flow and how it can help us.

If you are using github to sync between computers, then that's kind of like having multiple committers. Using this workflow should work perfectly. On the other hand, if you're always on just one computer (and never have commits pending on more than one) then you don't have to do anything - none of this. Just pull before you start working, and push when you're done. If you do have commits pending on more than one environment, then this will work perfectly, if you follow it.

I'm trying to do two things with this workflow:
1. Avoid unnecessary merge commits, which muddy the water and are potentially dangerous.
2. Make clean history

First of all let me thank you for the interesting articles on Drupal and Git,
I really appreciate you taking the time to publish your vision!

I am contacting you to ask what your approach would be to handle multiple
repositories, as follows:
- a developer uses releases from an upstream repository, say the Acquia git
repository
- on a development machine, the developer adds contrib and custom modules
- on a production server, the result of the development is published

What would be the best way to organize this workflow? Especially getting all
git config files right seems hard. From time to time I would want to be able
to quickly change something on the production server and merge it on the
development machine. This would break the "flow" from upstream repository to
development to production, and I am not sure how to avoid the merge workflow
you discussed earlier.

You're really talking about deployment, and there are conversations all over the place on drupal.org and elsewhere about it. And there's no consensus. People use drush make, the entire source tree checked into one repo, git submodules, and other techniques. But I don't think there's one true answer.

... # do stuff here.. Make commits.. test... ... this has taken some time, so:

git pull # since we only worked in the feature branch, this is fast forward of our 7.x-1.x branchgit rebase 7.x-1.x # Plop our commits on top of everybody else'sgit checkout 7.x-1.x # Switch to the local tracking branchgit merge comment_101026 # Do a fast forward because feature branch is already rebasedgit push # Push the public branch back up, with my stuff on the top

This gets rid of one pull and doesn't resort to fetch. It also uses a merge instead of the second rebase. I don't really get the consequences of rebasing the public branch, but probably it is equivalent to a fast forward merge.

Partly it is a design choice. If you are a sole developer, or in a very small team, then having lots of branches and merges, which create side loops in the history, is not a problem. You can see that the code 'just works' by inspection, and direct evaluation (running it). Your merges can include loads of fix ups and tweaks because everyone is involved. It could even be an evil merge (git style).

In a bigger environment, where either some folk are working in separate areas (so are left behind what you are doing) or you have QA and management, then you have that folk don't 'believe' that your branch code cleanly applies to the latest code base. They won't allow anything other than a clean merge, which is the same as a fast forward for a rebased branch.

By asking the author to do a rebase first, and fix any clashes, and run tests (or whatever is required), it means that no-one is surprised by the 'merge' , because it is a 'no-op' (fast forward). The rebase isn't any more work, it is simply the work that would have had to be done for the branch merge anyway. Plus you get to make your history look good a/k/a "professional" ;-) [rebase --interactive]

I guess my basic concern is that wherever possible, don't have git doing textual merges without explicit (and necessary) involvement from a dev who really understands what's happening. So, wherever possible, play the game so you get fast-forward merges or their equivalent in rebases.

Really, the essence of this article is "Don't let people do merges who don't understand what's going on in the merge".

Thanks, I guess I never wrote that. It's pretty easy though: Team members work on a development branch and then request that the integration manager do the rebasing. The integration manager rebases it and deals with any fallout (or pushes it back to them). The key is that the branch to be rebased should be prepared for merge/rebase, and hopefully will not result in a merge commit.

Great article and I am convinced already of using a rebasing approach in order to have clean histories without all the checkpoint commits on the main branch.

One question, I don't entirely understand the work flow shown above with two complete sets of fetch/rebase before the final push back to mainline. Seems like it has to rebase the remote tracking branch in addition to the the actual local working one. Can you please comment a litlte bit on that? What is that neccessary to mess with the remote tracking branch?

right so i want to create some wrapper scripts to make things somewhat dummy proof for contributors.

I love the forking PR model, because ultimately nobody but me can push stuff back to my repo without approval and discussion. This does complicate things a little more though as now for any given feature, there are essentially two origins and the local they are working on. And when *I* and working on stuff directly against the blessed repo I still want to use the rebase strategy, so I still kind of need to understand why you had to do two separate rebases in the above instructions...one from feature branch against origin/release and one against the working local release, that confused me a bit.

But then for contributors, presumably they would need to follow all of the above, except they now their local feature branch is referencing a remote on their forked repo. So the exact procedure for them to fetch or pull from the blessed repo, while updating their own repo at the same time, and also rebasing with their local working branch, and pushing back to their forked repo main, as well as finally sending the pull request in order to get theri change back to the blessed repo. That's where it all is a bit foggy to me at the moment and that is precisely why I want to make some scripts to make it dummy proof for any contributors.