distribution centric: changesets will generally reflect a complete feature. In general these checkins will be larger. This style is more user/maintainer friendly.

rollback centric: changesets will be individual small steps so the history can function like an incredibly powerful undo. In general these checkins will be smaller. This style is more developer friendly.

I like to use my version control as really powerful undo while while I banging away at some stubborn code/bug. In this way I'm not afraid to make drastic changes just to try out a possible solution. However, this seems to give me a fragmented file history with lots of "well that didn't work" checkins.

If instead I try to have my changeset reflect complete features I loose the use of my version control software for experimentation. However, it is much easier for user/maintainers to figure out how the code is evolving. Which has great advantages for code reviews, managing multiple branches, etc.

So what's a developer to do? Checkin small steps or complete features?

7 Answers
7

The beauty of DVCS systems is that you can have both, because in a DVCS unlike a CVCS, publishing is orthogonal to committing. In a CVCS, every commit is automatically published, but it in a DVCS, commits are only published when they are pushed.

So, commit small steps, but only publish working features.

If you are worried about polluting your history, then you can rewrite it. You might have heard that rewriting history is evil, but that is not true: only rewriting published history is evil, but again, since publishing and committing are different, you can rewrite your unpublished history before publishing it.

This is how Linux development works, for example. Linus Torvalds is very concerned with keeping the history clean. In one of the very early e-mails about Git, he said that the published history should look not like you actually developed it, but how you would have developed it, if you were omniscient, could see into the future and never made any mistakes.

Now, Linux is a little bit special: it has commits going in at a rate of 1 commit every 11 minutes for 24 hours a day, 7 days a week, 365 days a year, including nights, weekends, holidays and natural disasters. And that rate is still increasing. Just imagine how much more commits there would be if every single typo and brainfart would result in a commit, too.

But the developers themselves in their private repositories commit however often they want.

So what's a developer to do? checkin small steps or complete features?

It's possible to get the best of both worlds, especially with git and other DVCSs that let you be selective about which history to publish. Here's a simple workflow that illustrates this.

Your project has master and release branches. Developers each maintain their own develop branches that they don't push.

You use develop to do your day-to-day work. Bite-sized commits appear here, representing incremental advances in the state of the project over time. You might make topic-* branches for working on longer features that span more than a few days or major refactorings. You commit to develop very frequently, perhaps several times an hour. It's like hitting "Save" in a document that you're editing.

When you have some commits that are suitable for the next release, you merge the relevant commits to release. release now has a bunch of individual commits that have selectively been taken from your develop branch. You commit to release whenever you reach a good stopping point. That's usually a few times a day.

When the release is ready to go, your lead developer squashes all the commits since the last merge to master into a single merge commit that appears on master. Then you tag this commit with the release identifier (e.g., v.1.0.4). This happens infrequently, perhaps once an iteration or every few weeks.

Here, you get to have your cake and eat it too. Prior to releasing, you can rollback changes that shouldn't have happened or that you don't want to go into the release, and you can do it one at a time. Developer-friendly! But users get what they want, too: big, globby commits on master that represent what's changed since the last release.

wilhelmtell: I think you might be misunderstanding. There's definitely a long time between changes to master, but pushes to release should happen anytime you think your commits on develop are significant enough to represent some work you want to share.
–
John FeminellaJun 15 '10 at 0:43

1

My two cents: in a lot of projects, the squashing happens between develop and release, not between release and master. The developer, not the integrator, knows how best to squash things into the "how it should have been done" history. Then they can push the clean result on into the integration pipeline (next -> master).
–
JefromiJun 15 '10 at 15:08

1

@Jefromi: The two approaches aren't mutually exclusive. As a developer, you're free to decide that a bunch of commits should have been rolled up into one thing. In this (extremely simple) workflow above, though, the integrator always squashes everything to a single commit on master when a release is ready. It's like a packaging of the entire release. (The other commits stay intact on release, though, if you want to see the full history.)
–
John FeminellaJun 15 '10 at 15:20

Small steps. There's a reason it's called revision control, and not release control :)

Commit as often as you like. Don't hold back. There should never be negative consequences to committing code on an "in progress" branch. Development shops that expect commits not to "break the build" are misusing the RCS. Likewise, ascribing any meaning whatsoever to a commit is dangerous policy, simply because it conflicts with the purpose of revision control. Meaning should instead be ascribed to tags, branches, clones, stashes, or whatever your RCS calls them. These things have meta data (perhaps as minimal as a name) designed to convey the purpose. Revisions are simply a history of what you modified.

The last thing you want to do is institute a policy to discourage developers from committing their code, for any reason.

My recommendation would be to create a branch or even separate repository for experimentation purposes. Then, once the feature is complete, you could then merge the code from the branch back into the main trunk of code. Hopefully, that would allow you to have the best of both worlds.

I think a new repository for experimentation is a bit extreme, but pulling a branch for this type of work is an excellent use of branches. Don't have any continuous integration on the branch so breaking check-ins won't affect anyone else, and feel free to pound away. The one caveat is to regularly pull up changes from the line if others are making changes so you don't get too far afield.
–
EricJun 14 '10 at 22:51

A new repository is the standard way to do this type of thing in mercurial (or at least was a few years ago).
–
Xiong ChiamiovJun 14 '10 at 22:55

The problem is if you merge the changes back mercurial will merge the revision history as well. So even if the code all comes into the main repo at one it's still broken up into small step-like commits instead of feature commits.
–
deft_codeJun 14 '10 at 23:15

One thing I really like about Git is that the repo in your dev. environment is YOUR repo. It's a copy of the maintainer's repo. You're free to do what ever you want to that repo and you won't tick off the maintainer unless you push up some crazy histories.

To that point, use branching and merging to your advantage as much as you can to aid in your development and experimentation. Only push the changes you are most comfortable with upstream. Git even gives you the ability to squash your commit history into fewer change sets if needed so you can push up a series of commits you performed into a single commit.

The flexibility is extremely empowering to your personal work flow as well as the policies your colleagues have in place.

Small steps are really great. You can always bundle them into larger steps in another repo. To do the opposite you have to "rewrite history" which can be done in some systems (notably git), but it's not as well supported as you might like.

Another reason I like small steps is so my colleagues can easily see what I've done. If I work for three or four hours it's often much more sensible for me to reel off half a dozen commits so that my colleagues can see the relevant diffs. (And I appreciate it that they extend me the same courtesy.)

Finally, small steps make it less likely that you'll have conflicts, or that when you do, they'll be smaller.

I use small steps even when working alone, on multiple branches.

Summary: For daily workflow, small steps have many advantages. If you want a distribution-centric workflow, create a repo and a branch just for distribution, and you can set up your big steps there exactly the way you want them.

Most of the time I can get away with the following rule of thumb -- check in the smallest amount at a time that makes sense (and still be useful or an improvement). I find this helps me better plan out my work, which has several benefits including (but not limited to) ...

Better development estimates.

Better testing estimates.

Faster development time.

Fewer overall bugs.

Less coupling between modules.

Finding out sooner if my code unintentionally broke something else.

many more

There are times however when it is necessary to create a branch and then when the work is done, merge that back into the mainline. However, once operating on the branch, I still try to follow the rule as it does automagically waive all those benefits away.