I've heard in several places "Don't make large commits" but I've never actually understood whats a "large" commit. Is it large if you work on a bunch of files even if there related? How many parts of a project should you be working on at once?

To me, I have trouble trying to make "small commits" since I forget or create something that creates something else that creates something else. You then end up with stuff like this:

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
If this question can be reworded to fit the rules in the help center, please edit the question.

19 Answers
19

To me, I have trouble trying to make "small commits" since I forget or create something that creates something else that creates something else.

That is a problem. It sounds like you need to learn to break down your work into smaller, more manageable chunks.

The problem with large commits are:

In a multi-person project, a greater chance that your commits will cause conflicts for other developers to resolve.

It is harder to accurately describe what has been done in log messages.

It is harder to track the order that changes were made, and hence to understand the cause of problems.

It increases the probability of losing a lot of uncommitted work.

Sometimes large commits are unavoidable; e.g. if you have to change a major API. But that's not normally the case. And if you do find yourself in this situation, it is probably a good idea to create a branch and do your work in there ... with lots of small commits ... and reintegrate when you are finished.

(Another case is when you do an initial import, but that's NOT problematical from the perspective of the issues listed above.)

+1, definitely learn how to break it down into smaller chunks. When looking for a bug, smaller commits are easier to manage. After the first few times of staring at a large commit, you'll get the habit :P
–
dr Hannibal LecterOct 10 '10 at 11:15

1

If needed at the end of the series of small commits, you can add a label / tag which includes a long summary description. This effectively applies a baseline at the point where your large block of work is done, before you re-integrate or start on some form of more formal testing (should that be part of how you/your business works). I WOULD ADD: Making large scale changes (as you seem to be suggesting) down a development branch is a hugely good idea. It prevents pollution of your main stream with great piles of crap and makes quick-fix service packs,etc easier to create if they are needed.
–
quickly_nowJan 20 '11 at 0:15

Every source control commit should serve only one purpose. If you have to put the word "and" or "also" in your summary, you need to split it up.

It's very common to end up with lots of separate unrelated or semi-related changes in your working copy. This is called the "tangled working copy problem," and it's actually very hard to avoid even for disciplined developers. However, Git and Mercurial both give you tools to resolve it -- git add -p or chunk selection and Mercurial Queues in TortoiseHg respectively.

Imagine that the client asked to have a particular change made - for example to add a rule that something or another can't be done within two days of the "whatever" date. Then, after you've made the change, they change their minds. You will want to roll back your commit. If it's all mushed in with some things about changing the sort order of unrelated reports, your life is a misery.

One work item, one changeset. One request from the client, one changeset. One thing you might change your mind about, one changeset. Sometimes that means it is one single line of code. Sometimes it is ten different files including the database schema. That's fine.

+1 for the nice summary of "One..., one changeset"
–
IdaJun 20 '12 at 7:51

6

The only thing I'd add is that sometimes it makes sense to go even smaller than "one request, one changeset". Larger requests should be broken down into smaller, incremental changesets. (As mentioned in another answer, development on a branch might facilitate this) Ultimately, I might adapt the aforementioned mantra as such: "One request, one (or more!) changesets".
–
rinogoFeb 25 '13 at 18:13

Large commits are when you have tons of changes that don't really all go in the same bucket. If I change the controller logic, then the database connection model, then some misc. scripts, I shouldn't be bundling it all under one commit.

Prevention is making commits according to what you're completing. In the above example I would commit after the controller logic, after the database work, and after the scripts. Don't put off committing simply because you know what changed. Other people will look back at your "Changed stuff" commit log message and wonder what you were smoking.

Initial imports are probably the biggest commits you should ever have. Setting up a system from scratch? Sure have a few big commits. After you've leveled it out though it's time to keep things organized.

If you know you are going to be working on a large chunk of code beforehand, I would suggest creating a branch for your specific feature while periodically pulling code down from the mainline to make sure your code stays in-sync. When you're done working on the branch merge all your changes back into the mainline. This way other team members will not be surprised and/or annoyed when they see a huge commit. Also, there's a much less chance of breaking things. Keep practicing to break things down into smaller commits. With time it will become second nature.

As a rule of thumb, describe the change in one sentence or one line of text. (Based on this rule, the commit should be broken into 10-15 smaller ones.) If you can't adequately comment a commit in one line, then it's already too large.

To practice smaller commits, take notes in your notepad (or in Notepad) of what you've already changed or added. Commit before it becomes a long list or before you make a code change unrelated to what you already have in the notepad.

In my field (physics modeling), I discover a bug in the output today that wasn't in the repository as of 6 months ago. When this happens, I will do a binary search on revisions:

Run model from 3 months ago

If bug is still in output, run the model from 4.5 months ago

... repeat until I find the commit that yields in bad output

When the bug was introduced in a monstrous commit, I have to sit with a fine-toothed comb to find the source of the problem. If the commit touched a small number of files, it's less painful to track down the line(s) of code that introduced the problem.

I would recommend breaking down your problem into a series of smaller tasks (ideally put each task in a bug tracker). Make a commit as you complete each task (and close that bug/feature in your bug tracker).

The thing to grasp here is that "Large" in this context is about the number distinct changes not the physical size of the commit (although generally the two will go hand in hand).

Its not so a question of "don't make large commits" as do make small commits - the ideal being to commit small self contained changes.

Its clear from the changelog that you've got a series of things that could have been committed separately (and safely) and therefore its fairly self evident that its too large.

The reason that this can be a problem is that your last commit is your reference point for the changes you're currently making and if, for example, you get the first bit right and then get the next bit wrong you've no easy way to roll your work back to the point where you started making mistakes (BTDTGTTS).

Of course sometimes changes just are big - large scale refactoring - and as is suggested by others this is where you need to branch, that way although your individual commits might notionally break things they are separated from the main development trunk so that isn't a problem and you get to continue to commit early and often.

One more thing - if something comes along in the midst of your work that requires more immediate attention you need to change it separately (ideally in a completely distinct set of folders) and commit it separately.

The real challenge in all of this is not the mechanics its the mindset - that a commit is not just a backup copy that you make every now and then but that each commit is an inch-pebble along the way and that there's nothing wrong with lots of small commits and that munging different things together in a mob commit is as bad a munging vaguely related bits of functionality together in a lump of code.

It isn't the size of the commit that really matters, it is the scope of the change that should determine how your commitments are organized.

You might, for instance change every instance of __macro1 to __macro2 in a large code base, which changes 200 files. 200 commitments would not be sane in that case.

What you want to end up with is being able to pull the repository at any single revision and having the build work. Did you change from libfoo to libbar? I hope that change includes updating your build scripts and Makefiles as well (or whatever is applicable).

Sometimes, you might need to make a series of experimental changes that accomplish one thing, in which case, you have to determine what scope is more important to you if you need to revert later. Does one depend on the other? Commit them all at once in one single revision. Otherwise, in that case, I'd suggest one commit per change. You should be doing something like that in another branch, or in another repo anyway.

While yes, you do have the power to revert a single file to a previous revision (thus backing one file out of a larger commitment), doing so really screws up tools like bisection later on down the road, and pollutes the history.

If you stop and think "Ok, tests pass, I think this works .. but if it goes bad, can I easily back it out?" .. you'll end up making sensible commitments.

At the very least, train yourself to commit whenever you think to yourself "I like my progress so far, and I don't want to lose it if the changes I'm about to make are a disaster." Then you have the option to take advantage of the VCS to blow away any dead ends that you tried or special debugging code you added to track down a problem. (e.g. with git reset --hard or rm -rf *; svn update)

You have probably heard the saying that perfection is when you cannot take anything more away. That should also describe your standard for commit size.

It depends on your project where that "perfect" size is. If you are shipping to external customers, a good size might be the smallest increment that you would be comfortable shipping if you didn't finish the next one on time. If you are building internal, frequently deployed applications, the best size might be the smallest increment that doesn't break anything (and gets you closer to where you want to be).

Modern version control systems help you create good commits with easy branching, interactive rebasing, staging area, etc.

Sometimes you've been working all day on several different logically distinct chagnes, and you forgot to commit your code in between. Using git citool can be very helpful for breaking up your work into nice bite-sized chunks at the end of the day, even if you weren't so careful during the day while you were working.

git citool can let you select which specific hunks of a file (or which specific lines) to commit in a particular commit, so you can break up (non-overlapping) changes made to the same file into several commits.

(It seems that you use Subversion. I don't know of a tool that does this for Subversion, but you could look into using git-svn, the Subversion adapter for git, which will change your life.)

The large the commit, the more likely you'll break the build and get paid out by the rest of your team. I try commit changes twice a day. Just before lunch and before I go home. So @ 12pm and 4:30pm I try get everything working and ready to commit. I find this practice works for me.

The company I work for forces a peer code review for every commit.
Therefore, any commit that makes it difficult for a peer to figure out what is going on and review in a reasonable amount of time, is too large.

1) For me the standard commit is considered big if it's doing more than one thing. By thing I mean fixing a bug or adding a feature.

2) Prevent such commits by making it a habit and a rule to commit whenever you finish something.

3) In the semi-early stages of development, I allow the commits to include the first creation of the files which will be used later.

I would like to note that by finished I mean that all the bugs you can identify have been fixed and you won't break the build by committing.

Yes, this generates a large number of commits, but it lets you rollback exactly what broke things instead of having to roll back a large series of change which were committed at the same time where only one of the changes is causing an issue.

I would also like to point out that the rules change a bit for distributed version control systems (DVCS) like Mercurial and Git. In the case you're using one of those, you commit whenever you've made a change, but haven't tested it yet and then you push to the central repository when it's working. This is preferable as it lets you revision more changes to your code without worrying about breaking the build.

In my case I am trying to commit files of a server to the repository system (SVN). This is the initial commit and don't want to download it as it is a really large project (a few GB) and I want to do the initial commit off the clients server.

The issue is that the client is on a shared server, the svn client is killed (or any other software) if it runs more then a minute.

One alternative would be to download the project on my computer and do the initial commit from there but I am interested to know if there is a option in SVN to break the large commit into more, something similar to transactions methods.