I work for a company that primarily builds Java applications and I'm trying to convince everybody to stop checking-in binary files (dependencies and final products) to SCM.

They know it is a bad practice but they think that "it works" and it is not really a problem even when many people know about Maven and other tools for building besides Ant. Both PMs and programmers (around 50 people) are ready to listen to any argument against and even acknowledge that it is a waste of backup space but I want to be really convincing because the habit change would involve a lot of effort. What arguments do you use to support a change?

Edit: Okay, it makes sense to make a distinction between files that almost do not change, like dependencies, and generated files. Even so, I'm interested in reasons against the latter.

4 Answers
4

Storage space is cheap, and so that's not a very convincing argument for why you should or shouldn't check files in.

Instead, You can appeal to the purpose of SCM. Each file that is tracked by SCM represents some need to manage the parallel, distributed changes your team is doing. None of that is really apparent until two team members try to change the same file. Resolving those changes is what SCM is really for, preventing accidental overwrite of another dev's work, and hopefully, automating the process of merging those changes.

Merging binary files is usually a real challenge, because there's no sane way for a generic merge tool to guess how a merged binary file should work. It can't know enough about how the indexes or offset pointers in the file work unless specially designed to recognize that particular file type.

That means it's up to the dev to merge the binary file by hand, and then tell the SCM that the file has been so merged. Since it's a dev doing it, the merge may not really cover all of the changes of both prior check-ins, and since the file is binary, there's no automated way to verify the merge.

For binary formats that really represent project sources, such as art assets, this is an unfortunate, but neccesary step. However, build outputs aren't sources. There's no need to merge them, because the sources can be merged and a resulting build output can be regenerated. Tracking and managing these changes is 100% waste. It wastes the SCM's resources, though not terribly much, but it also wastes developer time getting past the spurious merge failures. Developer time is very expensive, and anything that puts it to waste is a cancer.

On the other hand, there is a particular case where the build outputs should be archived. Any version of the project that has ever been shipped or deployed should probably be retained, indefinitely. Having an exact, byte for byte copy of the actual build that a customer is having issues with can make supporting that customer much easier, since you will have the exact version he has.

That backup probably shouldn't be in the same repository as the source code, since they will generally follow different schedules and have basically different structures.

Dependencies, even in binary form, should be checked in so that when someone else pulls down the project, it just works. The main concern is not the type of file, but how the file is created. The rule of thumb that I use is that if it can be generated using another file, it doesn't get checked in - this means automatically generated documentation, binary files that I create, and so on.

One of the primary advantages of using SCMs is that you can reconstruct your system from anytime in the past. So there is no point is storing your final build in your SCM because you can just check out the revision number and build it.

You mention dependencies...
Your SCM should be set up so that you can do a clean checkout to a new machine (with dev environment), hit build and you should be able to build your system without needing to install anything else. So keeping binary dependencies in your SCM is a good idea. Libraries rarely change so they won't take up much room.

Ok, i agree: dependencies rarely changes. But a 20Mb WAR file with one line of source code changed does not deserves to be checked-in.
–
ItherDec 26 '10 at 2:52

3

Why not? Are you going to run out of disk space? If you don't have the source and it's a required dependancy then you don't have any choice, if you do then it doesn't count as a binary and you can build it when you need it.
–
HenryDec 26 '10 at 9:53

Seems redundant to include both source and object files (source files are obviously required). In addition to just being unnecessary, the object files can take-up a lot of space. If your firm is using a distributed SCM (Git, Hg, Bzr) then those binary files must be copied and stored among all the developers.