When developing for embedded devices and other odd worlds, it's very likely your build process will include multiple proprietary binaries, using very specific versions of them.
So the question is, are they part of your source control? My offices goes by the rule of "checking out from source control includes everything you need to compile the code" and this has led to some serious arguments.

The main arguments I see against this is bloating the source control DB, the lack of diffing binary files (see prior questions on the subject). This is against the ability to check out, build, knowing you have the precise environmental the previous developer intended and without hunting down the appropriate files (with specific versions no less!)

Alternatively, you can write bash/python/perl/bat script to checkout source and download all other dependant components in a single step. However, I would still recommend checking in binaries into your version control, just for the sake of keeping revisions. The only files that shouldn't be checked into the repository are files that can be easily regenerated from version-controlled files. Disk space is cheap, and shouldn't be a major consideration.
–
Lie RyanSep 25 '11 at 12:25

10 Answers
10

The idea of VERSION CONTROL (misnomer: source control) is to allow you to roll back through history, recover the effect of changes, see changes and why made. This is a range of requirements, some of which need binary thingies, some of which don't.

Example: For embedded firmware work, you will normally have a complete toolchain: either a proprietary compiler that cost a lot of money, or some version of gcc. In order to get the shipping executable you need the toolchain as well as the source.

Checking toolchains into version control is a pain, the diff utilities are horrible (if at all), but there is no alternative. If you want the toolchain preserved for the guy who comes to look at your code in 5 years time to figure out what it does, then you have no choice: you MUST have the toolchain under version control as well.

I have found over the the years that the simplest method to do this is to make a ZIP or ISO image of the installation CD and check this in. The checkin comment needs to be the specific makers version number of the toolchain. If gcc or similar, then bundle up everything you are using into a big ZIP and do the same.

The most extreme case I've done is Windows XP Embedded where the "toolchain" is a running Windows XP VM, which included (back then) SQL Server and a stack of configuration files along with hundreds and hundreds of patch files. Installing the whole lot and getting it up to date used to take about 2-3 days. Preserving that for posterity meant checking the ENTIRE VM into version control. Seeing as the virtual disk was made up of about 6 x 2GB images, it actually went in quite well. Sounds over the top, but it made life very easy for the person who came after me and had to use it - 5 years later.

Summary: Version control is a tool. Use it to be effective, don't get hung up about things like the meaning of words, and don't call it "source control" because its bigger than that.

And when the VM needs to be updated your repo balloons to 12 GB? Even if you have good binary diffs your still talking a 10GB+ repo
–
TheLQSep 25 '11 at 17:55

1

Well, no. If you use VMWare you can use disk snapshots. These store the original baseline disk image and add new files containing only the deltas, which are quite small. You just need to remember to check in the newly created files. Last I look at this, an update added about 250K - chicken feed. Besides, worrying about repo size is pointless - disk is cheap.
–
quickly_nowSep 25 '11 at 22:59

Why keep binaries? Projects today depend on a swath of external tools
and libraries. Let’s say you are using one of the popular logging
frameworks (like Log4J or Log4Net). If you don’t build the binaries
for that logging library as part of your build process, you should
keep it in version control. That allows you to continue to build your
software even if the framework or library in question disappears (or,
more likely, introduces a breaking change in a new version). Always
keep the entire universe required to build your software in version
control (minus the operating system, and even that is possible with
virtualization; see “Use Virtualization,” later in this chapter). You
can optimize retaining binaries by both keeping them in version
control and on a shared network drive. That way, you don’t have to
deal with them on an hourly basis, but they are saved in case you need
to rebuild something a year later. You never know if you will need to
rebuild something. You build it until it works, then forget about it.
It is panic inducing to realize you need to rebuild something from two
years ago and don’t have all the parts.

I couldn't agree more; while this is arguably subverting the VCS for a task it wasn't
designed for ( keeping binaries ), I think the benefits outweigh the potential drawbacks. But, as the author notes later, sometimes keeping the binaries in VCS might not be a practical solution, so other options should be considered - like keeping them on a mapped network drive.

If the binaries aren't too big, I would definitely keep them in VCS. This seems to be even more true in your case, since the binaries are probably small, and you work with very specific versions. They might also be hard to find, due to a variety of reasons ( the authors shut down their website, or the version you need is no longer listed for downloading ). Although unlikely, you never know what will happen in a few years.

I wish I read this book a few years ago, when I was working on a game using a graphics library ( which was dll file ); I interrupted the development for a while, and when I wanted to continue I couldn't find the dll again because the project died.

Yes, this happens all too often. I have a hobby project where I rely on a scanner generator that was abandoned by its author 3-4 years ago. Luckily it has always been under version control.
–
Christian KlauserSep 27 '11 at 8:25

Source control is for sources. Sources are what you're unable to build from other things. Some files that qualify as sources happen to be binaries.

My VCS has lots of binaries checked into it, but each one is the unit of release from some product I didn't write and don't maintain. This might be something like GNU ccRTP, which is released as a compressed tarball. That tarball is my source, and it's checked in along with whatever infrastructure I need to turn it into a finished product (a Makefile and an RPM spec in my case) in a single, automated step. When there's a new version of ccRTP, I treat the new tarball as changed source: it goes into a checked-out copy, gets built, tested and committed back to the VCS. I've done the same with commercial products that don't ship with source (compilers, libraries, etc.) and it works the same way. Instead of unpack-configure-compile-package, it's just unpack-package. The software that does the nightly builds doesn't know or care as long as it can run make and get finished products.

Most VCSes have features that make human-readable source easier to deal with and more efficient to store, but to say that they aren't suited to binaries isn't really true if binaries put in come back out unmolested. How a VCS deals with binaries internally depends entirely on whether or not its authors thought attempting to only store differences was worth the effort. Personally, I think storing full copies of a the ccRTP distribution at 600K a pop is more than made up for by the ability to tag a version of it along with all of my other sources.

This reminds me of the "jars in repository" problem that some time ago Java had. People building java apps were used to push their dependencies (binary jar files) into repositories. Everybody were happy with this, because we you would have "one click" build system and disk space is cheap, so who cares. Then came Maven and you could get rid of all that binary cruft and with local cache-only repository still maintain bullet-prof builds. Still you have "one click" build system, but source control doesn't have to shuffle around binary files that make no sense there.

So yeah, you can get binary files out of the source control, but this will require you to tweak the build system, to get them at build time. Without dedicated software (like Maven) this might be a lot of effort to just get them out.

In principle, I appreciate the "check everything you need to build into source control" camp, but dependency management has evolved quite a bit in the last few years, with tools like Maven, Ivy and NuGet.

Also, in practice, I find checking in binaries to create a number of unpleasant side effects. Git/Mercurial aren't really tuned for it, for example, and Subversion and Perforce can drive you nuts when merging branches that contain binaries.

With a dependency management solution, you specify in a source-controlled file in your project which package names and which versions your project depends on. Almost all dependency management tools allow you to create a private repository of your dependencies, following some sort of versioning and naming convention; when you do a build, the dependency management tool will resolve all of your open source and proprietary dependencies from a list of approved sources, then stuff them into your local cache. Next time you build with the same version dependencies, everything's already there and it goes much faster.

Your private repository can then be backed up with conventional filesystem backup tools.

This avoids the slowdowns I've experienced when a ton of binaries are being pulled from the source tree, and prevents your repository from having lots of hard-to-diff files. There's only one location for any given dependency, by name and version number, so there's no merge conflicts to deal with, and the local filesystem caching means that you don't have to deal with the cost of evaluating whether your local copy has changed when you pull updates.

Your source control hold the sources to what you do. If a given binary blob can be reconstructed from the sources it is not a source and should not go in the source code repository. Only non-recreatable blobs should to in the source control.

You usually have anotherrepository network folder of binary blobs you've built through time of the sources. These can be deployed to customers or used in projects (instead of building everything from scratch every time).

the "main program build" needs just a few binaries, compared to the thousands of source code text files, so the binaries are checked into the repository. This works fine.

the installer build needs a lot of third party components (some of them are just copied to the installation CD, like the Adobe Reader). We are not putting those into the repository. Instead, those components reside on a network drive (even older versions of them), and the build scripts copy them to the right place. Of course, to have reproducible builds, anyone has to be careful not to change any folder where the third party components are stored.

Both strategies work fine and fulfil the "checking out from source control includes everything you need to compile the code" requirement.

You need to keep everything you will need to rebuild specific versions of the product at some point in the future.

However you don't have to keep everything in Source Control.

One company kept a frozen server rack (because the OS only ran on that specific hardware, and the toolchain only ran on that OS, and the source was dependent on that toolchain). Can't check that into Source Control.

If you do need to split up the requirements for a build, then you have the accounting problem of keeping two version control systems sync'd. e.g. the hardware box in this closet, or the VM or binaries in this preserved backup volume, go with this SVN Source Code revision, etc. This is messier that using a single source control system, but solvable.

It's very chaos to check-in binary to SCM in my mind. I had run a very complex project, which has a lot of dependencies to third part libraries. The principles that we adopted:

All source code is managed with SCM

All dependencies are managed
with Ivy, which has great eclipse integration.

This works pretty well. We have a configuration file about version of each external library that source code can be compiled with. This configuration file is checked into SCM, so it evolves as source code evolves. By applying this approach, we can exactly reproduce a build without messing around version of external libraries.