Preparing a Git repository for open-sourcification

16 Dec 2013, 3 minute read

A project I’m involved with is going through the process of being open-sourced and released on GitHub. This is a great development, but of course we have had to go through and make sure that we’re able to release everything. The project started out life in a single Git repository so there’s plenty of bootstrap data that’s owned by other groups within the University. Fortunately there’s an easy way in Git to go through your history and remove the offending articles.

Update 18/12/2013 For those who don’t want to re-clone their project, I’ve added the commands to flush the deleted refs from your local repository. I’ve also added a small section on identifying and removing space-hungry sections of the git repository.

There are many ways to achieve the same goal, and we could arguably have retained the same repository, however it’s a nice opportunity to clear out the 300Mb which magically worked its way into the repository at some point and generally streamline our code.

A word of warning

The following will go through your entire tree and remove all references to the files you’re removing, rewriting commits where necessary. As a result your local repository will likely wildly differ from the remote repository, so you’ll need to coordinate this with your fellow team members. If your repository is already in the wild is there much point in causing this much pain? You can probably just get away with removing it from your tags and branches, which is far less invasive.

A second word of warning

This is a destructive process that rewrites history. Make sure you’ve got a backup before attempting this, just in case things go pear-shaped.

Update 18/12/2013 GitHub’s documentation is pretty thorough, so it wasn’t surprising to find a page on removing sensitive data. I’ve updated this post to reflect some of the suggestions in that page.

The first thing to do is remove the offending content from your repository, and push that change. This is an important step because you want a commit at HEAD which isn’t modified by the following actions, which allows other users to pull the changes without having to nuke their repository.

Reapplying commits

It’s almost certainly the case that someone cloned your repository and made some changes while you were off rewriting history. If you’re now working on a new repository, the easiest way might be to create a patch from the old repo and then apply it to the new one. Using the following command, “-1” refers to the last 1 commit, so you can use “-2”, “-3”, and so on.