Removing unwanted files from GIT repositories

A lot of times you inherit a repository with several binaries which cause an exponential size growth if they’re modified. But a GIT repository should not host binaries. In fact no VCS should host those files.

Other times a developer (or yourself) uploads something mistake that shouldn’t be on the repository.

In either case you’ll want to delete those files.

I tried several guides but it never worked as supposed because they don’t update all the branches on the repository and therefore the file is never deleted.

So let’s go:

First you should ask all your developers to commit & push before you start playing.

Then clone the repository with the –mirror option so we have a new copy with all the branches and tags (a mirror):

git clone --mirror MY_REPO_URL

Note this clone is not a working copy, so you won’t be able to see it’s contents.

Now use git filter branch to remove the path you want from all refs in history:

Note the -r that is used to recursively remove (you may remove it). Also MY_PATH must be the complete path from the repository root.

Finally get rid of the real files:

git reflog expire --expire=now --allgit gc --prune=now --aggressive

You may consider to add the path to .gitignore (and then commit) if the file was added by mistake.

And push:

git push

Now your developers should rebase all the branches they have. However the best approach is to clone the repository again.

Tell them not to push anything unless they follow one of the previous procedures. Otherwise you’ll have duplicate commits and a real mess.

Another would be using BFG repo cleaner, a java tool to handle the removal. It’s faster, but keep in mind that BFG will not modify the HEAD, so if you still have the file on HEAD you need to delete it with git, commit and push before starting.