Then I deleted the local release branch. First I tried git branch -d release, but git said "error: The branch 'release' is not an ancestor of your current HEAD." which is true, so then I did git branch -D release to force it to be deleted.

But my repository size, both locally and on GitHub, was still huge. So then I ran through the usual list of git commands, like git gc --prune=today --aggressive, with no luck.

By following Charles Bailey's instructions at SO 1029969 I was able to get a list of SHA1s for the biggest blobs. I then used the script from SO 460331 to find the blobs...and the five biggest don't exist, though smaller blobs are found, so I know the script is working.

I think these blogs are the binaries from the release branch, and they somehow got left around after the delete of that branch. What's the right way to get rid of them?

This deserves more up votes. It finally got rid of a lot of git objects other methods would keep. Thanks!
– Jean-Philippe PelletOct 29 '13 at 17:33

1

Upvoted. Wow, I don't know what I just did but it seems to clean up a lot. Can you elaborate on what it does? I have the feeling it cleared out all my objects. What are those and why are they (apparently) irrelevant?
– RedsandroJan 16 '14 at 21:52

1

@Redsandro, as I understand, those "git rm origin", "rm" and "git update-ref -d" commands remove references to old commits for remotes and such, which might be preventing garbage collection. The options to "git gc" tell it not to hold on to various old commits, else it will hold on to them for a while. E.g. gc.rerereresolved is for "records of conflicted merge you resolved earlier", by default kept for 60 days. Those options are in the git-gc manpage. I'm not an expert on git and don't know exactly what all these things do. I found them from manpages, and grepping .git for commit refs.
– Sam WatkinsJan 20 '14 at 5:23

1

A git object is a compressed file or tree or commit in your git repo, including old stuff from the history. git gc clears out unneeded objects. It keeps objects which are still needed for your current repo, and its history.
– Sam WatkinsJan 20 '14 at 5:27

1

Nothing else I tried worked, but when I ran this it immediately worked. Thanks!
– ErikOct 31 '15 at 18:21

Now git has a safety mechanism to not delete unreferenced objects right away when running 'git gc'.
By default unreferenced objects are kept around for a period of 2 weeks. This is to make it easy for you to recover accidentally deleted branches or commits, or to avoid a race where a just-created object in the process of being but not yet referenced could be deleted by a 'git gc' process running in parallel.

So to give that grace period to packed but unreferenced objects, the repack process pushes those unreferenced objects out of the pack into their loose form so they can be aged and eventually pruned.
Objects becoming unreferenced are usually not that many though. Having 404855 unreferenced objects is quite a lot, and being sent those objects in the first place via a clone is stupid and a complete waste of network bandwidth.

Anyway... To solve your problem, you simply need to run 'git gc' with the --prune=now argument to disable that grace period and get rid of those unreferenced objects right away (safe only if no other git activities are taking place at the same time which should be easy to ensure on a workstation).

And BTW, using 'git gc --aggressive' with a later git version (or 'git repack -a -f -d --window=250 --depth=250')

That limits the delta cache size to one byte (effectively disabling it) instead of the default of 0 which means unlimited. With that I'm able to repack that repository using the above git repack command on an x86-64 system with 4GB of RAM and using 4 threads (this is a quad core). Resident memory usage grows to nearly 3.3GB though.

If your machine is SMP and you don't have sufficient RAM then you can reduce the number of threads to only one:

git config pack.threads 1

Additionally, you can further limit memory usage with the --window-memory argument to 'git repack'.
For example, using --window-memory=128M should keep a reasonable upper bound on the delta
search memory usage although this can result in less optimal delta match if the repo
contains lots of large files.

On the filter-branch front, you can consider (with cautious) this script

#!/bin/bash
set -o errexit
# Author: David Underhill
# Script to permanently delete files/folders from your git repository. To use
# it, cd to your repository's root and then run the script with a list of paths
# you want to delete, e.g., git-delete-history path1 path2
if [ $# -eq 0 ]; then
exit 0
fi
# make sure we're at the root of git repo
if [ ! -d .git ]; then
echo "Error: must run this script from the root of a git repository"
exit 1
fi
# remove all paths passed as arguments from the history of the repo
files=$@
git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch $files" HEAD
# remove the temporary history git-filter-branch otherwise leaves behind for a long time
rm -rf .git/refs/original/ && git reflog expire --all && git gc --aggressive --prune

Hi VonC - NI'd tried git gc prune=now with no luck. It really looks like a git bug, in that I wound up with unreferenced blobs locally following a branch deletion, but these aren't there with a fresh clone of the GitHub repo...so it's just a local repo problem. But I have additional files that I want to clear out, so the script you referenced above is great - thanks!
– kkruglerDec 16 '09 at 17:01

Each time your HEAD moves, git tracks this in the reflog. If you removed commits, you still have "dangling commits" because they are still referenced by the reflog for ~30 days. This is the safety-net when you delete commits by accident.

You can use the git reflog command remove specific commits, repack, etc.., or just the high level command:

Now that is a scary command :) I'll have to give it a try when my git-fu feels stronger.
– kkruglerDec 15 '09 at 14:36

you can say that again. I'm always wary of any commands that manipulate a repository's history. Things tend to go very wrong when multiple people are pushing and pulling from that repository and suddenly a bunch of objects git is expecting aren't there.
– Jonathan DumaineAug 12 '11 at 19:54

Before doing git filter-branch and git gc, you should review tags that are present in your repo. Any real system which has automatic tagging for things like continuous integration and deployments will make unwanted objects still referenced by these tags, hence gc can't remove them and you will still keep wondering why the size of repo is still so big.

The best way to get rid of all un-wanted stuff is to run git-filter & git gc and then push master to a new bare repo. The new bare repo will have the cleaned up tree.