I've got a git repo of 300 MB. My currently checked-out files weigh 2 MB, and the git repo weighs 298 MB. This is basically a code-only repo that should not weigh more than a few MB.

Most likely, somebody at some point committed some heavy files by accident (video, huge images, etc), and then removed them...but not from git, so we have a history with useless large files. How can I track down the large files in the git history? There are 400+ commits, so going one by will be time-consuming.

This answer was really helpful, because it sent me to the post above. While the post's script worked, I found it painfully slow. So I rewrote it, and it's now significantly faster on large repositories. Have a look: gist.github.com/nk9/b150542ef72abc7974cb
– Nick K9Jun 23 '14 at 19:46

7

Please include full instructions in your answers and not just offsite links; What do we do when stubbisms.wordpress.com inevitably goes down eh?
– ThorSummonerSep 3 '14 at 19:44

@NickK9 interestingly I get different output from your script and the other. there's a bunch of bigger objects that yours seems to miss. Is there something I'm missing?
– UpAndAdamJan 5 '16 at 17:54

🚀 A blazingly fast shell one-liner 🚀

This shell script displays all blob objects in the repository, sorted from smallest to largest.

For my sample repo, it ran about 100 times faster than the other ones found here.
On my trusty Athlon II X4 system, it handles the Linux Kernel repository with its 5.6 million objects in just over a minute.

@Sridhar-Sarnobat Well, properly removing files from a repo can be challenging. See if the official checklist helps you. Alternatively check the other question linked in this question.
– raphinesseOct 7 '17 at 9:15

That one-liner only works if you want to get the single biggest file (i.e., use tail -1). Newlines get in the way for anything bigger. You can use sed to convert the newlines so grep will play nice: git rev-list --objects --all | grep -E `git verify-pack -v .git/objects/pack/*.idx | sort -k 3 -n | tail -10 | awk '{print$1}' | sed ':a;N;$!ba;s/\n/|/g'`
– ThroctukesJun 4 '14 at 13:58

The problem with this is you can't just SEE what are the big files without actually removing them. I don't feel comfortable doing this without a dry run first that simply lists the big files.
– Sridhar-SarnobatOct 3 '17 at 21:11

This produces an answer different to @raphinesse, missing a bunch of the largest files on my repository. Also when one large file has a lot of modifications, only the largest size is reported.
– kristianpJul 19 '17 at 23:29

I think you have a nice general answer in the BFG suggestion, but you spoil it by not giving any details and then by suggesting using a different third-party service (also without any explanation). Can you clean this up some to provide a command-line example of this BFG usage?
– phordJun 22 at 16:32

I stumbled across this for the same reason as anyone else. But the quoted scripts didn't quite work for me. I've made one that is more a hybrid of those I've seen and it now lives here - https://gitlab.com/inorton/git-size-calc