Lukas "lzap" Zapletal: Git auto fetch script I run every day

Git auto fetch script I run every day

I am “shutdowner”, meaning I always shutdown my laptop (now workstation) at the end of the day. I have a script to do that which sleeps 5 seconds (so I can change my mind – e.g. when I dig shell history incorrectly and quickly hit enter – it really happened yeah) and it is simple:

puts my monitors into standby mode

applies all OS updates

runs duplicity backup on my home folder

fetches git repos

filesystem sync call

fstrim root volume

poweroff

I learned a trick I want to write about today from colleague of mine Mirek Suchý, but I think he runs it from cron (not a “shutdowner” guy). The idea is simple:

find all directories containing .git/ and run on all of them:

git fetch –all

git gc

So every time I do git pull on a repo that I don’t use much (e.g. ruby language), I don’t need to wait seconds in order to pull all commits. Clever, now I’ve improved it a bit.

With my Ryzen 1700 8 core 16 threads CPU, I am able to leverage GNU parallel to do this in parallel. That will be faster. But how much? Let’s test against git repo I use the most: www.theforeman.org.

Initially I thought that running 16 GNU parallel worker processes of parallel will be fine, but git gc is really slow on one core (see above), so I usually end up with several very slow garbage tasks while all the others finished downloading. The sweet spot for git is around 4 threads where it always gives reasonable times even for bigger repos.

But I think little bit of CPU overcommit won’t kill, therefore I’ve decided to go with 8×4 which might sound crazy (32 threads in theory), but in practice garbage collect is executed only on few repositories I work regularly on.