Slaying dragons with git, bash, and ruby

07 September 2010

Slaying dragons with git, bash, and ruby

An often over-looked feature when using git are the various hooks you have available. They cover pre-applypatch, post-update, and anything between or beyond. I suspect a lot of people may have first been introduced to them when integrating with a Continuous Integration server as a means of telling it to test a new build, but they work equally well as a hidden monkey saving your from showing the world some of your more embarrassing mistakes.

Getting started with git hooks

Within your cloned git repository you'll most likely be aware of the .git/ directory. Within there you'll have another directory called hooks/ which, surprise surprise, your git-hooks live. You'll probably have a bunch of existing hooks in there with .sample as the extension to stop them being executed, it's worth taking a look at them to get an overview of the various hooks and what is possible.

To get a hook to fire you need a file with the appropriate name (remove the .sample extension on each file you want to run), and it needs to be granted execute permissions:

$ chmod +x .git/hooks/hook-name-here

Joining forces for real ultimate power

Coding in ruby most the day makes it the quickest language for me to use to throw together a script. Thankfully you can write you hooks in ruby, or just about any language really, just change the shebang line accordingly:

#!/usr/bin/env ruby

However there are lots of things that are much easier to do from a command line than they are in a ruby script, and so we will stand on the shoulders of giants and use the underlying *nix tools to do what they're best at, and use ruby to keep things re-usable and readable.

Catching out bad habits

One thing I've been guilty of in the past is hastily trying to fix a bug, and then accidentally leaving a debug breakpoint in the committed code. If that ever made it onto a production system it would leave it hanging and unresponsive. Even on other developer machines it causes enough confusion. So to make me look much more reliable than I really am, enter the git pre-commit hook:

Now whenever I try to commit code, it will first run a recursive grep over the codebase to ensure I've not left my debug statement in (I can be sure it always looks like "require 'ruby-debug'; debugger" as I have it bound to a shortcut).

Stopping an incomplete merge

There's been occasions where a particularly large rebase or merge creates a lot of conflicts in a file, and one of those has snuck through and rather than being fixed the inline diff has actually been committed. Time to add another check to pre-commit, using egrep to scan recursively for the 3 different line markers that git uses to indicate a merge conflict:

If you try this though you'll probably discover that it doesn't quite work as expected, because there are some binary files that happen to include these characters. More shell scripting to the rescue then, we will pipe the results into a couple of other commands to filter it out. First it goes via xargs to allow us to take the input from STDIN and pass each line recursively into file to find out what type of file we are dealing with. We then pipe that into egrep again to select only the script and text files:

It would be nice at this point to actually know what files have been affected, without needing to commit the above series of commands to memory, so we can output it again this time passing the result into awk to strip out just the filename:

Helping your workflow

I'm a big fan of committing regularly in manageable amounts, but I want to ensure each commit is self-contained and has all the tests passing. I don't want to be in a state where I revert a commit and end up with a broken app. However, there are times where I'll be spiking something or refactoring a class and I'd like a temporary save point incase I make a mess of things and want to step back. To do that, I typically commit with a message like "WiP: Got Foo working, about to fix Bar." with the intention of coming back when it's complete and amending that commit to include the additional changes and have a more meaningful message. Sometimes I forget to use --amend though and things don't go to plan. That's another one that is easy to avoid:

You might need to do a little tweaking on that one depending on your setup, so I'll break it out in the order the commands will be executed to help you modify to your needs. First, I use git-config to return the email address of the current user:

$ git config --get-all user.email

I then pipe that into sed to return just the bit before the @ sign:

$ sed s/@.*//g

That's all been executed in a sub-process (I've backslash escaped the back tick characters at each end of the command: "git config --get-all user.email | sed s/@.*//g"). The result of that command is passed into git-log to return the last 5 commits for that author:

$ git log --oneline --author=username_here -n 5

And finally, grep is called on the result to ensure I haven't left the string "wip" in any of the commits:

$ grep -i wip

Ensuring you don't break the build

The hook that kicked it all off for me was to ensure that I didn't break the build, mostly as an attempt to claim moral superiority over anyone else who was found guilty of doing it themselves. Little did they know I had a secret weapon to protect my perfect performance ;)

Making it more self-aware

This approach worked great for a couple of days, but I quickly got frustrated because I'd have to add the --no-verify parameter to commits quite regularly. I really only wanted to run all the tests when I was committing on master before I pushed changes upstream to everyone else. The other problem was that my "WiP" workflow meant I'd have to use --no-verify whenever I was amending a commit and it struck me the script should be intelligent enough to know I was trying to do the right thing.

Detecting master

Determining if the current branch was master was relatively straight-forward:

`git symbolic-ref HEAD | grep master`!=""

So just wrap that as part of the if statements you only want to be executed when you're on the master branch.

Detecting commit amend

Working out if you are amending a commit is a little trickier. The options passed to commit aren't passed through to your script, so it requires a bit of process hackery in both ruby and bash to find out if --amend was used. First we use the built in $$ variable in ruby to return the process ID of the ruby process, and use it with ps and grep to return all matching processes:

`ps -f | grep #{$$}`

We then pass that into awk to extract the parent's process ID, and make an assumption that the first line is the parent:

`ps -f | grep #{$$} | awk '{print $3}' | head -n 1`

Back into ps and grep again now that we have the process ID of the parent we use it to return the full command and options that were passed to git-commit, and then grep again to see if --amend was passed in:

Wrapping it all up

All that would create a mess of if statements and duplication throughout your git pre-commit hook, and any other hook you might want to apply this logic to so I've bundled it all up in a reusable class that I include in any project. I'll keep updating it as my needs develop, feel free to fork it and add features for other languages and frameworks.

I'm putting together a weekly newsletter to help developers spend more time on what they
do best. You should sign up to
Cloud Services Weekly
today!