Blocking large file commits in git

21 Jun 2013

In my previous blog post on Locating large objects in a git
repository, I covered some of the git utilities that can be used to
identify large files in your repository. In this post I’d like to go over how
to create a server-side hook that will block large files before they get into
your repo. Some of the same utilities from the post referenced above will be
used in creating this hook.

What are server-side hooks?

Git hooks allow you to perform actions at different points in the
git push workflow. Server-side hooks are executed at certain specific phases of
the push process:

pre-receive

Fires before any refs are updated on the server, and is run once. This hook
receives a list of changes on stdin in the format of: old-ref new-ref
refname, one line for each ref to be updated. Both stderr and stdout are
forwarded back to the user for messaging. A non-zero exit will prevent any
refs from being updated.

update

Fires before the updating the refs on the server, and is executed once for
each ref that is being updated. It takes 3 arguments: ref-name old-ref
new-ref, and will pass stderr and stdout back to the client. A non-zero exit
status will only prevent the effected ref from being updated on the server.

post-receive

Fires once after all refs have been updated, and gets the same input on
stdin as the pre-receive hook. As with the hooks before it, stdout and
stderr are passed back to the client for messaging. Generally this type of
hook is only used for notification and/or messaging to clients about the push
that just occured.

post-update

As with the post-receive hook, this fires once after the refs have been
updated, and takes a variable number of arguments. Each parameter that is
passed in, is the name of a ref that was successfully updated. Similarly to
post-receive this hook is also used primarily for notification, as it cannot
effect the status of the push.

On the git server these hooks live inside the hooks directory of the bare
repository. They must to be named to match the type of hook they are.

Which hook type should I use?

Since we’re looking to block an actual commit from happening, we would want to
use either a pre-receive or update hook. The distinction here is that the
pre-receive will block all updates on a non-zero exit, whereas the
update will only block the specific update for the ref which returns a
non-zero exit. For this example I will choose to use pre-receive because I
want to block the entire push if we find an object that is over the file size
limit.

How does it work?

The basic flow of this hook is as follows:

Read stdin line by line for push information (old-ref new-ref refname)

For each line you receive, obtain a list of files that have changed.

Inspect each file’s object and get its size in bytes.

Compare that size to a limit that you have set.

Provide useful feedback to the user about any exceptions that are rasied.

Exit accordingly

We also need to be aware of a couple of exceptional cases: branch creation and
branch deletion. On a branch creation the old-ref will be a null SHA1
(0000000000000000000000000000000000000000). Similarly on a branch deletion
the new-ref will be a null SHA1. For the purposes of this hook, there should
be nothing happening in a branch deletion that is of any importance to us, so we
will want to skip looking at those types of operations.

For this example I will use a bash script which calls out to the git utilities.
I am choosing to do it this way so that we can discuss how each step works, and
what is happening. Similar hooks could be written using modules designed to
interact with git, such as GitPython or Grit.

What does the script look like?

Breakdown

Let’s explore the script a little bit, and go through the operations happening
to make all this work. The first git operation, which is at line 21 is
performing a git-diff with several options.

--stat is making diff display a diffstat

--name-only makes the output only contain the name of the files changed

--diff-filter=ACMRT limits the output of the diff to only contain additive
operations.

${oldref}..${newref} generates the diff based on changes to be applied to
oldref by newref.

The next git operation is on line 23, where git cat-file is used
to obtain the object size of a file in a commit. When we encounter a file that
is above our defined MAXSIZE (line 27), the script outputs some actionable
information to the user. When the filesize condition is met, the EXIT variable
is set to a non-zero value, and we exit with that at the end of execution.

Conclusion

As you can see, it’s trivial to put together a simple bash script that can
enforce file size restrictions on pushes. Alternatively, it would be a
relatively simple modification to allow this script to operate on total commit
size. Knowing the core concepts required to perform a task like this makes
it easier to port it to using a git module in a different scripting language.