jueves, 19 de enero de 2017

git contribution spans (emacs case study)

Let's play a little, just for the fun of it, and to gather some metainformation about a codebase.

I sometimes would like to find out the commit spans of the different authors. I may not care about the number of commits, but only first and last. This info may be useful to know if a repo has most long time commiters or the majority of contributors are one-off, or one-feature-and-forget.

First we'll sharpen a bit our unix chainsaw:

Here's a little helper that has proven to be very useful. It's similar to uniq, but you don't need to sort first, and also it accepts a parameter meaning the column you want to be unique. (Unique by Column)

function uc () {
awk -F" " "!_[\$$1]++"
}

And then, you only have to look at man git-log to find out you can reverse the order of the logs, so you can track the first and the last appearence of each commiter.

Here is the output file. it has some spurious parsing errors, but I think it still shows how powerful insights we can get with just a bit of bash, man and pipelines.

Just by chance, the day after I tried this, I discovered git-quick-stats, which is a utility tool to get simple stats out of a git repo. It's great that I could add the same functionality there too via a Pull Request :).

Big thanks to every one of the commiters in emacs, being long or short span and amount of code contributed. Thanks to y'all