Tuesday, September 28, 2010

Stupid Git Tricks for PostgreSQL

Even before PostgreSQL switched to git, we had a git mirror of our old CVS repository. So I suppose I could have hacked up these scripts any time. But I didn't get around to it until we really did the switch. Here's the first one. It's a one-liner. For some definition of "one line".

This will show you the ten "biggest" commits since the REL9_0_STABLE branch was created, according to number of lines of code touched. Of course, this isn't a great proxy for significance, as the output shows. Heavily abbreviated, largest first:

2746e5f21d4dc Introduce latches. A latch is a boolean variable, with the capability to wait until it is set (Heikki Linnakangas)

Of course, some of these are not-very-interesting commits that happen to touch a lot of lines of code, but a number of them represented significant refactoring work that can be expected to lead to good things down the line. In particular, latches are intended to reduce replication latency and eventually facilitate synchronous replication; and Tom's PARAM_EXEC refactoring is one step towards support for the SQL construct LATERAL().

This one shows you the total number of lines of code committed to 9.1devel, summed up by committer. It has the same problem as the previous script, which is that it sometimes you change a lot of lines of code without actually doing anything terribly important. It has a further problem, too: it only takes into account the committer, rather other important roles, including reporter, authors, and reviewers. Unfortunately, that information can't easily be extracted from the commit logs in a structured way. I would like to see us address that defect in the future, but we're going to need something more clever than git's Author field. Most non-trivial patches, in the form in which they are eventually committed, are the work of more than one person; and, at least IMO, crediting only the main author (if there even is one) would be misleading and unfair in many cases.

I think the most interesting tidbit I learned from playing around with this stuff is that git merge-base can be used to find the branch point for a release. That's definitely handy.