Announcing reposloc

For some time now, I’ve nurtured a small hack I call sloc, which serves basically the same purpose as the venerable ohcount and David Wheeler’s sloccount, but does so with great simplicity (it’s about 300 lines) and much faster (it’s over 10 times as fast as ohcount). For those not familiar with those tools, it counts source lines of code (non-blank, non-comment) in various languages. This doesn’t require full parsing of the language, of course, which makes it possible to do pretty quickly.

The logical next step was reposloc, which is essentially just a quick perl script to run sloc at every point in time over the history of a repository to produce a graph. This is something ohloh does, but reposloc doesn’t require you to give a third party access to your repository, and it’s much, much faster thanks to the speed of sloc itself. reposloc is kept alongside sloc in the same repository, since it can’t used separately.

At the moment only mercurial and git repositories are supported, although adding support for more (as long as they’re distributed) is pretty much trivial. Adding support for non-distributed repositories is best done by first importing them to git - the fastest, at least when used by reposloc - and then running the analysis from there.

reposloc can generate graphs by language, by code vs comment, or just graphs of the total number of source lines of code over time. It can also generate graphs for multiple repositories at a time, treating them all as one big repository with shared history, even if the repositories are using different VCSs.

The biggest outstanding bug is the handling of forks and other forms of shared patches. There’s no good way to detect and manage these, especially when multiple VCSs are involved, so the problem remains unsolved.

An archlinux package is in the AUR (sloc-git) - others can install from the git repository like so: