Version String Management in Python: Introducing python-versioneer

What’s a good way to manage version numbers in a Python project? I don’t mean:

where should it be stored, so that other code can find it. PEP 8 tells us to use __version__, and distutils tells us to call setup() with a version= argument. The embedded string is particularly useful to report or record a version in bug reports.

what format it should take: PEP 386 describes a format (N.N[.N]+[{a|b|c|rc}N[.N]+][.postN][.devN]) that enables comparison, so packaging tools can evaluate things like “dependency > 1.2.0”. (I happen to find this format really limiting, and this tool doesn’t necessarily produce PEP386-compliant strings, but that’s not what this post is about)

What I do mean is:

how does the right version string get into the code?

what does a release manager need to do when it’s time to make a new release?

The traditional approach, ages old, is to have a static string embedded in the code. Each time you’re about to make a new release, you make a commit which updates this string. It’s nice and simple, but has some problems:

Confusing intermediate versions: users/developers who work from revision control, between releases, will get a stale version, causing confusing bug reports (“it says 1.2! Oh, but I pulled from SVN last tuesday, maybe it’s closer to 1.3”). I got into the habit of commiting a “1.2+” version just after making the 1.2 release, which fixes the confusion, but doesn’t remove the ambiguity (“it says 1.2+ .. when did you last update?”)

Post-RC code changes: projects large enough to have release candidates must accept (small) code changes between the last release-candidate and the final release, to update the embedded version string one last time. This makes QA nervous.

Merge conflicts: embedded version strings work fine for linear histories, but modern DVCS systems make it easy to have extended release branches, so larger projects usually have parallel lines of development (dev, stabilization, release). It’s frequently a good idea to merge your release branch back into your main trunk, to make sure you don’t lose any of the fixes made during the release effort. But then you get merge conflicts between the version string on trunk (which now says something like 1.3-dev or 1.3+) and the older version on the release branch (maybe 1.2), every single release. Resolving this correctly each time (generally in favor of the trunk version) is a source of errors.

It adds an extra step to the release process (two if you use the “1.2+” approach), discouraging developers from releasing early and often.

Over the years, I’ve built a couple of ad-hoc mechanisms for updating the embedded version string just before each commit, or to add an “update version” command/script that checks with the version-control tool for information. Tahoe uses setup.cfg to prepend an update-versions command in front of any setup.py command that might care. And I’ve had setup.py code which greps a _version.py for the current string (rather than import, to avoid committing to an unusably old dependency, also for speed). But none of these have been very satisfying.

Thinking about how I use git these days, I realized that I want my release process to have one step: “git tag” (well, and a “git push” to tell the world about it). Everything else should be automated: building tarballs, uploading them to a release server, updating a web page, sending an announcement email, pypi registration, etc. What really matters is the release manager making the decision to bless some well-tested revision id with a public name of some sort.

So I finally built a tool to accomplish this. It’s called “Versioneer”, and is available at https://github.com/warner/python-versioneer . It’s still pretty early, but seems to do the right thing. Here’s how it works:

To install, you copy versioneer.py into your tree and follow the instructions in the docstring. These will have you edit your setup.py to “import versioneer”, set a few variables to tell it about your tag-naming convention (PROJECT-VER or just VER) and where the _version.py ought to live. It also has you add a cmdclass= to your setup() arguments, which hooks into the “build” and “sdist” commands, and adds an “update_files” command. Then you run “setup.py update_files”, which creates your _version.py and modifies your __init__.py to include it (and define the PEP 8 __version__ variable). It also creates a .gitattributes file to arrange for variable-expansion during git-archive, described below. Then it does ‘git add’ on the relevant files and asks you to commit the changes.

When setup.py needs a version (such as for building an sdist tarball, or registering with pypi), it calls versioneer.get_versions().

If either is called in a checked out source tree, they invoke “git describe” (with –tags –always –dirty) to come up with a fine-grained version string. If you’re sitting on a tag, you get just “1.4”. If you’re after a tag, you’ll get something like “1.4-8-gf7283c2”, which means there are 8 commits after the 1.4 tag, and the abbreviated SHA1 revision ID is f7283c2. And if your tree has uncommitted changes, you’ll get “1.4-8-gf7283c2-dirty”.

Using “git archive” to create a tarball or zipfile will expand some magic variables in _version.py, capturing the SHA1 revision id, and any tags that point to it. When unpacked, these strings are parsed to extract the most likely tag name. This allows tarballs generated by gitweb and GitHub’s “Download A Tarball” button to get useful version strings.

Using “setup.py dist” to make a tarball, or “setup.py build” to copy code into build/, will compute the version and replace _version.py with a short form that just has the computed strings.

The result is that you get a useful fine-grained version string, updated every time you run your program, embedded into release products via the most common tarball-generation tools (setup.py sdist and git-archive). Developers will get detailed version information in their test logs (assuming you record __version__ in them, which you should), so other developers can reproduce their tree. Bug reports from end users will contain enough data (assuming they emit the version string) to reproduce their code, and to learn if they have local modifications or not. Release managers only need to run “git tag” when they decide to make a new release. And development/stabilization/release branches can be merged freely without worrying about what will happen to embedded version strings.

There are a few gotchas:

the tool only handles git so far. I plan to add support for other systems in the future. Git is nice because git-describe is so fast. The –dirty flag does require that it stat every file (and possibly hash the contents), but that’s still pretty fast (25ms on my laptop for the jetpack tree). Other VCSes should be similarly fast, except for Darcs for which there is no concise version string except for specific tags.

The git-archive expanded variables are abbreviated, and don’t distinguish between branch names and tag names, so sometimes we have to guess. Also you might have multiple tags pointing at the current revision (very common when your last release candidate gets promoted to final: the code sees both “1.3rc2” and “1.3” and picks the shorter one). It works pretty well when you build a tarball from a release tag, but building from something between-releases only gets you the full SHA1 (not the 1.4-8-gREVID form).

if you wind up in a git checkout but either /usr/bin/git or .git is unavailable (maybe your git is named something else, or you’ve deleted the .git directory, or something), then the code will report a version of “unknown”. This happened on the Flightdeck (aka Add-On Builder) site, where it turned out they were getting the Jetpack code in a git submodule and then copying everything but the .git directory to a deployment server. This also happens if you’re bridging the code into a different VC system (e.g. using hg_git or git-svn or something) and using the result. This happens on the git-to-hg bridged version of the jetpack trunk, which is still used by some RelEng automation.

“1.4-8-gREVID” is not PEP386-compliant: it can tolerate an “-rNNN” suffix (for SVN-style revision IDs), but not the more general “-gHEX” suffix that git-describe provides, nor the “-8” revision counter. You may need to mangle the result if PEP386-compatibility of between-release version strings is important to you.

Jetpack uses Versioneer as of addon-sdk-1.4, with only a few stumbles so far. I’ve also switched python-ed25519 to use Versioneer. Let me know what you think!

This entry was posted on Tuesday, January 31st, 2012 at 6:08 pm and is filed under Jetpack, Python.
You can follow any comments to this entry through the RSS 2.0 feed.
You can leave a comment, or trackback from your own site.

* you’ll get version=”unknown” in trees that are bridged into a different VCS system. I learned the other day that Tahoe is being synced into a Launchpad-based “bzr” repo, and trees checked out from that won’t have any clue what their version should be

* the (intermediate) version strings this makes aren’t parseable by setuptools: it wants a “normalized version” with a .postNN and maybe a .devNN, and won’t accept (hex) SHA1 revisionids. In Tahoe, I used some external code to force __version__ to be in this normalized form (but I don’t like it very much). I’ll probably merge that code into versioneer, so get_versions()[“normalized”] will be available. (dunno if being compatible with this is important or not.. if Buildbot is already doing something similar, then normalized versions probably isn’t a constraint).