I am glad to see discussions about the problem of distributing python programs in the wild. A recent post by Glyph articulates the main issues better than I could. The developers vs end-users focus is indeed critical, as is making the platform an implementation detail.

There is one solution that Glyph did not mention, the freeze tool in python itself. While not for the faint of the heart, it allows building a single, self-contained executable. Since the process is not really documented, I thought I would do it here.

Setting up a statically linked python

The freeze tool is not installed by default, so you need to get it from the sources, e.g. one of the source tarball. You also need to build python statically, which is itself a bit of an adventure.

I prepared a special build of static python on OS X which statically link sqlite (3.8.11) and ssl (1.0.2d), both from homebrew.

You should now have an executable called hello of approximately 7-8 MB. This binary should be relatively portable across machines, although in this case I built the binary on Yosemite, so I am not sure whether it would work on older OS X versions.

How does it work ?

The freeze tool works by bytecompiling every dependent module, and creating a corresponding .c file containing the bytecode as a string. Every module is then statically linked into a new executable.

Limitations

I have used this process successfully to build non trivial applications that depend on dozens of libraries. If you want a single executable, the main limitation is no C extension requirement.

More generally, the main limitations are:

you need to statically build python

you have to use unix

you are not depending on C extensions

none of your dependency uses shenanigans for package data or import

1 and 2 are linked. There is no reason why it should not work on windows, but statically linking python on windows is even less supported than doing it on unix. It would be nice for python itself to support static builds better.

3 is one of the feature that has been solved over and over by the multiple freezer tools. It would be nice to get a minimal, well-written library solving this problem. Alternatively, a way to load C extensions from within a file would be even better, but not every platform can do this.

4 is actually the main issue in practice, it would be nice for good solution here. Something like pkg_resources, but more hackable/tested.

I would argue that the pieces for a better deployment story in python are there: what is needed is taking the existing pieces to build a cohesive solution.

This is a quick post to show how to build NumPy/SciPy with OpenBlas on Mac OS X. OpenBlas is a recently open-sourced version of Blas/Lapack that is competitive with the proprietary implementations, without being as hard to build as Atlas.

Note: this is experimental, largely untested, and I would not recommend using this for anything worthwhile at the moment.

Building OpenBlas

After checking out the sources from github, I had the most luck building openblas with a custom-build clang (I used llvm 3.1). With the apple-provided clang, I got some errors related to unsupported opcodes (fsubp).

With the correct version of clang, building is a simple matter of running make (CPU is automatically detected).

Building NumPy/SciPy

I have just added a initial support for customizable blas/lapack in the bento build of NumPy (and scipy). You will need a very recent clone of NumPy git repo,and a recent bento. The single file distribution of bento is the simplest way to make this work:

It is that time of the year where packaging questions resurface in the open (on python-dev and by Armin)

Armin wrote an article on why he loves setuptools, and one of the main takeaway of his text is that one should not replace X with Y without understanding why X was created in the first place. There is another takeaway, though: none of the features Armin mentioned matters much to me. This is not to say they are not important: given the success of setuptools or pip, it would be stupid not to recognize they fulfill an important gap for a lot of people.

About tradeoffs

But while those solutions provide a useful set of features, it is important to realize what they prevent as well. Nick touches this topic a bit on python-dev, but I mean something a bit different here. Some examples:

First, the way setuptools install eggs by adding things to sys.path caused a lot of additional stat on the filesystem. In the scientific community (and in corporate environments as well), people often have to use NFS. This can cause import speed to take a lot of time (above 1 minute is not unheard of).

Setuptools monkey patches distutils. This has a serious consequence for people who have their own distutils extensions, since you essentially have to deal with two code paths for anything that setuptools monkey patches.

As mentioned by Armin, setuptools had to do the the things it did to support multi-versioning. But this means that it has a significant cost for people who do not care about having multiple versions of the same package. This matters less today than it used to, though, thanks for virtual env, and pip that installs things as non-eggs.

Similar argument can be made about monkey-patching: distutils is not designed to be extensible, especially because of how commands are tightly coupled together. You effectively can NOT extend distutils without monkey-patching it significantly.

Hackable solutions

A couple of years ago, I decided that I could not put up with numpy.distutils extensions and the aforementioned distutils issues anymore. I started working on Bento sometimes around fall 2009, with the intend to bootstrap it by reusing the low-level distutils code, and getting rid of commands and distribution. I also wanted to experiment with simpler solutions to some more questionable setuptools designs such as data resource with pkg_resources.

I think hackable solutions are the key to help people solving packaging solution(s). There is no solution that will work for everyone, because the usecases are so different and clash with each other. Personally, having a system that works like apt-get (reliable and fast metadata search, reliable install/uninstall, etc…) is the holy grail, but I understand that that’s not what other people are after.

What matters the most is to only put in the stdlib what is uncontroversial and battle-tested in the wild. Tarek’s and the rest of the packaging team efforts to specify and write PEP around the metadata are a very good step in that direction. The PEP for metadata works well because it essentially specify things that have been used succesfully (and relatively uncontroversial).

But an elusive PEP around compilers as has been suggested is not that interesting IMO: I could write something to point every API issues with how compilation work in distutils, but that sounds pointless without a proposal for a better system. And I don’t want to design a better system, I want to be able to use one (waf, scons, fbuilt, gyp, whatever). Writing bento is my way of discovering a good design to do just that.

From the beginning, it was clear that one of the major hurdle for bento would be transition from distutils. This is a hard issue for any tool trying to improve existing ones, but even more so for distribution/packaging tools, as it impacts everyone (developers and users of the tools).

Since almost day one, bento had some basic facilities to convert existing distutils projects into bento.info. I have now added something to do the exact contrary, that is maintaing some distutils extensions which are driven by bento.info. Concretely, it means that if you have a bento package, you can write something like:

import setuptools # this comes first so that setuptools does its monkey dance
import bento.distutils # this monkey patches on top of setuptools
setuptools.setup()

as your setup.py, it will give the “illusion” of a distutils package. Of course, it won’t give you all the goodies given by bento (if it could, I would not have written bento in the first place), but it is good enough to enable the following:

installing through the usual “python setup.py install”

building source distributions

more significantly: it will make your package easy_install-able/pip-able

This feature will be in bento 0.0.5, which will be released very soon (before pycon 2011 where I will present bento). More details may be found on bento’s documentation

I could not spend much time (if any) on bento the last few weeks of 2010, but I fortunately got back some time to work on it this month. It is a good time to describe a bit what I hope will happen in bento in the next few months.

Bento poster @ Pycon2011

First, my bento proposal has been rejected for PyCon 2011, so it will only be presented as a poster. It is a bit unfortunate because I think it would have worked much better as a talk than as a poster. Nevertheless, I hope it will help bringing awareness of bento outside the scipy community, and give me a better understanding of people’s need for packaging (poster should be better for the latter point).

Bento 0.0.5

Bento 0.0.5 should be coming soon (mid-february). Contrary to the 0.0.4 release, this version won’t bring major user-visible features, but it got a lot of internal redesigns to make bento easier to use:

Automatic command dependency

One does not need to run each command separately anymore. If you run “bentomaker install”, it will automatically run configure and build on its own, in the right order. What’s interesting about it is how dependencies are specified. In distutils, subcommand order is hardcoded inside the parent command, which makes it virtually impossible to extend them. Bento does not suffer from this major deficiency:

Dependencies are specified outside the classes: you just need to say which class must be run before/after

Class order is then computed at run time using a simple topological sort. Although the API is not there yet, this will enable arbitrary insertion of new commands between existing commands without the need to monkey patch anything

Virtualenv support

If a bento package is installed under virtualenv, the package will be installed inside the virtualenv by default:

Of course, if the install path has been customized (through prefix/eprefix), those take precedence over virtualenv.

List files to be installed

The install command can optionally print the list of files to be installed and their actual installation path. This can be used to check where things are installed. This list is exactly what bento would install by design, so it is more difficult to have weird corner cases where the list and what is actually installed is different.

First steps toward uninstall

Initial “transaction-based” install is available: in this mode, a transaction log will be generated, which can be used to rollback an install. For example, if the install fails in the middle, already installed files will be removed to keep the system in a clean state. This is a first step toward uninstall support.

Refactoring to help using waf inside bento

Bentos internal have been improved to enable easier customization of the build tool. I have a proof of concept where bento can be customized to use waf to build extensions. The whole point is to be able to do so without changing bento’s code itself, of course. The same scheme can be used to build extensions with distutils(for compatibility reasons, to help complex packages to move to bento one step at a time.

Bentoshop: a framework to manage installed packages

I am hoping to have at least a proof of concept for a package manager based around bento for Pycon 2011. As already stated on this blog, there are few non-negotiable features that the design must follow:

Robust by design: things that can be installed can be removed, avoid synchronisation issues between metadata and installed packages

Transparent: it should play well with native packaging tools and not go in the way of anyone’s workflow.

No support whatsoever for multiple version: this can be handled with virtualenv for trivial cases, and through native “virtualization” scheme when virtualenv is not enough (chroot for fs “virtualziation”, or actual virtual machines for more)

Efficient

This means PEP376 is out of the question (it breaks points 1 and 4). I will follow a first proof of concept following the haskell cabal and R (CRAN) systems, but backed with a db for performances.

The main design issue is point 2: ideally, one would want a user-specific, python-specific package manager to be aware of packages installed through the native system, but I am not sure it is really possible without breaking other points.