15 March 2010

Repeat after me: “Cabal is not a Package Manager”

It seems that every few weeks someone (who has usually either just started to use Haskell or doesn’t seem to be active in the Haskell community) comes out with the fallacy that Cabal is the “Haskell package manager”. I’ve gotten sick of replying why this isn’t the case on IRC, blog comments, etc. that I’ve decided to write a blog post to set things straight.

Disclaimer: I am not a Cabal developer (I did submit a patch or two to fix a couple of documentation problems I found, but I don’t know if they were applied or if they were re-written, etc.) so this is not the official line, just my own opinion (hmmm… “IANACD” doesn’t quite trip off the tongue…).

Cabal /= cabal-install

First of all, there is the common misconception that Cabal provides the command line tool cabal. This is not the case: this tool is provided in the cabal-install package. This unfortunately (for comprehension’s sake) named tool is completely distinct from Cabal-the-library except for the fact that it uses that library. To help understand why there is this distinction (as opposed to other language-specific installation tools such as RubyGems), I highly recommend this video by Duncan Coutts. The short version is: Cabal depends directly on Haskell compilers and nothing else (and is indeed shipped with GHC); cabal-install needs other packages (for network access, etc.) as well.

For the rest of this blog post, I will assume that people are instead referring to cabal-install as a package manager (which is what most of them intend) or else the entire Hackage ecosystem (see below).

What is a package manager?

According to that most authoritative source Wikipedia, a Package management system (of which a package manager is merely one component, namely the tool that is used) is:

… a collection of tools to automate the process of installing, upgrading, configuring, and removing software packages from a computer.

Coupled with such a tool is also a large (well, it could be small but that kind of defeats the point) collection of packages used by the package manager. Together, these provide a way of (hopefully) seamlessly installing packages without end users having to consider what is needed to do so. For example, let us assume that a Gentoo user with no other Haskell-related software present is wishing to install XMonad (one of the most popular pieces of software written in Haskell). In that case, all they need to do is:

emerge xmonad

This will bring in GHC and all other related dependencies (even non-Haskell ones!) so that the user doesn’t have to worry about all that kind of stuff.

The Haskell “Package Management System”

The “equivalent” to a package management system for Haskell is the Hackage ecosystem: HackageDB + Cabal + cabal-install, which provide the list of packages, the build system and the command line tool respectively. However, this ecosystem is not a package management system:

HackageDB

HackageDB is the central repository of open-source Haskell software. However, it is limited solely to Haskell software that is installed using Cabal. As such, it is not closed under the dependency relation: entering the incantation cabal install xmonad into a prompt on a Haskell-free system will not bring in first GHC and then all other dependencies (actually, it is not even possible to utter that incantation since neither Cabal nor cabal-install will be available, unless a pre-built cabal-install is being used). Furthermore, for any GUI libraries/applications that use the Gtk2Hs wrapper library around Gtk+, then Hackage will also be unable to install those packages (to the great confusion of many who state that they do indeed have gtk+ installed, when Cabal and cabal-install are complaining about the gtk+ sub-library from Gtk2Hs) since it isn’t Cabalised.

As such, HackageDB cannot really fulfill the requirements of being a proper component of a package management system.

Cabal

Cabal is the Common Architecture for Building Applications and Libraries, which not only acts as the build system (analogous to ./configure && make && make install) of choice for Haskell packages, but also provides this metadata information to other libraries and applications that need it via a library interface. Nowadays, the only mainstream Haskell packages that don’t use Cabal are GHC itself (since it contains non-Haskell components; furthermore this would involve bootstrapping issues since a Haskell compiler is needed to build Cabal) and “legacy” libraries and tools such a Gtk2Hs (for which it is currently not possible to build using the Cabal framework for various reasons).

Cabal obtains its information from two two files:

A .cabal file that contains a human-readable description of the package including dependencies, exported modules, any libraries and executables it builds, etc.

A Setup.[l]hs file that is a valid Haskell program/script using the Cabal and which performs the actual configuration, building and installation of the package; for most packages this is a mere two lines long (one to import the Cabal library, the second that states that it uses the default build setup).

As such, Cabal is extremely elegant, especially when compared to ones such as Ant. However, it is not a valid specification format for packages that are part of a fully-fledged package management system: it cannot deal with all possibilities (e.g. Gtk2Hs) nor all dependencies (it allows you to state any C libraries needed at build time, but not for any libraries needed from other languages nor any other tools or libraries needed at run-time; for example my graphviz library for Haskell really needs the “real” Graphviz tool suite installed to work properly but there is no way of telling Cabal that).

Why not XML?

Some people have stated that Cabal should really switch to an XML-based file format because it is a “standard”. Even if we assume that XML really is such a well-defined standard (though we’d need to define a Schema for use with Cabal), XML has one large failing: it is not human readable. One of the greatest features of Cabal is that its file format is perfectly readable and understandable by humans (even if it is at times imprecisely defined in some aspects), such that if I have to check dependencies, etc. for a Cabalised package I can quickly skim through its .cabal file and just read it without having to decode it (especially since usage of XML would remove any need for newlines, etc. which are used for readability purposes). Furthermore, it would require Cabal to use an XML parser, which means an extra dependency (whereas at the moment it needs only a compiler).

cabal-install

The choice of both package and executable name for cabal-install was unfortunate (if understandable) in that too many people confuse it for Cabal. Whilst it may use Cabal and act as a wrapper around both it and HackageDB, it is indeed a completely separate package. So remember, whilst you may do cabal install xmonad, you’re not using Cabal to do that but rather cabal-install.

cabal-install brings a convenient command-line interface, dependency resolution and downloading to the Hackage ecosystem. Assuming that you have GHC install, cabal install xmonad will indeed determine, download and build all Haskell dependencies for XMonad. However, this is not always the case.

As a wrapper around Cabal and HackageDB, cabal-install inherits all of their reasons why it is not a valid package manager. However, to that it brings in a few warts of its own: it only manages libraries. That doesn’t mean that it can’t install applications, because it can; it’s just that once it has installed an application it can’t tell that it has done so. That’s because rather than have its own record of what it has installed and what is available, cabal-install uses GHC’s library manager ghc-pkg to determine which libraries are installed. Since GHC doesn’t know which applications are installed, neither does cabal-install. As such, if one tries to install haskell-src-exts (or any package that depends upon it) on a “virgin” Haskell system, then it will fail since the parser generator happy isn’t installed. cabal-install (or rather Cabal) can detect if happy is available, but will not automatically offer to download and install it for you. Whilst this might be more a temporary limitation (in the sense that no-one has yet added support for build-time tools to cabal-install’s dependency resolution system) rather than a problem with cabal-install, it still requires extra user intervention.

There is a yet more serious impediment to being able to properly consider cabal-install a package manager (since other package managers may also require user intervention at times when installation fails for some reason). Let us once again consider that definition of a package management system, this time with some emphasis added:

… a collection of tools to automate the process of installing, upgrading, configuring, and removing software packages from a computer.

Did everyone spot that not-so-subtle hint? cabal-install can’t un-install Haskell software. Why not? Partially because Cabal doesn’t support uninstallation: whilst Cabal can un-register a library from ghc-pkg, it won’t remove any files it installed. Furthermore, because cabal-install doesn’t track which applications it has installed, it is definitely unable to uninstall them since it has no idea which files it needs to delete.

Partially linked to this problem of uninstallation is another segment of that definition: upgrading. Whilst it can install a new version of a package, it cannot remove old versions. However, this isn’t why the cabal upgrade option has been disabled: GHC ships with several libraries upon which it itself depends upon; these are known as the boot libraries. Originally cabal-install offered to upgrade these libraries, with at times disastrous results.

But what if we fix cabal-install???

What if cabal-install starts recording which packages it installs and which files it installs? With that uninstallation will be possible, and if it can tell which libraries are boot libraries then upgrading should also be possible. As such, cabal-install could be considered a proper package management system, couldn’t it? Pretty please?

Unfortunately, no: as mentioned earlier, between them HackageDB and Cabal can only be used to install packages written in Haskell that are cabalised. As such packages cannot be closed under dependencies and cabal-install cannot install all necessary dependencies (both build-time and run-time).

Why you should use your distribution’s package management system

Many GNU/Linux users in the Haskell community express disdain for their distribution’s package management system and vehemently express their preference of cabal-install. Here are several reasons, however, that you should use your distribution’s package management system:

Proper dependencies: system packages can bring in all required dependencies, no matter what language they were written in, etc. How are they able to do this when cabal-install can’t? Because they are (hopefully) checked by the most clever computational device known: the human brain. Good system packages have all dependencies listed explicitly, and in the case of those that are marked as “stable” are usually tested on a larger number of machines, architectures and software configurations than the upstream developer is capable of.

Package patching: a common complaint with the current status of HackageDB and Cabal is that if there is a mistake with a package’s .cabal file (usually due to package maintainers being either too strict or too lax in terms of dependency version ranges), then users are forced to either manually download, edit and install (thus losing cabal-install’s dependency resolution) those packages or else wait for the package maintainer to release an update. System packages, however, are able to work around these problems by either providing ready-built binary versions of those packages or else patching the package to get it working. For example, when the duplicated instance problem arose between yi and data-accessor, I was able to edit the yi ebuild to remove its own instance definition so that it would clash with data-accessor’s: as such, Gentoo users who wanted to install yi wouldn’t even know such a problem existed, which is how it should be.

It Just Works: linked to the above point, system packages are more likely to install and work on the first attempt than using cabal-install, because they have (hopefully) been tested to do so within that particular distribution. This also includes choosing the correct compile-time flags, etc.

Integration: when using system packages, the Haskell packages you have installed are first class citizens of your machine, just like every other package. Applications are installed into standard directories, as are libraries, documentation, etc. They will also interact with all the other packages on your system better: for example, there are various wrapper scripts that come with different packages that wrap around darcs; by using a system install of darcs then these wrapper scripts will work, whereas they might not if it’s installed in your home directory.

Done for you: someone has put in the effort to write the system package for that particular Haskell package; why shouldn’t you be grateful for them for doing so and use it?

There are of course various reasons why people prefer not to use system packages for Haskell packages:

Out of date packages;

Limited variety of packages;

Not built with the wanted options.

However, there are two possible solutions to these problems. The first one is that if you are a serious Haskell hacker and your distribution doesn’t support Haskell well, then why not try another distribution? Arch and Gentoo are usually recognised as being those with the best Haskell support, with Fedora seeming to have decent Haskell support for a more release- and ready-to-go-oriented distribution.

Alternatively, get more involved with your distribution’s Haskell packaging team or start one if there isn’t one there already: get what you want in your distribution and help other people out at the same time. It usually isn’t hard to at least make unofficial packages: I started off writing Haskell ebuilds for Gentoo by copying and editing ones that were already there; nowadays we have our hackport tool (available at app-portage/hackport in the Haskell overlay) that generates most of the ebuild for you, especially for simple packages that don’t need much tweaking. Failing that, just ask for a new/updated Haskell package.

So is cabal-install useless?

Not at all: cabal-install still serves four useful purposes:

Building and testing your own packages during development;

On OSs without a package management system (e.g. Windows);

You are unable to use your system’s package manager for some reason (e.g. need a custom build with different compile-time flags);

You are unable to install system-wide packages on your work/university computer (which is the situation I face at uni, unless I take the “manage your own computer and if it breaks don’t come to us crying” approach).

However, I still strongly recommend you use your system package manager whenever possible.

Conclusion

I have tried to set out at least some of my reasonings about why I believe that cabal-install is not a package manager like so many people seem to believe, and that the Hackage ecosystem overall is not a valid package management system. Furthermore, I have covered why I believe Cabal should not switch to XML-based files for its metadata and why you should strongly consider using your OS’s package management system (if it has one) over installing packages by hand with cabal-install.

If nothing else, it is my sincere hope that this blog post will at least stop people talking about Cabal when they mean cabal-install.

Share this:

Like this:

Related

I agree that cabal has a number of limitations, and that it isn’t a real package manager, and can’t be.

However, I’m not entirely sure that I buy that the right solution is to turn to your platform’s package management system. You now have to worry about how far that lags behind hackage, and worse, from a library writer’s perspective, there are many different package managers out there.

Why should a library writer worry about external package managers? OK, to an extent this is a problem: xmonad has been stuck with using base-3 because the developers wanted to keep providing support to GHC 6.8.* on the Ubuntu LTS version whilst it was still being maintained. However, I think that this isn’t going to be a problem for most developers: just try to keep your package as compatible as you can, and if you can’t (since you need to use a new feature in a new version of GHC, etc.) don’t worry about it.

Furthermore, as I said, if people are dissatisfied with how much their package manager lags behind Hackage in providing Haskell support, then as I said they should consider either switching distros or help get their distros Haskell packages up to scratch.

I think Cabal is highly overrated by the Haskell community. What’s worse is that there is no real alternative to it, at least if you want to distribute your code on Hackage.

The main problem I have with Cabal is that it absolutely refuses to work with tabs. I normally use tabs instead of spaces when writing Haskell code; although this is generally frowned upon in Haskell at least compilers support it. Cabal chooses to be different, however, by forcing the use of spaces instead of tabs.

I would guess that the main reason literal tab characters are discouraged in .cabal files is very simple: in Unix-based OSs (not sure if this includes OSX) a tab character is equivalent to 8 spaces; in Windows it’s equivalent to 4 spaces. As such, just because the individual lines might line up in your OS might not mean they’ll line up (and hence be parseable) in my OS.

The solution is simple: use an editor that will replace literal tab characters with spaces for you, so you can keep using the tab key for indentation but not have to worry about cross-platform compatability.

Except for when you have an indentation-specific language and different operating systems use different tab widths. Consider this scenario:

Foo
[4 spaces]Bar
[tab char]Baz
[8 spaces]Blah

Is “Baz” at the same level of indentation as “Bar” or indented even further? How about “Blah” compared to “Baz”?

Now, if everyone can agree to the congruence between tab characters and spaces, this wouldn’t be a problem; however, we can’t even at the level of two people working on the same project let alone different operating systems. Even then, it isn’t too big a deal if everyone sticks to using tab characters for everything, but that doesn’t always happen (and I’ve never liked the argument of “tab characters for indentation, spaces for alignment”; shouldn’t the alignment be part of the indentation?).

Besides, is there any reason to use literal tab characters? Surely we’re not that parsimonious with disk space and network traffic that an extra couple of space characters as opposed to a tab character hurts (especially when network traffic is compressed)…

Very educational post. I have refrained from cursing any of the haskell tools since I have so far regarded the whole haskell experience as a bleeding edge education. However this is not a healthy long-term attitude, since the language has significant potential in its own right.

My preferred platform is macosx (10.5). I have certainly experienced a number of install problems of the sort mentioned, as well as others. When you say we should use the native package manager, which would you or the commenters regard as the native one for macosx. macports, fink, frameworks? Is there a way to support more than one?

I wouldn’t have a clue, as I don’t have a Mac (and to be honest can’t stand it when I have to use one). I think I’ve heard rumours that macports is getting back up to steam with respect to Haskell packages however…

I am thinking of trying to run multiple GHC versions (6.10 and 6.12) at the same time. Does Hackage/Cabal/cabal-install support this in theory and in practice? Or am I better off setting up another system or user altogether?

As a developer, one is interested in, well, development, rather than ensuring that one’s specific system’s package manager happens to include the most up-to-date required version of some package. At any rate, choosing which distro to work on based solely on how up-to-date its Haskell packages doesn’t seem reasonable – there are many other applicable criteria. In theory, this issue could be fixed by some automatic gateway between Hackage and popular distro’s package manages (apt, RPM, etc.). This would be a “non-trivial” undertaking, with a significant per-distro effort.

Even so, one doesn’t always have the luxury of having root access to one’s machine (using Haskell in a non-small company with an actual IT department). It is easy (well, reasonably so) to set up cabal-install to use a local “user” repository. It is impossible (well, nearly so) to do the same for the “system’s package manager”. One could argue this is a fault in the system’s package manager, but it is one that seems to be shared by all of them.

The lack of integration between GHC’s “bootstrap” and cabal-install is another major pain. Consider for example that at the time I am writing this, it is impossible (without an extreme effort, anyway) to get containers-0.5.0.0 to work with GHC 7.4.2. This kind of issue isn’t something that can be solved by having the system’s package manager handle the dependencies, unless one is using a source-based system package manager and without much deeper integration between it and GHC – again, a “non-trivial” undertaking with significant per-distro effort.

Yes, it is true cabal-install can’t handle non-Haskell dependencies, but this doesn’t mean that it wouldn’t be a very useful thing to have a true cabal package manager that works across all distros within these limitations. Ruby’s “gem” command, for example, attempts to be a true package manager, independent of the system’s package manager. While it does have some warts and restrictions due to being unable to handle non-Ruby dependencies, it is still extremely useful.

Bottom line, it would be extremely useful to have a more powerful cabal-install which is better integrated with the GHC “bootstrap”, allowing proper management of Haskell packages.

I don’t think what you’re after is possible: GHC comes with the boot libraries, which are the libraries that it _needs_ to have installed because it itself uses them, and libraries that it exports use them. It is currently possible to install containers-0.5.0.0 with GHC-7.*, but you then get diamond dependency problems.

The only way I can see this being fixed is if GHC creates re-named copies of libraries like containers, etc. and thus the boot libraries that _need_ to ship with GHC (e.g. Cabal, ghc-prim) are munged to work with them; if they don’t expose any types from these re-named libraries then it should then be possible to use separate copies of containers and the like for everything else.

AFAIK emerge can’t uninstall packages either (or has that changed?), so it’s not a “real” Package Manager.

Now seriously, in principle I completely agree with this post, and this is what I try to do. But I can see why people switch to hand-building: you install something from your distro, it has a bug. You go ask on cafe, and you’re told: “Oh, but you’re not using my latest-n-greatest! Upgrade now!”

And distros can lag not only when their Haskell support is “bad”, but for fundamental distro-wide reasons. For instance, I track debian testing, which is now frozen. The freeze will take about half a year, going by past releases.
Other similar situations are transitions to a new libc.

I guess the basic problem is that the Haskell world just moves so fast. Faster than just about anything but the Linux kernel. And for the kernel I make an exception: I build my own, because I need the latest hardware support. In a weird way, the lowest and highest level software in a GNU/Linux distro is similar :-)

When _hasn’t_ emerge been able to uninstall packages? I don’t use Gentoo any more, but from when I started around 2005 I could always do “emerge -C”…

Now, I must admit that I am currently building most of my packages by hand using cabal-install; this is because I mean to add better Haskell support to Exherbo by letting the package manager natively understand .cabal files and get packages straight from Hackage (rather than manually or semi-automatically creating exheres), but that involves C++ hacking, which I’m avoiding :p

I know this post is pretty old, but what’s stopping the Haskell community from doing something like virtualenv / NPM? I’ve found that the dependency management, though disk-wasteful, allows me to easily maintain distinct versions of packages on many platforms.

It also significantly helps deployment in that I can simply and easily package an application with all requisite libraries, meaning that the environment is almost guaranteed to be constant between development, testing, and production systems.

We do have several sandboxing implementations available for Haskell, and the next major version of cabal-install will have native support for sandboxing (which is what I’m assuming you were alluding to with virtualenv).

If by “NPM” you mean the Node package manager, then my understanding is that it would have the same issues as cabal-install: it can only manage packages for the language it’s designed for, and thus won’t bring in any non-Node libraries/tools that are needed.

If, however, you mean that it bundles the entire dependency set into one package… then no, we don’t have that for Haskell… and I don’t think anyone is really planning anything like that, as it would be inherently fragile (it would depend rather strongly on the version of GHC you’re using for starters).

While it’s lamentable that cabal does not manage software uninstallation, it is not unique among package systems in that regard. Look at OS X packages, and easy_install before pip came along, for example. I prefer the way Puppet, a configuration manager that has to deal with packages from all corners of the earth, addresses this issue: be inclusive when using the terms ‘package manager’ and ‘package system’, but specific in the set of features that each package manager does or does not support (http://docs.puppetlabs.com/references/latest/type.html#package).

[…] resolve dependencies, can’t easily delete old/broken versions of packages, simply because Cabal is not a package manager. This sucks, that’s why every now and then what I have to do is to purge all my conflicting […]

Welcome!

Greetings and salutations be upon thee to my little corner of what we like to charmingly call "the internet". This is where I put forth cunning arguments and repartee on such topics as Haskell, Gentoo, university... just don't ask me if I have a social life :s