Try Backpack: Cabal packages

This post is part two of a series about how you can try out Backpack, a new mixin package system for Haskell. In the previous post, we described how to use a new ghc --backpack mode in GHC to quickly try out Backpack's new signature features. Unfortunately, there is no way to distribute the input files to this mode as packages on Hackage. So in this post, we walk through how to assemble equivalent Cabal packages which have the same functionality.

GHC 8.2, cabal-install 2.0

Before you start on this tutorial, you will need to ensure you have up-to-date versions of GHC 8.2 and cabal-install 2.0. When they are up-to-date, you should see:

One obvious way to translate this file into Cabal packages is to define a package per unit. However, we can also define a single package with many internal libraries—a new feature, independent of Backpack, which lets you define private helper libraries inside a single package. Since this approach involves less boilerplate, we'll describe it first, before "productionizing" the libraries into separate packages.

For all of these example, we assume that the source code of the modules and signatures have been copy-pasted into appropriate hs and hsig files respectively. You can find these files in the source-only branch of backpack-regex-example

Single package layout

In this section, we'll step through the Cabal file which defines each unit as an internal library. You can find all the files for this version at the single-package branch of backpack-regex-example. This package can be built with a conventional cabal configure -wghc-8.2 (replace ghc-8.2 with the path to where GHC 8.2 is installed, or omit it if ghc is already GHC 8.2) and then cabal build.

The header of the package file is fairly ordinary, but as Backpack uses new Cabal features, cabal-version must be set to >=1.25 (note that Backpack does NOT work with Custom setup):

Private libraries.str-bytestring, str-string and regex-types are completely conventional Cabal libraries that only have modules. In previous versions of Cabal, we would have to make a package for each of them. However, with private libraries, we can simply list multiple library stanzas annotated with the internal name of the library:

To keep the modules for each of these internal libraries separate, we give each a distinct hs-source-dirs. These libraries can be depended upon inside this package, but are hidden from external clients; only the public library (denoted by a library stanza with no name) is publically visible.

Indefinite libraries.regex-indef is slightly different, in that it has a signature. But it is not too different writing a library for it: signatures go in the aptly named signatures field:

With Cabal, these instantiations can be specified through a more indirect process of mix-in linking, whereby the dependencies of a package are "mixed together", with required signatures of one dependency being filled by exposed modules of another dependency. Before writing the regex-example executable, let's write a regex library, which is like regex-indef, except that it is specialized for String:

Here, regex-indef and str-string are mix-in linked together: the Str module from str-string fills the Str requirement from regex-indef. This library then reexports Regex under a new name that makes it clear it's the String instantiation.

We can easily do the same for a ByteString instantiated version of regex-indef:

In the root directory of the package, you can cabal configure; cabal build the package (make sure you pass -wghc-head!) Alternatively, you can use cabal new-build to the same effect.

There's more than one way to do it

In the previous code sample, we used reexported-modules to rename modules at declaration-time, so that they did not conflict with each other. However, this was possible only because we created extra regex and regex-bytestring libraries. In some situations (especially if we are actually creating new packages as opposed to internal libraries), this can be quite cumbersome, so Backpack offers a way to rename modules at use-time, using the mixins field. It works like this: any package declared in build-depends can be specified in mixins with an explicit renaming, specifying which modules should be brought into scope, with what name.

For example, str-string and str-bytestring both export a module named Str. To refer to both modules without using package-qualified imports, we can rename them as follows:

The semantics of the mixins field is that we bring only the modules explicitly listed in the import specification (Str as Str.String) into scope for import. If a package never occurs in mixins, then we default to bringing all modules into scope (giving us the traditional behavior of build-depends). This does mean that if you say mixins: str-string (), you can force a component to have a dependency on str-string, but NOT bring any of its module into scope.

It has been argued package authors should avoid defining packages with conflicting module names. So supposing that we restructure str-string and str-bytestring to have unique module names:

In fact, with the mixins field, we can avoid defining the regex and regex-bytestring shim libraries entirely. We can do this by declaring regex-indef twice in mixins, renaming the requirements of each separately:

Note that requirement renamings are syntactically preceded by the requires keyword.

The art of writing Backpack packages is still in its infancy, so it's unclear what conventions will win out in the end. But here is my suggestion: when defining a module intending to implement a signature, follow the existing no-conflicting module names convention. However, add a reexport of your module to the name of the signature. This trick takes advantage of the fact that Cabal will not report that a module is redundant unless it is actually used. So, suppose we have:

Separate packages

OK, so how do we actually scale this up into an ecosystem of indefinite packages, each of which can be used individually and maintained by separate individuals? The library stanzas stay essentially the same as above; just create a separate package for each one. Rather than reproduce all of the boilerplate here, the full source code is available in the multiple-packages branch of backpack-regex-example.

There is one important gotcha: the package manager needs to know how to instantiate and build these Backpack packages (in the single package case, the smarts were encapsulated entirely inside the Cabal library). As of writing, the only command that knows how to do this is cabal new-build (I plan on adding support to stack eventually, but not until after I am done writing my thesis; and I do not plan on adding support to old-style cabal install ever.)

Fortunately, it's very easy to use cabal new-build to build regex-example; just say cabal new-build-wghc-headregex-example. Done!

Conclusions

If you actually want to use Backpack for real, what can you do? There are a number of possibilities:

If you are willing to use GHC 8.2 only, and you only need to parametrize code internally (where the public library looks like an ordinary, non-Backpack package), using Backpack with internal libraries is a good fit. The resulting package will be buildable with Stack and cabal-install, as long as you are using GHC 8.2. This is probably the most pragmatic way you can make use of Backpack; the primary problem is that Haddock doesn't know how to deal with reexported modules, but this should be fixable.

If you are willing to use cabal new-build only, then you can also write packages which have requirements, and let clients decide however they want to implement their packages.

Probably the biggest "real-world" impediment to using Backpack, besides any lurking bugs, is subpar support for Haddock. But if you are willing to overlook this (for now, in any case), please give it a try!

Michal: Well, we can definitely do the ASCII tree view. Perhaps some sort of force directed layout would be possible too (would give a bit more flexibility than straight GraphViz; I experimented a little with drawing in GraphViz but it was really clunky).

Unfortunately, Backpack does not solve the orphan instance problem; if you don’t define the instance in T or C, Backpack is not going to let you define the instance in a non-orphan way. What Backpack *can* do is let you make use of an instance without defining it (put the instance in a signature).

I’m not suggesting to define an orphan instance, I’m saying can you define “instance C T” in P1 (which is not orphan, as T is defined in P1) without a dependency on P2 (which defines C). Can you just define P2’s interface in P1 without importing it directly?

As you suggested, the Binary type class is now encapsulated in the Data.Binary signature, without having a dependency on binary. It should also be clear that to actually write the instance, you ALSO have to put enough signatures in Data.Binary to support all the functions you need. That shouldn’t be too surprising.

Now the crux of the matter is, can we use MyAwesomeType without incurring a dependency on Data.Binary? Maybe. But if we want to run our code, we have to fill in the signature some how. So we could make some bogus module implementation that fills everything with undefined. But we don’t have any way to stop people from trying to use the Binary instance, which always exists even if we didn’t really want it.

So… the good news is that you can define your library without a dependency on binary. The bad news is, without a preprocessor, your library ALWAYS claims to implement the instance in question.

I don’t understand exactly how backpack works, but the idea is that there is two types of users of my data type:

1. Those who want to use the data type “T” and also the class “C”. To do so, they will have to import both P1 and P2. In this case the signatures have implementations in scope.

2. Those who just want to use “T” but not “C”. In which case they’ll never use the instance “C T” (they can’t as they haven’t imported C). In this case, they won’t have to build the package P2 and all it’s dependencies.

For example, my data type might be “MonoFoldable”. But I don’t want to depend on “mono-traversable” because most of my users don’t even know what “mono-traversable” is and its not essential to the functioning of my package. I’d just like to add an instance for users that do use “mono-traversable” as a convenience, without forcing a dependency. So I create a signature, and if they use “mono-traversable” then there’s the implementation, and if they don’t it doesn’t matter as they can’t use that instance anyway.

Now, if you want to say, “I want to use MyThing, but I don’t care about Show”, you have to ALSO make sure prettyPrint doesn’t get used, because it’s using a function ‘showList’ which will only be available if you give a real implementation for ‘Show’.