Wednesday, October 26, 2016

Push and pull deployment of Nix packages

In earlier blog posts, I have covered various general concepts of the Nix package manager. For example, I have written three blog posts explaining Nix from a system administrator, programming language, and a sales perspective.

Furthermore, I have written a number of blog posts demonstrating how Nix can be used, such as managing a set of private packages, packaging binary-only software, and Nix's declarative nature -- the Nix package manager (as well as other tools in the Nix project), are driven by declarative specifications describing the structure of the system to deploy (e.g. the packages and its dependencies). From such a specification, Nix carries out all required deployment activities, including building the packages from its sources, distributing packages from the producer to consumer site, installation and uninstallation.

In addition to the execution of these activities, Nix provides a number of powerful non-functional properties, such as strong guarantees that dependency specifications are complete, that package builds can be reproduced, and that upgrades are non-destructive, atomic, and can always be rolled back.

Although many of these blog posts cover the activities that the Nix package manager typically carries out (e.g. the three explanation recipes cover building and the blog post explaining the user environment concepts discusses installing, uninstalling and garbage collection), I have not elaborated much about the distribution mechanisms.

A while ago, I noticed that people were using one of my earlier blog post as a reference. Despite being able to set up a collection of private Nix expressions, there were still some open questions left, such as fully setting up a private package repository, including the distribution of package builds.

In this blog post, I will explain Nix's distribution concepts and show how they can be applied to private package builds. As with my earlier blog post on managing private packages, these steps should be relatively easy to repeat.

Source and transparent binary deployments

As explained in my earlier blog post on packaging private software, Nix is in principle a source-based package manager. Nix packages are driven by Nix expressions describing how to build packages from source code and all its build-time dependencies, for example:

The above expression (mc.nix) describes how to build Midnight Commander from source code and its dependencies, such as pkgconfig, perl and glib. Because the above expression does not specify any build procedure, the Nix builder environment reverts to the standard GNU Autotools build procedure, that typically consists of the following build steps: ./configure; make; make install.

Besides describing how to build a package, we must also compose a package by providing it the right versions or variants of the dependencies that it requires. Composition is typically done in a second expression referring to the former:

In the above expression (default.nix), the mc attribute imports our former expression and provides the build-time dependencies as function parameters. The dependencies originate from the Nixpkgs collection.

To build the Nix expression shown above, it typically suffices to run:

The result of a Nix build is a Nix store path in which the build result is stored.

As may be noticed, the prefix of the package name (jal99995sk6rixym4gfwcagmdiqrwv9a) is a SHA256 hash code that has been derived from all involved build dependencies, such as the source tarball, build-time dependencies and build scripts. Changing any of these dependencies (such as the version of the Midnight Commander) triggers a rebuild and yields a different hash code. Because hash codes ensure that the Nix store paths to packages will be unique, we can safely store multiple versions and variants of the same packages next to each other.

In addition to executing builds, Nix also takes as many precautions to ensure purity. For example, package builds are carried out in isolated environments in which only the specified dependencies can be found. Moreover, Nix uses all kinds of techniques to make builds more deterministic, such as resetting the timestamps of all files to 1 UNIX time, making build outputs read-only etc.

The combination of unique hash codes and pure builds results in a property called transparent binary deployments -- a package with an identical hash prefix results in a (nearly) bit-identical build regardless on what machine the build has been performed. If we want to deploy a package with a certain hash prefix that already exists on a trustable remote machine, then we can also transfer the package instead of building it again.

Distribution models

The Nix package manager (as well as its sub projects) support two kinds of distribution models -- push and pull deployment.

Push deployment

Push deployment is IMO conceptually the simplest, but at the same time, infrequently used on package management level and not very well-known. The idea of push deployment is that you take an existing package build on your machine (the producer) and transfer it elsewhere, including all its required dependencies.

With Nix this can be easily accomplished with the nix-copy-closure command, for example:

The above command serializes the Midnight Commander store path including all its dependencies, transfers them to the provided target machine through SSH, and then de-serializes and imports the store paths into the remote Nix store.

An implication of push deployment is that the producer requires authority over the consumer machine. Moreover, nix-copy-closure can transfer store paths from one machine to another, but does not execute any additional deployment steps, such as the "installation" of packages (in Nix packages become available to end-users by composing a Nix user environment that is in the user's PATH).

Pull deployment

With pull deployment the consumer machine is control instead of the producer machine. As a result, the producer does not require any authority over another machine.

As with push deployment, we can also use the nix-copy-closure command for pull deployment:

The above command invocation is similar to the previous example, but instead copies a closure of the Midnight Commander from the producer machine.

Aside from nix-copy-closure, Nix offers another pull deployment mechanism that is more powerful and more commonly used, namely: channels. This is what people typically use when installing "end-user" packages with Nix.

The idea of channels is that they are remote HTTP/HTTPS URLs where you can subscribe yourself to. They provide a set of Nix expressions and a binary cache of substitutes:

$ nix-channel --add https://nixos.org/channels/nixos-unstable

The above command adds the NixOS unstable channel to the list of channels. Running the following command:

$ nix-channel --update

Fetches or updates the collection of Nix expressions from the channels allowing us to install any package that it provides. By default, the NIX_PATH environment variable is configured in such a way that they refer to the expressions obtained from channels:

With a preconfigured channel, we can install any package we want including prebuilt substitutes, by running:

$ nix-env -i mc

The above command installs the Midnight Commander from the set of Nix expressions from the channel and automatically fetches the substitutes of all involved dependencies. After the installation succeeds, we can start it by running:

$ mc

The push deployment mechanisms of nix-copy-closure and pull deployment mechanisms of the channels can also be combined. When running the following command:

The consumer machine first attempts to pull the substitutes of the dependency closure from the binary caches first, and finally the producer pushes the missing packages to the consumer machine. This approach is particularly useful if the connection between the producer and the consumer is slow, but the connection between the consumer and the binary cache is fast.

Setting up a private binary cache

With the concepts of push and pull deployment explained, you may probably wonder how these mechanisms can be used to your own private set of Nix packages?

Fortunately, with nix-copy-closure no additional work is required as it works on any store path, regardless of how it is produced. However, when it is desired to set up your own private binary cache, some additional work is required.

As channels/binary caches use HTTP as a transport protocol, we need to set up a web server with a document root folder serving static files. In NixOS, we can easily configure an Apache HTTP server instance by adding the following lines to a NixOS configuration: /etc/nixos/configuration.nix:

As may be observed by inspecting the document root folder, we now have a set of compressed NAR files representing the serialized forms of all packages involved and narinfo files capturing a package's metadata:

In addition to the binary cache, we also need to make the corresponding Nix expressions available. This can be done by simply creating a tarball out of the private Nix expressions and publishing the corresponding file through the web server:

We can install our custom Midnight Commander package by pulling the package from our own custom HTTP server without explicitly obtaining the set of Nix expressions or building it from source code.

Discussion

In this blog post, I have explained Nix's push and pull deployment concepts and shown how we can use them for a set of private packages, including setting up a private binary cache. The basic idea of binary cache distribution is quite simple: create a tarball of your private Nix expressions, construct a binary cache with nix-push and publish the files with an HTTP server.

In real-life production scenarios, there are typically more aspects you need to take into consideration beyond the details mentioned in this blog post.

For example, to make binary caches safe and trustable, it is also recommended to use HTTPS instead of plain HTTP connections. Moreover, you may want to sign the substitutes with a cryptographic key. The manual page of nix-push provides more details on how to set this up.

Some inconvenient aspects of the binary cache generation approach shown in this blog post (in addition to the fact that we need to set up an HTTP server), is that the approach is static -- whenever, we have a new version of a package built, we need to regenerate the binary cache and the package set.

To alleviate these inconveniences, there is also a utility called nix-serve that spawns a standalone web server generating substitutes on the fly.

Moreover, the newest version of Nix also provides a so-called binary cache Nix store. When Nix performs operations on the Nix store, it basically talks to a module with a standardized interface. When using the binary cache store module (instead of the standalone or remote Nix store plugin), Nix automatically generates NAR files for any package that gets imported into the Nix store, for example after the successful completion a build. Besides an ordinary binary cache store plugin, there is also plugin capable of uploading substitutes directly to an Amazon AWS S3 bucket.

Apart from the Nix package manager, also Nix-related projects use Nix's distribution facilities in some extent. Hydra, the Nix-based continuous integration server, also supports pull deployment as it can dynamically generate channels from jobsets. Users can subscribe to these channels to install the bleeding-edge builds of a project.

NixOps, a tool that deploys networks of NixOS configurations and automatically instantiates VMs in the cloud, as well as Disnix, a tool that deploys service-oriented systems (distributed systems that can be decomposed in a "distributable units", a.k.a. services) both use push deployment -- from a coordinator machine (that has authority over a collection of target machines) packages are distributed.

Concluding remarks

After writing this blog post and some thinking, I have become quite curious to see what a pull deployment variant of Disnix (and maybe NixOps) would look like.

Although Disnix and NixOps suit all my needs at the moment, I can imagine that when we apply the same concepts in large organizations with multiple distributed teams, it can no longer considered to be practical to work with a centralized deployment approach that requires authority over all build artefacts and the entire production environment.