Tuesday, November 29, 2011

On Nix, NixOS and the Filesystem Hierarchy Standard (FHS)

Our work on NixOS has been topic of discussion within various free software/open source projects and websites. For example, we have been discussed on the Debian mailing list and on Linux Weekly News.

One of the criticisms I often receive is that we don't comply with the Filesystem Hierarchy Standard (FHS). In this blog post, I'd like to give a response on the FHS and why NixOS deviates from it.

What is the Filesystem Hierarchy Standard?

The purpose of the Filesystem Hierarchy Standard (FHS) is to define the main directories and their contents in Linux operating systems. More specifically:

It defines directory names and their purposes. For example, the bin directory is for storing executable binaries, sbin is for storing executable binaries only accessible by the super-user, lib is for storing libraries.

It defines several hierarchies within the filesystem, which have separate purposes:

The primary / hierarchy contains essential components for the boot process and system recovery. The secondary hierarchy: /usr contains components not relevant for booting and recovery. Moreover, files in this directory should be shareable across multiple machines, e.g. through a network filesystem. The tertiary hierarchy: /usr/local is for the system administrator to install software locally.

Hierarchies and special purpose directories are usually combined. For example, the /bin directory contains executable binaries relevant for booting and recovery which can be used by anyone. The /usr/bin directory contains executable binaries not relevant for booting or recovery, such as a web browser, which can be shared across multiple machines. The /usr/local/bin directory contains local executables installed by the system administrator.

The FHS also defines the /opt directory for installing add-on application software packages. This directory can also be considered a separate hierarchy, although it is not defined as such in the standard. This directory also contains the same special purpose directories, such as bin and lib like the primary, secondary and tertiary hierarchies. Furthermore, also the /opt/<package name> convention may be used, which stores files specific to a particular application in a single folder.

It also makes a distinction between static and variable parts of a system. For example, the contents of the secondary hierarchy /usr are static and could be stored on a read-only mounted partition. However, many programs need to modify their state at runtime and store files on disks, such as cache and log files. These files are stored in variable storage directories, such as /var and /tmp.

It defines what the contents of some folders should look like. For example, a list of binaries which should reside in /bin, such as a Bourne compatible shell.

The standard has some practical problems. Some aspects of the filesystem are undefined and need clarification, such as a convention to store cross compiling libraries. Furthermore, the standard is quite old and newer Linux features such as the /sys directory are not defined in the standard. To cope with these issues, many distributions have additional clarifications and policies, such as the Debian FHS policy.

The LSB standard has been developed, because there are many Linux based systems out there. What they all have in common is that they run the Linux kernel, and quite often a set of GNU utilities (that's why the free software foundation advocates GNU/Linux as the official name for these kind of systems).

Apart from some similarities, there are many ways in which these systems differ from each other, such as the package manager which is being used, the way the file system is organized, the software which is supported etc.

Because there are many Linux systems available, the Linux Standards Base is designed to increase compatibility among these Linux distributions, so that it becomes easier for software vendors to ship and run Linux software, even in binary form. Many of the common Linux distributions such as Debian, SuSE and Fedora try to implement this standard, although it has several issues and criticisms.

NixOS and the FHS

In NixOS, we have an informal policy to follow the FHS as closely as possible, but to deviate from it where necessary. An important aspect is that we can't follow the structure of the primary /, secondary /usr and tertiary hierarchies: /usr/local.

The main reason to refrain from using these hierarchies is that they don't provide isolation. For example the /usr/lib directory contains a wide collection of shared libraries belonging to many packages. To determine to which package a particular library belongs, you need to have a look in the database of the package manager. Because most libraries can be found in the same location, e.g. /usr/lib, it is very tempting to forget specifying a dependency, because they can still be implicitly found.

The purpose of the Nix package manager is to achieve purity, which means that the build result of a package should exclusively depends on the source code and input parameters. Purity ensures that the build of a package can be reproduced anywhere and that a package can be transferred to any machine we want, with the guarantee that the dependencies are present and correct.

Nix achieves purity by using the filesystem as a database and to store packages in isolation in a special directory called the Nix store. For example:

/nix/store/r8vvq9kq18pz08v249h8my6r9vs7s0n3-firefox-8.0.1

The path above refers to the Mozilla Firefox web browser. The first part of the directory name: r8vvq9kq18pz08v249h8my6r9vs7s0n3 is a hash-code derived from all build time dependencies. By using this naming convention, components can be stored safely in isolation from each other, because no component shares the same name. If Firefox is compiled with a different version of GCC or linked to a different version of the GTK+ library, the hash code will differ and thus not interfere with another variant.

Because of the naming scheme of the Nix store and the fact that we don't use global directories to store binaries and libraries, the build of a Nix package will typically fail if a required build time dependency is omitted. Furthermore, it also restricts undeclared dependencies which may allow a build to accidentally succeed (and therefore also prevents incomplete dependency specifications).

Another deviation of the FHS, is not following the list of required binaries and libraries in the /bin, /lib etc. directories. For example, the FHS requires a Bourne compatible shell in /bin/sh, some mkfs.* utilities in /sbin and for historic reasons, a sendmail executable in /usr/lib.

In NixOS, apart from the /bin/sh executable (which is a symlink to the bash shell in the Nix store), we don't store any binaries in these mandatory locations because it is not necessary and it also makes builds impure.

Discussion

Now that I have explained why we deviate from the FHS on some points, you probably may wonder, how we achieve certain properties defined in the FHS in NixOS:

How to determine files for the boot process and recovery? Because we don't use hierarchies, files relevant for the boot process can't be found in /bin and /lib. In NixOS, we basically generate a small base system as a Nix component serving this purpose, which is stored as a single component in the Nix store.

How to share components? Because we have no secondary hierarchy, you can't share components by storing the /usr directory on a network file system. In Nix, however, the entire Nix store is static and shareable. Furthermore, it's even more powerful, because the hash codes inside the component names prevent different variants of NixOS to use the incorrect versions of a component. For example, you can safely share the same Nix store across 32-bit and 64-bit machines, because this parameter is reflected in the hash.

In fact, we already use sharing extensively in our build farm, to generate virtual machines to perform testcases. Apart from a Linux kernel image and an initial RAM disk, the components of the entire virtual network are shared through the host system's Nix store, which is mounted as a network file system.

How to find dependencies? Because we store all software packages in the Nix store, it is harder to address components in e.g. scripts and configuration files, because all these components are stored in separate directories, and aren't necessarily in the user's PATH.

I have to admit that this is inconvenient in some cases, however in NixOS we usually don't manually edit configuration files and write scripts.
NixOS is a distribution that is unconventional in this sense, because its goal is to make the entire deployment process model-driven. Instead of manually installing packages and adapting configuration files in e.g. /etc. In NixOS, we generate all the static parts of a system from Nix expressions, including all configuration files and scripts. The Nix expression language provides the paths to the components.

Improvements

Although we have to deviate from the FHS for a number of reasons, there are also a few improvements we could make in the organization of the filesystem:

The directory names and purposes within the hierarchies can be more consistently used within Nix packages. Essentially, you could say that every component in the Nix store is a separate hierarchy.

For example, I think it's a good thing to use /nix/store/<package>/bin for storing binaries, and /nix/store/<package>/lib for libraries. Sometimes installation scripts of packages do not completely adhere to the FHS and need some fixing. For example, some packages install libraries in the libexec directory, which is not defined by the FHS.

We could make an additional check in the generic builder of Nixpkgs, checking the structure of a package and to report inconsistencies.

For the variable parts of the system, we can adhere better to the FHS. For example, the current version of the FHS defines the /srv directory used for serving files to end-users, through a service (e.g. a web server). In NixOS, this directory is not used.

Because the FHS standard has some gaps, we could also define some additional clarification, like other Linux distributions do.

Conclusion

In this blog post I've explained the idea behind the Filesystem Hierarchy Standard (FHS) and I have explained why we deviate from it in NixOS. The main reason is that the organization of specific parts of the file system, conflict with important aspects of the Nix package manager, such as the ability to store components isolation and to guarantee correct and complete dependencies.

Furthermore, apart from the FHS, NixOS cannot completely implement other parts of the LSB either. For example, in the LSB a subset of the RPM package manager is defined as default package manager, which imperatively modifies the state of a system.

If we would implement this in NixOS, we can no longer ensure that the deployment of a system is pure and reproducible. Therefore, we sometimes have to break from the traditional organization and management of Linux systems. However, sometimes I get the impression that the FHS is considered a holy book and by breaking from it, we are considered heretics.

Related work

GoboLinux is also a Linux distribution not obeying the FHS, with a custom filesystem organization. They also have the vision that the filesystem can act as a database for organizing packages.

The Mancoosi project is also a research project investigating package management related problems, like we do. Their research differs from us, because they don't break away from the traditional filesystem organization and management of Linux systems. For example, a lot of their research deals with maintainer scripts of packages, which goal is to "glue" files from a package to files on the system, e.g. by imperatively modifying configuration files.

By keeping the traditional management intact, this introduces a whole bunch of challenges to make deployment efficient, reliable and reproducible. For example, they have investigated modeling techniques for maintainer scripts and system configurations, to simulate whether an upgrade succeed and to determine inverse operations to perform rollbacks.

In our research, we achieve more reliable deployment by breaking from the traditional model.