There is value in reproducible systems and therefore there is a massive push for such systems. The value comes from the fact that it is possible to verify system authenticity bit-for-bit along with the working of the system down to every individual bug by simply rebuilding the system again from scratch and then comparing with previous state of the computer. With census, people can confirm that they are not compromised in any way. Reproducible systems are also easier to reason about because they stay consistent between builds.

For DevOps, there is immense power in being able to verify that your compiled binary or system build is same as what other users are already running and it works exactly the same across computers down to every individual bug. In large deployments it can ensure that every computer is running the same software stack even on different cloud providers.

For academia, it's a massive advantage to you if your peers could reproduce your result. If a computer had any part in your research then giving your peers easy access to a reproducible environment where you're sure that your result could be reproduced is the first step towards gaining that advantage.

For casual users, you could be sure that no matter what happens, your heavily customised environment that you've build over time is just a few commands away. You can have one-shot installation of all your favourite programs. Your battle tested shell configuration, program settings, shortcuts etc. are available for a quick restore all the time.

A system is called reproducible if its final state is same for the same input across any number of builds. Individual information chunks like: installed program, configuration, user profiles or even wallpaper are all part of this "state" and a computer's state is completely (and only) defined by its memory content, which is all the bits & bytes the memory has. If you coalesce all the memory together, which includes: RAM, CPU registers, ROMs, disk storage etc. present on a computer then you have enough information to recreate the computer again. If you can save and restore this state, then you have the ability to recreate your computer setup anytime.

Arguably, you could do a low level copy of bits from the disk to get a raw disk image backup which would backup all of your content but it's impossible to reasonably manage such backups because every backup would require another disk which means you'll have to manage stack of disks made over time and it is also impossible to do partial restore from these backups. It's impossible to effectively version control raw disks images and it is not exactly a very transparent format because its content cannot be examined without special tools.

Until now, creating such systems was out of reach for most. But now, we finally have usable tools at our disposal to build such systems (at least on Linux). In this post I'm going to share with you how I setup my personal computer to be fully reproducible and show you how you can do it as well.

There are 3 steps in this process. Those are:

Reproducible base system setup

Reproducible user profile setup (optional)

Consistent application configuration

Setting the context

I use Nix/NixOS to build my base system and user profile. Nix gives you a reliable and declarative approach to defining your system state and application versions. You define your system in a human readable text file and then pass this file to Nix and it'll build you a whole Operating System based on the input. I chose Nix because having the system defined in a text files allows one to version control system state and having it in a human readable format means that it is easy to approach.

Nix has this concept of "expression" with which you define how to build any software and these expression are deterministic. You can read more about it at the Nix documentation. A collection of these expressions is called a "channel" and these channels are stored in the nixpkgs-channels repository. Every branch of this repository is a channel and every user on NixOS can use different channel because different channels serve different purposes.

I use Sane DotFiles Manager (SDF) to version control application configuration in my user profile. SDF provides a Git like CLI interface because it wraps Git so that you don't have to learn another language and will store your configuration in a Git repository which you can host on GitHub, GitLab etc.

[Shameless plug: I wrote SDF]

So, basically, I manage software with Nix and configuration with SDF.

1. Reproducible base system setup

I prefer to keep my base system on the stable channel because I want my base system to be super stable as one can maybe recover from a crashing userland but a borked, non-functional base is much harder to recover from. So, I use the latest stable channel (nix-18.09 as of writing) to build my base system. You can pin down your channel to a particular commit if you want to replicate software down to version numbers.

The configuration of my base system is in the shreyanshk/nixos-config repository and is open to public access. Read my previous article Declarative Configuration with NixOS on how to work with this configuration file. If you're going to start with this configuration, please update the users.extraUsers field according to your needs.

You need to setup partitions before you can setup this and these partitions are defined in the hardware-configuration.nix file in the repository. To install this base configuration you need:

"NixOS" labeled BTRFS partition where the operating system will be installed.

"EFI" labeled EFI partition with boot flag where the bootloader will be installed.

Please ensure that you have a working internet connection as this step will download a lot of data. You need to download the NixOS installer, boot with it and mount the partitions to install base system:

This will set you up with a bootable system and once it's finished, you will reboot into a base configuration that matches what I'm running on my computer.

2. Reproducible user profile setup (optional)

I prefer to use the latest software and so I use the nixos-unstable channel in my user profile. As the name implies, this is the bleeding edge channel with frequent updates compared to stable channels where updates are only pushed after running automated testing. Please note that unstable doesn't mean that it has "unstable" versions (like RC, beta builds etc.) of software but this channel goes through little NixOS specific testing.

As with base configuration, you can do version pinning here as well but I don't because I want the latest version available and without that my profile setup file looks like this:

3. Consistent application configuration

With application software in place, the configuration needs to be taken care of. An applications user specific configuration resides in the user's directory (in my case /home/shreyansh) so I only need to restore this directory. I manage my application configuration with SDF which allows me to easily backup and restore my configuration from a Git repository.

To restore my configuration, all that needs to be done is to open a terminal and execute:

Conclusion

With Nix/NixOS and SDF, you now have all the necessary tools + knowledge required to build your own reproducible system. Go ahead and try this in a virtual machine or if you're feeling lucky then go bare metal.