Linux Jail Packages

Introduction

This article on Linux jails continues our series of articles on isolation and integrity techniques, which we outlined in the first entry. In the previous article, we reviewed several security building blocks found in the Linux Kernel; in this article, we will combine them together in a more accessible and user-friendly way, using the Linux jail packages, Firejail and Minijail. To recap, the series as a whole focuses on embedded Linux and Internet of Things (IoT) devices, and describes the security techniques we can use to isolate the services and applications running on them, so that attackers who take over a network-facing service do not end up compromising the entire device.

Normally, when a process or program is launched on Linux, it is given all of the abilities that its user has. In the case of the root user, this means unrestricted access and control of the system. If a program is compromised, the attacker inherits all of the launching user privileges and abilities, often enabling them to take over the rest of the system.

Most applications do not require all or even most of the user's abilities to run. The purpose of a jail package like Firejail or Minijail is to create a restricted environment, known as a "jail" or a "sandbox", that limits the abilities of the target program to only those that it needs. Inside the sandbox, the program can only "see" the necessary system resources such as files, processes and network traffic. Everything outside of the sandbox is invisible to the program and cannot be modified or read.

In the event that the sandboxed program is compromised, the attacker is only able to access the resources in the sandbox, protecting the rest of the system from abuse. If a specific process is isolated, it is harder (or in some cases, impossible) to leverage it for further privilege escalation, lateral movement, or sensitive data leak attacks.

We have reviewed several isolation methods before, including control groups (cgroups), namespace isolation, the kernel capabilities feature, and seccomp. Now, we will present easier ways to configure and deploy them in combination, through jail packages. We choose to demonstrate Firejail as the more user-friendly jail package.

Firejail

According to its website, Firejail is a community project built by a team of volunteers, and is not affiliated with any company. It's a good example of an easy to use isolation program. Many Linux distributions, such as Debian, Ubuntu, and Raspbian, offer an official installation package for Firejail in their repositories, so installing it is just a matter of a few shell commands. Firejail comes with built-in profiles for several hundreds of widely used programs, and if you need to create or customize a profile for your application, it's easier than working with AppArmor or SELinux. We also found Firejail easier to integrate with systemd services, and therefore chose to focus on it in this guide.

Minijail

Minijail is developed and maintained by Google, and its source code is found on Google's Open Source Code Repository. You have to build it from source. Google use it to sandbox programs and critical system services in Android, Chrome OS, and in their server farms and software testing infrastructure. In fact, anywhere that Google is running a system that uses a Linux Kernel, they employ Minijail to sandbox and secure these programs and processes. Examples of the processes that Google tries to sandbox are the Bluetooth daemon, the network manager, the DHCP daemon, and other processes that usually have unrestricted root system privileges.

Jail Security Mechanisms

Jails use several Linux security tools to create secure sandboxes. These sandboxes do not require a complete or partially separated operating system which virtual machines and containers use, but rather, they apply the standard security mechanisms that are available in the Linux Kernel. This means that no special daemon, process, or kernel module is required to create the sandbox, which in turn means the performance penalty is very small and no unsupported kernel modules are used.

Linux Namespaces

Namespaces are a way to abstract a system resource so that it appears to a process running within that namespace as its own isolated and unique instance of that resource. A process running in a namespace has no access to any resources that fall outside of its namespace. Jails use the following namespaces:

Mount Namespace - Firejail uses the mount namespace to create a temporary file system for each sandbox. This temporary filesystem is populated by using overlay filesystems and bind-mounting directories into it to give the jailed process a restricted view of the system filesystem. Through the use of blacklists and whitelists, the jailed process has parts of the system filesystem set as either hidden or read-only. The use of the mount namespace means that a compromised process is not able to read or alter any sensitive files that fall outside of its normal operations.

IPC Namespace - The IPC or Inter-Process Communication namespace limits kernel messaging to processes within the same namespace. A sandboxed process is unable to message processes outside of its namepsace.

Network Namespace - This namespace creates a private network environment for the process. This means that the jailed process is able to connect to the internet but is unable to eavesdrop on any network data bound for a process outside of its namespace.

PID Namespace - The Process ID namespace creates a new PID 1 and process tree that only contains the processes belonging to the sandboxed program. The sandboxed program is unable to interfere with any other process outside of the namespace.

Linux Capabilities

Before the Linux Kernel version 2.2 , a process with root privileges had no restrictions on what it was able to do. The 2.2 kernel introduced capabilities which split each of root's abilities into a set of separately named and controlled capabilities. This means that when we start a root process, we can choose which of root's abilities are assigned to it. Jails allow us to configure a sandbox that, when it contains a root process, is only given the abilities that process needs to run.

Seccomp with BPF

Seccomp is a contraction of SECure COMPuting, and BPF stands for Berkeley Packet Filter. Seccomp is a way to filter and block the system calls that a process is able to make. System calls are how a process interacts with the system kernel. They may be hardware-related, like accessing RAM, or software-related, like creating new processes. The default for a root process on Linux is to be able to issue all available system calls. The BPF filter, which is a program that lives in the kernel, is attached to a process and prohibits any system calls that the process does not require to run. This reduces the means available for an attacker who has compromised the process to extend their control over the system, and can also deny some local kernel privilege escalation attacks.

Linux MAC Integration

Linux MAC (Mandatory Access Control) systems limit a process or program's access to system resources such as files, memory objects, and network data. The most well-used MACs on Linux are SELinux and AppArmor. Firejail is able to work in SELinux and AppArmor environments and provides an integration configuration file for AppArmor. This article will not cover using Firejail with SELinux or AppArmor.

Prerequisites

Now that we have looked at the security techniques employed by jails, we will move on to provide a detailed guide with instructions for the installation and configuration of a custom sandboxed process. We will use Firejail for the examples, but the configuration for Minijail is very similar, as it uses the same building blocks, just with different command line switches.

As explained below, some of the steps in this guide will vary by Linux distribution or kernel version. To complete this article with practical guidance, we chose the Raspberry PI Model 3B running Raspbian Stretch Lite as the example device. This is a common example of a Linux IoT device, but in general Firejail and Minijail can run on any system with the following properties:

x86 64bit and 32bit (x86_64 and i686)

ARMv7 and later (Raspberry PI 3)

Kernel Version 3.x or newer

Firejail does not require any Linux packages that are not available on almost any Linux system. After you install Firejail on your system, you should be able to follow this guide to create a Firejail sandbox.

You would need to log into your Raspberry PI as a user with sudo privileges to start working through this guide.

Installation

The Firejail developers provide three versions of Firejail: development, mainline, and LTS (long term support). Firejail mainline is available from the default Raspbian repositories. However, this version has many dependencies, including a full X11 desktop stack which is not required on a headless Raspberry PI. We will install the LTS release - the Firejail developers recommend this version for enterprise deployment because it contains only the core features, and is more stable and more secure.

We will install Firejail from source code, which can be obtained using Git. For this we need to install the git command line tool and the tools needed to compile the source code. The following command will install the packages:

Next, we are going to download the Firejail source code using Git from root's home directory:

$ git clone https://github.com/netblue30/firejail.git
$ cd firejail/

Finally, we need to compile source code into a working program and install it with the following command:

$ ./configure
$ make
$ sudo make install-strip

Firejail is now installed and ready to use. Next, we will look at how to use Firejail.

Using Firejail

The purpose of this guide is to demonstrate how to use Firejail to run a process in a security sandbox. We will do this by walking through the process for an example network-connected program. We will use lighttpd because it is an internet connected web server and so should be similar to any program that you may want to sandbox.

If you want to install lighttpd to follow along with the examples below, install it with the following command:

$ sudo apt install lighttpd

The syntax for creating a sandboxed process with Firejail is as follows:

firejail [OPTIONS] [program and arguments]

We can create a sandbox for lighttpd that uses Firejail's default options with the following command:

$ sudo firejail lighttpd -D -f /etc/lighttpd/lighttpd.conf

The lighttpd arguments that we have used mean as follows:

-D - Stay in the foreground. We have to use this option so that the lighttpd process does not create a new process in the background that runs outside the sandbox.

-f /etc/lighttpd/lighttpd.conf - This option specifies which lighttpd configuration file to load. Here we have used the default configuration file.

When you run this command Firejail prints the configuration that it applies to the sandbox:

This information will be useful to ensure that Firejail is running the custom configuration that we will create later in this guide. The process will not return you to the prompt because we used the -D option with lighttpd. You need to use CTRL+C to kill the Firejail and lighttpd processes and regain control of the shell.

This command will start the lighttpd process inside a Firejail sandbox from a command prompt. It is also possible to start a sandboxed program using systemd. Instructions on how to do this are in the last section of this guide.

When a program is started with the default options, the sandbox provides an elevated level of security compared to the non-sandboxed program.

/root/ and /home/user are mounted as temporary files systems. All changes written here are discarded when the sandbox is stopped.

Firejail provides the tools to discover what Firejail sandboxes are running and also shut them down. If you run the sandboxed lighttpd command above and then, in a different terminal, run firejail --list, you will see the following:

The second line is the output of firejail --list where it reports the PID and program name. You can use this information to shut down the process by using the PID of the sandbox (in the above output, the PID is 23222):

$ sudo firejail --shutdown=23222
Sending SIGTERM to 23222

This sandbox uses only the default settings. These are the easiest to set up but also the least secure because the default sandbox enables system calls and capabilities that are not used by lighttpd. Each of the steps outlined below will increase the security of the sandbox by limiting these resources to only those that lighttpd needs to run.

Now that we can use Firejail with the default settings, we can move on to create a customized sandbox.

Creating a Custom Profile

A Firejail profile is a configuration file that contains all the settings that will be applied to a program's sandbox. A Firejail installation with several hundred pre-configured profile files, located under/usr/local/etc/firejail/. Additional profiles can be installed easily without downloading the source code, using your distribution's package manager - see for example the Debian firejail-profiles package:

There is no pre-configured profile for lighttpd however, so we will need to create one.

The recommended way to begin a custom profile is to copy the generic default and customize it. If you take a look at the output when you ran lighttpd with Firejail, you can see that the first configuration file that Firejail loads is /usr/local/etc/firejail/server.profile. This is the generic default that we will use to create a customized profile.

When a program is sandboxed with Firejail, Firejail checks for the existence of a matching profile in ~/.config/firejail. We will put our lighttpd configuration file under that directory. This ensures that it will not get overwritten if a lighttpd profile is supplied in a future release of Firejail.

First, create the Firejail configuration directory:

$ sudo mkdir -p ./root/config/firejail/

Now, copy this generic server.profile profile into .config/firejail, giving it the name lighttpd.profile, with the following command:

Before we start editing this profile, we need to check that we can start lighttpd with the new profile we just created. This is also how we will start lighttpd after making any changes to the lighttpd.profile file, so we can test them.

The command to start a sandbox for lighttpd using the custom profile is as follows:

The first Reading profile line indicates that we are now using the custom profile.

If lighttpd is unable to start, then you will be immediately dropped back to the command prompt, with the following line printed out after the same output:

Parent is shutting down, bye...

This will also happen if lighttpd crashes during operation because it required access to a resource that Firejail is blocking.

If lighttpd is able to start, then the process will pause at:

Child process initialized

until you stop it.

Once lighttpd is running, you should check that lighttpd is working as expected, i.e. as a webserver, by opening a browser and browsing to the IP address of the Raspberry PI. You should see the lighttpd default page that lighttpd loads before a website has been configured. Now that we have a working baseline profile, we can start to customize it.

The process that we will use to customize the profile is the same process that we will use to create custom capabilities. We will make a change to the configuration file, save it, and then attempt to start lighttpd. If the lighttpd crashes on startup or does not load a web page correctly, then we will know that the program requires the privilege that we have just removed, so we must re-enable it and test again.

This process is made much easier if you open two terminals to the Raspberry PI. In the first, open the profile file at /root/.config/firejail/lighttpd.profile with a text editor, and in the second, use firejail --profile=/root/.config/firejail/lighttpd.profile lighttpd -D -f /etc/lighttpd/lighttpd.conf to start and stop lighttpd.

Uncommenting a line (or adding a new one) in the profile will always remove an ability from the sandboxed program, making the profile more restrictive. This means you generally should not comment-out lines, as lighttpd functioned with those lines enabled, and they were making the sandbox more secure.

The man page for firejail-profile explains what each option does and is an excellent reference when creating a custom profile. You can view the man page with this command:

$ man 5 firejail-profile

While you are customizing the lighttpd.profile, the following line:

# netfilter /usr/local/etc/firejail/webserver.net

should not be uncommented unless you are sandboxing a web server, as the iptables rules that it contains only allow network traffic on ports 80 and 443. The network namespace section provides more information.

In the next section, we will restrict lighttpd to its own private network.

Enabling a Network Namespace

The default profile that we customized does not yet define a network namespace. A network namespace creates a virtual network interface that is used by the sandbox. This means that network traffic to and from the sandboxed program is isolated from all other traffic on the system. The program in the sandbox is not able to record or modify any network traffic bound for other programs outside of the sandbox.

We will enable the network namespace and attach it to a bridge interface so that the lighttpd sandbox can connect to the internet. The package that creates bridge network devices is not included by default in Raspbian, so we need to install it with the following command:

$ sudo apt install bridge-utils

Now that this package is installed, we need to create the bridge device and then set an iptables rule to forward traffic that arrives on the wireless or ethernet port to the sandbox.

NOTE: The following guidance assumes that your device is using iptables, rather than nftables, as its firewall.

The Firejail developers provide the following script to do this for us. You need to copy and paste the script shown below into a file under /root/.config/firejail/ with your favorite text editor. Here we will use nano:

The last line in the script forwards network traffic that arrives via one of the physical network interfaces on port 80 to the sandbox on port 80. You will need to edit and/or add additional lines for traffic arriving on other ports to get it forwarded. For example, if you also want to forward HTTPS traffic to the lighttpd sandbox, you will need to append the following line:

This lets you know that a network namespace has been applied to the sandbox and provides you with the network details that it is using.

Next, we need to set some iptables rules for the network namespace. We need these rules so that we can firewall the network traffic that arrives at the sandboxed program. In normal operation, lighttpd only needs traffic on ports 80 (HTTP), 443 (HTTP) and 53 (DNS). All other traffic can be dropped, because it's not expected. Firejail provides a file containing the required iptables rules for webservers, like lighttpd, to create this firewall.

Running this fill will delete the br0 network device and unset the forwarding rule. This will rest the network on your Raspberry PI to its original state.

We can now apply a network namespace to our sandboxed process and connect it to the internet. In the next section, we will create a custom list of Linux capabilities filters that we will apply to the lighttpd sandbox.

Creating a Custom Linux Capabilities Filter

Here we will create a custom whitelist of Linux capabilities that lighttpd needs to run. The capabilities that are not included in the list are blocked and not available in the sandbox.

Lighttpd must be started by the root user because it opens a network port below 1000 which only root can do. The presence of a root process in the sandbox means that its capabilities should be reduced to only those it needs.

Only processes that are run as root will benefit from a capabilities filter list. If you are using this guide to sandbox a process that does not require root privileges to run, then you do not need to enable a capabilities filter list and you can skip this section.

We are unable to determine beforehand which capabilities are required by the program. On an x86 architecture, the bcc-capable tool may help, but it did not run on the ARM architecture we used for testing. The only way to find out the set of capabilities is to start with the complete list, and remove capabilities one at a time to observe if lighttpd is able to start and run without each one.

We will create a capabilities whitelist on the lighttpd profile. The line that we need to add to the profile has the form:

caps.keep capability,capability,capability

We can generate a complete list of the capabilities supported on Raspbian with the following command:

$ sudo firejail --debug-caps > caps-list-full

This will output the fill list to a file called caps-list-full that looks like the following:

This command outputs 38 capabilities on a Raspberry PI/Raspbian Stretch system. The names of the capabilities are in the last column. You can use a text editor to put them into a single comma-separated line or use the one we created below.

First, open the lighttpd profile with a text editor:

$ sudo nano /root/.config/firejail/lighttpd.profile

Then, comment out the caps line. We are replacing it with a custom list.

Next, stop lighttpd and remove a capability. Save the file and then restart it. If lighttpd is not able to start or crashes during operation, that capability is needed. Leave it in the list and move on to the next one. Keep going until no more capabilities are left to try removing.

The following is what a startup crash looks like. Here the required setuid capability is removed:

After removing and testing all 38 capabilities, you will be left with only the following three for lighttpd:

caps.keep net_bind_service,setgid,setuid

Our lighttpd sandbox now has access to only the capabilities that it requires to run. Next, we will briefly examine why a custom system call filter list is not recommended for use in a production environment.

The Problems with a Custom System Call Filter

Firejail allows us to apply a custom filter of system calls to our sandbox just as we did the capabilities. This filter list limits the system calls that can be made from the program within the sandbox. This filter list can be customized so that only the system calls that lighttpd requires to run are allowed from within its sandbox.

A customized system call filter list makes the sandbox more secure, but it has a number of critical drawbacks.

The first problem is that the recommended tool for capturing the system calls that the target program makes is the Linux system tracing tool, strace. However, strace does not capture all of the systems calls that the program can potentially make. Some calls may only be made when specific features are in use, so a single run of the program does not guarantee exhausting the entire system call list for the program. As a result, a required system call may be omitted from the whitelist.

The Firejail documentation recommends checking the audit log to find any system calls that are blocked, as Firejail will log them there. This is not possible on a Raspbian because the kernel is not compiled with auditing support. This makes it very difficult to identify which system calls are being blocked and causing lighttpd to crash.

The second issue is that a system call filter list is not portable from the system it was created on. A filter list will only work on the exact system that it was created on.

The system calls required by the sandboxed program are dependent on:

The kernel running on the system.

The version of the program being sandboxed.

The versions of the libraries that the program uses.

A filter list cannot be created on an alternative distribution, such as a Debian Stretch or Ubuntu, and used on Raspbian. Any updates to the Raspbian installation that update the kernel, the target program, or any library used by the target program may change the required system calls, making the filter list unusable.

If an update causes the target program to use a new system call that was not included in the whitelist, then the program will either fail to start or crash during operation.

This does not mean that it is not possible to benefit from system call filtering. Firejail provides a configuration option to employ a default filter list that will work with almost all applications. It is not as restrictive and therefore not as secure as a custom filter, but it is portable and will usually work without modification.

The default filter is enabled by uncommenting the seccomp option in the profile file.

To investigate what system calls your program makes (even if the list is not complete), you need to use the strace command. This will start lighttpd and record a list of the system calls. When you kill the process with CTRL+C, a ranked list of the system calls will be printed to the terminal, e.g.:

The name of the system call that you need to use in the profile file is in the last, "syscall" column.

Now that you have a custom profile that allows the program to work, we will integrate Firejail with systemd. This integration will make managing the sandboxed program consistent with other services, and it will also be required for automated startups and shutdowns with the network namespace and the associated bridge service.

Integrating Firejail with Systemd

Systemd is the modern suite of tools for low-level system management, including starting and stopping programs like lighttpd. This integration will ensure that we can start, stop, restart, and enable lighttpd on boot without using non-standard Bash scripts or other workarounds.

Systemd manages programs with service files. These are files that contain the information that systemd uses to start and stop programs. We will replace the start-up command in the service file that was supplied by lighttpd with the Firejail command we have been using.

Instead of editing the supplied service file directly which may get overwritten in a future update, we will create a copy in /etc/systemd/system/ that will get used instead of the supplied file. There are two ways to do this.

The first way is to copy the lighttpd service from /lib/systemd/system/ into /etc/systemd/system/:

We can now start and stop the lighttpd in the Firejail sandbox using the standard commands, e.g.:

$ sudo systemctl start lighttpd
$ sudo systemctl stop lighttpd

If you don't need to create a second service file in order to choose between sandboxed and non-sandboxed operation, you can directly edit the custom service file in /etc/systemd/system by doing:

$ sudo systemctl edit --full lighttpd.service

Then, edit the file and save it as you normally would.

If you have also applied the network namespace described above, then we will also need to integrate the networking configuration. This will ensure that the bridge device is created when the Firejailed lighttpd is started with systemd and that it is removed when the program is stopped.

We do this by including two additional lines to the lighttpd.service file. These will run the network setup script when lighttpd is started and run the teardown script when it is stopped.

Our custom networking will now get automatically created and removed when lighttpd is started and stopped.

We now have a sandboxed and integrated program that we are able to use just like any other service on a Linux system.

Conclusion

We have now worked through the steps to create a restricted sandbox environment for an example web service. You can use all of these steps to create a sandbox for any other application that you want to work on in a limited and isolated environment, and on any other Linux distribution.

In theory, if successfully exploited by an attacker, a sandboxed application should remain isolated from the rest of the system. This is how jails thwart attempts to completely take over the device. In practice however, jails can be tricky to deploy, and require extensive study of the target program and careful configuration. Any mistake in the configuration, or an insufficiently restrictive profile, can expose a loophole through which the attacker can continue to spread their control through the device. In addition, if any kernel vulnerabilities are present on the system that the attacker can exploit from the compromised process, the attacker can break out of the jail completely.

As a counter to these issues, VDOO propose the combination of a real-time monitoring system with isolation features and active exploit blocking features. The VDOO ERA™ - Embedded Runtime Agent - has the benefit of being automatically configured based on a deep inspection of the device in question, creating a tightened, customizable security profile without the need for intricate configuration. The ERA whitelisting, integrity and protection features prevent and report known and unknown attacks, and stop exploits from spreading. Unlike sandboxing, ERA is also compatible with earlier Linux versions, as low as 2.6. This allows for an easy deployment experience with a robust protection layer.

Share this post

Written By

Leo Dorrendorf

Leo Dorrendorf is a security researcher with experience in the academy and the industry, including a diversity of topics from reverse-engineering and breaking to designing and implementing connected systems. Currently part of the VDOO security team, Leo deals with creating engines for automated threat modeling, binary scanning, and requirement generation which incorporates a growing number of standards from the world of embedded security.