Inside NetBSD's CGD

NetBSD is well-known for its portability, but since the release of NetBSD 2.0, the project has also included tons of interesting and unique features. While waiting for the upcoming 3.0, Federico Biancuzzi interviewed Roland Dowdeswell, the author of the Crypto-Graphic Disk system. This is a must-read for any laptop owner (and paranoid androids)!

Could you introduce yourself?

Roland Dowdeswell: By day, I work for a major financial institution in New York. In the past, I have also consulted extensively and, like most IT people of my age, co-founded a startup in Silicon Valley. I've been involved in the NetBSD Project for a number of years, but honestly I devote a vast majority of my time to my job and don't contribute as much as I might like to.

What is CGD?

RD:CGD (PDF) is an encrypted pseudo disk driver. It sits on top of another disk or partition and presents a new virtual disk to the rest of the operating system. Each disk write (read) to the pseudo disk is encrypted (decrypted) and then performed on the underlying disk. The new disk can then be partitioned and used for any purpose that a normal disk can. This is the same approach as used by many software RAID solutions.

Why did you choose to create CGD?

RD: Like many open source projects, I created CGD because I wanted it myself. At the time, the only BSD-licensed alternative was OpenBSD svnd driver, which did not take steps to frustrate dictionary attacks and only supported a single cipher (blowfish). So I decided that I should write my own.

Did anyone finance the development?

RD: No, I took a little time off from consulting to do it.

How does it work from a user standpoint?

RD: First, you set up the parameters file. This file contains the information that cgdconfig (PDF) needs to configure the disk, including the encryption method, the key generation procedures, etc.:

# cgdconfig -g aes-cbc 256 > /etc/cgd/wd0f

The default key generation method is PKCS#5 PBKDF2, which generates a key from a pass phrase. It is also possible to setup n-factor authentication or simply store the key in the file.

Once you have the parameters file, then you need to configure the disk for the first time:

The -V re-enter flag tells cgdconfig to prompt for the pass phrase twice and check that they are the same. In general usage, cgdconfig will check that the disk contains either a valid disklabel or filesystem with the provided key. Of course, the first time there will be neither of these.

At this point, you have a disk, /dev/cgd0[a-h], which you set up however you want; for example, label the disk and create filesystems. Once the disk is properly set up,

# cgdconfig -u cgd0

will unconfigure the disk and

# cgdconfig cgd0 /dev/wd0f

will configure it.

NetBSD's rc system supports configuring CGD disks on boot by adding them to /etc/cgd/cgd.conf.

If you want to provide a different pass phrase for another system administrator, you can create a different parameters file from the first one, which will decrypt the disk with a different pass phrase:

# cgdconfig -G old_params > new_params

This method can be used to change the pass phrase, if old_params is securely erased.

If I'm already using a disk full of data, how can I encrypt it? Should I use a new empty slice and then move data there?

RD: Yes. To encrypt an existing partition, you will need to create a new, encrypted partition and move the data to it.

How much space does the crypto use? Does it grow with data, or is it fixed based on disk size?

RD: The current encryption modes that CGD implements do not change the size of the data, beyond the storage size of the parameters file, which specifies how the disk in encrypted and how the key is generated. This is generally a few hundred bytes.

What steps does CGD use to encrypt information, and which does it use for the inverse path?

RD: First, cgdconfig will read the parameters file to obtain configuration parameters such as the key generation method, the encryption algorithm, the key length, and the Initialization Vector generation method.

The default key generation method is PKCS#5 PBKDF2, which uses a pass phrase. PCKS#5 PBKDF2 is a salted iterated hash. The salt prevents offline dictionary attacks, so you cannot precompute a dictionary without having access to the parameters file. Iterating makes each guess in a dictionary attack more expensive. The iteration count is defaulted to make the algorithm take 2 seconds on the current hardware.

The currently supported encryption algorithms are AES (128-, 192-, and 256-bit keys), blowfish (40-448-bit key), and 3DES (192-bit key). Providing multiple ciphers allows the user to make trade-offs between security and performance. In the case that weaknesses in a cipher are discovered, the user can switch to another cipher without needing to upgrade to a new version of the operating system.

Each sector is encrypted independently. To encrypt a sector:

An Initialization Vector is generated via the configured IV method.

The sector is encrypted using the configured encryption algorithm with the IV generated from step 1.

Decryption operates in the same way.

Once I have an encrypted slice, can I switch cipher or disable cgd?

RD: There is currently no way to change the encryption type of an existing partition in place. The procedure to accomplish this is to create a new partition and move the data to it. I have been considering writing a tool to do this, though.

Does it interact in a special way with hardware/software RAID?

RD: No.

Does it take advantage of cryptography hardware such as accelerators and random number generators?

RD: Not yet, but I've been planning to do this at some point.

Is there a RAM/CPU minimum requirement?

RD: There is no minimum RAM or CPU requirement, but one will find that slower CPUs will impose a larger performance penalty.

How much does it limit I/O performance?

RD: It depends on the speed of the CPU versus the speed of the disks. There are two main effects. The maximum throughput can be limited by the CPU speed. Also, CGD will impose an additional latency on each disk operation, since the encryption must occur before writes and the decryption must occur after reads.

For laptops with Speed Step, CGD's performance will differ greatly in the different modes.

I think that every laptop owner is terrified by the idea of having it stolen. If such a bad event happens, what type of protection does CGD offer against an attacker with physical access to the disk?

RD: If the laptop is off, then the attacker would have to break the cryptography either by performing a dictionary attack on your pass phrase or by trying to brute-force the key.

A dictionary attack is attempting to guess the pass phrase which is used to generate the key that encrypts the disk. It is generally the most feasible attack on any cryptographic system, since users do not choose pass phrases which are nearly as secure as modern ciphers. CGD uses PKCS#5 PBKDF2 (an iterated salted hash) to frustrate dictionary attacks. What this does is increase the computation required to guess each pass phrase. CGD also provides a mechanism for 2-factor authentication where additional key material can be stored on a USB device.

A brute-force attack on the key is guessing the key to the cipher. This should be infeasible for all of the ciphers that CGD supports: AES 128, AES 192, AES 256, 3DES, and Blowfish. All of these have keys that are over 100 bits long and would take an average of over 2^100 guesses.

If the laptop is stolen while it is still on or in suspend mode, then an attacker might be able to retrieve the key from memory.

You said, "CGD also provides a mechanism for 2-factor authentication where additional key material can be stored on a USB device." How does it work?

RD: When setting up a CGD device, one creates a parameters file. This file contains all sorts of information, including the encryption types, the key lengths, and the methods for generating keys. Most of the time, one will generate a key from a pass phrase using PKCS#5 PBKDF2, but one can specify more than one key generation method. If multiple methods are specified, then the results of each will be XORed together to produce the final key. So, to use 2-factor authentication, what one does is specify two key generation methods; one will be a pass phrase based method such as PKCS#5 PBKDF2, and the other will be a simple key, which is just stored in the parameters file. Now, you just put the parameters file on a USB device. And without the USB device, knowledge of the pass phrase does not yield the resulting key.

What happens if I lose my key? What if I forget my pass phrase?

RD: If you lose your key or forget your pass phrase, then you have lost your data. You are in the same position as the person who stole your laptop. If this is a concern, then you should back up your data (securely, of course.)

cgdconfig provides multiple methods to verify the pass phrase. It can scan for a valid disklabel or a valid FFS filesystem. I'm wondering how many probabilities are there that an invalid password would generate a valid disklabel/filesystem?

RD: The probabilities that an invalid password would verify correctly are actually quite small. Let's consider the disklabel. Currently, we validate four things: the 32-bit magic number, which is repeated twice; the 32-bit checksum; and the number of partitions. The number of partitions is stored in a 16-bit integer but is limited, depending on the architecture, to a maximum of 22. This means that the upper 11 bits are zeros. So, we validate the values of 32 + 32 + 32 + 11 = 107 bits.

If we assume that typing the wrong pass phrase will result in an essentially random block, then the chances that the disklabel will pass the sanity test is 1/2^107. This is a reasonably small number, approximately 0.00000000000000000000000000000000616.

We could validate many more bits in the disklabel to get even more certainty; e.g., the sector size, the number of sectors per track, the partition placements, etc. There is more than enough known data in a disklabel that it is unlikely that there exists an incorrect pass phrase that would generate a sane and valid disklabel.

A similar analysis could be performed for the FFS superblock.

Is the code easily portable to other OSes?

RD: Most of the CGD code is in the userland configuration utility, which should be quite easy to port. Kernels tend to have larger differences in APIs and behaviors, so the kernel code would be slightly more difficult to port. CGD has been ported to OpenBSD 3.2 by Ted Unangst, but it was not integrated into the main OpenBSD tree.

Do you know any alternative software that you consider better or that includes features missing in cgd?

RD: CGD and encrypted pseudo disks in general such as OpenBSD's svnd, FreeBSD's GBDE, or Linux's Loop-AES address a particular threat model. Encrypted filesystems offer an answer to a different threat model and contain different trade-offs. For some use cases, the use of an encrypted filesystem such as CFS or nCryptfs might be a better choice. But that said, it is important to understand what the trade-offs are. With CFS and nCryptfs, you gain the ability to have different keys for different users at the expense of exposing a fair amount of the filesystem metadata. Per-user encryption can be important over NFS, but on a laptop it is not important. Even on a server with locally connected disks, one must keep in mind that root can just read any user's keys out of memory--so an encrypted filesystem will not protect you from root, at least while you are logged in.

OpenBSD didn't import CGD even if Ted Unangst wrote a port some time ago. Do you think OpenBSD's svnd is already offering the same features?

RD: In a sense, OpenBSD's svnd appears to offer some of the same features as CGD. Before I developed CGD, I examined svnd and determined that it has a number of deficiencies.

The biggest drawback of svnd is its lack of security in the
general use case. It is vulnerable to an offline dictionary attack. That is,
you can generate a database mapping known ciphertext blocks on the disk back
into pass phrases that can be accessed in O(1) without even being in possession
of the disk. What's even worse is that the same database will work on any
svnd disk. It is possible--and perhaps even likely--that large
agencies such as the NSA have constructed such a database and can crack a
majority of the svnds in the world in less than a second. The way
that one prevents an offline dictionary attack is to use a salt in conjunction
with the pass phrase, and this is what I did when I wrote CGD by using PKCS#5
PBKDF2. Offline dictionary attacks have been well-known since at least the
'70s, and salting the pass phrase has been standard practice for over
30 years.

OpenBSD's solution only supports Blowfish, whereas I wanted to ensure that CGD had the flexibility to support a small range of ciphers. This is important for a number of reasons, but mainly we want to provide our users with the ability to make cost-versus-risk decisions. Blowfish is fast, but probably less secure than AES. In some situations, users will decide that speed is more important than security, and in others the reverse will be true. Also, if security issues are discovered in one cipher that we support, then users can change their CGDs to use one of the other ciphers without needing to upgrade to a new version of the operating system. Blowfish also has a cipherblock size of 64 bits, which for sufficiently large disks might be small enough to allow some level of structural analysis.

How does CGD compare to Linux's Loop-AES?

RD: I first looked at Linux's cryptoloop, which has all of the problems of OpenBSD's svnd and more. Loop-AES addresses these issues in a very different way than CGD does, but from a quick analysis it appears to solve the issues.

What type of differences do you see between CGD and FreeBSD's GBDE?

RD: FreeBSD's GBDE was developed at roughly the same time as CGD, although neither GBDE's author nor I was aware of the other's work. In an accident of fate, I think that I ended up committing CGD about a fortnight before GBDE. We took very different approaches and I was quite interested in the code once it was committed. I quite liked one of the features in GBDE, allowing multiple different pass phrases to decrypt the disk, so I implemented that functionality (although slightly differently in CGD).

GBDE is vulnerable to online dictionary attacks if its 2-factor authentication mechanisms are not used or if the second factor is somehow obtained. This is not quite as serious as the offline dictionary attack that I mentioned in OpenBSD's svnd. An online dictionary attack must be performed separately for each different drive encrypted with GBDE. This is a rather serious disadvantage, as many FreeBSD users must be using GBDE without 2-factor authentication, falsely believing their security to be substantially more impressive than it actually is. And as I noted in my answer about OpenBSD's svnd, this is a solved problem. After even pointing this out to GBDE's author a number of months ago, it has not been addressed.

Another issue with GBDE is that it violates the atomicity assumptions that filesystems have about disks. When you write a sector to the disk, GBDE actually performs two distinct writes, in between which the disk is in an unrecoverable, indeterminate state. If the power goes out between these two operations, then the sector will be lost and you cannot recover it--at least not without cracking AES. GBDE is not resilient to either power outages or OS crashes of any variety. This would give me serious reservations about using it in a production environment.

FreeBSD recently introduced GELI. I have only had a chance to have a quick look at it, so I may be missing something, but so far it looks much better. It is simpler and therefore easier to analyze, and unlike GBDE it uses cryptographic techniques in standard, well-understood ways. It also doesn't appear to have the transactionality issues I just mentioned.

What improvements or new features do you plan to add in the future?

RD: I've been planning for a while to add hardware crypto
support to CGD. Also, I've been thinking of adding another IV Method, since
there is a modular framework for them and one defined. Steven Bellovin has
suggested ways that we might be able to relatively cheaply provide integrity
checking, which sounds interesting--but it would involve maintaining a
transaction log, so it would impact write performance substantially. Another
feature that I've been considering would be to add support for creating CGDs
to our installer.

Federico Biancuzzi
is a freelance interviewer. His interviews appeared on publications such as ONLamp.com, LinuxDevCenter.com, SecurityFocus.com, NewsForge.com, Linux.com, TheRegister.co.uk, ArsTechnica.com, the Polish print magazine BSD Magazine, and the Italian print magazine Linux&C.