Subscribe

LibreSSL's PRNG is Unsafe on Linux [Update: LibreSSL fork fix]

The first version of LibreSSL portable, 2.0.0,
was released
a few days ago (followed soon after by 2.0.1).
Despite the 2.0.x version numbers, these are only preview releases and shouldn't be used in production yet,
but have been released to solicit testing and feedback.
After testing and examining the codebase, my feedback
is that the LibreSSL PRNG is not robust on Linux and is less safe than the OpenSSL PRNG that it replaced.

Consider a test program, fork_rand. When linked with OpenSSL, two different
calls to RAND_bytes return different data, as expected:

The problem is that LibreSSL provides no way to safely use the PRNG after a fork. Forking
and PRNGs are a thorny issue - since fork() creates a nearly-identical clone of the parent process,
a PRNG will generate identical output in the parent and child processes unless it is reseeded.
LibreSSL attempts to detect when a fork occurs by checking the PID (see line 122). If it differs from the last
PID seen by the PRNG, it knows that a fork has occurred and automatically reseeds.

This works most of the time. Unfortunately, PIDs are typically only 16 bits long and thus wrap around
fairly often. And while a process can never have the same PID as its parent, a process can
have the same PID as its grandparent. So a program that forks from a fork risks generating
the same random data as the grandparent process. This is what happens in the fork_rand program, which repeatedly
forks from a fork until it gets the same PID as the grandparent.

OpenSSL faces the same issue. It too attempts to be fork-safe,
by mixing the PID into the PRNG's output,
which works as long as PIDs don't wrap around. The difference is
that OpenSSL provides a way to explicitly reseed the PRNG by calling RAND_poll.
LibreSSL, unfortunately, has turned RAND_poll into a no-op (lines 77-81). fork_rand calls RAND_poll after forking,
as do all my OpenSSL-using programs in production, which is why fork_rand is safe under OpenSSL but not LibreSSL.

You may think that fork_rand is a contrived example or that it's unlikely in
practice for a process to end up with the same PID as its grandparent.
You may be right, but for security-critical code this is not a strong
enough guarantee. Attackers often find extremely creative ways to
manufacture scenarios favorable for attacks, even when those scenarios
are unlikely to occur under normal circumstances.

Bad chroot interaction

A separate but related problem is that LibreSSL provides no good way to use the PRNG from a process running inside a chroot jail.
Under Linux, the PRNG is seeded by reading from /dev/urandom upon the first use of RAND_bytes. Unfortunately,
/dev/urandom usually doesn't exist inside chroot jails. If LibreSSL fails to read entropy from /dev/urandom,
it first tries to get random data using the deprecated sysctl syscall, and if that fails (which will start happening
once sysctl is finally removed), it falls back to a truly scary-looking function (lines 306-517) that attempts to get entropy from sketchy
sources such as the PID, time of day, memory addresses, and other properties of the running process.

OpenSSL is safer for two reasons:

If OpenSSL can't open /dev/urandom, RAND_bytes returns an error code. Of course the programmer has to check
the return value, which many probably don't, but at least OpenSSL allows a competent programmer to use it securely,
unlike LibreSSL which will silently return sketchy entropy to even the most meticulous programmer.

OpenSSL allows you to explicitly seed the PRNG by calling RAND_poll, which you can do before entering the chroot
jail, avoiding the need to open /dev/urandom once in the jail. Indeed, this is how titus ensures it can use the PRNG from inside its highly-isolated chroot jail. Unfortunately, as discussed above, LibreSSL has turned RAND_poll into a no-op.

What should LibreSSL do?

First, LibreSSL should raise an error if it can't get a good source of entropy. It can do
better than OpenSSL by killing the process instead of returning an easily-ignored error code. In fact, there is already a
disabled
code path in LibreSSL (lines 154-156) that does this. It should be enabled.

Second, LibreSSL should make RAND_poll reseed the PRNG as it does under OpenSSL. This will allow the programmer to guarantee
safe and reliable operation after a fork and inside a chroot jail. This is especially important as LibreSSL aims to be a
drop-in replacement for OpenSSL. Many properly-written programs have come to rely on OpenSSL's RAND_poll behavior for safe
operation, and these programs will become less safe when linked with LibreSSL.

The presence or need for a [RAND_poll] function should be considered a serious design flaw.

I agree that in a perfect world, RAND_poll would not be necessary, and that its need is evidence of a design flaw. However, it is evidence of a design flaw not in the cryptographic library, but in the operating system. Unfortunately, Linux provides no reliable way to detect that a process has forked, and exposes entropy via a device file instead of a system call. LibreSSL has to work with what it's given, and on Linux that means RAND_poll is an unfortunate necessity.

Workaround

If the LibreSSL developers don't fix RAND_poll, and you want your code to work safely with
both LibreSSL and OpenSSL, then I recommend putting the following code after you fork or before you chroot (i.e. anywhere you would currently need RAND_poll):

In essence, always follow a call to RAND_poll with a request for
one random byte. The RAND_bytes call will force LibreSSL to seed the
PRNG if it's not already seeded, making it unnecessary to later open
/dev/urandom from inside the chroot jail. It will also force LibreSSL
to update the last seen PID, fixing the grandchild PID issue. (Edit: the LibreSSL
PRNG periodically re-opens and re-reads /dev/urandom to mix in additional entropy,
so unfortunately this won't avoid the need to open /dev/urandom from inside the
chroot jail. However, as long as you have a good initial source of entropy, mixing in the sketchy
entropy later isn't terrible.)

I really hope it doesn't come to this. Programming with
OpenSSL already requires dodging numerous traps and pitfalls, often by deploying
obscure workarounds. The LibreSSL developers, through their well-intended effort
to eliminate the pitfall of forgetting to call RAND_poll,
have actually created a whole new pitfall with its own obscure workaround.

Update (2014-07-16 03:33 UTC): LibreSSL releases fix for fork issue

LibreSSL has released a fix for the fork issue! (Still no word on the chroot/sketchy entropy issue.) Their fix is to use pthread_atfork to register a callback that reseeds the PRNG when fork() is called. Thankfully, they've made this work without requiring the program to link with -lpthread.

I have mixed feelings about this solution, which was discussed in a sub-thread on Hacker News. The fix is a huge step in the right direction but is not perfect - a program that invokes the clone syscall directly will bypass the atfork handlers (Hacker News commenter colmmacc suggests some legitimate reasons a program might do this). I still wish that LibreSSL would, in addition to implementing this solution, just expose an explicit way for the programmer to reseed the PRNG when unusual circumstances require it. This is particularly important since OpenSSL provides this facility and LibreSSL is meant to be a drop-in OpenSSL replacement.

Finally, though I was critical in this blog post, I really appreciate the work the LibreSSL devs are doing, especially their willingness to solicit feedback from the community and act on it. (I also appreciate their willingness to make LibreSSL work on Linux, which, despite being a Linux user, I will readily admit is lacking in several ways that make a CSPRNG implementation difficult.) Ultimately their work will lead to better security for everyone.

Hi, I'm Andrew. I'm the founder of SSLMate, an SSL certificate management service that lets you buy SSL certificates from the command line. I also develop open source projects like git-crypt and titus.

I blog here about a variety of technology topics, including security, devops, IPv6, and reliable programming. If you liked this post, consider checking out my other posts or subscribing to my Atom feed.

pthread_atfork() requires linking with libpthread, which a single-threaded program would not normally do. Otherwise, it's not a bad suggestion. Checking process creation time is a very interesting suggestion (not perfect due to clock changes but still pretty darn good). Can it be done without needing to read /proc, which wouldn't exist in a chroot jail?

Still, on top of everything LibreSSL does to automatically detect forks, it should still expose a way to explicitly reseed the PRNG in an OpenSSL-compatible way, since OpenSSL has made guarantees that certain functions will re-seed the PRNG, and there may be some scenarios where even the best automatic fork detection fails (imagine a program calling the clone syscall directly for whatever reason, in which case pthread_atfork handlers won't be called). Since LibreSSL is billed as a drop-in replacement for OpenSSL, you should not be able to write a valid program that's safe under OpenSSL's guarantees but not when linked with LibreSSL.

proc(5) says that the 22nd field of /proc/pid/stat, starttime, is clock ticks since since system boot, so this wouldn't be influenced by clock changes. Looks like the only way to get that value is to read /proc though.

The creators of LibreSSL have stated that the target platform is going to be OpenBSD, which does things quite differently from Linux. Although I haven't had a chance to test it myself, I would hazard a guess that these issues don't exist on that platform.

They indicated that a lot of work would be required to port LibreSSL to Linux due to the inherent lack of certain secure functions within Linux.

They haven't indicated that a lot of work is required to port LibreSSL to Linux. Most of the secure functions are actually pretty easy; they're entirely implemented in userspace and don't rely on any system specific or third party functionality. So it is a matter of bundling a copy of these functions with the portable distribution, and adding the appropriate checks in the configure system.

But you cannot get around the fact that Linux does not provide a reliable library-wrappable way to get entropy, except for the sysctl which is apparently deprecated. So they try do what they can. Is it good enough? Hard to say, but as it cannot really be worse than what OpenSSL did (if we for a moment ignore the bug with pid based fork checking).

You are right in this analysis and the LibreSSL developers are wrong, if not outright frighteningly incompetent in these matters. Much history and exploits show the necessity for making RAND_poll forcibly reseed with a good source of entropy. Their "truly scary function" is cargo cult cryptography.

RAND_poll is rarely used at all, and correct use is very uncommon. Sure the API exists but looking in package source in Debian and OpenBSD I only found 5 programs using it safely: Net::SSLeay, libevent sample code (also present in the copy of libevent in mozilla trees), Tor, lldpd, dnscrypt-proxy.

But then given that OpenSSL doesn't document it (surprise!) and itself uses it unchecked (in the stub for RAND_screen), who can blame them?

I think people are missing the point of the "truly scary function". Read the comments, especially lines 130-153. This is more a case of "if your kernel doesn't provide a reliable means to fetch entropy, we'll try what we can, but there really isn't much we can do".

In my opinion, what is needed is an intent to continue maintaining the sysctl interface until an alternative safe kernel interface is available. This is far simpler than retrofitting the thousands of other programs that use libssl/libcrypto with RAND_poll calls.

That's nice research about RAND_poll. It's possible other programs are using another way of reseeding the PRNG, such as RAND_add. If any program forks without reseeding the OpenSSL PRNG, they are unsafe under OpenSSL and are unsafe under LibreSSL. What I object to are security-conscious programs (such as Tor) doing it right under OpenSSL now being unsafe when linked with an allegedly drop in replacement for OpenSSL. Nothing excuses that even if only a handful of programs are currently safe.

Read the comments, especially lines 130-153. This is more a case of "if your kernel doesn't provide a reliable means to fetch entropy, we'll try what we can, but there really isn't much we can do".

Line 142 suggests a much better alternative to using sketchy entropy: "Could raise(SIGKILL) resulting in silent program termination." They don't justify why silently returning sketchy entropy is better than SIGKILLing the process, except with: "This code path exists to bring light to the issue that Linux does not provide a failsafe API for entropy collection." Well, I agree that's an issue, but I don't think that's a good reason to potentially compromise the entropy gathering of a security critical library.

Also note that even if Linux retains sysctl or provides a safe alternative, it only fixes the chroot issue - forking will still be a problem unless LibreSSL rips out their userspace PRNG entirely and just passes RAND_bytes calls straight through to the syscall.

The rationale for not aborting is there: unsafe core files. Is it worse than sketchy entropy? I can't tell. It's a case by case thing really. Also, the sketchy looking code isn't necessarily all that bad. Keep this in mind people, rdrand is a rather recent addition and prior to that, most common systems weren't expected to have a hardware random generator. So we devised lots of ways to collect entropy from timing and supposedly unpredictable events. The in-kernel entropy generation has always seemed sketchy, just like the fallback function. Ugly it is, but not necessarily bad. Also, some of the kernel entropy is likely to be in the data the fallback uses; see AT_RANDOM, ASLR, PIE, etc. Hopefully though Linux will get a getentropy() call and we can get rid of the sketchy looking code. I think everyone should be happy with that.

Unsafe core files are only an issue if you use SIGABRT. SIGKILL leaves no core. While it's true that there's a certain degree of sketchiness inherent to any entropy gathering, the kernel has access to far better sources of entropy, like network traffic, disk latencies, and mouse movements, and bolsters it by mixing in a random seed that's carried over across reboots. getentropy_fallback does an impressive job considering the circumstances, but it's still not nearly as good as the kernel, and loses quite a bit of its effectiveness if a program is statically linked and addresses of functions are constant.

You are mistaken. Linux does have chroot(2) and therefore "chroot jails". The "chroot jail" concept dates back at least to SVr4 and is older than namespaces and cgroups. In fact, chroot is probably the original Unix "container" mechanism. The problems with chroot are well-known, in particular that it only affects filesystem access and that root can easily escape from a chroot jail. Other problems come from the isolation provided by chroot--if /dev isn't bind-mounted inside the chroot jail or a duplicate /dev/urandom created inside the jail, a jailed process does not have access to it. Similarly, data in /proc is inaccessible unless /proc is mounted inside the jail. Most programs that use chroot for security intentionally lock themselves into a very limited environment, thus the use case for a getentropy syscall.

As another commenter pointed out, Linux, as do many Unixes, has chroot(). If you're going to start a program's execution from inside a chroot, it makes sense to set up a basic directory structure. However, a very common privilege separation technique is to start execution outside the jail, open all the resources you need, and then chroot into a completely empty unwritable directory. OpenSSL has the API to make this work (as do other crypto libraries such as libsodium); LibreSSL does not.

Could you clarify what you mean by "memory locks" in this context? As for checking times(), you can always make the fork detection better, and that's a good thing, but if it's not 100% you still need to provide an explicit way for the programmer to reseed.

By 'memory locks', Jayson means mlock(). Unfortunately this is not really enough: you can mlock(), sure, but detecting that the page is still locked in is hard. Neither munlock() nor mlock() error if asked to act on an already-locked page, and the only way to see if it is still locked (or if anything is still locked) or even paged in is to consult things in /proc/$pid/ -- and if you had access to that, you could reseed more conventionally.

I had a feeling it was mlock(), but I couldn't think of a good way to check if a page is locked. OpenBSD provides a really nifty way to detect forks: you can use the minherit() syscall with the INHERIT_ZERO argument to specify that a page should be replaced with zeros when forking. This experience has been a real eye opener to how an operating system can make it easier to write a secure crypto library.

A library RNG should probably accept its place in the universe as a thing that will get forked and rather than trying to second guess the system it should instead either mix in new entropy on every call or should direct the programmer to use more effective resources (operating system RNG service, instruction set RNG, etc.).

I don't think a linkable library is really the right place for an RNG. It's fine for providing a conservative way to access system entropy, but not fine for operating in isolation and a CSPRNG.

pid_t is not 16-bits, though often the default /proc/sys/kernel/pid_max is set to a value that fits in an unsigned 16-bit number (likely to keep ancient programs working).

The value can be raised to a little more than 4 million, and on most of the systems I influence, this number is raised as otherwise busy systems run out of id's. With the limit set much higher, the chance of pid reuse is lessened, but not eliminated.

As I said in my article, this mess is evidence of a design flaw in Linux, so I kind of agree with the LibreSSL devs. But if they want to provide a portable version of LibreSSL, they need to provide the facilities to make it safe. OpenSSL does.

"""It can do better than OpenSSL by killing the process instead of returning an easily-ignored error code."""

Hello? Seriously?? Please, this is not the 1980s anymore. Killing a process on error is a major PITA to anyone writing library code, or bindings for higher-level languages (which usually have a proper exception propagation mechanism, meaning you can't ignore an error return by mistake, you know).

Frankly, anyone who thinks that "killing a process" is a legitimate response for non-catastrophic failures should be banned from writing anything else than throwaway scripts and enterprise frameworks.

From my own selfish perspective, I completely agree with you, since I'm a C++ programmer and I wrap calls to RAND_bytes in a function that checks the return value and throws an exception if it fails. I don't want my programs raising SIGKILL on error. However, I'm all too aware of how cryptographic libraries are used in practice, and since OpenSSL/LibreSSl is a C library, it's all too likely that programmers are going to ignore the return value of RAND_bytes. Fortunately, a missing /dev/urandom is a pretty exceptional error so raising SIGKILL is not too unreasonable.

it's not just a missing /dev/urandom, but also resource exhaustion (for example out-of-fd's) than can cause open("/dev/urandom", ...) to fail.
an attacker may find ways to make your application run out of fds (for example by creating many connections).

aborting the program from a library however is very bad and precludes usage of said library in a robust application.
they should rather just return an error code when getentropy() fails, so the library user can handle the error gracefully (and if he doesn't check the return value, it's neither the library's fault nor responsibility).
doing an abort/kill is only acceptable if the API is misdesigned in a way that prevents checking for such an error.

Some excellent catches here. The removal, or rather conversion to noop, for RAND_poll() is especially bizarre. The LibreSSL folks did a disservice to themselves and the integrity of the project by attempting to downplay the issues you've exposed.

If an application knows it is going to chroot it's child away from accessing /dev/urandom, why doesn't the parent take responsibility to provide a named pipe in the chroot environment?

Part of the problem also seems to be that to be a drop-in replacement to OpenSSL requires leaving the API the way it is. Hopefully if LibreSSL gain popularity, they will be able to revise the API to include such things as being able to specify the entropy quality where the calling app decides if a scary function is an acceptable source of entropy or not.

Yes, part of the problem is that on one hand, the LibreSSL developers are trying to make LibreSSL a drop-in replacement for OpenSSL, but on the other hand they want to ignore parts of the API that they don't like. This is a problem even if they are right about those API parts being bad.

Still, even if LibreSSL were being designed from scratch, I'd still want it to provide a way to open /dev/urandom in advance. A long-standing privilege separation idiom is to start execution outside of the chroot, open needed resources, and then chroot into a completely empty directory. We shouldn't change the way we do chroot jails just because LibreSSL refuses to provide an API to make it possible. Even libsodium, a modern crypto library that is frequently lauded for its good design, provides an API to open /dev/urandom in advance. Fortunately, LibreSSL's API deficiency is easily worked around by just asking for 1 byte of random data.

Post a Comment

Your comment will be public. If you would like to contact me privately, please email me. Please keep your comment on-topic, polite, and comprehensible. Use the "Preview" button to make sure your comment is properly formatted. Name and email address are optional. If you specify an email address it will be kept confidential.