Posted
by
Soulskill
on Wednesday July 16, 2014 @04:37PM
from the looking-forward-to-the-next-two-day-panic dept.

msm1267 writes: The OpenBSD project late last night rushed out a patch for a vulnerability in the LibreSSL pseudo random number generator (PRNG). The flaw was disclosed two days ago by the founder of secure backup company Opsmate, Andrew Ayer, who said the vulnerability was a "catastrophic failure of the PRNG." OpenBSD founder Theo de Raadt and developer Bob Beck, however, countered saying that the issue is "overblown" because Ayer's test program is unrealistic. Ayer's test program, when linked to LibreSSL and made two different calls to the PRNG, returned the exact same data both times.

"It is actually only a problem with the author's contrived test program," Beck said. "While it's a real issue, it's actually a fairly minor one, because real applications don't work the way the author describes, both because the PID (process identification number) issue would be very difficult to have become a real issue in real software, and nobody writes real software with OpenSSL the way the author has set this test up in the article."

more like "I see your using the phone in a way we hadn't anticipated though we don't think thats the best way to use the phone we'll make the appropriate changes to ensure its safe for you to use in this manner"

To exploit this, you needed a program that was written like so:
1. Grandparent initializes SSL state, sends some data, then exits.
2. Parent forks a child
3. Child happens to get the same pid as the grandparent, and then uses the SSL connection.

It's a program structure that doesn't make a whole lot of sense in the real world. Maybe it has happened somewhere.

The big issue is that the original discoverer found an easily filled molehill and somehow it got reported as a world destroying volcano across the the various tech sites. A minor flaw in the first public release of the test version of a library with no production users is not "catastrophic".

The LibreSSL developers apparently agreed that it was a bug that should be fixed, and fix it they did.

.
The discussion seems to center more around whether or not this was a "catastrophic" bug, or a "minor" bug. A bug in a library that has not yet seen a production release. So one really should ask, why not just report the bug and have it fixed, instead of seeking headlines?

There seem to be some people who would like to see the LibreSSL project fail. It makes one wonder why, as the OpenSSL near-monoculture has served the world so well.

I don't know about people wanting it to outright fail, but I do agree there are lots of people that don't see the point in forking it. They feel like it's a fork for the sake of being a fork. And like it or not, there's a whole history of open source and linux forks that do nothing but fragment things (people, distros, development, etc).
A guy heavy in the Linux field once said something to the effect of "If it's not broken, don't fix it. But if it is broken, fix it. Just don't fork it unless you reall

I don't know about people wanting it to outright fail, but I do agree there are lots of people that don't see the point in forking it.

At some point people are going to form opinions no matter what really and nothing will convince them that a fork is OK. In this case, the combination of bugs hanging around in RT for years (to the point where there were already unofficial distro forks with the bugs fixed) and the add-new-features-and-never-clean and the FIPS requirements meant that the OpenSSL end of things had reached the end of the line.

Kind of like Xorg versus XFree86.

I think this was one of the ver much "had to" cases.

And in the intervening time, libreSSL has done substantial rewrites, cleaned many things and fixed many previously hidden bugs, got it working on OpenBSD and made it portable. Meanwhile over in OpenSSL land, the Linux foundation signed on a lot of corporate sponsors who splattered logos all over a page, made announcements and meybe even appointed someones, and the OpenSSL people fixed a couple of bugs and posted a roadmap.

In this particular case, yes. There will always be non-exploitable bugs.

The problem is that when you begin to dismiss bugs as non-exploitable (whether you've fixed them or not) and their reports as "overblown," you put yourself in the unfortunate position of only needing to be wrong once. Specifically, dismissing bug reports with the notion that the bug would never be exploitable—not because the bug is "beyond the airtight hatchway," but because no one would be dumb enough to write an application in

The disclosure is very well written, says exactly why this is a big problem and proposes a very implementable solution that would fix it. Nevertheless, the dev decided to try to dismiss the disclosure, called his daddy (de Raadt) and they both threw a tantrum, and fixed it in a way that is questionable (an update on the disclosure raises some good questions on why it is questionable)

Btw, forgetting about ssl for a minute (open/libressl are crypto libraries, not ssl libraries), a PRNG is either secur

Nice to see that I'm not the only one who thought the response was completely unprofessional.

What is it lately with all these OSS guys in key positions turning into Ulrich Drepper clones? Have they gotten so used to being told how great they are that when anybody dares point out a mistake they HAVE to go nuclear? You got Torvalds flipping off companies and throwing fits like a 14 year old Halo player, you got de Raadt acting like a giant ass, now you got this guy...wtf? Has everybody forgotten what it me

OpenSSL's RNG is used in many places separately from the SSL communication protocol itself, sometimes just for encryption in general (S/MIME) or sometimes someone just wants really random bytes.

Many servers fork twice in order to reparent to init, repeated forking is a common idiom in unixland.

Apache with MPM-prefork forks a bunch of children from a master process, which is typically itself a descendant of apachectl. In apache's case, this shouldn't be a problem since the "master-process-rng" would have recognized the fork and reinitialized on the first openssl connection, so the children are protected because they cannot have the same PID as the master-process.

Where it would be a problem would be an application or daemon that starts up, initializes the RNG, forks twice, then without this fork touching the RNG, starts forking children to do something random (say, encrypting one file per process or establishing a single SSL connection per process or something). Without having the RNG reset by the master process, one in 65534 or so processes will have the exact same RNG, because it will have inherited the original RNG untouched and be assigned the PID that created the RNG.

Only if the master process quit after forking twice. This is not typical, since most of the time people will leave the master process around to clean up after the children to avoid zombies. It's such a strange case I think you would be hard pressed to find a real world application that behaved in a way that made it vulnerable to this exploit. I'm glad the OpenBSD guys got a patch for this, but even if they didn't I woudln't be losing much sleep over it.

Only if the master process quit after forking twice. This is not typical

No, this IS typical. The double fork allows the original process to interact with the user ("Enter your private key password:"), then exit and return 0 to the init script so init can print [ OK ] on your console.

The middle process needs to close file descriptors and do other cleanup then fork and die, causing the final process to become re-parented to init. Init then becomes responsible for cleaning it up if it dies, so it won't becom

It's not a flaw in LibReSSL, it's a flaw in the portability layer that only happens in non-OpenBSD OSes with situations where the sysadmin blocks access to/dev/urandom like a tard. The only reason the portability implementation even supports this fall-back is because there are a lot of stupid sysadmins using OpenSSL and LibReSSL needs to be as much a drop-in replacement as possible. It's a flaw in the bandaid for a situation that shouldn't happen, but does.

Presumably for the same reason that they would block "ICMP would fragment", aka "the packet you are trying to send is too big to pass on, and you've told me not to split it in two, so pleasy try again with a smaller packet, as I'm giving up on this one", causing downloads of over 1400 bytes to fail when the other end is behind a VPN.

His point was obviously that you couldn't accidentally write a program to exploit the flaw and that this exploit does not mean that all software using OpenSSL is vulnerable to the exploit as was the case with heartbleed. In fact, this flaw only means that your encryption is weak if you 1) decide to use LibreSSL in your software and 2) decide to intentionally break LibreSSL in your software. The end result is then weak encryption.

This is not a problem where an outside attacker can successfully attack the software. It is a problem where a malicious developer can attack his or her own software. So the vulnerability is not that an attacker can shoot at me with a gun, the vulnerability is that I can use my own gun to shoot myself in the foot. But only if I construct a clever framework that causes the anti-shoot-in-the-own-foot measures provided by the gun to be blocked.

Ok, so best-case scenario is that OpenBSD has additional sources of randomness and that issue simply weakened crypto instead of outright breaking it.

For ignoramus that downmoded my GPP - all cryptographic functions heavily rely on random numbers being both unpredictable and computationally indistinguishable from true random. It can break two ways - first by broken seeding, where it becomes predictable. Second by having algorithm that has non-uniform (e.g. some numbers have higher chance than 1/u). Both of

I read the rational, it wasn't compelling to me, but its their project.

OpenBSD's moto: Do it correctly, or GTFO

The only reason it wouldn't be compelling is if you don't believe in doing it correctly in the first place. The entire bug is because of a bandaid in the portability layer to accommodate stupid admins. LibReSSL's stance is the OS is responsible for crypto entropy, anything else is not recommended. Don't have access to/dev/urandom, well too bad. They were forced to add this because of OpenSSL allowing bad practice.

fixed a condition that was highly unlikely to be able to be exploited in real world conditions and made a big deal out of it. Just fix it and move on, the 'While at first glance this only appears to be a major issue" is something I expect to hear from other camps.

The PID is used as an absolute last-ditch fallback in the case that no other sources of randomness are available. In order for this to happen,/dev/urandom needs to be inaccessible, the KERN_RANDOM sysctl needs to be unavailable, gettimeofday() needs to fail, and clock_gettime() needs to fail.

If you're running on a system that crippled, you've really only got two choices: try seeding using the PID, or use an unseeded RNG. Or follow Theo's advice and get yourself a real operating system.

Not an expert in OS design details, but I'm quite surprised there exists an OS which newly hands out the same PID a very recent process had. Do not PIDs monotonically increase until they wrap around? If not, why not? And why are they not based on adequately large integers? 32 bits for a minimum; why not 64? Yeah, it will uglify a ps display, but eyes on the security ball here. My 64-bit Arch linux on kernel 3.15 is saying 15 bits (cat/proc/sys/kernel/pid_max = 32768).

The design is requiring the PID to not just be unique, but to be unpredictable. So after untangling the cords, you end up with the same requirement on your PID as you have on your RNG. Therefore the RNG design is wrong.

It's a program tthat exits the grand parent process and then forks in a loop until it happens to get the same process id as the grand parent. Which is of course something that will never happen in a real program. Expanding the size of the pid will just make it take longer.

You can always "echo 4194303 >/proc/sys/kernel/pid_max" on linux if you want to wait longer for said program - though you will break old binaries though...

Sure, needs to be fixed, but it it not going to happen in most situations and an attacker that can provoke it already can do far worse. That said, a competent user of LibreSSL will reseed after a fork anyways. You can do only so much for the incompetent ones.

The difference is that OpenSSL provides a way to explicitly reseed the PRNG by calling RAND_poll. LibreSSL, unfortunately, has turned RAND_poll into a no-op (lines 77-81). fork_rand calls RAND_poll after forking, as do all my OpenSSL-using programs in production, which is why fork_rand is safe under OpenSSL but not LibreSSL.

The OpenSSL docu says "Be careful when deferring to RAND_poll on some Unix systems because it does not seed the generator." As far as I can see, RAND_poll() is not mentioned in the official OpenSSL interface definition at https://www.openssl.org/docs/c... [openssl.org] at all, so it is more of an internal, hidden function, and should not be used by the library user. It is mentioned on http://wiki.openssl.org/index.... [openssl.org], but with the clear warning above. Also note that it says "OpenSSL will attempt to seed the random numbe

Or lack thereof, in this case. Just because a project is open source doesn't mean the code's been properly audited as you seem to be assuming. OpenSSL is notorious for its poor code quality and difficulty to understand.

Wait, when was LibreSSL a completely new SSL library? There was me thinking that they'd spent ages saying that they were only stripping out dead code and refactoring, and listening to BSD lovers on Slashdot saying how often "Theo" and his "boys" have done this kind of thing before... but now there's a vulnerability it's suddenly "a completely new SSL library"?

There's a lot of people saying its a non-issue. It's a huge issue. The contract of a PRNG says it's to return a random value. Getting it to do otherwise (without providing the same seed) is tantamount to being able to make a collision in a hash function (in terms of severity) -- which means that it's fundamentally broken. This bug indicates that there is some underlying structural issue with this PRNG's initialization, and downplaying it demonstrates incompetence.

It if was a deterministic PRNG for the purpose of producing deterministic sequences, then it would be fine. But it is not that and it is not fine. It is the random service in the crypto library and you want this to provide numbers that meet the necessary properties of cryptographically secure random numbers.

If you fork a process and each process call my RNG, you'll get a different result, subject to the normal binomial collision distribution. This is how it should be.

Well, that was the requirement for this RNG to, wasn't it? But they had a bug where the pid was presumed to be unique within the foliation of process forking. Testing found that assumption to be incorrect (given maliciousness on the backend), and so the code was fixed. Seems perfectly fine to me. That's why there's testing: you can't see the errors in your assumptions through any amount of inspection.

That's not exactly the case, but it's close. The issue is that the SSL library has no way of knowing if the process forks other than checking the PID. If the SSL library detects a PID change, it has to reseed the RNG to avoid getting the same random values in both the parent and the child. Due to the way Unix PIDs work, you have a guarantee that the Parent and the Child will have different pids (fork() fails otherwise). However, if a grandparent forks a parent and then exits, and the parent then forks a child, there is nothing in Unix that outright prevents the child from getting the pid of the now deceased grandparent and foiling this detection so the SSL library doesn't know that a fork happened.

So it's a potential problem, but not one that likely exists in any production code. You could write test code that exploits it fairly easily by forkbombing the machine until the pid wraps before spawning the child, but in real production code it is unlikely to be an issue. Plus it was fixed.

A problem was found in a new library and fixed, this wasn't the PRNG itself, it was an interaction with the operating system. To quote (jandrese):
1. Grandparent initializes SSL state, sends some data, then exits.
2. Parent forks a child
3. Child happens to get the same pid as the grandparent, and then uses the SSL connection.
Why are you outraged? This was a subtle bug, that was tricky to exploit and couldn't be used to hack into the computer. You should be outraged that the heartbleed bug remain exposed

This "Theo de Raadt and developer Bob Beck, however, countered saying that the issue is "overblown" because Ayer's test program is unrealistic."Is why it is bad. Bugs are allowed. But in security, RNGs are special and you do them right or you fail. They failed. Then they tried to claim it wasn't a biggie.

>You should be outraged that the heartbleed bug remain exposed for years due to awful coding practicesWho says I'm not. But that is a symptom of a bigger problem of using transport security to protect th

It isn't. Apparently is an issue related to portability (aka Linux), and lack of permissions to access to proper RNGs in real-world scenarios (no access to/dev/urandom). While this is definitely a bug, it *isn't* a biggie. Its an edge case where the implementation should have been more robust, that's it.

It's a shame that you don't realize that *modern* Intel is only a subset of the cpu market, and not even that relevant in network appliances. Have a look at http://en.wikipedia.org/wiki/R... [wikipedia.org], and you'll quickly see that the instruction you mention is about 2 years old. So, either you have the experience you say you have in other posts, and you're perfectly aware of this and are trolling, or you actually have no clue on the diversity of hardware out there.

Of course I know about other hardware RNGs. I already pointed to VIAs and the occasional one strapped to an ARM core. I put some of them in some of those chips. Back then I was into iterated hashes, but I've learned the error of my ways and these days it's block ciphers and field arithmetic all the way.

Rumor has is that I may know something about the RNG you just referenced. It may be two years old to you, but it didn't come into existence in 10 minutes. It doesn't really matter. These repeated crypto softw

Of course I know about other hardware RNGs. I already pointed to VIAs and the occasional one strapped to an ARM core. I put some of them in some of those chips.

So, you acknowledge they're still not mainstream, as you tried to imply in your previous comment.

It may be two years old to you, but it didn't come into existence in 10 minutes.

Yeah, it didn't. Crypto support in general purpose CPUs is not new, and as you mentioned, the VIA instructions are way older than the incarnation from Intel.

These repeated crypto software failures point to a holier than thou attitude of some crypto software writers that does the public no good. You can't play in this game without accepting that it's easy to be wrong and you'd better have things checked and cross checked by the smartest people you can find and don't get all defensive when you've been found to be wrong.

The whole point is, probably some of the critical systems running software implementations in userspace shouldn't. Cryptoprocessors exist for a long time, and cryptocards and SoC are quite common well, everywhere. Bugs will always exist, but the attack surfa

I'm not. There are normal capability bits though. So software can be written to do the right thing on each platform.

The point is that even in a chroot jail with no access to/dev/urandom and a completely predictable PID, instructions are still there on intel CPUs, VIAs and some arms, yet the library ignores all those options, resulting in a collision case. It's certainly the right thing to do to mix in cheap, fast sources into your CSPRNG state on each call. You don't have to trust the source and no harm wi

CSPRNGs are a fine component in a system. But it doesn't let anyone off the hook for gathering and extracting entropy.

Hardware vendors have to do it. Things are ok on PCs these days, but the plethora of amateur SoCs have re-opened the field for entropyless systems.

Something somewhere needs to implement policy, in terms of what is trusted to be entropic and combining and processing sources. A library can do that. But a CSPRNG as we have seen in this case, is particularly precarious in a user library because

I've spent the past 5 years of my life fully employed in the design, creation, testing, and deployment of secure RNGs.

Citation needed. Seriously, this is/. where everyone is a world-class programmer (except me, of course).

The world is full of bad PRNGs, NRNGs, CSPRNGs, DRBGs, TRNGs and any other form of RNG.

I will grant you that one.

LibreSSL doesn't have a leg to stand on. A good secure RNG will return unpredictable output.

Bzzzzt! Sorry, you lose. As I have already said, this is not a LibreSSL problem - it's a Linux PRNG problem. Unless I am mistaken, the same issue is non-existent under OpenBSD, because it's PRNG is different from Linux, better seeded and because PIDs are randomized under that OS.

We know how to do these things. It isn't trivial, but it isn't hard either.

Bzzzzt! Sorry, you lose. As I have already said, this is not a LibreSSL problem - it's a Linux PRNG problem. Unless I am mistaken, the same issue is non-existent under OpenBSD, because it's PRNG is different from Linux, better seeded

Incorrect. OpenSSL manages its own entropy pool and retrieves entropy from operating system as necessary on Linux and most UNIX systems by reading from/dev/urandom.

and because PIDs are randomized under that OS.

Who cares if PIDs are sequential or random? Chance of same sequence of events remains with either scenario.

More amusingly reuse happens quicker with random algorithm than a sequential one as the sequence needs to wrap around first.

As opposed to the magnificent job OpenSSL has done all these years, with information leakage, bug reports that went uncorrected for years and accumulated cruft for such modern OS as VMS, DOS and Windows 3.1?

I think you need to tone down the hysteria a notch right here.

Two wrongs don't make a right. Whatever shortcomings the OpenSSL project has does not excuse shortcomings in Lib

Some years ago I described "The Paranoid Entropy Trap". The tendency to get no entropy because you trusted none of your sources and turned them all off.

This is just such an example. If that computer he ran it on was less than a couple of years old, the hardware was almost certainly providing lots of entropy and the library was actively choosing to ignore it in the name of security.

Often entropy is used when seeding, not at every call to get a new random number. When you exactly duplicate a process, you get exactly the same state in both PRNGs. A PRNG library which is distinct from the operating system needs to rely on the operating system to allow it to know when its state has been duplicated. The bug was with this operating system interaction.

Now you may have a point that someone should apply entropy at every single iteration of the RNG, but that is often very expensive and thus

However there were also major flaws in the way OpenSSL was doing this stuff. Using OpenSSL securely required that you know about the flaws that it have and provide specific workarounds to avoid specifically the problem LibreSSL encountered. What LibreSSL did was attempt to make the library more idiot proof so that it would work even if you forgot to do RAND_poll() at key moments. The bug is that they did this wrong; but OpenSSL also did it wrong as well as it was not fork safe in all ports, requiring the

OpenSSL/LibreSSL are *not* security products. They are crypto middleware. They can be used to build security products, or to build completely unsecured products. They do nothing by itself. Which is fun, because the LibreSSL Linux port actually required *extra* code so it would work with dumbass admins. And this extra code had the bug. True, Linux PID behaviour is not a security feature, but it is an entropy source. Maybe not a good one, granted. But it was used as fallback. Want to bitch about it, go ahead.

The last time I looked, OpenSSL claimed to provide command line tools for managing certs

So, it generates prime numbers and does some math between them. If that is a security product, so is everything else capable of producing that kind of output - it includes both Excel and the C language, as an example.

OpenSSL recently greatly improved its RNG code

Define "recently" and "greatly". Because if this bug actually happened in OpenSSL, I suspect that we'd have to wait months for the proper patch from upstream.

>So, it generates prime numbers and does some math between them. If that is a security product, so is everything else capable of producing that kind of output - it includes both Excel and the C language, as an example.

I didn't know C and Excel had a native X.509 parser and cert management built into the language. I'll run and check my copy and K&R, but I'm pretty sure it's not in there. That's why libraries like openssl exist.

>Define "recently"In the last two years. Deployed in the main stream in

I didn't know C and Excel had a native X.509 parser and cert management built into the language. I'll run and check my copy and K&R, but I'm pretty sure it's not in there.

If you configure any of them to that specific task, there is no technical limitation from their side. But I'm sure you wouldn't consider some scripted operations in Excel to generate and manage certificates a security product, right? That was my point.

In the last two years. Deployed in the main stream in that last year.

And is consistent in every environment? Shall I expect the same quality and behaviour in OpenBSD, Linux and Windows 3.1? Because, you know, this is the actual problem.

Gave the option of using local high rate entropy sources to ensure consistency in the random numbers from it's service interface.

I've gone for bypassing the OS as best I can and delivering the entropy directly from hardware. OSs don't have the situational awareness to know whether or not what they have is really entropic. It works most of the time until you try and run it on an arm processor in a fully synchronous chip in a cheesy router pulling random numbers at early boot time.