Saturday, May 3, 2014

A good idea with bad usage: /dev/urandom

Last week, I wrote twoarticles pointing out issues with unofficial porting efforts of LibreSSL. In these articles, I highlighted some issues that I currently see going on with some of these projects.

In the second article, I called attention to poor arc4random_buf() implementations being created, specifically saying: "Using poor sources of entropy like /dev/urandom on Linux, or worse, gettimeofday(), and using them to generate long-lived keys." Which seemed to irk some. Now for those that care to understand the quoted issue in its proper context should go look for the current LibreSSL porting projects they can find via Google, and review their source. However, most seem to be understanding this as some kind of argument about /dev/randomversus /dev/urandom. It isn't. So without further ado:

In order for many forms of cryptography to work properly, they depend on secrecy, unpredictability, and uniqueness. Therefore, we need a good way to use many unpredictable values.

Now randomness doesn't really exist. When we humans see something as random, it's only because we don't know or understand all the details. Therefore, any perceived randomness on your part is your inability to track all the variables.

Computers are very complex machines, where many different components are all working independently, and in ways that are hard to keep track of externally. Therefore, the operating system is able to collect variables from all the various hardware involved, their current operating conditions, how they handle certain tasks, how long they take, how much electricity they use, and so on. These variables can now be combined together using very confusing and ridiculous algorithms, which essentially throw all the data through a washing machine and the world's worst roller coaster. This result is what we call entropy.Now entropy might not be that large, and can only provide a small amount of unique random values to play with. However, a small amount of random values can be enough to create trillions of pseudo-random values. To do so, one uses some of the aforementioned ridiculous algorithms to produce two values. One value is never seen outside the pseudo-random number generator, and is used as part of the calculations for the next time a random value is requested, and the other value is output as the pseudo-random value generated for a single request.

A construct of this nature can allow an unlimited amount of "random" values to be generated. As long as this technique never repeats and is unpredictable, then it is cryptographically secure. Of course since the algorithm is known, the entropy seeding it must be a secret, otherwise it is completely predictable.Different algorithms have different properties. OpenBSD's arc4random set of functions are known to be able to create a very large amount of good random values from a very little amount of entropy. Of course the better entropy it is supplied with, the better it can perform, so you'll always want the best entropy possible. As with any random number generator, supply it with predictable values, and its entire security is negated.

arc4random and /dev/(u)random

So, how does one port the arc4random family to Linux? Linux is well known for inventing and supplying two default files, /dev/random and /dev/urandom (unlimited random). The former is pretty much raw entropy, while the latter is the output of a CSPRNG function like OpenBSD's arc4random family. The former can be seen as more random, and the latter as less random, but the differences are extremely hard to measure, which is why CSPRNGs work in the first place. Since the former is only entropy, it is limited as to how much it can output, and one needing a lot of random data can be stuck waiting a while for it to fill up the random buffer. Since the latter is a CSPRNG, it can keep outputting data indefinitely, without any significant waiting periods.

Now theoretically, one can make the arc4random_buf() function a wrapper around /dev/urandom, and be done with it. The only reason not to is because one may trust the arc4random set of algorithms more than /dev/urandom. In that case, would /dev/urandom be trusted enough to seed the arc4random algorithms, which is then in turn used for many other outputs, some of which end up in RSA keys, SSH keys, and so on? I'll leave that question to the cryptographic experts. But I'll show you how to use /dev/urandom poorly, how to attack the design, and corrective efforts.First, take a look at how one project decided to handle the situation. It tries to use /dev/urandom, and in a worse case scenario uses gettimeofday() and other predictable data. Also, in a case where the read() function doesn't return as much as requested for some reason, but perhaps returned a decent chunk of it, the gettimeofday() and getpid() calls will overwrite what was returned.

This very much reminds me of why the OpenBSD team removed the OpenSSL techniques in the first place. You do not want to use a time function as your primary source of entropy, nor with range limited values, and other dumb things. If you're going to use time, at the very least use clock_gettime() which provides time resolution that is 1,000 times better than gettimeofday(), and can provide both current time, and monotonic time (time which cannot go backwards). Additionally, both gettimeofday() and clock_gettime() on 64-bit systems will return values where the top half of their data will be zeroed out, so you'll want to ensure you throw that away, and not just use their raw data verbatim. Further, how this one project uses /dev/urandom, like most other projects, is terrible.

This tries to open the file, ensures it was opened, tries to read 40 bytes of data, and continues what it needs to do in each scenario. Note, 40 bytes is the amount arc4random wants, which also happens to be 8 bytes more than the Linux manual page says you should be using at a time from its random device.

The first common mistake here is using read() like this. read() can be interrupted. Normally it can't be interrupted for regular files, but this device is not a regular file. Some random device implementations specifically document that read() can be interrupted upon them.

With this improved approach, we know we're reading as much as possible, and if not, then we can try using lesser techniques to fill in the missing entropy required. So far so good, right?

Now, let me ask you, why would opening /dev/(u)random fail in the first place? First, it's possible the open was interrupted, as may happen on some implementations. So the open() call should probably be wrapped like read() is. In fact, you may consider switching to the C family fopen() and fread() calls which handle these kinds of problems for you. However, opening could be failing because the application has reached its file descriptor limit, which is even more prevalent with the C family of file functions. Another possibility is that the file doesn't even exist. Go ahead, try to delete the file as the superuser, nothing stops you. Also, you have to consider that applications may be running inside a chroot, and /dev/ entries don't exist.

I'll cover some alternative approaches for the above problems later. But if you managed to open and read all the data needed, everything is great, right? Wrong! How do you even know if /dev/(u)random is random in the first place? This may sound like a strange question, but it isn't. You can't just trust a file because of it's path. Consider an attacker ran the following:

Now both random devices are actually large sparse files with known data. This is worse than not having access to these files, in fact, the attacker is able to provide you with a seed of his own choosing! A strong cryptographic library should not just assume everything is in a proper state. In fact, if you're using a chroot, you're already admitting you don't trust what will happen on the file system, and you want to isolate some applications from the rest of the system.

So the next step is to ensure that the so called device you're opening is actually a device:

So now we're out of the woods, right? Unfortunately not yet. How do you know you opened the correct character device? Maybe /dev/urandom is symlinked to /dev/zero? You can run lstat() on /dev/urandom initially, but that has TOCTOU issues. We can add the FreeBSD/Linux extension O_NOFOLLOW to the open command, but then /dev/urandom can't be used when it's linked to /dev/random as it is on FreeBSD, or linked to some other location entirely as on Solaris. Furthermore, avoiding symlinks is not enough:

The Linux user manual page for the random devices explicitly informs the reader of these magic numbers, so hopefully they won't change. I have no official sources for the magic numbers on the other OSs. Now, you'll notice that I checked here that the file descriptor was open to either Linux random device. A system administrator may for some reason replace one with the other, so don't necessarily rely on a proper system using the expected device under the device name you're trying to use.

This brings us back to /dev/random versus /dev/urandom. Different OSs may implement these differently. On FreeBSD for example, there is only the former, and it is a CSPRNG. MirBSD offers five different random devices with all kinds of semantics, and who knows how a sysadmin may shuffle them around. On Linux and possibly others, /dev/urandom has a fatal flaw that in fact it may not be seeded properly, so blind usage of it isn't a good idea either. Thankfully Linux and NetBSD offer the following:

However, there may be a TOCTOU between the ioctl() and the read() call, I couldn't find any data on this, so take the above with a grain of salt. This also used to work on OpenBSD too, but they removed the ioctl() command RNDGETENTCNT a couple of versions back.

Linux has one other gem which may make you want to run away screaming. Look at the following from the manual for random device ioctl()s:

RNDZAPENTCNT, RNDCLEARPOOLZero the entropy count of all pools and add some system data (such as wall clock) to the pools.

If that last one does what I think it does, is any usage ever remotely safe?

/dev/(u)random conclusion

After all this, we now know the following:

This is hardly an interface which is easy to use correctly (and securely).

On some OSs there may not be any way to use it correctly (and securely).

Applications not running with Superuser privileges may have no way to access the random device.

An application at its file descriptor limit cannot use it.

Many portability concerns.

For these reasons, OpenBSD created arc4random_buf() in the first place. It doesn't suffer from these above problems. The other BSDs also copied it, although they may be running on older less secure implementations.

Alternatives

In addition to a CSPRNG in userspace, the BSDs also allow for a way to get entropy directly from the kernel:

KERN_RND may also be replaced with KERN_URND, KERN_ARND, and others on the various BSDs. FreeBSD also has a similar interface to determine if the kernel's CSPRNG is properly seeded, see the FreeBSD user manual page for more details. DragonFlyBSD also provides read_random() and read_random_unlimited() as direct no-nonsense interfaces to the underlying devices for /dev/random and /dev/urandom.

However, this interface was removed a couple of versions back. Kind of makes you wonder why there is no sane simple to use standardized API everywhere that doesn't depend on a house of cards. This is the kind of thing you would think would be standardized in POSIX.

Conclusion

There does not seem to be any good cross-platform techniques out there. Anything done with the current technology will probably require a ton of conditional compiles, annoying checks, and ridiculous workarounds. Although, there should be enough basis here to come up with something with a high confidence level. It'd be interesting to see what the LibreSSL team (or the OpenSSH team) comes up with when they get around to porting what's needed here. As I said before, until then, avoid the non-official ports out there, which use poor sources of entropy like an insecure use of /dev/urandom on Linux, which then falls back on gettimeofday(), and is used to generate long-lived keys.

44 comments:

I originally blew off the concerns about an attacker monkeying with /dev/urandom because an attacker would need to have administrative privileges to do so, and if they have that they can already do a lot more damage (Raymond Chen refers to this as requiring the attacker to be on the other side of an airtight hatch: https://www.google.com/search?q=airtight+hatch+oldnewthing ). However, the machine you're one may not be the attacker's final target, your SSL keys might be (or, rather, your customers' private data might be the final target, and the attacker hopes to get that data through administrative rights on the machine).

Max: I was already asked to write this yesterday, and was half way done by the time you wrote your first comment. So yeah, it seems quick :)

Now, regarding your points:Not necessarily an attacker either, The NetBSD manpage *recommends* you shuffle around your random devices to what you think is best. If you program to expect one design and get the other, you're going to get problems.

And don't forget that it may be seeded incorrectly. Thankfully you can use the ioctl() to check for that.

In any case, one shouldn't make assumptions in secure programs, they should be verified, and (/dev/urandom || gettimeofday()) is a very poor source of entropy. The first may fail for a multitude of reasons, and the last one, especially when done terribly, is a joke.

Hi, thanks for the feedback insane coder. I had posted this version more as a PoC to the OpenSSH mailing list, you can follow the discussion there: http://www.gossamer-threads.com/lists/openssh/dev/58537

I'll note your comments on the project issue tracker so that they're not lost.

I'm not saying you should follow any of these blindly, but take a look at:https://github.com/libevent/libevent/blob/master/arc4random.chttps://github.com/nmathewson/libottery/blob/master/src/ottery_entropy_urandom.c

ah, i thought you were asking if the device an fd was referencing could change between ioctl and read.

the state of the device can obviously change between syscalls unless it makes attempts to be opened by only one thing at a time, and isn't affected by something like hotplug or interrupts. however dup and fork kind of ruin that too.

Hmm after reading the article again, it seems you did not explicetly state this fact :-)

But:So, how does one port the arc4random family to Linux? Linux is well known for inventing and supplying two default files, /dev/random and /dev/urandom (unlimited random). The former is pretty much raw entropy, while the latter is the output of a CSPRNG function like OpenBSD's arc4random family

The thing with "pretty much raw entropy" caught my eye. According to the mentioned article, /dev/random is the same as /dev/urandom, but will block if the super-mega-magic entropy-measurement function thinks it does not have enough entropy.

I tried giving an explanation which should be simple for a layman to follow of what goes on. I wouldn't take it as precise fact, especially when there's a washing machine involved in the explanation ;)

The issues between /dev/random and /dev/urandom is slightly more complex than just blocking. How does /dev/urandom keep supplying an unlimited about of data? Will its algorithm ever start becoming predictable? How does this interact with seeding the algorithm in arc4random, which also reseeds from time to time?

As is well known, the primary issue with /dev/urandom is when you use it in a library which uses it once to seed, and is started at bootup, say within Apache. To combat the issues there, 1) I showed that the ioctl() can be used to see if /dev/urandom has entropy (whether that works properly or not is another matter). 2) I suggested that arc4random_buf() should in fact be a direct wrapper around /dev/urandom, so you never have to worry about some initial poor seed infecting an application's entire life time, as you'll just call it directly each time you need some random.

The RANDOM_UUID sysctl wasn't removed, but the syscall mode of access was simply made optional with the kernel configuration option CONFIG_SYSCTL_SYSCALL.

Ubuntu 14.01 still builds with sysctl syscalls, and so the method I posted still works, while Red Hat apparently removed them awhile ago. I haven't tested other distributions.

This is pretty lame on the part of Red Hat, because quite a few applications use the RANDOM_UUID method. You can still, for example, cat /proc/sys/kernel/random/uuid, you just cannot access it without /proc.

Here's a choice quote from the Tor source code:

/* On linux, sysctl is deprecated. Because proc is so awesome that you shouldn't _want_ to write portable code, I guess? */

Any randomness should be first ran through some algorithm to remove biases that a single source may carry.

If you're very interested in this, you may want to read:http://eprint.iacr.org/2013/338.pdfhttp://www.oreilly.com/catalog/secureprgckbk/chapter/ch11.pdf (this covers real world use too!)

There's also been some talk lately about a backdoor in Linux's RNG, read through some commit messages here: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/log/drivers/char/random.c?showmsg=1

Some random devices may not too secure, check out /dev/srandom and /dev/prandom on MirBSD. This page covers a lot of interesting material regarding different devices on different OSs and testing: https://calomel.org/entropy_random_number_generators.html

Lastly, you're assuming that arc4random is secure with a good random seed. I personally haven't seen any papers on its strengths and weaknesses, or how it compares to alternatives. I'd love if someone who really knows their stuff would determine how arc4random stacks up to Linux's method, and the pros and cons of one seeding the other. If arc4random is truly a real improvement that can be applied after a /dev/urandom step, then there's no reason why it shouldn't be added as the final pass to the output of /dev/(u)random itself.

Yet, I saw that a couple of days ago.Note, that is being constantly worked on, so my remarks may no longer apply by the time you read this.

1) It seems they noted some of the information presented here, so that's good.

2) They seem not to handle some issues mentioned here, or in my other article.

Notably, they're relying on /dev/urandom to be returning good data, which may not be true close to system start up. The approach they're taking if reading /dev/urandom fails should really be used in addition to /dev/urandom, not instead of. Combining data from all over (hashing it all together) will serve to improve the basic situation. Of course there is still the question as to whether not being able to combine in /dev/(u)random as one of the sources should result in an abort.

They're using the sysctl() call on Linux, which doesn't exist. I don't know which distribution has an actually system call there (even with what they're doing), but not a single one of my servers, which have multiple distros, some even 5 years old, supports it. Even if a distro does offer that function, I'm not sure it does what they think it does.

Since I don't have access to a system which supports it, I haven't actually tested the function to see what it returns. But based on some of the information I've read about it, it may only return a limited amount of data, and it won't be in the format you're expecting. Meaning, you really need to check how much you're receiving, and loop if need be. Also, you'll have to convert the encoding of the data to raw binary.

This may also be entirely moot, because if the function does work, it may very well just wrap to file system reads, in which case you're not really gaining anything, and you surely cannot trust the data you receive from it.

However, you may be able to mount proc in cases where you cannot mount devtmpfs (or whatever flavor alternate is available), in which case it can be worthwhile to read it via the file system directly, perhaps with some preads to ensure successive calls are returning different results.

3) There's a lot more they can be doing to gather entropy, especially for bad situations. As I described in my other article, grabbing hardware serial numbers, perhaps from the network devices can help some situations. SIOCGIFCONF and friends can help, and on some OSs, that will also return some real entropy from the network devices. There's getifaddrs() too, although that can be error prone if you're not careful, and you really need to test how you're parsing the returned data if certain whacky network devices are present (with SIOCGIFCONF, you can just hash the buffer as a whole, and the only parsing you need is perhaps for simple SIOCGIFHWADDR calls on OSs where available).

On x86 CPUs, the cpuid instruction can also return some unique to processor details, and on modern ones, entropy as well, as some cpuid calls return details about temperature and power.

4) Taking the address of main is a bad idea. Depending how the library is linked/loaded, that's going to fail.

5) Based on the idea of taking addresses, sbrk(0) should be mixed in. Address of environ can be good too. Linux has 3 pointers, etext, edata, end, on Solaris, _etext, _edata, _end, which point to different areas of the memory layout should be mixed in too. Some other C functions may also be in separate libraries and can move around a bit, for example, malloc, clock_gettime, and pthread_mutex_lock, so those are good in addition to printf.

So if clock_gettime() is present, it'll use the most accurate real time, and for those also supporting monotonic time, the most accurate one of that. Of course their approach of mixing in multiple of these is good, but they're not always hitting some of the more important ones.

7) The use of gettimeofday() when you have clock_gettime() is pretty pointless. However, alongside clock_gettime() of day, it can be useful to use clock_getres() for each timer, as those will differ system to system.

8) times() may have other useful data to mix in.

9) On Linux, there's probably a few more getauxval() values that can be useful to mix in. A fallback for implementing that function for older GLIBCs wouldn't be a bad idea either.

>You mentioned you have your own libraries, is there anything you do in your libraries you have not covered?

All the important things I do is more or less covered in the articles here, and the resources I link to.

As far as collecting entropy goes, the biggest difference between my implementation and LibreSSL's is that I have a list of the random device numbers for OSs where they do not fluctuate (the BSDs fluctuate), and on some OSs, use their APIs for getting the device numbers (for example NetBSD provides such an API). These device numbers are used to verify the devices being opened, and also attempt to mknod() them if they do not exist in the usual location.

For the OSs that support it, I also attempt to mount devfs or devtmpfs.

>If all your suggestions are implemented, would you yourself use the LibReSSL RAND API?

No, I wouldn't.

Today, with the proliferation of virtual machines, especially for hosting, traditional random APIs no longer cut it. As I mentioned in my other article, virtual machines can be rolled back to a particular state, or are sometimes cloned from the same state. These things happen in real hosting environments in response to malfunctions, attacks, or heavy load.

Mixing in the current time (and perhaps MAC address) into the entropy upon the initial machine boot is crucial (see my previous response regarding combining /dev/(u)random with the other functions). Now while that will guard against several problems, it wouldn't help against live state loading, nor cases where the VM hasn't updated the real time yet.

Buffered APIs like arc4random*() and RAND_*() layered over it will be generating identical data in these situations, and can be attacked. Mixing in the current time and any hardware RNGs if available on every use can help, but you really need to be using hedged cryptography.

The APIs provided by OpenBSD and LibreSSL also don't cover all the scenarios I need them to. For example, I sometimes need a uniform number of a particular type within a particular range, and arc4random_uniform() doesn't actually provide uniform results, doesn't simply allow for ranges, and only works with a single type.

When using a standalone command to generate RSA keys, I'd also prefer if it blocked on /dev/random, but this point is debatable.

What do you mean? I see this in the manual "arc4random_uniform() will return a uniformly distributed random number less than upper_bound. arc4random_uniform() is recommended over constructions like “arc4random() % upper_bound” as it avoids "modulo bias" when the upper bound is not a power of two."

"only works with a single type"

So what? You can easily extend the functions to other types as needed.

The essential of the idea is that you should ensure the randomness generated is generated specifically for whatever it is you're generating it for. So for example, if you're generating a random temporary password for a user, you mix the user's ID or user's name into the entropy pool before obtaining the random data. If you're generating a temporary symmetric encryption key for a remote connection, you can mix in the IP address, and so on. This ensures that if two different users or machines or whatever are working off the same state, they'll still get different results between the two of them.

This will need a slightly different API, and applications have to make use of it where applicable, but OpenBSD doesn't even provide an underlying API to accomplish this. They used to have an arc4random_addrandom() function which could theoretically get the job then, but they since removed it. In any case, that API was poorly designed, the hedged randomness function should combine the various functions as needed internally to prevent misuse, and ensure there aren't any gotchas occurring between the two calls.

>I see this in the manual "arc4random_uniform() will return a uniformly distributed random number less than upper_bound. arc4random_uniform() is recommended over constructions like “arc4random() % upper_bound” as it avoids "modulo bias" when the upper bound is not a power of two."

Yes, but that's not actually true. Like most naive implementations out there, the results aren't uniform and contain bias. Test it yourself if you don't believe me.

Unfortunately, with the UNIX principal of do just one thing and do it well, there's a severe lack of higher level functions in the various APIs. Sure, every now and then someone invents a higher level function like asprintf(), but for the most part, you find UNIX libraries filled with code repetition, and OpenBSD is no exception.

Take this example, which I noticed someone pointed out on OpenSSL Valhalla Rampage, see: http://www.openbsd.org/cgi-bin/cvsweb/src/lib/libssl/src/crypto/engine/eng_list.cIn revision 1.11, issetugid() was placed before every getenv() call throughout the library. This kind of usage already indicates there should be a combined function. If not in the C library, at least internal to LibreSSL itself. In fact, GLIBC already noticed this issue is frequent and supplies a function secure_getenv().Now look at revision 1.12, theo had to correct a mistake he made in using these two functions separately. That would likely not have happened if there was a combined function call available.

POSIX is finally starting to get it, and is adding more functions for atomicity reasons. Unfortunately this messed up development style and the ensuing issues and complications is quite prevalent throughout OpenBSD and other projects. As I mentioned in this article, read() is problematic, and generally needs to be wrapped in checks for EAGAIN, EINTR, and general looping. Yet these projects generally just employ the same code repetition everywhere leading to ugly bloated code and potential problems.

I'm all for not complicating the underlying C library, and providing new functions which may have questionable APIs ending up in public use. But something of LibreSSL's nature should have its own internal higher level functions to use wherever needed. I'd welcome another several thousand line source reduction by doing so.

Which can be used for generating temporary passwords using certain allowed characters, ensure the returned random string is hedged, and null terminated.

Now such functions could easily be built from more primitive functions, but we want to ensure our developers use these things correctly, and provide higher level functions for both ease and correctness. It's a good thing too, because shortly after creating these, a coworker of mine pointed out how these two functions had a subtle bug due to the interactions of the primitives used to build them. I'm glad I only had to fix that in one place, and not throughout a ton of code recreating these.

To give you an idea of what I'm talking about here, my random API has over 70 functions. My string APIs (which go way beyond the silly things like strcpy(), strncpy(), strlcpy()) has several hundred functions. With consistent naming and APIs which were well thought out, developers also remember the different ones, and how they work and don't need to keep looking them up. More than once, one of our developers came across a certain situation where they needed some particular string handling function, just typed in what they thought it would be called if it existed along with its parameters, and figured they'd go add it to our massive list, but turns out it was already put there by a previous developer, along with the API they thought it would have. Now that's intuitive and useful, but unfortunately rarely exists, and all too often, a lot of functions leads to confusing misdesigned APIs. So don't expect such things to work in anything as messed up as a public C library. More often than not, in such cases, you end up with the garbage Microsoft provides.

Note: I'm posting this from my business account because I can apparently only post with a Google account here, but my employer has nothing to do with what I write here. It's not that I can foresee any conflict of interest, but you can never be too careful with these things.

> Rip out the random function and replace with one of your own which returns 0, then 1, then 2, etc...>> Replace the loop with a failure mode.>> Collect statistics on returned data.

Assuming I understood you correctly, I did exactly that for a few different upper bounds (not exceeding 42) and found no differences between the expected and actual distributions. Could you clarify when the problem occurs? Does it only happen with large upper bounds? Could you post some sample code, perhaps? I've studied this function in serious detail and am absolutely convinced that it's 100% correct, so your claims that it does the wrong thing have me puzzled. Also, if this is (in your words) a naive implementation, what would a proper implementation look like?

I modified the function to use a uint8_t instead of uint32_t, upper bound of 255 (254 really the way it works), and collected data. Suffice it to say, it fails, although performed better than some other approaches I saw.

I can't comment on using it with 42, I'm not aware of any native integer size which provide a limit of that number.

The function is incorrect, because the distribution is not uniform. For every single upper bound, the amount of hits for each number below it should be equal to every other number. For the maximum, the value of every number should be exactly 1. For some values for various upper bounds, you'll have results which do not match that.

If in proper testing, it did work correctly, perhaps that was of a version other than the ones I looked at. What I said certainly held true of the latest version in CVS at the time I wrote the above. I can double check with latest CVS again when I get a chance.

If need be, I can write an article about proper testing and implementation.

That function did not change in the OpenBSD tree since 2012, although recently it was moved to a separate file, without any changes to the actual code.

I did not change the return type of the random function, but 42 was the largest value of the upper bound argument I supplied to the uniform distribution function. Larger values require a larger array to store the results and more time to verify them, so there is an upper limit to the values that are feasible (although it's obviously a lot higher than 42).

I you changed the type to uint8_t, you may have run afoul of integer promotions. I would check that first if I were you. Of course, if you post your code here, everyone can take a look at it to see what your approach was.

Well, I feel like we're going in circles. Yes, it might be interesting to write an article about how to handle this properly, but in the meantime there's a more immediate issue to take on.

You wrote that arc4random_uniform is not implemented correctly. If this is true, it's a serious problem that needs to be solved immediately, given that this function has become something of an informal standard that half the world is now using. However, if it's not true, we need to clear that up, so as not to cast unnecessary doubt on a function that's actually correct.

I agree that only an exhaustive test can serve as proof that a function is correct. In fact, even an exhaustive test isn't necessarily sufficient, given that there are differences between platforms, compilers, etc. I did not claim that I performed an exhaustive test, but I did try to keep the alterations to the function minimal. I was already convinced that the function was correct based on an in-depth analysis of its code, and running the test (based on my initial understanding of what you did) reassured me. It is, however, still theoretically possible that I'm wrong and that the function is defective. Given that I was the one who suggested the latest change to this function, after MUCH deliberation, I'd like to consider myself something of an expert ("in a very narrow field", to quote a recently released humorous movie about being the only engineer in business meetings). That's why I believe I may know the explanation for your test results.

What you seem to have done (again, this is based on my understanding of what you wrote here, so please do correct me if I'm wrong) is to replace uint32_t with uint8_t and test that version exhaustively. While you may have performed an exhaustive test on your altered version, you did not test the original function exhaustively, nor did you technically test the original function AT ALL. This is important, because it means that conclusions about the correctness of the altered function (no matter if they are positive or negative) do not necessarily carry over to the original function.

Allow me to elaborate. There's an implicit assumption in the current code that uint32_t is not a candidate for integer promotions, i.e. that the arithmetic will be carried out in 32 bits, not in 64 bits or some other number of bits that is not 32. This assumption holds for all platforms on which OpenBSD runs, and for the platforms that have imported this code it probably holds as well. However, similar assumptions do not hold for uint8_t. Such values will be promoted to 32 bits (signed, probably) and that will affect the result. So, if you replaced all occurrences of uint32_t with uint8_t, you broke the code. Results obtained by running the altered function will be incorrect. However, this does not say anything at all about the correctness of the original function.

Will you help me confirm my hypothesis? If so, and if my analysis of how you performed the tests is correct, then please correct your altered version and report back on your findings.

My testing framework includes many uniform functions found in various libraries and books (most of them considerably easier to correctly port to uint8_t and this API than OpenBSD's method). Nearly every implementation I've ran across fails an exhaustive test (and nearly every port was double and triple checked for correctness), and it seems in my haste to recently add OpenBSD's to it, I mistakenly casted the wrong component, and assumed that like many other implementations it carried similar mistakes.

I'm sorry for the false alarm, and can now confirm with exhaustive tests that OpenBSD is one of three implementations that I know of that are correctly implemented.

If one cannot admit when they make a mistake, they should not be allowed to write anything.

>What are the other two correct implementations, by the way?

The other implementations are libottery and a corrected version I wrote based upon the one found in the Secure Programming Cookbook (In the SPC itself, it mistakes > for >=, and mishandles the case where the specified range is the entire range of possible values, but fixing it is easy).

Since I now have 3 working implementations, I ran an additional statistic regarding reloop, and noticed that OpenBSD unlike the other two has optimal relooping in all scenarios. The other two both cut off a whole set of random values when the range is a power of 2 minus 1. Meaning their retry random value is a bit worse than 50% on power of 2 minus 1. Nothing terrible, but can be slightly slower to find a random value in those cases.

Additionally, OpenBSD and my corrected SPC implementation has a faster algorithm than libottery, as the last division operation for the formers is only ran when the value is valid, whereas the latter need to run a division upon each random value received, even if thrown away.

Interesting stuff. It looks like there would be more than enough material for a real article, if you can find the time for it. This subject is still poorly understood by many, but the awareness is growing. A whole generation of programmers has grown up with the modulo trick, without realizing that it's subtly (and sometimes not so subtly) wrong. They are now slowly waking up to the reality that they can't do it that way if the distribution has to be truly uniform.

> It looks like there would be more than enough material for a real article, if you can find the time for it.

Unfortunately I have a backlog of 3 articles that are partially completed, and I keep getting requests from others (many regarding LibreSSL, especially lately regarding fork() protection), and I still have real code that needs to be written.

> A whole generation of programmers has grown up with the modulo trick, without realizing that it's subtly (and sometimes not so subtly) wrong.

Indeed, however it's so much worse, and there's many libraries and books which claim to offer something better, and sometimes they are a bit better, but fall short of the ideal.

Of the ones I reviewed, the algorithms are usually far off the mark. The only reason why I corrected the one from SPC is because it was so close to correctness. Perhaps in later printings of the book they even corrected it, I know they used to publish a lot of errata.

As I mentioned above, there's also so many other issues with random related functions. Needing random strings is common, and some of the solutions I've seen are horribly broken. If you think uniformity is bad, you should see how some developers try to generate a password which matches minimum requirements of certain classes of characters (I saw one implementation which affixes A0! to the end of the string, and fills the rest with mersenne twister % 26 + 'a'). I've only ever seen one correct implementation for this. Nobody gets floating point random correct either in a portable manner, and I haven't come across a single ideal implementation yet.

Jorden, regarding your comment about people not necessarily knowing why modulo is wrong, I did write about it here with pertinent details and an example, along with a number of other issues: http://insanecoding.blogspot.com/2014/05/dealing-with-randomness.html

I tried to be brief with this and the other issues, leaving just enough to get the point across. If there needs to be more elaboration on certain issues or solutions, I can do so.

Yes, you mentioned but the problem, but not really the solution, apart from "So in the above scenario, if 30 or 31 occurs, a new random number should be chosen". While that is technically correct, I would not trust most programmers to write a correct implementation based on that sentence alone. ;)

I came up with this approach independently (although, in my case, it was an optimization of my earlier "just keep requesting random numbers until the result is in range" approach, which (for single bytes) isn't actually as unreasonable a solution as it may sound) and wrote my own implementation a bit more than two years ago, only to find out that I wasn't the first to think of that solution. I know of several implementations, but you seem to have done more in-depth research.

It seems that the results of your testing would be an interesting basis for an article. Of course, I understand that you may have other priorities.