(I did repeat this for 31bit safe prime generation and got almost the same proportions.)

Findings:

the openSSL output covered only about half of all safe primes in the given
range (2^32 -> 3059799)

also within the range of the returns only 90% of the safe primes available where reported

I would be interested to understand why openSSL safe prime return is restricted in such way and if this is considered an optimal result having in mind the crypto functions included in the library that rely on the safe prime generator.

1 Answer
1

The smallest safe prime you got from OpenSSL was 3221226167 = 0xC00002B7. The largest was 4294967087 = 0xFFFFFF2F. This makes me hypothesize that OpenSSL is setting the two high bits to one, and choosing the rest of the bits randomly. If that is accurate, that would explain the range of primes you did.

As far as why you found many more safe primes in the range between 0xC0000000 and 0xFFFFFFFF, I don't have a good explanation for that. I would have predicted that your experiment with OpenSSL should turn up approximately 96.2% of all safe primes in that range. I don't know why you only saw 89.1% of the safe primes in that range. Is it possible that OpenSSL has a slightly different definition of safe prime than what you were using in your exhaustive prime sieve?

Oh, why 96.2%, you ask? It has to do with the coupon collector's problem. If you have $n$ different objects, and you randomly draw $n$ times (sampling with replacement), you don't expect to observe all possible objects. Instead, some objects will have been chosen multiple times, and some will have been chosen only a few times. In your case, there are apparently 3059799 different safe primes (objects). You draw $10^7$ times (with replacement). What fraction of safe primes will be missed? Well, let's fix a single safe prime. The probability that it is missed on any one draw is $1-1/3059799$. The draws are independent, so the probability that it is missed on all of the draws is $(1-1/3059799)^{10^7}$, which is approximately $0.038$. In other words, we'd expect that a $0.038$ fraction of safe primes will never have been output by OpenSSL, in your experiment, assuming OpenSSL samples uniformly at random from all possible safe primes. This corresponds to covering 96.2% of the safe primes during the $10^7$ draws you did.

Another possible explanation for your observed data is that maybe OpenSSL does not choose uniformly at random from all possible safe primes in that range; maybe some safe primes are slightly more likely to be chosen than others. For instance, one possible method for generating a prime number at random is to select an integer $m$ at random, test if it is prime; if not, test $m+1$ for primality; if that's not prime, test $m+2$; and so on, continuing until you find a number that is prime. This procedure does not generate the uniform distribution on primes of a certain size: primes that are preceeded by a large gap (a long stretch of composite numbers) will be more likely to be drawn than others, and primes $p$ where $p-2$ is also prime will be less likely to be chosen. It is possible that OpenSSL is using some sort of random prime generation strategy like this. If it is, then this affects the coupon collector's calculation, and we'd expect to see somewhat more repeats. So, your observed numbers are consistent with this hypothesis. If this is how OpenSSL is generating primes, it is a deviation from the uniform distribution on safe primes, but this deviation is not believed to cause any security problems. The number of possible safe primes is still ginormous (the entropy of the distribution is still plenty large), so there's no known way to take advantage of any non-uniformity that results.

This gives you some hypotheses and potential explanations you could test further. An authoritative answer might require spelunking through the OpenSSL code (which is not much fun at all!).

'not much fun at all' is the understatement of the century! I weep quietly every time I dive into the OpenSSL source.
–
ReidSep 14 '13 at 2:09

@DW: i had a slightly different hypothesis (from dreaming tonight ;-): maybe openSSL does prime-generaton within 2^31<p<0.5*2^32 and then tests 2p+1 for primality. That would fit the resulted safe prime range and it would be consistent with your explanation of the coupon collectors problem (thanks for this in particular). And yes, there is a somehow strange distribution of the frequency of being chosen (chosen safe prime histogram remembers a 1/x graph). (By the way: i looked into openssl source but nearly lost my confidence)
–
ABriSep 14 '13 at 8:21