The following code comment comes from PHP, a free and open source project. I have done my own research and I cannot find any evidence to support the argument made in this code comment. Thus the only tool to support such this statement is mathematics.

Prove or disprove that that taking the modulus of a random number n for any value of M such that M<n decreases entropy. Or to clarify the operation is: n mod M and the constraint is for any value where: M<n , M>0. The 2nd part of the proof is to show that n' = a + n(b-a+1)/(M+1) does not decrease entropy under the same constraint: Where n is any random value and M is any value smaller than n and M greater than zero.

Or to put it another way how is the value produced by n' = a + n(b-a+1)/(M+1)more random than the value produced by n mod M for the same values of n and M such that M<n.

Any helpful information on this topic will earn you a +1 by me. Proving that n mod M skews the distribution of random values over a range, would be a HUGE accomplishment in my humble opinion.

Thank you for your time.

/*
* A bit of tricky math here. We want to avoid using a modulus because
* that simply tosses the high-order bits and might skew the distribution
* of random values over the range. Instead we map the range directly.
*
*
* We need to map the range from 0...M evenly to the range a...b
* Let n = the random number and n' = the mapped random number
*
* Then we have: n' = a + n(b-a)/M
*
* We have a problem here in that only n==M will get mapped to b which
*
* means the chances of getting b is much much less than getting any of
* the other values in the range. We can fix this by increasing our range
* artifically and using:
*
* n' = a + n(b-a+1)/M
*
*
* Now we only have a problem if n==M which would cause us to produce a
* number of b+1 which would be bad. So we bump M up by one to make sure
* this will never happen, and the final algorithm looks like this:
*
*
* n' = a + n(b-a+1)/(M+1)
*
* -RL
*/

3 Answers
3

Suppose you choose a number n randomly from {1,2,3} and then calculate n mod 2. Then you will get 1 twice more frequently than 0.

Another reason why code like this avoids taking the modulus is that for many old pseudorandom generator routines the lower-order bits were not as "random" as higher-order bits. This is no longer a problem with modern libraries.

Let's clean this up. First, assume a = 0 (if not, just add it on at the end). Second, don't deal with closed intervals (0 <= n <= M, 0 <= n' <= b) but with half-open intervals (0 <= n < M, 0 <= n' < b). This second simplification is exactly the 'bit of tricky math' in the comment, which just replaces M by M+1 and b by b+1.

So now we have a random number 0 <= n < M, and we want to transform it into a random number 0 <= n' < b, where b <= M. This is an interesting question (if not necessarily the one that the OP asked). The solution in the OP's code fragment is to take n' = nb/M. Assuming that b doesn't divide M exactly (for then there is no problem), n' is not uniformly distributed: it takes a number of values (in fact, M mod b of them) with probability ⌈M/b⌉/M, and the remaining values with probability ⌊M/b⌋/M. Exactly the same is true if we take n'= n mod b; there is no statistical difference.

If M/b is big enough, then ⌈M/b⌉/M and ⌊M/b⌋/M are nearly equal; and this may be good enough for your application. If not, then the simplest solution is the following:
Loop:
Generate random 0 <= n < M
if (n < M - (M mod b)) return n mod b
goto Loop

The problem is that this algorithm is not guaranteed to halt! But you can get a closer and closer approximation to uniformity by setting a bigger and bigger bound on the number of times the loop is executed.

Note that for this algorithm, it is important to use n mod b instead of nb/M, because the check for uniformity (n < M - (M mod b)) is simple.

Yes, that's right, George. I do not know how this relates to practical issues in the original program or in your program, and I especially do not know what situation would allow one to define a value for my letter $l$ and then not allow my $k = l.$ But, assuming you are required to take $k > l,$ note that if $k$ is very much larger than $l$ the bias towards some numbers will be slight. $$ $$ Anyway, look, I make a standard offer, if you click on my name there are instructions for finding my email address. You may write to me if that might help, I have done enough programming.
–
Will JagyAug 14 '10 at 21:55