random will always be random..., unless you go with a true round robin
–
ratchet freakSep 1 '13 at 15:23

4

Do you understand about variance in random distributions? What you've described is an example of a multinomial distribution, so the variance in each of your variables will be n * (0.33) * (0.66), where n is the number of trials. Your results will always jitter around the mean values of 0.33.
–
Charles E. GrantSep 1 '13 at 16:18

For what it's worth, I just ran a variant of that algorithm using FreeBSD's rand() routine, and while it didn't look particularly balanced by eye, it never approached the skew in the results that centosuser posted.
–
Aidan CullySep 1 '13 at 17:25

Random means having a "lack of pattern or predictability in events". Your definition of random seems a little different. Given a trillion tries you'd expect a random distribution to be approximately 1/3 1/3 1/3, but only approximately. If with only 1000 tries you got 333, 333 and 334 I would strongly suspect that the random function was broken
–
Richard TingleSep 2 '13 at 12:11

3 Answers
3

(This part of the question was associated with revision 2 and is still quite applicable to that problem)

The standard approach to doing this is to populate an array with the appropriate weights:

[x, x, x, x, x, x, y, y, y, y, z, z]
1 2 3 4 5 6 1 2 3 4 1 2

and then pick one at random.

Please note that random is random. It is very unlikely that you will ever a perfect distribution.

If you want to guarantee the distribution, you need to do (mostly) away with the random.

Instead, shuffle the above array. For example, one shuffling would give you:

[x, y, z, y, x, x, y, x, z, y, x, x]

And then loop over the array again and again returning the current index. It isn't random anymore, but it will guarantee the distribution remains what you are looking for always. This is just a slightly more 'random' version of a round robin distribution.

If this is attempting to load balance something, the round robin is a better approach and you might want to consider a evenly distributed round robin:

[x, y, x, y, x, z]

This has the same distribution as above and tries to keep everything at an even distance so not as to saturate any one resource.

As an aside, to the question of rand being poor, you may be dealing with an older standard library. From rand3 man page

The versions of rand() and srand() in the Linux C Library use the
same random number generator as random(3) and srandom(3), so the
lower-order bits should be as random as the higher-order bits.
However, on older rand() implementations, and on current
implementations on different systems, the lower-order bits are much
less random than the higher-order bits. Do not use this function in
applications intended to be portable when good randomness is needed.
(Use random(3) instead.)

This can be demonstrated using the following C program that spits out the consecutive low byte and low nybbles from rand(). I happen to have a more modern system and don't have access to anything that might demonstrate this low order less randomness.

You can test it on your system with the following code which looks at the low byte, low 4 bits, and low 2 bits.

The reason I am testing the low two bytes is that you are doing things with % 9 which works only with these low bits. You may find that your random number generator is old and might need to implement your own (if that degree of randomness is something that you need to work with).

One approach would be, well, to implement your own. The Mersenne twister is quite well regarded and you can find implementations for a wide range of languages quite easily.

The other thing to do would be to shift the random number you get down 8 bits to get rid of any low order less randomness and use those bits.

You can get an idea of the poor quality of rand in some generators from Random.org.

For, example, rand() called by php on Windows produces the following:

(from Random.org)

Which is quite certainly, not a good random.

There's a bit more to this and math that can cause some uneven distribution even with even distributions.

The code:

double r = rand() / (double)RAND_MAX;

rand() returns a number in the range [0,32767]. This range isn't evenly divisible by 9 (or 3). Thus, some values will be over sampled. Assume we are dealing with % 100. The values from [0,67] will be found more frequently than the values from [68,99]. This is introducing a bias to the random which is problematic.

No matter how you try to fix this with rand, you will have this problem you will run into the pigeonhole principle which states that if you try to put n items into m containers where n > m, at least one container will have more than one item. This seems obvious but its what you are running into with trying to map rand into another range.

If you really only want a range of [0,2], there are appropriate calls with that can give you the uniform distribution that you are after.

The following ode (from video linked below) uses the Mersenne twister to produce a uniform distribution:

Note that this is using a deterministic engine with a random seed. It is not cryptographically secure, but it does have high quality fast random data. The key is that you are using the distribution correctly. You could use random_device for all your data, but thats consuming entropy from your system that you probably don't need to spend there (running out of entropy on ServerFault). The twister is good enough for random load balancing.

A really good watch on this is the video rand() Considered Harmful which addresses lots of things about rand. I've borrowed much of this section from that video.

Particular points from the video:

Non-uniform distribution (3:56 .. 5:50 )

(src * 1.0 / RAND_MAX) * 99 is hilariously non-uniform (6:12 .. 7:29)

2.0000001 problems (10:40)

Hello, Random World (12:58 .. 15:30)

If you really want things to be perfectly balanced ((333, 333, 334) events for each of three choices 1000 times), this isn't random. Use a round robin instead.

round robin is good when the weights is equals ,,, but when the weights are different ,,there isn't any better solution than using the random ,,but the problem that it is not accurate
–
user101114Sep 1 '13 at 15:57

@centosuser using any weights, you can shuffle the weighted array and loop over that. It doesn't matter what the weights are. It is also possible, that you've got a poor random number generator, and I went into that area with this latest edit.
–
MichaelTSep 1 '13 at 16:42

@CharlesE.Grant Its also possible, that he's dealing with a poor random number generator. There are still quite a few of them out there. Combined with looking effectively at %9 he's often looking at the low bits which are often less than ideal. There are some approaches to addressing this (like shifting the low bits off or using a different random function).
–
MichaelTSep 1 '13 at 16:44

It's probably not perfect (I think RAND_MAX might give a Y instead of Z or something like that), but if the random number generator works properly, you'd expect to see a good distribution of X, Y and Z. The expected deviance from a uniform distribution would be less than the sample size you're using now.

This will guarantee that you always have an exact distribution, but the order will be as random as rand() can make it. This works well if the number of choices is low, and you know how many trials you will have ahead of time.

It depends on what you want. If you want something to give a good weighted distribution with infinite trials, use @MichaelT's answer. If you want to guarantee an exact distribution and know the number of trials, the do something like this.