Eric Lippert's blog

Main menu

Post navigation

Producing permutations, part six

Last time in this series I presented an algorithm for generating a random number, and the time before I presented an algorithm that turns such a number into a permutation, so we can put them together to make an algorithm that returns a random permutation:

Is this actually a correct algorithm for generating a random permutation? Give it some thought.

Turns out, surprisingly, no, it is not correct. Now, there’s no flaw that I’m aware of in the logic or the mathematics that underlies it. Rather, the problem is that the Random type is a big fat lie. We’d like to have the property that a random permutation generator produces every possible permutation with equal likelihood; that is, it should be unbiased. This algorithm is deeply biased because Random is not a strong enough source of random numbers. Some permutations are less likely than others.

So what’s the big deal? If there were just a small bias, that would be unfortunate but not fatal. It’s actually worse than that. With this bad source of randomness, given a portion of a permutation, you can work out what the rest of the permutation likely is.

And that’s my challenge to you. I’ve posted some code below, putting together everything we’ve seen so far in this series. The first five card dealt when I ran the program were:

39: Ace of Clubs
11: Queen of Spades
29: Four of Diamonds
5: Six of Spades
20: Eight of Hearts

My challenge to you is: what are the next five cards that I generated? And…. go!

25 thoughts on “Producing permutations, part six”

Well, since you’re re-using the same instance of Random, the same sequence of numbers will be generated for a given seed. That’s actually a pretty useful property for online games. What I wonder is: if you create a new instance of Random in the RandomFactoradic method, and therefore get a time-based seed, will the bias still be there?

7: Eight of Spades
31: Six of Diamonds
8: Nine of Spades
47: Nine of Clubs
1: Deuce of Spades

KooKiz – While bias could be a problem in a better algorithm (see the common bias in incorrect implementations of Fisher-Yates), the larger problem here is that there are only a few billion possible decks of cards (the seed is a 32-bit integer), while there are 52! distinct shuffled decks of cards. 2^32 / 52! = 5.3249002 x 10^-59. That’s a teeny tiny percentage of possible decks.

There are 52*51*50*49*48 = 311875200 possible first five cards, so with 2^32 seeds we should see on average 13.77 seeds for a list of 5 starting cards. Given the sixth card, we should see about 0.29 seeds. The first seed I found that matched was 150272048. Will continue the hunt given the Trey of Harts.

You are correct that I underspecified the problem; I didn’t actually bother to check if there were collisions.

I presume that you removed the error checking to make it run faster.

I tried for a while to come up with a technique that was faster than brute force. I think there is one, but I was insufficiently clever. (Clearly there are much faster techniques if you have a less complicated shuffle algorithm, like Fischer-Yates but I couldn’t come up with one for this shuffle.)

Are you making the assumption that the Random ctor without constructor uses Environment.TickCount wich increment the value only by 15 or 16 (avarage 1000/64)? In this case, I think the sixth number isn’t needed.
Knowing Eric’s habits regarding PC reboots would help too.

15: Trey of Hearts
3: Four of Spades
16: Four of Hearts
33: Eight of Diamonds
43: Five of Clubs

The Random seed for this is -781599298.

It was done via a brute force search, but by rewriting the NthPermutation method to short-circuit as soon as the RandomFactoradic value (index parameter) was known to produce a bad permutation (for example, placing any of the numbers 40-51 in one of the first 5 positions of the permutation).
My final NthPermutation method didn’t actually produce a Permutation value, as that wasn’t necessary. It simply returned a bool if the index parameter would cause a proper permutation to be produced.

In case people were interested, here is the rewritten NthPermutation method, now called PermutationCheck. It uses the position from permutation.InsertAt() and validates that putting the value at that position results in a Permutation that gives the correct first 6 cards. Literals are off by 1 from expected due to InsertAt() being called with size – 1. Position values aren’t 0,1,2,3,4,5 because the larger values aren’t yet in the permutation when InsertAt is called for the smaller values.

Here, again, is the PermutationCheck() method. It uses the position from permutation.InsertAt() and validates that putting the value at that position results in a Permutation that gives the correct first 6 cards. Literals are off by 1 from expected due to InsertAt() being called with size – 1. Position values aren’t 0,1,2,3,4,5 because the larger values aren’t yet in the permutation when InsertAt is called for the smaller values.

I let it run all night, and I got the same result as Joel, except that for me the seed is *positive* 781599298, not negative… I didn’t optimize as heavily as Joel though, I just removed the error checking.

I wonder if there is a more efficient way than brute force to solve this problem. Knowing an approximate value of the system’s TickCount would help a lot of course, but TickCount doesn’t seem to always be accurate: I usually shut down my computer every night, and the TickCount reports a duration of more than 23 days…

Just had a look at the Random source code (http://www.dotnetframework.org) and I see the absolute value of the seed is used (or Int32.MaxValue in the case of Int32.MinValue). That may have been the easiest and best optimization I could have done.

Looking at the source code (thanks Joel, it is much faster than looking at ildasm) the pseudo-random algorithm isn’t all that good, at least if non-reversability is desirable, even tought there are a lot of operations to fill the SeedArray they all happen to same places for every seed, the InternalSample is weak as well, it is possible to brute force much cheaper than creating a Random instance for every possible seed by just producing and looking at needed values, also keeping a track of those operations may reveal how to reverse them, to some point, but I think than humans trying to find the reverse will take more time than computers trying to brute force…

I wonder if Neil Stephenson’s Cryptonomicon was an inspiration for this challenge, along with the detailed description of the Solitair encryption in the afterword to the novel. In any case, this random permutation stuff, complete with high-quality randomness, simply begs to be used for encryption.

Hi Eric, I’ve noted with much interest the behavior of Random as discussed here in your bog. Now what I have is more of a question than anything. If one wanted to minimize the bias, would using code such as this really suffice?