I'm just reviewing a program someone wrote to provide high quality random data using an entropy collector and a hash based whitening technique. I'll try summarize the process:

Collect entropy from sources into separate arrays e.g.

milliseconds between key presses

milliseconds between mouse clicks

mouse movements x + y concatenated together

All the separate numbers are joined into a single array of values (a + b + c).

The array is mixed using a Knuth shuffle using random numbers from the programming language's psuedo random source. I believe a Knuth shuffle works its way through an array in reverse order swapping the current item in the array into a random position in the array.

The items in the mixed array are concatenated together and converted to a single string.

Each 512 bits of the string is iteratively run through SHA2 to give a uniformly distributed 256 bits of output per 512 bits of input. For example if a total of 8192 bits of entropy was collected, the total output would be 4096 bits.

The quality of randomness probably depends on how random the input is, e.g. how randomly the user moved their mouse. It would seem difficult for a person to repeat the same mouse movement path in the same way twice. Let's assume for now the user moved it very randomly and not in a repeatable fashion.

A sane algorithm would be secure if the entire seed material contained enough (somewhere between 100 and 200 bits) entropy. Your algorithm is only secure if each 512 bit block contains enough entropy. So it's clearly a bad design.
–
CodesInChaosOct 6 '13 at 11:36

Personally I have a hard time understanding the behavior in time for the above. If you just want to create the output of the pseudo random number generator, just supply the above entropy as (additional) seed to an existing PRNG.
–
Maarten BodewesOct 6 '13 at 11:59

1

The idea is probably to create a TRNG by parcelling out hardware noise (keyboard/mouse) in 256bit blocks. I don't think the entropy source is 256 bits per 256 bit input though.
–
LateralFractalOct 6 '13 at 12:02

1 Answer
1

The entropy of a PRNG depends on the quality and amount of "true" random noise used to initialise the PRNG and how much bits the PRNG produces before being re-initialised. A fresh PRNG might be as high quality as your entropy collection above, albeit often a different source such as CPU heat.

The method you cite will not add entropy but will spread the entropy fairly evenly within each 256 bit SHA result (Step 3 can be dropped).

If you want to know the level of actual entropy, just save a lot of results from Step 2 and check with a test suite such as NIST's statistical tests.

The key concern you should focus on is not spreading the entropy (even a non-cryptographic hash like MurmerHash will suffice*) but improving the quality and bitrate of the random noise source.

* If the random source is truly random as it should be, a cryptographic hash isn't needed. If it isn't random, a cryptographic hash reduces inferring the signal in the noise.

EDIT

Regarding Step 1 - If you want stick with keyboard/mouse for random noise input, I suggest you use an approach similar to HotBits timing.

Only the difference between two or three timings (I forget which is used to cancel clock drift) are used to create a single bit of entropy; a mouse could use pixel travel distance per fixed interval.

As the timings aren't as random as beta decay, you'll need a NIST test suite or similar to decide on how many bits you need to collect for each SHA-256 hash. Hint: It will be much more than 256 bits.

Great, thank you for your answer! If step 3 can be removed as it was mixing all the entropy, perhaps now all the entropy being collected in step 1 should be put in a single array as each number is being collected? Therefore the order of arrival will also influence the output hash due to the avalanche effect. Would 2:1 or 3:1 input to output ratio on the hash would be sufficient do you think? For hotbits collection can you explain how that works in simple terms? Instead of a hash, is Von Nuemann Whitening better or worse in this case?
–
John LynxOct 7 '13 at 3:41

1

@JohnLynx I couldn't find an article on average entropy of keyboard and mice movements, so you'll need to test this yourself. Especially as entropy will usually be implementation specific. Applying randomness tests after hashing will give a false sense of randomness due to the "whitening" of the hash; so collect a lot of Step 1/2 data then run the test suite. I've no idea what the ratio would be because humans are bad at guessing randomness.
–
LateralFractalOct 7 '13 at 4:22

OK thanks. Just out of interest, for randomness tests what input format is it expecting the data in? Do I load in an ASCII file of numbers?
–
John LynxOct 7 '13 at 7:21

1

@JohnLynx I would refer you to the NIST test suite, but the current libertarian shutdown of the United States limits that option. You can download a non-authorative mirror copy from here, but without reading the unavailable NIST help guide I couldn't say how exactly you'd use it. GSL (GNU Scientific Library) is a diluted less-cryptic alternative but you'll need to install Cygwin if you are using Windows; and it requires gcc compilation because of course it does it's a #$%^ GNU app...
–
LateralFractalOct 7 '13 at 7:53