My current project, succinctly, involves the creation of "constrainably-random events". I'm basically generating a schedule of inspections. Some of them are based on strict schedule constraints; you perform an inspection once a week on Friday at 10:00AM. Other inspections are "random"; there are basic configurable requirements such as "an inspection must occur 3 times per week", "the inspection must happen between the hours of 9AM-9PM", and "there should not be two inspections within the same 8-hour period", but within whatever constraints were configured for a particular set of inspections, the resulting dates and times should not be predictable.

Unit tests and TDD, IMO, have great value in this system as they can be used to incrementally build it while its full set of requirements is still incomplete, and make sure I'm not "over-engineering" it to do things I don't currently know I need. The strict schedules were a piece-o'-cake to TDD. However, I'm finding it difficult to really define what I'm testing when I write tests for the random portion of the system. I can assert that all times produced by the scheduler must fall within the constraints, but I could implement an algorithm that passes all such tests without the actual times being very "random". In fact that's exactly what's happened; I found an issue where the times, though not predictable exactly, fell into a small subset of the allowable date/time ranges. The algorithm still passed all assertions I felt I could reasonably make, and I could not design an automated test that would fail in that situation, but pass when given "more random" results. I had to demonstrate the problem was solved by restructuring some existing tests to repeat themselves a number of times, and visually check that the times generated fell within the full allowable range.

Does anyone have any tips for designing tests that should expect non-deterministic behavior?

EDIT: Thanks to all for the suggestions. The main opinion seems to be that I need a deterministic test in order to get deterministic, repeatable, assertable results. Makes sense.

I created a set of "sandbox" tests that contain candidate algorithms for the constraining process (the process by which a byte array that could be any long becomes a long between a min and max). I then run that code through a FOR loop that gives the algorithm several known byte arrays (values from 1 to 10,000,000 just to start) and has the algorithm constrain each to a value between 1009 and 7919 (I'm using prime numbers to ensure an algorithm wouldn't pass by some serendipitous GCF between the input and output ranges). The resulting constrained values are counted and a histogram produced. To "pass", all inputs must be reflected within the histogram (sanity to ensure we didn't "lose" any), and the difference between any two buckets in the histogram cannot be greater than 2 (it should really be <= 1, but stay tuned). The winning algorithm, if any, can be cut and pasted directly into production code and a permanent test put in place for regression.

LSB rejection (bit-shifting the number until it falls within the range) was TERRIBLE, for a very easy-to-explain reason; when you divide any number by 2 until it's less than a maximum, you quit as soon as it is, and for any non-trivial range, that will bias the results towards the upper third (as was seen in the detailed results of the histogram). This was exactly the behavior I saw from the finished dates; all of the times were in the afternoon, on very specific days.

MSB rejection (removing the most significant bit off the number one at a time until it's within the range) is better, but again, because you're chopping off very large numbers with each bit, it's not evenly distributed; you're unlikely to get numbers in the upper and lower ends, so you get a bias toward the middle third. That might benefit someone looking to "normalize" random data into a bell-ish curve, but a sum of two or more smaller random numbers (similar to throwing dice) would give you a more natural curve. For my purposes, it fails.

The only one that passed this test was to constrain by modulo division, which also turned out to be the fastest of the three. Modulo will, by its definition, produce as even a distribution as possible given the available inputs.

So, ultimately, you want a program that looks out the output of a random number generator and decides if it's random? As in "5,4,10,31,120,390,2,3,4" was random but "49,39,1,10,103,12,4,189" was not?
–
psrFeb 2 '12 at 22:02

No, but avoiding introducing bias between the actual PRNG and the end result would be nice.
–
KeithSFeb 2 '12 at 22:20

Then it sounds like mocking the PRNG would be O.K. You don't need actual random values to test that you don't mangle values. If you have a bug squeezing random values into too small a subset of allowable ranges, you must be getting some specific values wrong.
–
psrFeb 2 '12 at 22:31

You should also test for combinations. Having roughly equal expected inspections per hour won't guard against the case where, say, an 11 AM inspection on Tuesday is always followed by a 2 PM on Thursday and a 10 AM on Friday.
–
David ThornleyFeb 3 '12 at 21:19

That is more a test of the PRNG itself; the test of the constraining mechanism(s), as structured above, would always fail such a test because it's being given a thoroughly non-random set of data. Assuming the constraint mechanism isn't endeavoring to "order" the random data I would call that "testing externals" which is something a unit test should not be doing.
–
KeithSFeb 3 '12 at 23:43

13 Answers
13

What you actually want to test here, I assume, is that given a specific set of results from the randomiser, the rest of your method performs correctly.

If that's what you're looking for then mock out the randomiser, to make it deterministic within the realms of the test.

I generally have mock objects for all kinds of non-deterministic or unpredictable (at the time of writing the test) data, including GUID generators and DateTime.Now.

Edit, from comments: You have to mock the PRNG (that term escaped me last night) at the lowest level possible - ie. when it generates the array of bytes, not after you turn those into Int64s. Or even at both levels, so you can test your conversion to an array of Int64 works as intended and then test separately that your conversion to an array of DateTimes works as intended. As Jonathon said, you could just do that by giving it a set seed, or you can give it the array of bytes to return.

I prefer the latter because it won't break if the framework implementation of a PRNG changes. However, one advantage to giving it the seed is that if you find a case in production that didn't work as intended, you only need to have logged one number to be able to replicate it, as opposed to the whole array.

All this said, you must remember that it's called a Pseudo Random Number Generator for a reason. There may be some bias even at that level.

No. What I want to test in this case is the randomizer itself, and assert that the "random" values generated by the randomizer fall within the specified constraints while still being "random", as in not biased towards an uneven distribution across the allowable time ranges. I can and do deterministically test that a particular date/time correctly passes or fails a particular constraint, but the real problem I encountered was that dates produced by the randomizer were biased and therefore predictable.
–
KeithSFeb 2 '12 at 21:11

The only thing I can think of is to have the randomizer spit out a bunch of dates and create a histogram, then assert that the values are relatively evenly distributed. That seems extremely heavy-handed, and still not deterministic as any truly random set of data can show an apparent bias that a larger set would then refute.
–
KeithSFeb 2 '12 at 21:21

That is a test that will break occasionally and unpredictably. You don't want that. I think you've misunderstood my point, to be honest. Somewhere inside what you are calling the randomiser, there must be a line of code which generates a random number, no? That line is what I am referring to as a randomiser, and the rest of what you are calling the randomiser (the distribution of dates based on "random" data) is what you want to test. Or am I missing something?
–
pdrFeb 2 '12 at 21:34

"some bias even at that level" If you use a good PRNG, then you will not be able to find any test(with realistic computational bounds) that can tell it apart from real randomness. So in practice one can assume that a good PRNG has no bias whatsoever.
–
CodesInChaosFeb 28 '13 at 9:38

This is going to sound like a stupid answer, but I'm going to throw it out there because this is how I've seen it done before:

Decouple your code from the PRNG - pass the randomization seed into all of the code that uses randomization. Then you can determine the 'working' values from a single seed (or multiple seeds of that would make you feel better). This will give you the ability to adequately test your code without having to rely on the law of large numbers.

It sounds inane, but this is how the military does it (either that or they use a 'random table' that isn't really random at all)

Exactly: if you can't test an element of an algorithm, abstract it out and mock it
–
Steve GreatrexFeb 3 '12 at 17:30

I didn't specify a deterministic seed value; instead, I removed the "random" element entirely so I didn't even have to rely on the specific PRNG algorithm. I was able to test that, given a large range of evenly-distributed numbers, the algorithm I went with could constrain those to a smaller range without introducing bias. The PRNG itself should be adequately tested by whomever developed it (I'm using RNGCryptoServiceProvider).
–
KeithSFeb 3 '12 at 18:24

Regarding the "random table" approach, you can also use a test-implementation that contains a "reversible" number-generating algorithm. This allow you to "rewind" the PRNG or even query it to see what the last N outputs were. It would allow much more in-depth debugging in certain scenarios.
–
DarienFeb 3 '12 at 20:02

"Is it random (enough)" turns out to be an incredibly subtle question. The short answer is that a traditional unit test just won't cut it - you'll need to generate a bunch of random values and submit them to various statistical tests that give you a high confidence that they are random enough for your needs.

There will be a pattern - we're using psuedo-random number generators after all. But at some point things will be "good enough" for your application (where good enough varies a LOT between say games at one end, where relatively simple generators suffice, all the way up to cryptography where you really need sequences to be infeasible to determine by a determined and well equipped attacker).

For the unit tests, replace the random generator with a class that generates predictable results covering all corner cases. I.e. make sure your pseudo-randomizer generates the lowest possible value and the hightest possible value, and the same result several times in a row.

As soon as I saw the title of your question, I came to jump in and propose the solution. My solution was the same as what several others have proposed: to mock out your random number generator. After all, I have built several different programs that required this trick in order to write good unit tests, and I have begun making mockable access to random numbers a standard practice in all my coding.

But then I read your question. And for the particular issue that you describe, that is not the answer. Your problem was not that you needed to make predictable a process that used random numbers (so it would be testable). Rather, your problem was to verify that your algorithm mapped uniformly random output from your RNG to uniform-within-the-constraints output from your algorithm -- that if the underlying RNG was uniform it would result in evenly distributed inspection times (subject to the problem constraints).

That's a really hard (but fairly well-defined) problem. Which means it's an INTERESTING problem. Iimmediately began to think of some really great ideas for how to solve this. Back when I was a hotshot programmer I might have started doing something with these ideas. But I'm not a hotshot programmer anymore... I like to that I am more experienced and more skilled now.

So instead of diving in to the hard problem, I thought to myself: what is the value of this? And the answer was disappointing. Your bug got solved already, and you'll be diligent about this issue in the future. External circumstances cannot trigger the problem, only changes to your algorithm. The ONLY reason for tackling this interesting problem was in order to satisfy the practices of TDD (Test Driven Design). If there is one thing that I've learned it is that blindly adhering to any practice when it isn't valuable causes problems. My suggestion is this: Just don't write a test for this, and move on.

=== SECOND ANSWER ===

Wow... what a cool problem!

What you need to do here is to write a test that verifies that your algorithm for selecting inspection dates and times will produce output that is uniformly distributed (within the problem constraints) if the RNG it uses produces uniformly distributed numbers. Here are several approaches, sorted by level of difficulty.

You can apply brute force. Just run the algorithm a WHOLE bunch of times, with a real RNG as input. Inspect the output results to see if they are uniformly distributed. Your test will need to fail if the distribution varies from perfectly uniform by more than a certain threshhold, and to ensure you catch problems the threshhold can't be set TOO low. That means that you will need a HUGE number of runs in order to be sure that the probability of a false positive (a test failure by random chance) is very small (well < 1% for a medium-sized code base; even less for a big code base).

Consider your algorithm as a function that takes the concatenation of all RNG output as an input, then produces inspection times as an output. If you know that this function is piecewise continuous, then there is a way to test your property. Replace the RNG with a mockable RNG and run the algorithm numerous times, producing uniformly distributed RNG output. So if your code required 2 RNG calls, each in the range [0..1], you might have the test run the algorithm 100 times, returning the values [(0.0,0.0), (0.0,0.1), (0.0,0.2), ... (0.0,0.9), (0.1,0.0), (0.1,0.1), ... (0.9,0.9)]. Then you could check whether the output of the 100 runs was (approximately) uniformly distributed within the allowed range.

If you REALLY need to verify the algorithm in a reliable fashion and you can't make assumptions about the algorithm OR run a large number of times, then you can still attack the problem, but you might need some constraints on how you program the algorithm. Check out PyPy and their Object Space approach as an example. You could create an Object Space which, instead of actually executing the algorithm, instead just calculated the shape of the output distribution (assuming that the RNG input is uniform). Of course, this requires that you build such a tool and that your algorithm be built in PyPy or some other tool where it is easy to make drastic modifications to the compiler and use it to analyze the code.

Apart from validating that your code doesn't fail, or throws right exceptions in right places you can create a valid input/response pairs (even calculating this manually), feed the input in the test and make sure it returns expected response. Not great, but that's pretty much all you can do, imho.
However, in your case it's not really random, once you create your schedule you can test for rule conformity - must have 3 inspections per week, between 9-9; there's no real need or ability to test for exact times when inspection occured.

There's really no better way than running it a bunch of times and seeing if you get the distribution you want. If you have 50 allowed potential inspection schedules, you run the test 500 times and make sure each schedule is used close to 10 times. You can control your random generator seeds to make it more deterministic, but that will also make your tests more tightly coupled to the implementation details.

But if it's truly random, then occasionally, some schedule will not be used at all; and occasionally, some schedule will be used more than 20 times. I don't know how you intend to test that each schedule is used "close to 10 times", but whatever condition you test here, you'll have a test that sometimes fails when the program is working to spec.
–
David WallaceFeb 3 '12 at 6:54

It is not possible to test a nebulous condition that has no concrete definition. If the generated dates pass all tests then theoretically your application is functioning correctly. The computer cannot tell you if the dates are "random enough" because it cannot acknowledge the criteria for such a test. If all tests pass but the behavior of the application is still not suitable then your test coverage is empirically inadequate (from a TDD perspective).

In my view, your best best is to implement some arbitrary date generation constraints so that the distribution passes a human smell test.

You can absolutely determine randomness via automated testing. You simply generate a sufficiently large number of samples and apply standard tests of randomness to detect biases in the system. This is a pretty standard undergrad programming exercise.
–
Frank SzczerbaFeb 3 '12 at 20:31

A simple histogram approach is a good first step, but is not sufficient to prove randomness. For a uniform PRNG you would also (at the very least) generate a 2-dimensional scatter plot (where x is the previous value and y is the new value). This plot should also be uniform. This is complicated in your situation because there are intentional non-linearities in the system.

My approach would be:

validate (or take as a given) that the source PRNG is sufficiently random (using standard statistical measures)

verify that an unconstrained PRNG-to-datetime conversion is sufficiently random over the output space (this verifies a lack of bias in the conversion). Your simple first-order uniformity test should be sufficient here.

Just record the output of your randomizer (whether pseudo or quantum/chaotic or real world). Then save and replay those "random" sequences that fit your test requirements, or that expose potential issues and bugs, as you build your unit test cases.

Your goal is not to write unit tests and pass them, but to make sure that your program fits its requirements. The only way you can do this, is to precisely define your requirements in the first place. For example, you mentioned "three weekly inspections at random times". I'd say the requirements are: (a) 3 inspections (not 2 or 4), (b) at times that are not predictable by people who don't want to be inspected unexpectedly, and (c) not too close together - two inspections five minutes apart are probably pointless, maybe not too far apart either.

So you write down the requirements more precisely than I did. (a) and (c) are easy. For (b), you might write some code as clever as you can that tries to predict the next inspection, and to pass the unit test, that code must not be able to predict better than pure guess.

And of course you need to be aware that if your inspections are truly random, any prediction algorithm could be correct by pure chance, so you must be sure that you and your unit tests don't panic if that happens. Maybe perform a few more tests. I wouldn't bother testing the random number generator, because in the end it's the inspection schedule that counts, and it doesn't matter how it was created.

No. Just no. The unit tests prove the program fits its requirements, so the two are one and the same. And I'm not in the business of writing predictive software to reverse-engineer random algorithms. If I were I wouldn't be telling you about it, I'd be making a killing cracking secure websites by predicting their keys and selling off the secrets to the highest bidder. My business is writing a scheduler that creates times that are constrainable but unpredictable within the constraints, and I need deterministic tests to prove I've done so, not probabilistic ones that say I'm pretty sure.
–
KeithSMar 27 '14 at 21:50