Doing R&D work, I often find myself writing programs that have some large degree of randomness in their behavior. For example, when I work in Genetic Programming, I often write programs that generate and execute arbitrary random source code.
A problem with testing such code is that bugs are often intermittent and can be very hard to reproduce. This goes beyond just setting a random seed to the same value and starting execution over.
For instance, code might read a message from the kernal ring buffer, and then make conditional jumps on the message contents. Naturally, the ring buffer’s state will have changed when one later attempts to reproduce the issue.
Even though this behavior is a feature it can trigger other code in unexpected ways, and thus often reveals bugs that unit tests (or human testers) don’t find.
Are there established best practices for testing systems of this sort? If so, some references would be very helpful. If not, any other suggestions are welcome!
4
It is useful to add hooks, as suggested, to recreate exact states. Also instrument the system so that it can dump its “seeds” (in your case, including the PRNG seed as well as the kernel ring buffer, and any other sources of nondeterministic input.)
Then run your tests both with true random input, and regression-style with any previously-discovered interesting cases.
In the particular case of your access to the kernel, I’d recommend making a mock in any case. Use the mock to force equivalence classes that are less likely to show up in practice, in the spirit of “empty” and “full” for containers, or “0, 1, 2^n, 2^n+1, many” for countable things. Then you can test with the mock and with the real thing, knowing that you have handled and tested the cases you’ve thought of so far.
Basically, what I’m suggesting amounts to a mix of deterministic and nondeterministic inputs, with the deterministic ones being a mix of those you can think of and those you were surprised by.
One reasonable thing to do is to seed the random number generator with a constant value for the tests, so that you get a deterministic behavior.
3
I think statistical testing is the only way. Just like random numbers are “tested” for randomness by statistical tests, so need to be algorithms that use of random behavior.
Simply run the algorithm multiple times with either same or different input and compare it to each other. The problem with this approach is massive increase in computational time required to finish the testing.
1
I’m not a specialist in this domain, but there is a scientific litterature relative to stochastic program testing.
If you cannot easily create test classes, a statistical test can be used, as #Euphoric said. Borning et al. compare a traditional approach and a statistical one. A generalisation of the statistical tests suggested by @Euphoric could be the one discussed by Whittaker. He suggested to create a stochastic model of the desired (stochastic, in your case) behavior and then generate specific test cases from this model (see his dedicated paper).
1