protest: fault injection

Fault injection is one of those technologies that has always been just a bit out of reach for me in terms of its ability to deliver results. Some tools are out there, I actually won a personal copy of Security Innovation's awesome Holodeck product at a trade show, and we've got some internal tools that can be used here as well if you work at it hard enough. But I guess rising above it all I tend to agree with Hoglund and McGraw who touch on fault injection in Exploiting Software: How To Break Code (another one for the required reading list). They seemed to come down on it pretty hard as a form of testing that just doesn't seem to have the cost/benefit payoff of other forms of testing. I'm sure there are exceptions but I get the feeling that in the majority of cases at least for application level software, when people do automate fault injection within a test suite it is done with little utilities designed to suck up memory here, fill disk space there, or do something very specific that is a regression test against a known issue.

That still doesn't stop me from dreaming, because I think fault injection especially when programmatically driven based on real time system state, can have its place in the pursuit of automated defect detection. If I could jump 5 or 10 years into the future of software testing and pick something to bring back it might be an integrated automated test harness consisting of (1) a test model created from multiple sources including reflection, code attributes, usability study flight recorder data, machine readable specs, exposed in a simple enough layer that testers can traverse it easily enough for path traversal logic and add state verification code (2) dynamic general purpose data collection engine capable of collecting data from multiple event sources, saving crash dumps on failure or even just pausing when a predefined threshold or condition is met, similar to how we use the debugger, (3) programmatic fault injection support. In an even more utopian dream it would have a DSL (Domain Specific Language) to visualize it all and zoom in and out and patch together the different levels of models (with some exception for those that are automatically generated from code the test modeling tools I've tried always seem to have a hard time living above the abstractions or hiding the complexity of feature level models when you're trying to look at a product level model).

If you can integrate these types of data sources, automated tests can become a lot smarter. In particular, the above system could support a model traversal algorithm that used Bayes thereom to take knowledge about the state to make path decisions leading to failure modes. This is like the successful boxer who internalizes the probabilities that certain opponents will react a particular way to a certain punch under certain conditions. He throws two punches in rapid succession, the first one causing the reaction, the second one intended to hit the opponent wherever he has a probability of reacting to. It's almost like being able to predict the future except for when probability fails and you punch into thin air. You could do similar things with fault injection and model based testing. Inject a fault, and then use bayes to select a test path of the available paths in the model most likely to put the system into a state that is outside of predefined tolerance. With a good data collection engine you can also watch something as simple as CPU utilization and flag when a function is out of threshold, like an early warning system that you may be getting close to an error condition, providing information you would never see with automation that only succeeds in finding an issue if something unexpectedly crashes or throws an exception.

Most of the parts are already out there, but it's obviously going to be a bear to put it all together. If anyone in research is reading this and could just let me know when you have something ready for me, that would be great. I have to go investigate my failing tests now. Thanks. :)

Thanks for the comment! I like the sensor point concept – it seems like a step in the right direction to be tossing in generic stubs rather than just a single particular fault type. Then you can abstract out the actual fault injection that happens and let it be polymorphic in the sense it is based on real time decisions about whether or not to inject the fault and what type of fault to inject based on something like system state and time. The real smarts of model based testing seem to be the traversal algorithms used to get code under test from point A to point B, but this seems to open the door for a similar but different category of algorithms that could create a parallax type effect in the sense fault injection model is dynamically changing the environmental system state the model walker is living on top of. :)