Designing Experiments

I experience intellectual work, such as testing, as a web of interconnected activities. If I were to suggest what is at the center of the testing web, on my short list would be: designing experiments. A good test is, ultimately, an experiment.

I’ve been looking around online for some good references about how to design experiments (since most testers I talk to have a lot of trouble with it). Here is a good one.

If you know of any other straightforward description of the logic of experiments, please let me know. I have some good books. I just need more online material.

Reader Interactions

Comments

Wikipedia has a nice article on experiments (http://en.wikipedia.org/wiki/Experiments). Though not a step-by-step procedural guide for designing experiments I find the classification of experiment types to be a nice list to review when I’m designing test cases and especially when reviewing others’ test cases (now what did they forget sort of thing …).

One with a more procedural tone and closer to my educational background is Experimental Design (http://www.socialresearchmethods.net/kb/desexper.htm). The hyperlinks on the left side give some additional details based upon the type of experiment. Though a bit of stretch for some testers the ideas are still sound. Consider their discussion on Randomized Block Designs (http://www.socialresearchmethods.net/kb/expblock.htm). This is especially useful for creating a matrix of application components and then testing with the same test case. On large projects this is useful because bug can be tracked by functional component to get an idea of error clusters.

[James’ Reply: It may look useful, but I think it’s more dangerous than useful. If you read it carefully, I think you will notice that it does not deal with the logic of testing at all. It presents a technique without justifying it, nor is its justification self-evident. It offers heuristics, without acknowledging them as heuristics. It requires special skill and knowledge to implement the technique well, but that skill and knowlege is not identified.

It also contains some outrageous advice, such as telling us that all requirements must be agreed to by the project team before we test against them. What?! That’s equivalent to telling testers never to apply their own background knowledge, or to speculate, or to indulge curiousity, or to make any observations that relate to real risks unless those risks have been expressed in formal language. That’s simply a formula for very expensive testing that covers little ground.

At its core, I think what the article is telling us is that following the author’s pet technique results in test case documents that look good to some people. But how do we know they are good? How do we know they are better than they’d be if we used a much simpler test approach?

Hence the danger: When an article that is essentially an editorial about the testing aesthetics of one author is presented as an authoritative statement about a Good Way To Test, it may inspire compliance without comprehension among some testers. This mythologizes our craft and impedes our progress.]

Lately, they’ve taken feedback (ideas, advice, and criticism) about the ways they have either proven or disproven urban myths and designed follow-up experiments around them.

For example, the episode 64 preamble (titled “More Myths Revisited”) states: “Returning to the most controversial myths… Jamie and Adam try to clear their names. Watch as they repeat past experiments to see if their original answer was genuine or bogus.”

[James’ Reply: That is one of my favorite shows, but their budget (both time and money), their personal safety, and their biases tend to make for experiments that are pretty simplistic. For instance, a recent episode looking at the possibility of psychic communication with plants actually uncovered evidence of the effect in question, duplicating results from an earlier researcher. Since they were unable to explain the evidence, they ultimately decided to ignore it rather than systematically vary a wide variety of variables to explore the extent of the phenomenon. At the end of the show, they announced that the myth was busted, when in fact their experiments were merely inconclusive.

Mythbusters, however, is much better than the Sci-Fi Channel show “Sci-Fi Investigates”. On that show a paranormal team of investigators look into various mysteries such as Mothman. The team seems to have about the same skills and qualifications as do the Scooby Doo gang. One of their members is some guy they identify as “the skeptic”. In fact, he’s a scoffer, not a skeptic. A skeptic is someone who reserves judgment while pursuing a life of vigorous inquiry. In short, whereas a scoffer would say “No, that’s wrong.”, a skeptic would say “Maybe. Let’s look into it.”]

Two attempts to identify and verify Archimedes infamous weapon which “burned the enemy fleet in the harbor” are interesting.

Mythbusters decided it was a focusing mirror assembled from lots of little mirrors, and couldn’t work.

When this was set as the problem in a design innovation class at MIT, they decided it was a focusing mirror assembled from lots of little mirrors, and worked rather well.

There’s was video and commentary from each on line at one time. The point is about assumptions. In addition to observations, and a theory, many interesting theories have way too many independent variables to manage in a simple experiment. So you’re tossed into rationalizing them, to fix them at a given value or range.

I’m as fascinated by the stuff we refuse to take as given (for now) when “designing” experiments as the stuff we don’t.

James, first excuse me for spamming your post as after posting my first comment I realized I should not have done it here. And I am grateful to you that you did take time to share your thoughts on it. I see there good points from you that will help me in knowing how to separate cream from milk. I totally agree that the link talks about technique/heuristic only.

Based on your comments may I request you to elaborate more on what do mean by “it does not deal with the logic of testing” and what “special skill and knowledge to implement the technique well” is required? And next there are two questions you raised by you in your comments: “But how do we know they are good? How do we know they are better than theyâ€™d be if we used a much simpler test approach?” What are your answers to it?

[James’ Reply: The logic of testing is pattern of evidence gathering and inferencing that fulfills the mission of testing. For instance it would be a bad test, probably, to test software by dunking a CD-ROM containing that software into a vat of boiling oil, because no evidence we could gather by doing that would help us evaluate the product. In the vernacular we would say that there’s no logic behind that kind of test. Similarly, I ask you, where is the logic behind the test technique in the article you cited? There is some logic there, but it is never explained by the author. So, the only message we are getting from the author is “do this because I say so.”

On the question of special skill, here’s one that the technique requires: the skill of decomposing a product into its constituent structures and variables. If you don’t know how to do that, and most testers who come to my classes aren’t very confident of their skill in this area, then you will not be able to perform the Use Case test technique well. There are numerous other skills required that are implied by the author of the article you cited. See if you can spot them for yourself.

James, if I understood correctly, you meant “dunking a CD-ROM containing that software into a vat of boiling oil” is a technique that does not have any logic. It does not have any logic because that kind of test does not seem probable in the real world. But I don’t think there is no logic in that – even wierd logic – probably the CD-ROM contains a very important software and the person carries it to some place where it is possible to get that CD-ROM dropped in boiling vat of oil and then he wants the software to still work. It would serve as the test of the CD-ROM. “Our CD-ROM retains your data even if put in a vat of boiling oil!”.

[James’ Reply: What you have done is to specify a context where there is a logical connection between the test and the mission of testing. In other words, depending on the context, there may or may not be logic behind that test. But notice that you have made that logic explicit. You have “explained” the boiling oil test. Having explained it, we can debate the merits of it. The author of the article that you cited does not explain the connection between the particular technique advocated and the mission of testing. That’s why I said there is no logic. It’s just an editorial opinion. A more precise statement would be “if there is logic, it is not explained in a way that I recognized as such.”]

Now coming to the article, I agree that the objective, the logic of using that particular technique is not explained in the article. It does not talk explicitly about what kind of issues could be found by that technique. The author does not say why this technique should be used. But I think the author does not claim that it is the only best technique as I don’t see any comparison with other approaches/techniques in there; although at the end of article some benefits are outlined which anyone would do.

[James’ Reply: What do you think the author is claiming? Why do you think the author wrote the article? Clearly the technique is being advocated. Magic authority words are used (e.g. “formal”). No science is offered, however, just mythology. Right now I’m looking for articles that deal with the logical foundation of testing, not just with its cultural patterns.]

Again on the skills required for this technique I agree that they were not mentioned in the article. But then how do you think the skill of decomposing a product into its constituent structures and variables can be explained/taught other than what was explained in the article by taking one example?

[James’ Reply: See the book Introduction to General Systems Thinking, by Gerald M. Weinberg, for examples of how it can be explained. See the material on my website, too. But that’s beside my point. My point is that the technique advocated in that article rests on some huge assumptions about skills, complexity, risk, cost, and alternative techniques. Not only is that technique not “the best”, it’s hard for me to imagine a context where that technique would better than the available alternatives.]

By “What are your answers to it?” I meant your answers to “…how do we know they are good? How do we know they are better than theyâ€™d be if we used a much simpler test approach?”. These were the questions raised by you in the inintial comment.

[James’ Reply: Those are questions that must be answered in context. I expect that someone who is a responsible advocate for a technique will provide some framework or evidence that would help us answer those questions in our own contexts. I don’t see that in the article. Do you?]

Here is a collection of design of experiments articles”. Some good ones, based on what I think you are asking for, include (in my very biased opinion): Teaching Engineers Experimental Design With a Paper Helicopter, What Can You Find Out From 8 and 16 Experimental Runs?, What Can You Find Out From 12 Experimental Runs?, 101 Ways to Design an Experiment and Planning Efficient Software Tests.

[James’ Reply: Thanks, man.

These look like useful articles. But doesn’t it bug you that Design of Experiments is so easy to confuse with design of experiments? I have done little or no DOE, and yet I can certainly claim to have designed many experiments. There’s more to designing experiments than DOE likes to talk about. DOE was developed, what, in the 20’s and 30’s? But did not Galileo and Bacon do an experiment or two in their careers, somewhat earlier in history than that? In one of the articles on your site, the OFAT approach is contrasted with what the author calls a “designed experiment.” As if an OFAT experiment is not designed. That’s unnecessarily dismissive rhetoric. I think DOE would have been better named Multi-Variable Causality Analysis, or something like that, rather than to pretend to represent the entirety of scientific empirical investigation.

Still, I appreciate the content, once I choke past the parochial cigar smoke.]Â

Experimental Design methods worked extremely well for me in that project, and other projects I have worked on. I could not imagine not using Experimental Design methods for designing automated tests for complex software.

After 8 years in the testing field, it finally last week hit me while testing: I’m designing and conducting experiments!

This post was the best source of resources I was quickly able to find, but thought to ask whether u would like to add any references to this list, as the post is quite old already. I’d be especially interested on any good books on the subject.

In terms of a Amazon search, I already did that before finding and posting on your blog initially and was swamped with the myriad of titles, as you can imagine.

I was hoping and I’m sure other visitors as well to get some guidance on Design of Experiments (DoE) books to aid/enhance one’s discipline of practical software testing, which you greatly encourage in your approach.

So, If i may request to you again, could you be kind enough please to share with us which titles on DoE you have on ‘your bookshelf’ which can help us? Thanks.