Thinking outside the Boxes

We've all heard the terms "black box" and "white box" thrown around in a variety of contexts, but what do they really mean? Industry expert Bret Pettichord explains some of the fundamental problems with using these terms as shorthand labels for techniques and how they can lead to confusion.

As a writer on software testing, I get questions from testers who need help. Twice in the past week people studying for software testing certification exams have written to me to get clarification about a column I wrote awhile back for StickyMinds: "Five Ways to Think about Black Box Testing." I wrote that column after I noticed that several people on my project were using the term "black box testing" in fairly different ways. Recognizing a pattern in the various interpretations, I came up with even more definitions of black box testing that I considered to be just as valid. So I wrote the column for StickyMinds, and I revised the test plan for my project to be specific about how those terms were being used. That resolved the issue for my team, but how does it help someone who is looking for a definitive answer for an exam?

Fundamentally, the problem with terms like "black box" and "white box" is that they describe the thinking behind the tests, not the tests themselves. They describe what kind of information was or wasn't used in the design of the tests. Black box means we don't know what the code looks like when we write out tests. White box means we do. (Though logically, it should really be called "clear box.") So you can't tell by looking at the test itself whether it is a black box test or a white box test. You'd have to know how it was created.

To add to the confusion, both terms are commonly used—even by experts—as shorthand labels for techniques. Code coverage analysis is typically described as a white box technique, because it provides a method for assessing tests based on how they exercise (or cover) the code. This requires knowing what the code looks like. On the other hand, boundary value analysis is typically described as a black box technique, because it starts by noticing boundaries in external program behavior and then suggests focusing testing at those boundaries. It doesn't require knowledge of the internals of the code.

But code coverage tools can be used with black box tests. What would you call that? And boundary value analysis can benefit from knowing where internal boundaries are in the code. Is this gray box testing?

Black box and white box are simply metaphors. They can be useful to highlight an important issue in testing: How much knowledge of the system do we need before we can apply a particular technique? But that's just be the beginning of thinking of how to test software. The black box metaphor shouldn't be seen as implying that some testing is better when done out of ignorance. And both metaphors could be criticized for orienting to the code, rather than the customer. Sadly, to invert the metaphor, many programmers and development organizations treat the customers and users as black boxes, when they might be better off trying to understand the internal goals and thought processes that their users have when they are using the software.

Any metaphor can be either useful or misleading depending on how it is used. Black box and white box become particularly dangerous when they are mistaken for scientific terminology. Consider these questions from the preparation workbook for the QAI certified software test engineer exam that the testers asked me aboutUnit testing uses which test strategy or strategies?a. Black box testingb. Gray box testingc. White box testingd. Both b and c

The real answer is that unit testing, acceptance testing, and integration testing can all use black box, white box, or gray box techniques, but that's not given as a possible answer. Typically, unit testing is associated with white box testing and acceptance testing with black box testing, but there is no rule saying this always must be. For example, test-driven development is a programming method based on unit tests. With this method, a programmer first writes a test that fails—because the necessary code is missing. Only after writing this test does the programmer write the code that makes the test pass. Because these tests are written before the code, they are arguably black box unit tests—they can't be based on a knowledge of code, because the code hasn't been written yet! Indeed, the very goal of test-driven development is to encourage programmers to focus more on interfaces and behavior and less on structure and algorithms—to think more about how their code will be used by others.

It gets worse. If the exam questions mentioned only white and black box testing and acceptance and unit testing, then I could guess what the certifiers had intended the correct answers to be (even if I disagreed). But the answer choices include gray box testing, so I have no idea what was intended. Gray box testing isn't a scientific term either, but an attempt to suggest that it is often useful to gain partial knowledge of the system under test, its major components and its architecture. I've heard the term used only with regard to system testing, when a full white box understanding would be nearly impossible, and yet a black box understanding is often insufficient. I can't even imagine what would be meant by gray box unit testing. And how much knowledge does it take before black box testing becomes gray box testing? Well, there's no way to draw a line. It's just another metaphor, not a scientific classification.This brings us back to the testers who asked my advice. The fact that two people preparing for this test noticed the ambiguity and wrote to me about it, shows they are better at thinking about the nuances of these issues than the certifiers. The sensitivity they show is critical to excellent software testing. This says more to me about their abilities than the exam could ever say. Good testers notice ambiguities in systems and press on them, just as these two did by questioning the exam.

It's too bad that this exam puts so much weight on such simple concepts. I hope other testers getting this certification won't be confused into thinking these terms mean more than they do. A lot of people would like to see software testers get more respect. But certifications that treat figures of speech as if they were scientific categories don't help.