Synthesize Your Test Data

In a society growing ever conscious of the benefits of organic materials, "synthetic" is a dirty word. But in this week's column, Danny R. Faught argues that when you're designing tests, synthetic data is the way to go. Read on to learn why it's important to be the master of your data.

I'm going to let you in on a secret. When I interview someone for a software testing job, I have one weed-out question that lets me know quickly whether someone understands the basics of testing:

"Tell me how you would test the operating system's feature that lists files on the hard drive. Choose the utility you're most familiar with to test, such as Finder, Explorer, ls, or dir."

Unfortunately, the difference between a good answer and one that leads me to send a résumé to the shredder is not an answer that you're likely to see taught explicitly in a book or a course. Here's the beginning of an all-too-frequently-heard bad answer: "Well, I would look around on the disk to see what files I can use to test with." Can you see what's wrong with that answer?

Good testers know that they need to control their test data; they don't limit themselves to what happens to be lying around on the disk. They use disciplined test design techniques, which usually means that they create the test data for their tests. They never would skip tests simply because they can't find the right data. Sometimes the only oracle that can tell if the tests pass is knowledge of how the data was created.

Check Your Timing The kind of test data I'm talking about here is the input that's put in place before starting a test, as a part of the test setup. This may be the contents of a text field that a Web browser automatically populates with previously given information, a file that's loaded into a word processor, or the contents of a database. When you design tests that need to have test data set up in advance, you need to define the procedure for putting it there, using one of these approaches:

On the fly -You create the data right before running each test or suite of tests, every time you run them. You can investigate how long it takes to set up the data, how hard the setup procedure is to automate, and how feasible it is to set up and tear down the data repeatedly. For a word processing file this could be as simple as copying the file, but for a database the task may be complex.

Created in advance -You may set up the test data once and leave it in place for days or months at a time, especially if it's difficult to set up. In this case, consider how you regularly will check that the data is still valid. What do you do if a test modifies the data in a way that will cause other tests to fail?

In both scenarios, think about any problems that might arise if two people are using the same data source.

The "created in advance" option is the most common because it's easier, especially if you're not validating the test data. But "on the fly" tends to be much more robust if you can find a feasible way to do it.

Test Data Mechanics You may have more than one way of building test data, either through the application's user interface or by directly creating a file or database. If you work through the user interface-entering data as the user would-you have the advantage of getting additional test coverage while you're building the data. However, this process may be so time consuming that it's not practical for building a large volume of data. Also, some kinds of relationships among the data may be impractical to set up this way, such as a series of