Agile in a Flash

Unit testing after the fact? We do just enough test-after development (TAD) to please our manager, and no more.

Just Ship It! Once the business realizes they have product in hand, there's always pressure to move on to the next feature.

Invariably, any "after the fact" process is given short shrift. This is why post-development code reviews don't reveal deep problems, and even if they do, we rarely go back and make the costly changes needed to truly fix the problems. This is why after-the-fact testers are always getting squeezed out of the narrow space between "done" software and delivery.

My Sh*t Don't Stink. I just wrote my code, and it works. I know so because I deployed it, cranked up the server, and ran through the application to verify it. Why would I waste any more time building unit tests that I hope to never look at again?

That Can't Possibly Break. Well, two-line methods and such. I can glance at those and think they look good. Lazy initialization? That couldn't break, could it?

That's Gonna Be a Pain. Holy crud, that's a 100-line method with 3+ levels of nesting, complex conditionals, and exceptional conditions (test those? you're kidding). It's going to take me longer to write a reasonably comprehensive set of tests than it took to write the method in the first place.

Worse, the method has numerous private dependencies on code that makes database or service calls. Or just dependencies that have nothing to do with what I'm testing. Just today I tried to instantiate a class, but failed because of a class two inheritance levels up with a dependency on another object being properly instantiated and configured. Figuring out how to test that is going to be a nightmare that eats up a lot of time.

Code written without immediate and constant consideration for how to unit test it is going to be a lot harder to test. Most of us punt when efforts become too daunting.

---

I hear that TAD coverage typically gets up to about 75% on average. Closer inspection reveals that this number is usually all over the map: 82% in one class, 38% in another, and so on. Even closer inspection reveals that classes with a coverage percent of, say, 38, often contain the riskiest (most complex) code. Why? Because they're the hardest to test.

If I was allowed to only do TAD and not TDD, I'd scrap it and invest more in end-to-end testing.

I borrowed the "brand" but Continuous Improvement In A Flash is a whole different kind of work than Agile In A Flash. It is essentially a guide for scrum masters to help them institute continuous improvement. It is a quick exposition of mindset and technique that should help any SM (even a part-time SM).

I've just made the first public release, and I hope to follow up with changes, expansions, and reductions as time allows.

I am also starting up a second LeanPub book as the further development of my Use Vim Like A Pro tutorial, which has lost its place on the internet with the closing of my old blog on blogsome.

We'll keep you up to date with further changes, such as the publication date of Jeff's new book, as time allows.

Red, green, refactor. The first step in the test-driven development (TDD) cycle is to ensure that your newly-written test fails before you try to write the code to make it pass. But why expend the effort and waste the time to run the tests? If you're following TDD, you write each new test for code that doesn't yet exist, and so it shouldn't pass.

But reality says it will happen--you will undoubtedly get a green bar when you expect a red bar from time to time. (We call this occurrence a premature pass.) Understanding one of the many reasons why you got a premature pass might help save you precious time.

Running the wrong tests. This smack-your-forehead event occurs when you think you were including your new test in the run, but were not, for one of myriad reasons. Maybe you forgot to compile it, link in the new test, ran the wrong suite, disabled the new test, filtered it out, or coded it improperly so that the tool didn't recognize it as a legitimate test. Suggestion: Always know your current test count, and ensure that your new test causes it to increment.

Testing the wrong code. You might have a premature pass for some of the same reasons as "running the wrong tests," such as failure to compile (in which case the "wrong code" that you're running is the last compiled version). Perhaps the build failed and you thought it passed, or your classpath is picking up a different version. More insidiously, if you're mucking with test doubles, your test might not be exercising the class implementation that you think it is (polymorphism can be a tricky beast). Suggestion: Throw an exception as the first line of code you think you're hitting, and re-run the tests.

Unfortunate test specification. Sometimes you mistakenly assert the wrong thing, and it happens to match what the system currently does. I recently coded an assertTrue where I meant assertFalse, and spent a few minutes scratching my head when the test passed. Suggestion: Re-read (or have someone else read) your test to ensure it specifies the proper behavior.

Invalid assumptions about the system. If you get a premature pass, you know your test is recognized and it's exercising the right code, and you've re-read the test... perhaps the behavior already exists in the system. Your test assumed that the behavior wasn't in the system, and following the process of TDD proved your assumption wrong. Suggestion: Stop and analyze your system, perhaps adding characterization tests, to fully understand how it behaves.

Suboptimal test order. As you are test-driving a solution, you're attempting to take the smallest possible incremental steps to grow behavior. Sometimes you'll choose a less-than-optimal sequence. You subsequently get a premature pass because the prior implementation unavoidably grew out a more robust solution than desired. Suggestions: Consider starting over and seeking a different sequence with smaller increments. Try to apply Uncle Bob's Transformation Priority Premise (TPP).

Linked production code. If you are attempting to devise an API to be consumed by multiple clients, you'll often introduce convenience methods such as isEmpty (which inquires about the size to determine its answer). These convenience methods necessarily duplicate code. If you try to assert against isEmpty every time you assert against size, you'll get premature passes. Suggestions: Create tests that document the link from the convenience method to the core functionality, demonstrating them. Or combine the related assertions into a single custom assertion (or helper method).

Overcoding. A different form of "invalid assumptions about the system," you overcode when you supply more of an implementation than necessary while test-driving. This is a hard lesson of TDD--to supply no more code or data structure than necessary when getting a test to pass. Suggestion: Hard lessons are best learned with dramatic solutions. Discard your bloated solution and try again. It'll be better, we promise.

Testing for confidence. On occasion, you'll know when you think a test will generate a premature pass. There's nothing wrong with writing a couple additional tests: "I wonder if it works for this edge case," particularly if those tests give you confidence, but technically you have stepped outside the realm of TDD and moved into the realm of TAD (test-after development). Suggestions: Don't hesitate to write more tests to give you confidence, but you should generally have a good idea of whether they will pass or fail before you run them.

Programmers have to consider cardinality in data. For instance, a simple mailing list program may need to deal with people having multiple addresses, or multiple people at the same address. Likewise, we may have a number of alternative implementations of an algorithm. Perhaps the system can send an email, or fax a pdf, or send paper mail, or SMS, or MMS, or post a Facebook message. It's all the same business, just different delivery means.

Non-programmers don't always understand the significance of these numbers:

Analyst: "Customers rarely use that feature, so it shouldn't be hard to code."

Program features are rather existential--they either have to be written or they don't. "Simplicity" is largely a matter of how few decisions the code has to make, and not how often it is executed.

The Rule of Zero: No Superfluous PartsWe have no unneeded or superfluous constructs in our system.

Building to immediate, current needs keeps our options open for future work. If we need some bit of code later, we can build it later with better tests and more immediate value.

Likewise, if we no longer need a component or method, we should delete it now. Don't worry, you can retrieve anything you delete from version control or even rewrite it (often faster and better than before).

The Rule of One: Occam's Razor Applied To SoftwareIf we only need one right now, we code as if one is all there will ever be.

We've learned (the hard way!) that code needs to be unique. That part of the rule is obvious, but sometimes we don't apply "so far" to the rule. Thinking that you might need more than one in a week, tomorrow, or even in an hour isn't enough reason to complicate the solution. If we have a single method of payment today, but we might have many in the future, we still want to treat the system as if there were only going to be one.

Future-proofing done now (see the "options" card) gets in the way of simply making the code work. The primary goal is to have working code immediately.

When we had originally written code with multiple classes and we later eliminate all but one, we can often simplify the code by removing the scaffolding that made "many" possible. This leaves us with No Superfluous Parts, which makes code simple again.

The Rule of Many: In For a Penny, In For a PoundCode is simpler when we write it to a general case, not as a large collection of special cases.

A list or array may be a better choice than a pile of individual variables--provided the items are treated uniformly. Consider "point0, point1, point2." Exactly three variables, hard-coded into a series of names with number sequences. If they had different meanings, they would likely have been given different names (for instance, X, Y, and Z). What is the clear advantage of saying 'point0' instead of point[0]?

It's usually easier to code for "many" than a fixed non-zero number. For example, a business rule requiring there are exactly three items is easily managed by checking the length of the array, and not so easily managed by coding three discrete conditionals. Iterating over an initialized collection also eliminates the need to do null checking when it contains no elements.

Non-zero numbers greater than one tend to be policy decisions, and likely to change over time.

When several possible algorithms exist to calculate a result we might be tempted to use a type flag and a case statement, but if we find a way to treat implementations uniformly we can code for "many" instead of "five." This helps us recognize and implement useful abstractions, perhaps letting us replace case statements with polymorphism.

Naturally, these aren't the only simple rules you will ever need. But simple, evolutionary design is well supported by the ZOM rules regardless of programming language, development methodology, or domain.

If you're reading this blog, you're probably a believer that a good agile process can make a difference. And maybe you've recognized someone on your team, on another team, or even in a different company that you think would benefit from a little covert mentoring.

We'd like to help! We believe getting these cards in the hands of the right people can make a real difference. We're willing to put that belief in action.

Here's how it works:

Email us at AgileInAFlash@mail.com, recommending one person who you think should receive a free deck. You don't have to name names, you can say "my boss," "our architect," "my dog," "my cousin," etc. You can even name yourself!

Tell us in one short, pithy line why you think that this person/team would benefit from Agile in a Flash.

We'll read the comments and pick our favorites.

If your entry is selected, we will contact you and get the particulars (names, addresses).

The person you recommended gets a deck of Agile in a Flash from us. No note, no card, no explanation.

To thank you for being so helpful, we send a second deck to you!

We'll put the winning comments on a soon-to-be-pubished Agile in a Flash blog entry. (You can choose to be attributed or anonymous.)

You can find many good blog posts on what to name your tests. We present instead an appropriate strategy for when and how to think about test naming.

Don't sweat the initial name. A bit of thought about what you're testing is essential, but don't expend much time on the name yet. Type in a name, quickly. Use AAA or Given-When-Then to help derive one. It might be terrible--we've named tests "DoesSomething" before we knew exactly what they needed to accomplish. We've also written extensively long test names to capture a spewn-out train of thought. No worries--you'll revisit the name soon enough.

Write the test. As you design the test, you'll figure out precisely what the test needs to do. You pretty much have to, otherwise you aren't getting past this step! :-) When the test fails, look at the combination of the fixture name, test method name, and assertion message. These three should (eventually) uniquely and clearly describe the intent of the test. Make any obvious corrections, like removing redundancy or improving the assertion message. Don't agonize about the name yet; it's still early in the process.

Get it to pass. Focus on simply getting the test to pass. This is not the time to worry about the test name. If you have to wait any significant time for your test run, start thinking about a more appropriate name for the test (see step 4).

Rename based on content. Once a test works, you mustrevisit its name. Re-read the test. Now that you know what it does, you should find it much easier to come up with a concise name. If you had an overly verbose test name, you should be able to eliminate some noise words by using more abstract or simpler terms. You may need to look at other tests or talk to someone to make sure you're using appropriate terms from the domain language.

Rename based on a holistic fixture view. In Eclipse, for example, you can do a ctrl-O to bring up an outline view showing the names for all related tests. However you review all the test names, make sure your new test's name is consistent with the others. The test is a member of a collection, so consider the collection as a system of names.

Rename and reorganize other tests as appropriate. Often you'll question the names of the other tests. Take a few moments to improve them, with particular focus given to the impact of the new test's name. You might also recognize the need to split the current fixture into multiple fixtures.

Reconsider the name with each revisit.Unit tests can act as great living documentation -- but only if intentionally written as such. Try to use the tests as your first and best understanding of how a class behaves. The first thing you should do when challenged with a code change is read the related tests. The second thing you should do is rename any unclear test names.

The test names you choose may seem wonderful and clear to you, but you know what you intended when you wrote them. They might not be nearly as meaningful to someone who wasn't involved with the initial test-writing effort. Make sure you have some form of review to vet the test names. An uninvolved developer should be able to understand the test as a stand-alone artifact - not having to consult with the test's author (you). If pair programming, it's still wise to get a third set of eyes on the test names before integrating.

Unit tests require a significant investment of effort, but renaming a test is cheap and safe. Don’t resist incrementally driving toward the best name possible. Continuous renaming of tests is an easy way of helping ensure that your investment will return appropriate value.

(Kudos to the great software guru Jeff Foxworthy for the card phrasing.)An effective unit test should follow the FIRST prescriptions in order to verify a small piece of code logic (aka “unit”). But what exactly does it mean for a unit test to be I for Isolated? Simply put, an isolated test has only a single reason to fail.If you see these symptoms, you may have an isolation problem:Can't run concurrently with any other. If your test can’t run at the same time as another, then they share a runtime environment. This occurs most often when your test uses global, static, or external data.A quick fix: Find code that uses shared data and extract it to a function that can replaced with a test double. In some cases, doing so might be a stopgap measure suggesting the need for redesign.Relies on any other test in any way. Should you reuse the context created by another test? For example, your unit test could assume a first test added an object into the system (a “generous leftover”). Creating test inter-dependencies is a recipe for massive headaches, however. Failing tests will trigger wasteful efforts to track down the problem source. Your time to understand what’s going on in any given test will also increase.Unit tests should assume a clean slate and re-create their own context, never depending on an order of execution. Common context creation can be factored to setup or a helper method (which can then be more easily test-doubled if necessary). You might use your test framework's randomizer mode (e.g. googletest’s --gtest_shuffle) to pinpoint tests that either deliberately or accidentally depend on leftovers.You might counter that having to re-execute the common setup twice is wasteful, and will slow your test run. Our independent unit tests are ultra-fast, however, and so this is never a real problem. See the next bullet.Relies on any external service. Your test may rely upon a database, a web service, a shared file system, a hardware component, or a human being who is expected to operate a simulator or UI element. Of these, the reliance on a human is the most troublesome.SSDD (same solution different day): Extract methods that interact with the external system, perhaps into a new class, and mock it. Requires a special environment. “It worked on my machine!” A Local Hero arises when you write tests for a specific environment, and is a sub-case of Relies on any external service. Usually you uncover a Local Hero the first time you commit your code and it fails during the CI build or on your neighbor’s dev box.The problem is often a file or system setting, but you can also create problems with local configuration or database schema changes. Once the problem arises, it’s usually not too hard to diagnose on the machine where the test fails.There are two basic mitigation strategies:

Check in more often, which might help surface the problem sooner

Periodically wipe out and reinstall (“pave”) your development environment

Can’t tell you why it fails. A fragile test has several ways it might fail, in which case it is hard to make it produce a meaningful error message. Good tests are highly communicative and terse. By looking at the name of the test class, the name of the method, and the test output, you should know what the problem is:

CSVFileHandling.ShouldToleratedEmbeddedQuotes - Expected "Isn't that grand" but result was "Isn"You shouldn't normally need to dig through setup code, or worse, production code, to determine why your test failed.The more of the SUT exercised by your test, the more reasons that code can fail and the harder it is to craft a meaningful message. Try focusing your test on a smaller part of the system. Ask yourself “what am I really trying to test here?”

Your test might be failing because it made a bad assumption. A precondition assertion might be prudent if you are at all uncertain of your test’s current context.

Mocks indirect collaborators. If you are testing public behavior exposed by object A, and object A interacts with collaborator B, you should only be defining test doubles for B. If the tests for A involve stubbing of B’s collaborators, however, you’re entering into mock hell.Mocks violate encapsulation in a sense, potentially creating tight coupling with implementation details. Implementation detail changes for B shouldn’t break your tests, but they will if your test involves test doubles for B’s collaborators.Your unit test should require few test doubles and very little preliminary setup. If setup becomes elaborate or fragile, it’s a sign you should split your code into smaller testable units. For a small testable unit, zero or one test doubles should suffice.

In summary, unit tests--which we get most effectively by practicing TDD--are easier to write and maintain the more they are isolated.

In our article Test Abstraction: Eight Techniques to Improve Your Tests (published by PragPub), we whittled down a convoluted, messy test into a few concise and expressive test methods, using this set of smells as a guide. We improved the abstraction of this messy test by emphasizing its key parts and de-emphasizing its irrelevant details.

Stripped down, the "goodness" of a test is largely a matter of how quickly these questions can be answered:

What's the point of this test?

Why is the assertion expecting this particular answer?

Why did this test just fail?

That's a nice short-form summary, but it is descriptive rather than prescriptive. In the Pragmatic Bookshelf article, a real-world code example was used to painstakingly demonstrate the techniques used to improve the clarity of tests by increasing their level of abstraction. Here we provide the "cheat sheet" version:

Unnecessary test code. Eliminate test constructs that clutter the flow of your tests without imparting relevant meaning. Most of the time, "not null" assertions are extraneous. Similarly, eliminate try/catch blocks from your tests (in all but negative-case tests themselves).

Missing abstractions. Are you exposing several lines of implementation detail to express test set-up (or verification) that represents a single concept? Think in terms of reading a test: "Ok, it's adding a new student to the system. Fine. Then it's creating an empty list, and then adding A to that list, then B, and ..., and then that list is then used to compare against the expected grades for the student." The list construction is ugly exposed detail. Reconstruct the code so that your reader, in one glance, can digest a single line that says "ensure the student's expected grades are A, A, B, B, and C."

Irrelevant data. "Say, why does that test pass the value '2' to the SUT? It looks like they pass in a session object, too... does it require that or not?" Every piece of data shown in the test that bears no weight on its outcome is clutter that your reader must wade through and inspect. "Is that value of '2' the reason we get output of '42'?" Well, no, it's not. Take it out, hide it, make it more abstract! (For example, we'll at times use constant names like ARBITRARY_CHARGE_AMOUNT.)

Bloated construction. It may take a bit of setup to get your SUT in the state needed for the test to run, but if it's not relevant to core test understanding, move the clutter out. Excessive setup is a design smell, showing that the code being tested needs rework. Don't cope with the problem by leaving extensive setup in your test, but tackle the problem by reworking the code as soon as you have reasonable test coverage. Of course, the problem of untestable code is largely eliminated through the use of TDD.

Irrelevant details. Is every step of the test really needed? For example, suppose your operational system requires the presence of a global session object, pre-populated with a few things. You may find that you can't even execute your unit test without having it similarly populate the session object. For most tests, however, the details of populating the session object have nothing to do with the goal of the test. There are usually better ways to design the code unit, and if you can't change your design to use one of those ways, you should at least bury the irrelevant details.

Multiple assertions. A well-abstracted unit test represents one case: "If I do this stuff, I observe that behavior." Combining several cases muddies the abstraction--which setup/execution concepts are relevant to which resulting behaviors? Which assertion represents the goal of the test case? Most tests with multiple assertions are ripe for splitting into multiple tests. Where it makes sense to keep multiple assertions in a single test, can you at least abstract its multiple assertions into a single helper method?

Misleading organization. You can easily organize tests using AAA (Arrange-Act-Assert). Remember that once green, we re-read any given test (as a document of SUT behavior) only infrequently--either when the test unexpectedly fails or when changes need to be made. Being able to clearly separate the initial state (Arrange) from the activity being tested (Act) from the expected result (Assert) will speed the future reader (perhaps yourself in 5 months) on his way. When setup, actions, and assertions are mixed, the test will require more careful study. Spare your team members the chore of repeatedly deciphering your tests down the road--spend the time now to make them clear and obvious.

Implicit meaning. It's not all just getting rid of stuff; abstraction requires amplifying essential test elements. Imagine a test that says "apply a fine of 30 cents on a book checked out December 7 and returned December 31." Why those dates? Why that amount? If we challenged you to tell us the rules that govern the outcome of this test, you'd likely get them wrong. A meaningful test would need to describe, in one manner or another, these relevant concepts and elements:

books are normally checked out for 21 days

Christmas--a date in the 21-day span following December 7--is a grace day (i.e. no fines are assessed)

The due date is thus December 29

December 31 is two days past the due date

Books are fined 20 cents for the first day late and 10 cents for each additional day.

30c = 20c + 10c

(Consider that each of these concepts could represent a separate unit test case, and that this detailed scenario represents more of an end-to-end test case.)
A unit test that requires you to dig through code to discern the reasons for its outcome is wasteful.

You could easily survive without our list of test abstraction smells--and figure out on your own what to do about your tests--as long as you keep one thought in your head: "My goal is to ensure that each unit test clearly expresses its intent, and nothing else, to the next person who must understand this system behavior."