Sunday, August 3, 2008

When I was younger, lighter and more of an adrenaline junkie, I took up rock climbing. It was an excellent way to push myself both mentally and physically, and it forced me to focus absolutely, completely on the moment. A successful day of climbing required thorough preparation, training and conditioning, the right equipment, a good sense of balance and timing (which often eluded me), and smooth team work with my climbing partners. One of the first things I learned was that unless you were a maniac or a supremely gifted super hero, you had to put in protection as you climbed, to ensure your safety and the safety of your climbing partners.

Building real software, software that is good and useful and meant to last, is equally challenging. I am not talking about one-off custom development work or small web sites, but real software products that companies run their businesses on. There is no one best practice or methodology, no perfect language or cool tool that will ensure that you will write good code. Instead, real software development demands discipline and focus and balance, and an intelligent defense-in-depth approach: consistently following good practices and avoiding stupid ones, hiring the best possible people and making sure they are well trained and have good tools, and carefully and conscientiously managing the team and your projects.

Having said this, if you need one place to start, if you had to choose one practice that could make the difference between a chance at success and the almost complete certainty of failure, start with building yourself a safety net: a strong regression test capability that you can run quickly and automatically before you release any new code. Without this safety net, every change, every fix that you make is dangerous.

I am surprised to find software product development shops with large code bases that do not have automated regression testing in place, and rely on black box testers (or their customers!) to find problems. Relying on your test team to manually catch regressions is error-prone - sooner or later a tester will run out of time, make a mistake or miss (or misunderstand) a test - and awfully awfully expensive - it takes too many people too long to test even a moderately complex system. Regression testing is important, but you will get a lot more value from the test team if you free them up to do things like deep exploratory testing, destructive testing, stress testing and system testing, simulations and war games, and reviews and pair testing with the developers.

If you don’t have an automated test safety net today, then start building one tomorrow. Find someone in your team who understands automated unit testing, or bring a consultant in, start writing unit tests for new code and add unit tests as you change code. Run the test suite for each build as part of your continuous integration environment (ok: if you don’t have CIM set up already, you will have to do this too).

Starting from nothing, there’s no point in trying to measure test coverage. So begin by counting the number of tests the team adds each week and track the team’s progress. As bugs are found in production or by the test team, make sure to write tests for the code that needs to be corrected, and spend some time to review whether other tests should be added in the rest of the code base to catch this type of failure. Monitor the trend to ensure that new tests continue to be added, that the team isn't taking a step backwards. Just like building software, take an incremental approach in adopting unit testing.

Once you have a good base of tests in place, use a code coverage tool to identify important areas of code that are not tested. Take a risk-based approach: if a piece of code is not important, don’t bother with writing tests to attain a mandated code coverage % goal. If a critical piece of code or a core business rule is not covered by a test, write some more tests as soon as you can. Then use a mutation tool like jumble or jester to validate the effectiveness of your test suite.

Writing tests that do not materially reduce risk simply adds to the cost of building and maintaining the system. Some fundamentalists will disagree and demand 100% coverage (although with the failure of Agitar the hype on this subject has subsided somewhat) . A recent post on achieving good ROI on Unit Testing explores how much unit testing is really necessary and valuable. Rather than writing unit tests on unimportant code, you could spend that time reviewing design and code, implementing static analysis tools to catch programming errors, helping with functional and integration testing, or, hey why not, designing and writing more code which is the point after all. For most projects, achieving 100% code coverage is not practical or even all that useful. But using good judgment to test high risk areas of the system is.

Test-first or not, but work with developers to write useful unit tests, and try to review the tests to make sure you don’t end up with a dozen tests that check variations on the same boundary condition or something else equally silly. Writing and maintaining tests is expensive, so make each test count.

While a good set of unit tests is extremely valuable, I don’t agree with some XP extremists who believe that once you have a comprehensive set of unit tests in place you are done with testing. Remember defense-in-depth: unit tests are an important part, but only a part, of what is needed to achieve quality. No one type of test or review is thorough enough to catch every error in a big system.

A good unit testing program requires investment in both the short-term and the long-term. Initial investments are needed to create the necessary infrastructure, train developers on the practice and tools, and of course there’s the actual time for the team to write and review the tests, and to integrate the tests into your build environment. In the longer-term, you will need to work with developers to continually reinforce the value of developer testing, especially as new people join the team, and ensure that the discipline of writing good tests is kept up; and the test suite will need to be constantly updated as code is changed or as new bugs are found.

Designing and writing good unit tests isn’t easy, especially if you are following a test-driven development approach. And not only do you have to sell the development team on the value of unit testing; you also need to convince management and your customers to give you the time and resources necessary to do a proper job. But without a solid testing safety net, changing code is like climbing without protection: sooner or later you or your buddy is going to make a mistake, and somebody’s going to get hurt.

Subscribe to this blog

About Me

I am an experienced software development manager, project manager and CTO focused on hard problems in software development, software quality and security. For the last 20 years I have managed teams building and operating high-performance financial platforms.
My special interest is how small teams can be most effective in building real software: high-quality, secure systems at the extreme limits of reliability, performance, and adaptability. Software that has to work, that is built right, and built to last.
I use this blog to explore ideas and problems in software development that are important to me. To reflect and to find new answers.