Thursday, April 19, 2012

I talk to a lot of people in both big and small software development organizations about how they manage software development, how they’re organized, what practices they follow and what practices actually work. Most people working on small teams that I talk to can’t justify having someone to just test their apps, because testers don’t actually build software, so they’re considered overhead. That means that developers need to test the software themselves – or the customer will have to do it.

What do testers do on an Agile team?

Quite a few Agile teams believe that you don’t need testers to deliver working software. Testers are looked upon as a relic from the waterfall days (requirements, design, code, then pass off to test). On XP teams, everyone is a developer, and developers are responsible and accountable for testing their own code, writing automated unit tests and then automating the acceptance tests that the Customer has defined. Scrum doesn’t explain how testing is done at all – the team will find a way to figure it out as they “inspect and adapt” themselves towards good practices.

If developers are already testing their own code (and maybe even pairing up to review code as it is written), then what do you need testers for?

Janet Gregory and Lisa Crispin wrote a big book to justify the role of testers on Agile teams and to explain to programmers and testers how testers can fit into Agile development, but this hasn’t changed the attitude of many teams, especially in “engineering-driven cultures” (startups founded by programmers).

One of their arguments is that Agile teams move too fast for testers, that black box testers writing up test plans and working through manual test scripts or constantly updating their Quality Center or Selenium UI regression tests can never catch up to a team delivering new features in short sprints. If the testers don’t have the technical skills to at least write acceptance tests in something like Fitnesse or Cucumber, or if they don’t have the business domain knowledge to help fill in for the Customer/Product Owner and answer developer questions, what are they good for?

This is taken to the extreme in Continuous Deployment,a practice made popular by companies like IMVU and Facebook where developers review their work, write automated tests, check the code and tests in and if the tests pass, the changes are immediately and automatically pushed to production.

Letting Customers test your work

Some shops look at Continuous Deployment as a chance to “crowdsource” their testing – by getting their customers to do their testing for them. It’s actually promoted as a competitive advantage. But it’s really hard – maybe impossible – to write secure and reliable software this way, as I have looked at before. For a critical review of the quality of a system continuously deployed to customers, read James Bach’s fascinating post on 20 minutes spent testing one of the poster child apps for Continuous Deployment and the problems that they found in the app in just a short period of time.

Other Continuous Deployment shops are more careful and follow Etsy/Flickr’s approach of dark launching: deploying changes continuously, but testing and reviewing them before turning them on progressively for customers and closely monitoring the outcome.

Regardless, it’s important to remember that there are some things that customers can test and in fact only customers should test: whether a feature is useful or not, whether a feature is usable, what kind of information they need to do a task properly, what the optimal workflow is. This is what A/B split testing is supposed to be about – experimenting with ideas and features and workflows, collecting usage data and finding out what customers use or like best and what they don’t. To evaluate alternatives and get feedback.

But you don’t ask your customers to test whether something is finished or not, whether the code works or not, whether the system is stable and secure or whether it will perform under load.

What do you need from your test team?

Even the best, most responsible and experienced developers make mistakes. In our shop, everyone is an experienced developer – some of them have been working in this domain for 10-15 years or more. They carefully test their own work and update the automated unit/functional test suite for every check-in. These tests and static analysis checks are run in Continuous Integration – we’ve learned to lean heavily on the test suite (there are thousands and thousands of tests now with a high level of coverage) and on static analysis bug checking and security vulnerability checking tools to find common coding mistakes. All code changes are also reviewed by another senior developer – without exception.

Even with good discipline and good tools, good programmers still make mistakes: some subtle (inconsistencies, look-and-feel problems, data conversion and setup, missing edits) and some fundamental (run-time failures under load, concurrency problems, missed requirements, mistakes in rules, errors in error handling). I want to make sure that we find most (if not all) of these mistakes before the customers do. And so do the developers.

That’s where our test team comes in. We have a small, experienced and highly-specialized test team. One tester focuses on acceptance testing, validating functional requirements and usability and workflow with the business. Another tester works on functional regression and business rules correctness and coverage, looking for missing rules and for holes in the developer’s test suites, and automating our integration tests at the API level. And the other tester’s main thing is operational testing, stress testing for spikes and demand shocks and soak testing to look for leaks and GC issues, destructive system testing and bug hunting – actively trying to break the system. They all know enough to fill in for each other when someone is away, but they each have their own unique knowledge and skills and strengths, and their own ways of approaching problems.

When we were first building the system we started with a larger test team focused more on coverage and assurance, with test planning and traceability and detailed manual testing checklists, and writing automated regression tests at the UI. But there was a lot of wasted time and effort working this way.

Now we depend more on automated tests written by the developers underneath the UI for functional coverage and regression protection. Our test team puts most of their effort into exploratory functional and system and operational testing, risk-based and customer-focused targeted tests to find the most important bugs, to find weaknesses and exploit them. They like this approach, I like it, and developers like it, because we find real and important bugs in testing, the kinds of problems that escape code reviews and unit testing.

They smoke test changes as soon as developers check them in, in different customer configurations. They pair up with developers to test through new features and run war games and simulations with the developers to try to find run-time errors and race conditions and timing issues and workflow problems under “real-world” conditions. They fail the system to make sure that the failure-detection and recovery mechanisms work. They test security features and setup and manage pen tests with consultants. They run the system through an operational day. Together with Ops they also handle integration certification with new customers and partners. They do this in short sprints with the rest of the team, releasing to production every 2 weeks (and sometimes more often).

The test team is also responsible for getting the software into production. They put together each release, check the dependencies, they decide when the release is done, what will make it into a release and what won’t, they check that we have done all of the reviews that the team agreed to, they test the roll-back and data conversion routines and then they work with Ops to deploy the release through to production. They don’t slow the team down, they don’t keep us from delivering software. They help us make sure that the software works and that it gets into production safely.

Testers find more than bugs

I’ve worked for a long time in high-assurance, high-integrity businesses where not having testers isn’t an option – the stakes of making mistakes are too high. But I don’t think that you can build real software without someone helping to test it. Unless you are an early stage startup pounding out a proof of concept, or you are a small team building something trivial for internal use (but then you probably won’t read this), you need help testing the system to make sure that it works.

It doesn’t matter how you are working, what method you follow - Agile or Waterfall doesn’t change the need for testers. If you’re moving fast and light, testers need to adapt to the pace and to the way that they get and share information. That’s ok. Good testers can do that.

I’m not naïve enough (any more) to think that the test team will find all of the bugs that might be in the system – or that this is their job. Of course, I hope that the testers will find any important or obvious bugs before customers do.What I need for them to do is to help us to answer some important questions: Are we ready to release? What’s too rough or unstable or incomplete, what needs to be backed-out, or what needs further review, or maybe a rewrite? What’s weak in the design? Where are we missing automated tests? Where do we need better test tools? What features are too hard to understand, or inconsistent, or too hard to setup? What error messages are missing or misleading? Are we trying to do too much, too fast? What do we need to change in the design, or the code, or the way that we design or code the system to make it better, more reliable?

Testing doesn’t provide all possible information, but it provides some. Good testing will provide lots of useful information.James Bach (Satisfice)

Without testers, not only do you put out code that you shouldn’t with bugs that you should have caught – you also lose a lot of important information about how good your software really is and what you need to do to make it better. If you care about building good software, this is an opportunity that you cannot pass up.

Friday, April 13, 2012

Our second interview in the "Ask the Expert" series on AppSec is with Dr. Chenxi Wang at Forrester Research, who looks at the same hard problems in secure software:

How big is the AppSec problem that we are all facing today? Why haven't we been able to solve the problem of writing secure software? Is the problem solvable? Is it really possible for developers to write secure software? If so, where should developers and businesses start? What are the first changes that they need to make?

Tuesday, April 10, 2012

When it comes to static analysis, Bill Pugh, software researcher and the father of Findbugs (the most popular static analysis tool for Java), is one of the few experts who is really worth listening to. He’s not out to hype the technology for commercial gain (Findbugs is a free, Open Source research project), and he provides a balanced perspective based on real experience working with lots of different code bases, including implementing Findbugs at Google.

Development is a zero sum game

Any time spent reviewing and fixing bugs is time taken away from designing and implementing new features, or improving performance, or working with customers to understand the business better, or whatever else may be important. In other words:

“you shouldn’t try to fix everything that is wrong with your code”

At Google, they found thousands of real bugs using Findbugs, and the developers fixed a lot of them. But none of these bugs caused significant production problems. Why? Static analysis tools are especially good at finding stupid mistakes, but not all of these mistakes matter. What we need to fix is the small number of very scary bugs, at the “intersection of stupid and important”.

Working with different static analysis tools over the past 5+ years, we’ve found some real bugs, some noise, and a lot of other “problems” that didn’t turn out to be important. Like everyone else, we’ve tuned the settings and filtered out checks that aren’t important or relevant to us. Each morning a senior developer reviews the findings (there aren’t many), tossing out any false positives and “who cares” and intentional (“the tool doesn’t like it but we do it on purpose and we know it works”) results. All that is left are a handful of real problems that do need to be fixed each month, and a few more code cleanup issues that we agree are worth doing (the code works, but it could be written better).

Another lesson is that finding old bugs isn’t all that important or exciting. If the code has been running for a long time without any serious problems, or if people don't know about or are willing to put up with the problems, then there’s no good reason to go back and fix them – and maybe some good reasons not to. Fixing old bugs, especially in legacy systems that you don’t understand well, is risky: there’s a 5-30% chance of introducing a new bug while trying to fix the old one. And then there’s the cost and risks of rolling out patches. There’s no real pay back. Unless of course, you’ve been looking for a “ghost in the machine” for a long time and the tool might have found it. Or the tool found some serious security vulnerabilities that you weren’t aware of.

The easiest way to get developers to use static analysis is to focus on problems in the code that they are working on now – helping them to catch mistakes as they are making them. It’s easy enough to integrate static analysis checking into Continuous Integration and to report only new findings (all of the commercial tools that I have looked at can do this, and Findbugs does this as well). But it’s even better to give immediate feedback to developers – this is why commercial vendors like Klocwork and Coverity are working on providing close-to-immediate feedback to developers in the IDE, and why built-in checkers in IDEs like IntelliJ are so useful.

Getting more out of static analysis

Over a year ago my team switched to static analysis engines for commercial reasons. We haven’t seen a fundamental difference between using one tool or the other, other than adapting to minor differences in workflow and presentation – each tool has its own naming and classification scheme for the same set of problems. The new tool finds some bugs the previous one didn’t, and it’s unfortunately missing a few checks that we used to rely on, but we haven’t seen a big difference in the number or types of problems found. We still use Findbugs as well, because Findbugs continues to find problems that the commercial engines don’t, and it doesn’t add to the cost of checking - it's easy to see and ignore any duplicate findings.

Back in 2010 I looked at the state of static analysis tools for Java and concluded that the core technology had effectively matured – that vendors had squeezed as much as they could from static analysis techniques, and that improvements from this point on would be on better packaging and feedback and workflow, making the tools easier to use and understand. Over the past couple of years that’s what has happened. The latest versions of the leading tools provide better reporting and management dashboards, make it easier to track bugs across branches and integrate with other development tools, and just about everything is now available in the Cloud.

Checking engines are getting much faster, which is good when it comes to providing feedback to developers. But the tools are checking for the same problems, with a few tweaks here and there. Speed changes how the tools can be used by developers, but doesn’t change what the tools can do.

Based on what has happened over the past 2 or 3 years, I don’t expect to see any significant improvements in static analysis bug detection for Java going forward, in the kinds of problems that these tools can find – at least until/if Oracle makes some significant changes to the language in Java 8 or 9 or something and we’ll need new checkers for new kinds of mistakes.

Want more? Do it yourself…

Bill Pugh admits that Findbugs at least is about as good as it is going to get. In order to find more bugs or find bugs more accurately, developers will need to write their own custom rules checkers. Most if not all of the static analysis tools let you write your own checkers, using different analysis functions of their engines.Gary McGraw at Cigital agrees that a lot of the real power in static analysis comes from from writing your own detectors:

In our experience, organizations obtain the bulk of the benefit in static analysis implementations when they mature towards customization. For instance, imagine using your static analysis tool to remind developers to use your secure-by-default web portal APIs and follow your secure coding standards as part of their nightly build feedback. (Unfortunately, the bulk of the industry's experience remains centered around implementing the base tool.)

If tool providers can make it simple and obvious for programmers to write their own rules, it opens up possibilities for writing higher-value, project-specific and context-specific checks. To enforce patterns and idioms and conventions. Effectively, more design-level checking than code-level checking.

Another way that static analysis tools can be extended and customized is by annotating the source code. Findbugs, Intellij, Coverity, Fortify, Klocwork (and other tools I’m sure) allow you to improve the accuracy of checkers by annotating your source code to include information that the tools can use to help track control flow or data flow, or to suppress checks on purpose.

If JSR-305 gets momentum (it was supposed to make it into Java 7, but didn’t) and tool suppliers all agree to follow common annotation conventions, it might encourage more developers to try it out. Otherwise you need to make changes to your code base tied to a particular tool, which is not a good idea.

But is it worth it?

It takes a lot of work to get developers to use static analysis tools and fix the bugs that the tools find. Getting developers to take extra time to annotate code or to understand and write custom code checkers is much more difficult, especially with the state of this technology today. It demands a high level of commitment and discipline and strong technical skills, and I am not convinced that the returns will justify the investment.

We’re back to the zero sum problem. Yes, you will probably catch some more problems with customized static analysis tools, and you’ll have less noise to filter through. But you will get a much bigger return out of getting the team to spend that time on code reviews or pairing, or more time on design and prototyping, or writing better tests. Outside of high-integrity environments and specialist work done by consultants, I don’t see these ideas being adopted or making a real impact on software quality or software security.

Monday, April 9, 2012

Frank Kim and I are working on a series of posts where we ask experts on security and software development hard questions about the essential problems of building secure software. The first of these posts is an interview with Jeremiah Grossman, CTO of WhiteHat Security.

Jeremiah takes on some of the biggest and hardest questions: How big is the AppSec problem? The software community is made up of a lot of smart people. Why haven't we been able to solve the problem of writing secure software? And Is the problem solvable?

Wednesday, April 4, 2012

Sometimes a programmer will come to me and explain that they don’t like the design of something and that “we’re gonna need to do a whole bunch of refactoring” to make it right. Oh Oh. This doesn’t sound good. And it doesn’t sound like refactoring either….

Refactoring, as originally defined by Martin Fowler and Kent Beck, is

A change made to the internal structure of software to make it easier to understand and cheaper to modify without changing its observable behavior… It is a disciplined way to clean up code that minimizes the chances of introducing bugs.

Refactoring is done to fill in short-cuts, eliminate duplication and dead code, and to make the design and logic clear. To make better and clearer use of the programming language. To take advantage of information that you have now but that the programmer didn’t have then – or that they didn’t take advantage of then. Always to simplify the code and to make it easier to understand. Always to make it easier and safer to change in the future.

Fixing any bugs that you find along the way is not refactoring. Optimization is not refactoring. Tightening up error handling and adding defensive code is not refactoring. Making the code more testable is not refactoring – although this may happen as the result of refactoring. All of these are good things to do. But they aren’t refactoring.

Programmers, especially programmers maintaining code, have always cleaned up code as part of their job. It’s natural and often necessary to get the job done. What Martin Fowler and others did was to formalize the practices of restructuring code, and to document a catalog of common and proven refactoring patterns – the goals and steps.

Refactoring is simple. Protect yourself from making mistakes by first writing tests where you can. Make structural changes to the code in small, independent and safe steps, and test the code after each of these steps to ensure that you haven’t changed the behavior – it still works the same, just looks different. Refactoring patterns and refactoring tools in modern IDEs make refactoring easy, safe and cheap.

Refactoring isn’t and end in itself

Refactoring is supposed to be a practice that supports making changes to code. You refactor code before making changes, so that you can confirm your understanding of the code and make it easier and safer to put your change in. Regression test your refactoring work. Then make your fix or changes. Test again. And afterwards maybe refactor some more of the code to make the intent of the changes clearer. And test everything again. Refactor, then change. Or change, then refactor.

You don’t decide to refactor, you refactor because you want to do something else, and refactoring helps you do that other thing.

The scope of your refactoring work should be driven by the change or fix that you need to make – what do you need to do to make the change safer and cleaner? In other words: Don’t refactor for the sake of refactoring. Don’t refactor code that you aren’t changing or preparing to change.

Scratch Refactoring to Understand

There’s also Scratch Refactoring from Michael Feather’s Working Effectively with Legacy Code book; what Martin Fowler calls “Refactoring to Understand”. This is where you take code that you don’t understand (or can’t stand) and clean it up so that you can get a better idea of what is going on before you start to actually work on changing it for real, or to help in debugging it. Rename variables and methods once you figure out what they really mean, delete code that you don’t want to look at (or don’t think works), break complex conditional statements down, break long routines into smaller ones that you can get your head around.

Don't bother reviewing and testing all of these changes. The point is to move fast – this is a quick and dirty prototype to give you a view into the code and how it works. Learn from it and throw it away. Scratch refactoring also lets you test out different refactoring approaches and learn more about refactoring techniques. Michael Feathers recommends that you keep notes during this on anything that wasn’t obvious or that was especially useful, so that you can come back and do a proper job later - in small, disciplined steps, with tests.

What about “Large Scale” Refactoring?

You can get a big return in understandability and maintainability from making simple and obvious refactoring changes: eliminating duplication, changing variable and method names to be more meaningful, extracting methods to make code easier to understand and more reusable, simplifying conditional logic, replacing a magic number with a named constant, moving common code together.

There is a big difference between minor, inline refactoring like this, and more fundamental design restructuring – what Martin Fowler refers to as “Big Refactoring”. Big, expensive changes that carry a lot of technical risk. This isn’t cleaning up code and improving the design while you are working: this is fundamental redesign.

Some people like to call redesign or rewriting or replatforming or reengineering a system “Large Scale Refactoring” because technically you aren’t changing behavior – the business logic and inputs and outputs stay the same, it’s “only” the design and implementation that’s changing. The difference seems to be that you can rewrite code or even an entire system, and as long as you do it in steps, you can still call it “refactoring”, whether you are slowly Strangling a legacy system with new code, or making large-scale changes to the architecture of a system.

“Large Scale Refactoring” changes can be ugly. They can take weeks or months (or years) to complete, requiring changes to many different parts of the code. They need to be broken down and released in multiple steps, requiring temporary scaffolding and detours, especially if you are working in short Agile sprints. This is where practices like Branch by Abstraction come in to play, to help you manage changes inside the code over a long period of time.

In the meantime you have to keep working with the old code and new code together, making the code harder to follow and harder to change, more brittle and buggy - the opposite of what refactoring is supposed to achieve. Sometimes this can go on forever – the transition work never gets completed because most of the benefits are realized early, or because the consultant who came up with the idea left to go on to something else, or the budget got cut, and you’re stuck maintaining a Frankensystem.

This is Refactoring - That Isn't

Mixing this kind of heavy project work up with the discipline of refactoring-as-you-go is wrong. They are fundamentally different kinds of work, with very different costs and risks. It muddies up what people think refactoring is, and how refactoring should be done.

Refactoring can and should be folded in to how you write and maintain code – a part of the everyday discipline of development, like writing tests and reviewing code. It should be done quietly, continuously and implicitly. It becomes part of the cost of doing work, folded in to estimates and risk assessments. Done properly, it doesn’t need to be explained or justified.

Refactoring that takes a few minutes or an hour or two as part of a change is just part of the job. Refactoring that can take several days or longer is not refactoring; it is rewriting or redesigning. If you have to set aside explicit blocks of time (or an entire sprint!) to refactor code, if you have to get permission or make a business case for code cleanup, then you aren’t refactoring – even if you are using refactoring techniques and tools, you’re doing something else.

Some programmers believe it is their right and responsibility to make fundamental and significant changes to code, to reimagine and rewrite it, in the name of refactoring and for the sake of the future and for their craft. Sometimes redesigning and rewriting code is the right thing to do. But be honest and clear. Don’t hide this under the name of refactoring.

Subscribe to this blog

About Me

I am an experienced software development manager, project manager and CTO focused on hard problems in software development and maintenance, software quality and security. For the last 15 years I have managed teams building and operating high-performance financial systems.
My special interest is how small teams can be most effective in building real software: high-quality, secure systems at the extreme limits of reliability, performance, and adaptability. Software that has to work, that is built right, and built to last.
I use this blog to explore ideas and problems in software development that are important to me. To reflect and to find new answers.