I originally wrote this for the AYE website in 2007. It’s no longer published there so I’m posting it here. Despite itching to tweak some words and add a better conclusion, I resisted the temptation to edit it other than formatting it for this blog. It’s as I wrote it in 2007. (Despite being 4 years old, I think this post is still relevant…perhaps even more so today with Agile having crossed the chasm.)

We were in the middle of my Agile Testing class, and the simulation had run for two rounds so far. Some of the participants created “software” on index cards. Others tested it. Still others deployed it. The participants were wholly engaged in their work for the fictitious “Word Count, Inc.” As the facilitator, I was running the simulation in 15 minute rounds followed by 15 minute reflect-and-adjust mini-retrospectives.

After the second round, during the mini-retrospective, I asked, “What do you see happening?”

“The deployment team looked like they were twiddling their thumbs for most of the round,” one participant observed.

Another participant added, “I think that’s because most of the cards are still on the QA table,” she said. “QA is a bottleneck.”

“No, the problem is that development didn’t deliver anything until the very last minute.” objected one of the QA team members.

“Well that’s because it took us most of the last round to coordinate with the deployment team,” one of the Developers countered.

“Your cards were all mixed up when you delivered them. We sent them back so you could sort them out. That’s hardly a ‘coordination’ problem.” scowled a Deployment team member.

Mixed up source code, software stuck in QA, late deliverables. Sounded like a real world project to me.

I shifted the conversation: “What would you like to change to improve the outcome in the next iteration?”

The answers varied: “Hold more project meetings to coordinate efforts!” “Appoint a project manager to keep everything on track!” “More people in QA!” “Define a source code control process!” The suggestions may all have been different, but there was a general trend: the participants wanted to add control points, process steps, and personnel in an attempt to reduce the chaos.

For the next round, the team adopted new practices: adding a new role of project manager; adding more meetings; and adding a strict change control process. During the next round I observed the team use half their available time standing in a big group discussing how to proceed. It seemed to me that in their attempt to control the chaos, they created a process in which it was almost impossible to get anything done. Once again, they weren’t able to deploy an updated version. And at the end of the round, the project manager quit the role in disgust and went back to “coding” on cards.

The team meant well when they added the role of project manager, and added more meetings, but their strategy backfired.

Most groups that go through the WordCount, Inc. simulation encounter problems similar to the ones that this team encountered. Some react by attempting to introduce the same kinds of controls as this group, with similar results. But some respond differently.

One group responded to the mixed-up-source-code problem by creating a centralized code repository that was visible and shared by all. Instead of creating a change control process to manage the multiple copies of the source code floating around, they posted one copy to be shared by all in a central location: the paper equivalent of source control.

Another group responded to coordination and bottleneck problems by co-locating teams. Instead of holding meetings, they coordinated efforts by working together.

Yet another group established an “automated” regression test suite that the deployment team always ran prior to each deployment. They then posted the test results on a Big Visible Chart so everyone knew the current state of the deployed system.

These steps all had the effect of making the team more Agile by increasing visibility, increasing feedback, improving collaboration, and increasing communication. And the end result for each group was success.

When reflecting-and-adjusting, it’s easy to reach for command-and-control solutions, to add quality gates and checkpoints and formal processes. But the irony is that such process changes often increase the level of chaos rather than reducing it. They introduce delays and bloat the process without solving the core problem.

It happens in the real world too.

One organization struggling with buggy code decided to create a role of Code Czar. Before any code could be checked into the source control system, it had to go through the Code Czar who would walk through the proposed changes with the programmer. The Code Czar role required someone very senior. Someone with tremendous experience with the large, complex code base under development. Someone who was also very, very busy. The result: code checkins were delayed whenever the Code Czar was unavailable. Worse, despite having more experience than anyone else on the team, the Code Czar couldn’t always tell what effect a given set of changes might have. The delays in checkins weren’t worth it; they did not result in an overall improvement in code quality.

By contrast, many teams find that automated unit tests work far better as a code quality feedback mechanism than a designated human code reviewer. Instead of waiting for a very busy person to become available, programmers can find out for themselves in minutes if their latest changes will have undesired side effects.

Even Agile teams that regularly reflect-and-adapt in iteration retrospectives are not immune to the temptation to revert to command-and-control practices. For example, Agile teams struggling to test everything during an iteration sometimes create a formal testing phase outside the iteration. I even heard of one organization that was struggling with completing all the tasks in an iteration attempt to solve the problem by having their Scrum Master do a Work Breakdown Structure (WBS) and delegate tasks to specific team members. Not surprisingly, both solutions caused more problems than they solved.

So how can you tell if a given process change will actually be an improvement and make a team more Agile? Before implementing a process change, consider how (or if) the proposed change supports Agile values like visibility, feedback, communication, collaboration, efficiency, and rapid and frequent deliveries. Also ask yourself these questions:

Does the process change rely on humans achieving perfection? To succeed in the role, the Code Czar would have had to have perfect knowledge of all the interdependencies in the code. Similarly, some processes rely on having perfect requirements up front. Successful practices don’t rely on perfect knowledge or perfect work products. Instead, they rely on fast feedback and visibility to enable the team to detect problems early, correct them while they’re small, and enable the team to improve iteratively.

Does it result in more time talking than working? Beware any process improvement that involves more meetings. More meetings rarely solve either communication or coordination problems. As the project manager in the simulation discovered, talking about work doesn’t increase the amount of work actually accomplished. As an alternative to meetings, consider collaborative working sessions where team members do the work rather than talking about it.

Does it introduce unnecessary delays or false dependencies? Whenever a process change increases the number of formal hand-offs, it slows things down but may not improve the overall outcome. The Code Czar learned this the hard way.

A long time ago, all the way back in 1999, I wrote an article on selecting GUI test automation tools. Someone recently found it and wrote me an email to ask about getting help with evaluating tools. I decided my response might be useful for other people trying to choose tools, so I turned it into a blog post.

By the way, so much has changed since my article on GUI testing tools was published back in 1999 that my approach is a little different these days. There are so many options available now that weren’t 12 years ago, and there are new options appearing nearly every day it seems.

Back in 1999 I advocated a heavy-weight evaluation process. I helped companies evaluate commercial tools, and at the time it made sense to spend lots of time and money on the evaluation process. The cost of making a mistake in tool selection was too high.

After all, once we chose a tool we would have to pay for it, and that licensing fee became a sunk cost. Further, the cost of switching between tools was exorbitant. Tests were tool-specific and could not move from one tool to another. Thus we’d have to throw away anything we created in Tool A if we later decided to adopt Tool B. Further, any new tool would cost even more money in licensing fees. So spending a month evaluating tools before making a 6-figure investment made sense.

But now the market has changed. Open source tools are surpassing commercial tools, so the license fee is less of an issue. There are still commercial tools, but I always recommend looking at the open source tools first to see if there’s anything that fits before diving into commercial tool evaluations.

So here’s my quick and dirty guide to test tool selection.

If you want a tool to do functional test automation (as opposed to unit testing), you will probably need both a framework and a driver.

The framework is responsible for defining the format of the tests, making the connection between the tests and test automation code, executing the tests, and reporting results.

The driver is responsible for manipulating the interface.

So, for example, on my side project entaggle.com, I use Cucumber (framework) with Capybara (driver).

To decide what combination of framework(s) and driver(s) are right for your context…

Step 1. Identify possible frameworks…

Consideration #1: Test Format

The first thing to consider is if you need a framework that supports expressing tests in a natural language (e.g. English), or in code.

This is a question for the whole team, not just the testers or programmers. Everyone on the project must be able to at least read the functional tests. Done well, the tests can become executable requirements. So the functional testing framework needs to support test formats that work for collaboration across the whole team.

Instead of assuming what the various stakeholders want to see, ask them.

In particular, if you are contemplating expressing tests in code, make very sure to ask the business stakeholders how they feel about that. And I don’t mean ask them like, “Hey, you don’t mind the occasional semi-colon, right? It’s no big deal, right? I mean, you’re SMART ENOUGH to read CODE, right?” That kind of questioning backs the business stakeholders into a corner. They might say, “OK,” but it’s only because they’ve been bullied.

I mean mock up some samples and ask like this: “Hey, here’s an example of some tests for our system written in a framework we’re considering using. Can you read this? What do you think it’s testing?” If they are comfortable with the tests, the format is probably going to work. If not, consider other frameworks.

Note that the reason that it’s useful to express expectations in English isn’t to dumb down the tests. This isn’t about making it possible for non-technical people to do all the automation.

Even with frameworks that express tests in natural language, There is still programming involved. Test automation is still inherently about programming.

But by separating the essence of the tests from the test support code, we’re able to separate the concerns in a way that makes it easier to collaborate on the tests, and further the tests become more maintainable and reusable.

When I explain all that, people sometimes ask me, “OK, that’s fine, but what’s the EASIEST test automation tool to learn?” Usually they’re thinking that “easy” is synonymous with “record and playback.”

Such kinds of easy paths may look inviting, but it’s a trap leads into a deep dark swamp from which there may be no escape. None of the tools I’ve talked about do record and playback. Yes, there is a Selenium recorder. I do not recommend using it except as a way to learn.

So natural language tests facilitate collaboration. But I’ve seen organizations write acceptance tests in Java with JUnit using Selenium as the driver and still get a high degree of collaboration. The important thing is the collaboration, not the test format.

In fact, there are advantages to expressing tests in code.

Using the same unit testing framework for the functional tests and the code-facing tests removes one layer of abstraction. That can reduce the complexity of the tests and make it easier for the technical folks to create and update the tests.

But the times I have seen this work well for the organization is when the business people were all technology savvy so they were able to read the tests just fine even when expressed in Java rather than English.

Consideration #2: Programming Language

The next consideration is the production code language.

If your production code is written in…

And you want to express expectations in natural language, consider…

Or you want to express expectations in code, consider…

Java

Robot Framework, JBehave, Fitnesse, Concordion

JUnit, TestNG

Ruby

Cucumber

Test::Unit, RSpec

.NET

Specflow

NUnit

By the way, the tools I’ve mentioned so far are not even remotely close to a comprehensive list. There are lots more tools listed on the AA-FTT spreadsheet. (The AA-FTT is the Agile Alliance Functional Testing Tools group. It’s a program of the Agile Alliance. The spreadsheet came out of work that the AA-FTT community did. If you need help interpreting the spreadsheet, you can ask questions about it on the AA-FTT mail list.)

So, why consider the language that the production code is written in? I advocate choosing a tool that will allow you to write the test automation code in the same language (or at least one of the same languages if there are several) as the production code for a number of reasons:

The programmers will already know the language. This is a huge boon for getting the programmers to collaborate on functional test automation.

It’s probably a real programming language with a real IDE that supports automated refactoring and other kinds of good programming groovy-ness. It’s critical to treat test automation code with the same level of care as production code. Test automation code should be well factored to increase maintainability, remove duplication, and exhibit SOLID principles.

It increases the probability that you’ll be able to bypass the GUI for setting up conditions and data. You may even be able to leverage test helper code from the unit tests. For example, on entaggle.com, I have some data generation code that is shared between the unit tests and the acceptance tests. Such reuse drastically cuts down on the cost of creating and maintaining automated tests.

Consideration #3: The Ecosystem

Finally, as you are considering frameworks, consider also the ecosystem in which that framework will live. I personally dismiss any test framework that does not play nicely with both the source control system and the automated build process or continuous integration server. That means at a bare minimum:

All assets must be flat files, no binaries. So no assets stored in databases, and no XLS spreadsheets (though comma separated values or .CSV files can be OK). In short, if you can’t read all the assets in a plain old text editor like Notepad, you’re going to run into problems with versioning.

It can execute from a command line and return an exit code of 0 if everything passes or some other number if there’s a failure. (You may need more than this to kick off the tests from the automated build and report results, but the exit code criteria is absolutely critical.)

Step 2. Choose your driver(s)…

A driver is just a library that knows how to manipulate the interface you’re testing against. You may actually need more than one driver depending on the interfaces in the system you’re testing. You might need one driver to handle web stuff while another driver can manipulate Windows apps.

Note that the awesome thing about the way test tools work these days is that you can use multiple drivers with any given functional testing framework. In fact, you can use multiple drivers all in a single test. Or you can have a test that executes against multiple interfaces. Not a copy of the test, but actually the same test. By separating concerns, separating the framework from the driver, we make it possible for tests to be completely driver agnostic.

Choosing drivers is often a matter of just finding the most popular driver for your particular technical context. It’s hard for me to offer advice on which drivers are good because there are so many more drivers available than I know about. Most of the work I do these days is web-based. So I use Selenium / WebDriver.

To find a specific driver for a specific kind of interface, look at the tools spreadsheet or ask on the AA-FTT mail list.

Step 3. Experiment

Don’t worry about choosing The One Right tool. Choose something that fits your basic criteria and see how it works in practice. These days it’s so much less costly to experiment and see how things go working with the tool on real stuff than to do an extensive tool evaluation.

How can this possibly be? First, lots of organizations are figuring out that the licensing costs are no longer an issue. Open source tools rule. Better yet, if you go with a tool that lets you express tests in natural language it’s really not that hard to convert tests from one framework to another. I converted a small set of Robot Framework tests to Cucumber and it took me almost no time to convert the tests themselves. The formats were remarkably similar. The test automation code took a little longer, but there was less of it.

Given that the cost of making a mistake on tool choice is so low, I recommend experimenting freely. Try a tool for a couple weeks on real tests for your real project. If it works well for the team, awesome. If not, try a different one.

But whatever you do, don’t spend a month (or more) in meetings speculating about what tools will work. Just pick something to start with so you can try and see right away. (As you all know, empirical evidence trumps speculation. :-))

Eventually, if you are in a larger organization, you might find that a proliferation of testing frameworks becomes a problem. It may be necessary to reduce the number of technologies that have to be supported and make reporting consistent across teams.

But beware premature standardization. Back in 1999, choosing a single tool gave large organizations an economy of scale. They could negotiate better deals on licenses and run everyone through the same training classes. Such economies of scale are evaporating in the open source world where license deals are irrelevant and training is much more likely to be informal and community-based.

So even in a large organization I advocate experimenting extensively before standardizing.

Also, it’s worth noting that while I can see a need to standardize on a testing framework, I see much less need to standardize on drivers. So be careful about what aspects of the test automation ecosystem you standardize on.

Let’s start at the beginning. Somebody, somewhere, needs some software.

Maybe we’re serving an internal “customer” who needs a simple bailing-wire-and-duct-tape app to connect system A with completely unrelated (except that they need to be able to share data) system B. Or maybe we’re in a startup that’s trying to Change the World with a grand vision, or perhaps a modest vision, to give people software that makes their lives better.

Either way, we build software because there are people who need it. Let’s call what users need the Actual Need. We want to serve that Actual Need by building a truly kick butt solution.

On Agile teams we use user stories in an attempt to capture actual needs. For example:

As a Banking Customer I want to use my ATM card to withdraw money from an automated banking machine while I’m in Kiev so that I can buy a cup of fabulous local coffee with the local currency, Hryvnia.

This is way better than “The system shall accept a rectangular piece of plastic made to conform with standard …” It humanizes the problem space and puts the user at the forefront.

But user stories aren’t typically written by real users. They’re written by surrogates: Business Analysts or Product Managers or the like. These internal people go out and find the Actual Need and then set Intentions for what we need to build to meet those needs.

And then the Software Development team brings the intentions to life with the Implementation.

That’s software development in a nutshell from gathering requirements through deployment. It’s all about finding the happy place where we are addressing real needs with awesome solutions.

So here’s the big question:

How do we know that our Intentions matched the Actual Need, that the Implementation matched our Intentions, and ultimately that the Implementation matched the Actual Need?

If software development projects exhibited mathematical properties then we could count on the relationship between each of these three things being symmetrical. That is, if A = B, and B = C, then mathematically speaking, A must also equal C.

But that doesn’t work with software.

We can set Intentions that, if implemented, would match the Actual Need. And we can communicate those Intentions effectively so that we end up with an Implementation that does what we intended it to do. But that does not mean that users won’t experience any problems with the delivered solution.

As an aside, this is fundamentally why waterfall does not work. Even if we could build the perfect requirements document that perfectly captured the actual needs of the business or our user base, there is no way to ensure that the resulting implementation will be a success. And by the time we release it’s way too late.

Back to my assertion that alignment is not symmetrical in software systems. Consider my story of trying to get Hryvnias in Kiev.

So there I was in Kiev with only a few Hryvnia in my pocket. I needed cash. So I took my trusty ATM card to an AutoBank machine. Actually, I walked by any number of AutoBank machines looking for one that had the right symbols on the front so I could be sure my card would work, and that was built into the wall of a bank so that it wasn’t a fake ATM machines designed to skim info. Yes, I can be paranoid sometimes. So anyway, I think I marched Sarah halfway across the city before I found one I would stick my card into.

Having finally found what I deemed to be a trustworthy ATM, I put in my card and entered my pin. And then I got a scary looking message: “You entered your PIN number incorrectly 3 times.” And the machine ate my card.

Here we see an example of how alignment is not symmetric. The Implementation of the ATM software no doubt matched the Intentions of those who built it. ATM machines are a mature and robust technology after all. And the Intentions addressed the Actual Need. Again, ATM machines are a known quantity at this point. But the Implementation spat in the face of my Actual Need. I was thwarted. Not only was my need for cash not met, but now my card was gone. And I had not entered my PIN number 3 times; I just entered it once. (My guess is that the real issue had to do with whether or not the ATM machine supported my card and not with the actual PIN number.)

But I digress. Back to the point.

We have Actual Need, Intentions, and Implementation. How do we know that all three of these things are in alignment?

The product owner can speculate that the Intentions accurately describe the Actual Need. If the product owner is hands off we often see development teams unilaterally asserting that the Implementation matches the Intentions. Worse, some product owners remain hands off because they want plausible deniability when things go wrong. That’s just…ew. Throughout all this, the business stakeholders can assume that by doing what we set out to do, the Implementation will meet the Actual Need and we’ll all be rich.

And we will be fooling ourselves. Such guesses and speculation allow us to become wrapped up in the illusion of progress.

If we want be sure our Implementation matches our Intentions, we have to state those Intentions concretely, with examples and explicit expectations. As long as we’re doing that we might as well go whole hog and do ATDD. It’s the best way I know to drive out ambiguity, clarify assumptions, and provide an automated safety net to alert us any time the Implementation strays from the Intentions. But automated checks aren’t enough. We also have to explore to discover risks and vulnerabilities that would jeopardize the spirit of the Intentions even if not the letter.

All of these activities are aspects of testing. And while testers are still important, not everything that involves some aspect of testing should be done by people with QA or Test in their title.

Too often software teams take a narrow view of “testing.” They think (to paraphrase Stuart Taylor) that it’s about checking or breaking. They relegate it to the people with “QA” or “Test” in their title. Typically we only test whether the Implementation meets the Intentions. We speculate about the rest.

And then we’re surprised by failures in production, angry customers, and declining revenue.

The harsh fact is that empirical evidence trumps speculation. Every. Single. Time. And testing, real testing, is all about getting that empirical evidence. It’s not something testers can do alone. There are too many kinds of testing involved in ensuring that all three things are in alignment: Actual Need, Intentions, and Implementation.

I talk to a lot of people in organizations that use some flavor of Agile. Almost all of them, even the teams that are succeeding wildly with Agile, struggle with testing. It’s easy to say that we test throughout the cycle. It’s harder to do it.

Some teams are really struggling with testing, and it’s affecting their ability to get stories all the way done in a timely manner and with a high enough level of quality. I hear comments like these frequently:

“We’re implementing stories up to the last minute, so we can never finish testing within the sprint.”

“We are trying to automate the acceptance tests, but we’re about to give up. The tests get out of sync with the code too easily. Nine times out of ten, when the build is ‘red’ it’s because the tests are wrong.”

“I’m afraid we’re missing bugs because we never have time to explore the system. But we’re too busy running the regression tests to take time out to explore.”

Using a variation on the 5 Why’s, I dig into the issue with these folks. What I’ve found is that there is one common unifying root cause at the heart of all these challenges:

There is an (unfortunate) belief that testers test, programmers code, and the separation of the two disciplines is important.

In some cases, people within the organization hold this belief explicitly. They subscribe to the notion that the only valid testing is that which is done by an independent tester. Just in case you happen to be among that group, let me dispel the programmers-can’t-test myth right now.

Programmers most certainly can test. Anyone who can wrap their heads around closures and patterns and good design is perfectly capable of wrapping their heads around risks and corner cases and test heuristics. For that matter, some of the best programmers I’ve worked with also turned out to be some of the best testers.

Perhaps your objection is a little different: “Sure, programmers can test,” you say. “But they can’t be objective about their own code. They could test someone else’s but not their own.”

Well, yes. Blindspots tend to perpetuate.

However, as both a tester and a programmer I can tell you that at least for me, time pressure is much more of an issue than inherent subjectivity.

When I feel time pressure, I rush. When I rush, I forget stuff. Later when I find bugs in production, it’s in the areas that I forgot about, in the places where I rushed. Just testing someone else’s code won’t address the problem that time pressure leads to rushing.

However, pairing can address both problems: subjectivity and rushing the job. Pairing with someone else while testing—say, for example, having a programmer pair with a tester—can both ensure we’re testing from multiple perspectives and also that we’re not unduly rushing through while failing to notice that the installer just erased the hard drive.

In other cases, however, the people I am talking to already buy into the idea that programmers can test.

“We don’t suffer from the belief that testers and programmers should be kept separate,” they object. “We believe programmers should test! And our programmers do test! But we still struggle with finishing the regression testing during a sprint.”

“If everyone on the team believes in programmers testing, why aren’t the programmers pitching in to run the manual regression tests?” I counter.

“Because they don’t have time…”

“…because they’re too busy writing new code that the testers won’t have time to test?”

“Um, yeah…”

“Right. You’re telling me testers test and programmers code.”

“Oh.”

So, back to our original problem: the team is struggling to complete testing within a sprint.

Throwing more testing bodies at the problem will not solve the issue. It will result in spending time to bring the new testers up to speed and to filter through large swaths of feedback that doesn’t actually help move the project forward.

Throwing a separate team of test automators at the problem might work as a temporary band-aid but it will end up being very inefficient and expensive in the long run. The separate team of test automators won’t be able to change the source code to improve testability so they will spend more time fighting the code than testing it.

The long term sustainable solution is both simple and brutally difficult: recognize that testing and quality are the responsibility of the whole team, not any given individual.

This is so much easier said than done. Sure, we can say “everyone is responsible for testing and quality.” But when it’s the end of the sprint and the product owner is pushing for more features, it takes an enormous amount of strength and courage to say, “We have undone testing tasks stacking up. Coding more features will not help. We need to focus on testing what we already have.”

For that matter, spending programmer time on making automated tests execute faster and more reliably might seem like pure indulgence in the face of project deadlines.

And internal process metrics that measure programmers and testers separately just exacerbate the problem. Any time programmers are measured on lines of code, checkins, or implemented story points, while testers are measured on defect counts and test cases executed, we’re going to have problems getting team members to see testing as a whole team responsibility.

But when we can get the team to see testing as part of developing a working solution, wonderful things can happen.

Our inventory of coded-but-not-tested stories dissipates as stories no longer languish in the “To Be Tested” column on the task board. We no longer have to deal with the carrying cost of stories that might or might not work as we intended.

Programmers executing manual regression tests are in a better position to see both opportunities to automate, and also opportunities to pare down duplication.

Testers and programmers can collaborate on creating test automation. The result will be significantly better automation than either testers or programmers would have written on their own, created much more efficiently.

As the level of regression test automation increases, testers have more time to do the higher value activity of exploratory testing.

Testing is an activity. Testers happen to be really good at it. We need testers on Agile teams. But if we want real agility, we need to see that completing testing as part of the sprint is the responsibility of the whole team, not just the testers.

And that means we have to do away with the barriers—whether beliefs or metrics or external pressure—that reinforce the “testers test, programmers code” divide.

I’m giving a session at Agile2011 in Salt Lake City at 9AM Wednesday on Exploratory Testing in an Agile Context. The session itself will be entirely hands on: we will explore a hand-held electronic game that I brought while discussing how ET and Agile fit together hand-in-glove. However, I did produce materials for the session: a PDF that’s almost a booklet. Thought you all might like to see it.

A few days ago, I tweeted that I was looking for nominations for events for an Agile timeline and am extremely grateful for all the responses I received.

The request was for the keynote talk that I just presented at PNSQC. I’ve had several requests for the timeline that resulted, so I figure the easiest (and therefore fastest) way to share the resulting timeline would be to share my slides.