Author: Alexander Tarlinder

As developers take increasingly larger explicit responsibility for quality, many in the testing community have been wrestling with what the tester role should look like. With very few exceptions, it’s generally agreed upon that the tester who only executes scripted tests on a finished product is a threatened species. What many teams need is someone who is a spokesperson for quality from a broader perspective. Enter the tester 2.0—the Quality Champion.

Given the average agile team doing its devops, the Quality Champion must be a jack of many trades. I expect such a person to have knowledge in three major areas: testing, development (including devops), and product (ownership). And since every Quality Champion will be coming from a different background, we must forget about T-shapes and π-shapes. Now we’re talking amoeba shape!

The Quality Champion competency amoeba.

Please keep in mind that it’s really hard to be strong in aspects of quality-related work, hence the amoeba shape. The point is rather than there’s a lot of work for a Quality Champion, and that you can shoulder the role coming from different backgrounds. Here’s a list of some competencies that I think are vital for a Quality Champion. I don’t expect this list to be complete, but it should provide a solid starting point.

Testing

Quality Champions must have testing skills. They should be the team’s go-to-people when it comes to selecting testing techniques, deciding on a test strategy, and handling defects (ideally, all of which should be fixed immediately during the ongoing iteration, but what about the ones that won’t be or need more investigation or external decisions?)

Quality Champions should know what tools there are out there, especially when it comes to more specific uses. Obviously, the developers will choose the unit testing and mocking frameworks (nothing stops the Quality Champion from having knowledge and opinions here), but what about load testing tools? Which vendor do you pay if you want mobile testing in the cloud? Should the team explore testing based on image recognition or model-based testing?

While test data management is a programmatic activity in teams with heavy automation, how data is selected, and what data is relevant, is something the Quality Champion should be concerned with.

Speaking of automation, it’s not bad if the Quality Champion has an understanding of what kind of automation strategy the team should aim for.

Development

The Quality Champion can have a deep expertise in the tools and frameworks the developers are using: What are their capabilities, limitations, and what will the quality of the developers’ automated checking be?

On the must side, is the ability and willingness to coach developers in testing techniques: “How many unit tests does this state diagram translate into? Did you remember this boundary value and this heuristic? This business rule may require two bat wings, one snake tail, and a pinch of generative testing.”

Where we are tight now as an industry—many are running containerized micro services—requires the Quality Champion to understand, and fight for, light-weight virtualization and infrastructure as code as a means to achieve testability. Now that everybody can run the entire system on their machine, with whatever data they like, what expectations on testing and quality do we have? If the team isn’t there yet, what is the first step?

Product

It’s not uncommon for the Quality Champion to be the team member who has the most time to spend with the product owner and other stakeholders. This results in deep understanding of the domain and its business rules. Domain knowledge is therefore a reasonable expectation.

A team’s planning meeting, be it called Sprint planning or something else, is where the quality work begins. It’s imperative that the team thinks about acceptance criteria, test data, test infrastructure, and the overall approach to testing in the upcoming iteration in this very meeting. If the team lacks the habit to tackle these issues, it’s a must that the Quality Champion steps up here. On a related topic, it’s not unreasonable to expect that the Quality Champion is the one who pushes practices such as BDD or ATDD in the team.

Needless to say, the Quality Champion has the most experience in exploratory testing, but may also be quite familiar with usability testing, and general strategies for developing and validating a product.

* * *

This is a mouthful, but building potentially shippable product increments iteratively in a consistent manner is a non-trivial task, and so is ensuring that quality remains prevalent throughout the process.

In the testing parlor, there’s a term called “bug advocacy.” The Quality Champion should strive for quality advocacy.

By the way, did you notice that I capitalized the Qs and Cs in “quality champion” to get QC, the name of a beloved tool? I couldn’t resist 🙂

In a developer testing training, I turn the participants’ first encounter with mock objects and stubs into an exercise in developing hand-written test doubles. Then, if time permits, we reiterate using a mocking framework. In this post, I’ll explain why.

Terminology plays a central role in developer testing. By being precise in our wording and labeling, we become accurate when it comes to selecting a technique. Think of it like:

Dragon – Bring a shield and sword

Vampire – Bring garlic, holy water, and a wooden stake

Werewolf – Bring silver projectiles

If we know exactly what something is, the chances are greater that we bring the right gear. Having browsed through numerous codebases, I can say that few concepts are as abused as the “mock”. Here are some of the interpretations that I can recall having seen throughout the years:

Stub

Organizer class/entity

System that isn’t quite ready for production

Database with test data

Stub or fake that replaces a system or component

To explain the purpose of the mock object, I get the participants to implement it roughly like this:

Endless variations are possible, of course, but the central point remains quite clear: a mock object can fail a test, whereas a stub can’t. Yes, the assertions in the class are clunky, but they get the message across. Apart from explaining what a mock object is actually supposed to be doing, this approach demonstrates that there’s no magic involved (although the things some frameworks need to do to get mock objects working may seem magic).

Still, why go through all this hassle to defend a word? There are two reasons. The first is general clarity and simplicity. You’ll be reading your tests differently knowing that something called “stub” will only provide indirect input, and something called “mock” will be performing verifications to check that a certain interaction has happened in a certain manner.

…which you don’t want in your code. (Note, by the way, that this is one of those cases where the framework makes you call a stub a mock.)

Second, more important, is that we want to avoid the domain of interaction testing as much as possible. While we certainly can design code and tests well, it’s still a fact that verifications do require knowledge of how two program elements interact with each other, and they encode that knowledge in the test. This tends to make tests brittle.

In summation, by having my trainees implement mocks and stubs “by hand” once, I hope to:

Emphasize the distinction between a stub and a mock (and a spy for advanced participants)

Show that mocks are not magic

Have people write clean tests where the purpose of the test double is clear

Reduce the number of tests that revolve around unnecessary interactions

A few days ago, I listened to Gojko Adzic’s talk “Humans vs Computers: five key challenges for software quality tomorrow” on Jfokus. It was a great talk and it really gave some food for thought. This summary will not do it justice, but basically the plot was that our software is now being used by other software, and there’s AI, and voice recognition and a mix of all this will (and already does) cause new kinds of trouble. Not only must we be prepared for the fact the “user” is no longer human; we must also take into account new edge cases, such as twins for facial and voice recognition, and the fact that our software may stop working because someone else’s does. All in all, the risk is rightfully shifted towards integration and to handle it, we need to turn to monitoring of unexpected behavior. This made Mr. Adzic propose that we do something about the test automation pyramid. Turn it upside down maybe?

Personally, I vote for the test automation monolith :), or rectangle. I’ll tell you why. First, I have to admit that this talk made some pieces fall into place for me. My ambition in regards to developer testing is to raise the bar in the industry. I don’t want us to wonder about how many unit tests we need to write or how we should name them. Mocks and stubs should be used appropriately, and testability should be in every developer’s autonomic nervous system. But why? And here’s the eye opener: Because we’ll need to be solving harder problems in a few years (if not today already). Instead of, or more likely in addition to, finding simple boundary values to avoid off-by-one errors, we’ll also need to handle the twins using voice authorization to login to our software. Needless to say, we shouldn’t spend too much of our mental juice writing simple unit tests and alike.

That being said, we can’t abandon the bottom layer of the pyramid. Imagine handling strange AI-induced edge cases in a codebase that isn’t properly crafted for testability and tested to some degree. It would probably be the equivalent of adding unit tests to poorly designed code or even worse. Yes, monitoring will probably play a greater part in the software of tomorrow, but isn’t it just another facet of observability?

So, what will probably happen next is that the top of the testing pyramid will grow thicker, maybe like this (couldn’t resist the “AI”):

I’ve been talking about this topic for years, and have been asked questions about it while giving presentations on developer testing. Still, I’ve been unable to articulate fully what I mean by the “Developer Testing Maturity Curve”. Many wanted to know, because it’s tempting to rank your organization once there’s something to rank against.

After organizing my thoughts for a while I’ve finally come up with a model that matches my experience of how developer testing tends to get implemented across different organizations. Just a caveat: I may revise this model if I learn more or have an epiphany, but this is what it looks like today.

Graph Axis

The horizontal axis is technical maturity. It’s the knowledge and understanding of various tools and techniques. I consider employing a unit testing framework “immature”, which means that you need relatively little technical skill to author some simple unit tests. Conversely, implementing an infrastructure that enables repeatable, automated end-to-end tests would be on the other side of the scale.

The vertical axis is more elusive. In the image, I call it “organizational maturity”, but in reality, it means several things:

Understanding that developer testing (any automated testing, in fact) must be allowed to take time

Acknowledging testability as a primary quality attribute and designing the systems accordingly

Willingness to refactor legacy code to make it testable

Time and motivation to clean up test data to make it work for you, not against you

Dedicating time and resources to cleaning up any other old sins that prevent from harnessing the power of developer testing and automated checking, be they related to infrastructure, architecture, or the development process in general

You should get the idea… If you’re in an organization that has internalized the above, you won’t be throwing quality and (developer) testing out the window as soon as there’s a slightest risk of not meeting a deadline.

The Maturity Zones

The hard part of this model was to place the individual practices in the different zones. Fortunately, in my experience, technical and organizational maturities seem to go hand in hand. By that, I mean that I haven’t seen an organization with superior technical maturity that would totally neglect the organizational climate needed to sustain the technical practices, and vice versa. After having made this discovery, placing the individual practices became easier. Next, I’ll describe what they are.

Immature

Unit tests

Having only unit tests is immature in my opinion. True, certain systems can run solely on unit tests, but they are the exception. If you only do unit tests, you probably have no way of dealing with integration and realistic test data. From an organizational point of view, having only unit tests means that developers write them because they must, not because they see any value in them. Had this been the case, they’d engage in other developer testing activities as well.

Mocking frameworks

This is the only artifact on the border. Let me explain. A mature way of employing a mocking framework assumes that you understand how indirect input and output affects your design and its testability, and then use the framework to produce the correct type of test double. The immature approach is to call all test doubles “mocks” and use the framework because everyone else seems to.

Mature

Specialized frameworks

These are frameworks that help you solve a specific testing problem. They may be employable at unit test level, but they’re most frequent (and arguably useful) for integration tests. Examples? QuickCheck, Selenium WebDriver, RestAssured, Code Contracts. Some of these frameworks may be entire topics and areas of competence in themselves. Therefore, I consider it mature to make use of them.

Integration tests without orchestration

What does “without orchestration” mean? It means that the framework you’re using does everything for you and that you don’t need to write any code to start components or set them to a certain state. I’m thinking about frameworks like WireMock, Dumbster, or Spring’s test facilities.

Using a BDD framework

You can use a BDD/ATDD framework to launch tests, hopefully the complex ones. This requires its infrastructure and training, so I consider it mature (barely). In this sector, you’re not reaping the full benefits of BDD, just using the tools.

Internalized

Leveraging BDD

In contrast to the immature case, where the BDD framework is just a tool, in this sector the organization understands that BDD is about shared understanding and a common vocabulary. Furthermore, various stakeholders are involved in creating specifications together, and they use concrete examples to do so.

Testability as primary a quality attribute

Testability­—controllability, observability, smallness—applies at all levels of system and code design. Organizations that have internalized this practice ensure that all new code and refactored old code take it into account. When it comes to designing code, it’s mostly about ensuring that it’s testable at the unit level and that replacing dependencies with test doubles is easy and natural. At the architectural level it’s about designing the systems so that they can be observed and controlled (and kept small), and that any COTS that enters the organization isn’t just a black box. The same goes for services operated by partners and SaaS solutions.

Integration tests with orchestration

There’s a fine line between such tests and end-to-end tests. The demarcation line, albeit a bit subjective, is the scope of the test. An integration tests with orchestration targets two components/services/systems, but it can’t rely solely on a specialized framework to set them up. It needs to do some heavy lifting.

End-to-end tests

Anything goes here! These tests will start up services and servers, populate databases with data and simulate a user’s interaction with this system. Doing this consistently and in a repeatable fashion is a clear indication of internalized developer testing practices.

All test data is controlled

This is true for a majority of systems: The more complex the tests, the more complex the test data. At some point your tests will most likely not be able to rely on specific entries in the database (“the standard customer”) and you’ll need to implement a layer that creates test data with specific properties on the fly before each test. If you can do this for all test data, I’d say you’ve internalized this practice.

The entire system can be redeployed in a repeatable manner

If you have your end-to-end tests in place, you most likely have ticked off this practice. If not, there’s some work to do. A repeatable deployment usually requires a bit of provisioning, a pinch of database versioning, and a grain of container/server tinkering. Irrespective of the exact composition of your stack, you want to be able to deploy at will. Why is this important from a developer testing perspective? Because it implies controllability, as all moving parts of the system are understood.

And the point is?

Congratulations on making it this far. What actions can you take now? If you really want to gauge your maturity level, please do so. My advice is that you map the areas in the maturity curve to your organization’s/team’s architecture and practices, and start thinking about where to start digging and in what order.

Recently, I was asked by a friend to name a few reasons to automate tests. In this context unit or integration tests—the kind of tests that should be written as part of the actual implementation—don’t count. Rather, we’re talking about test automation done either after the code’s been written, or in parallel with the implementation work. After having pondered this for a while, I came up with the following reasons.

Please keep in mind that most test automation isn’t really about automating tests, but checks. Testing usually refers to a process more creative than the work of a computer running the same verifications over and over again. This said, automation can be used to test something to a degree. I’ll get to that later.

We automate tests…

…to prevent regressions and instill courage

The more tests we can run frequently and repeatedly, the safer we feel about making changes to the code. An extensive suite of unit tests can make us feel quite safe, but a battery of automated end-to-end tests, performance, and system integration tests lets us make major code changes in complex systems without the fear and anxiety that normally accompanies refactoring of non-trivial code.

…to make any kind of iterative development work

This is an extension of the previous argument. Suppose that our team builds the software in iterations (or sprints, if you will). In iteration one, it’s able to deliver three tested stories: A, B, and C. In iteration two, it manages to deliver two more stories, D, E, while ensuring that A, B, and C still work. Iteration three adds yet another three stories, F, G, and H. Now, had there been no automated tests up until this point, regression testing would take more and more of our team’s time and its velocity would drop iteration by iteration. In fact, a team’s velocity can become more or less negative in cases where a small change results in days or weeks of regression testing, or avoiding the change altogether.

In conclusion, it’s crucial that tests are automated in teams that aim to maintain steady or increasing velocity iteration after iteration. Or we can just write “agile teams”.

…to implement continuous delivery

Taking the argument one step further would be stating that continuous delivery, or continuous deployment, just cannot work without all tests being automated. Would you release your system to production several times a day if only its unit tests were automated?

…to mobilize more brainpower and to engage

Test automation is mostly developer work. It requires non-trivial infrastructure, provisioning, and system architecture. On the other hand, if test automation is left solely to developers, without input from testers, some interesting test cases may be left out because of creator’s bias. Therefore, it’s only natural that testers and developers collaborate around test automation, which results in more brains on the problem and more buy-in from everybody.

…to make the system testable

Just as writing unit tests will make basic program elements testable, so will creating higher-level tests to larger components. Developers who know that they’ll end up writing automated tests (checks) for their system will obviously design it to accommodate that. A trivial example is putting in proper ids in web pages, so what WebDriver testing becomes a breeze, but other examples are easy to find.

…to verify the acceptance criteria of a story

Yes, acceptance test-driven development (ATDD), behavior-driven development (BDD), and specification by example (pick your flavor) are mostly about collaboration and shared understanding. That being said, tests that emerge from the specifications based on concrete examples provide a phenomenal opportunity to formalize the act of meeting a story’s acceptance criteria. If the team decides that a certain story has three major acceptance criteria, one developer may start implementing the tests for them, while another starts implementing the actual functionality. For a while, the tests will be red, but once they turn green, there should be indisputable proof that the story has indeed been implemented according to plan.

…to find defects in permutation-intensive scenarios

Many tests—both unit tests and higher level tests—are just examples that confirm something we know about the code. However, there are types of automated tests that can actually find new defects. True, this only applies during their first execution (after which they become normal regression tests), but still.

What tests am I talking about? Generative tests and tests based on models (MBT), of course. Such tests operate given a set of constraints, rather than exact scenarios, and are able to produce states that our brilliant tester mindset hasn’t been able to foresee.

…to find defects 🙂

Finally, if you’re a tech-savvy tester, or a developer in test, why even bother with manual tests? To be usable in the long run, the need to go into the regression test suite anyway. Therefore, why not automate them from the start?

The test pyramid is a very frequently quoted model. I believe it originates from Mike Cohn’s book Succeeding with Agile Software Development using Scrum. Originally, the test pyramid is drawn with three tiers: UI, Service, and Unit, but google it, and you’ll find many adaptations and refinements. I really like this model, because it illustrates so much about how testing is done on an agile team. In this post I aim to present some ways of reading and interpreting it along some dimensions.

Who does the “testing”

Since unit tests are at the bottom of the pyramid, it should come as no surprise that developers will actually be the ones who create the greatest amount of test code. Unit tests are the best place to employ standard testing techniques like equivalence partitioning, boundary value analysis and various sorts of table-based techniques, which means that there’ll be quite a few of them (there are other reasons as well, of course, like TDD). This doesn’t say anything about the testing process as a whole, but the fact remains: developers will create the most test artifacts and do the most checking.

Ratio

Visually, the model implies that there’s a ratio between the layers, i.e, there’s a relation like 1:x between service level tests and unit tests, and there’s a relation 1:y between UI tests and service tests. Personally, I don’t think it’s meaningful to strive for a certain ratio as such. Different systems with different architectures and history will have different ratios. As long as there are more lower-level tests we should be fine. However, for reasons listed in the following sections, we really want the majority of the tests at the bottom of the pyramid.

Level of abstraction and Language

The higher up in the pyramid, the more domain-related the language, or at least it should be. Good unit tests most likely use domain concepts in their code and read as specifications, but they can get away with compact names related to the solution domain at times. This doesn’t work for higher-level tests, since they often work by orchestrating bits and pieces of quite complex test infrastructures in many cases. A typical example is a UI-based test of a specific scenario. The underlying test code will interact with a layer typically called “flow layer” or “scenario layer”, which in turn will orchestrate Page Objects or the like. So, basically, the test will talk to the test infrastructure using language like “login,” “open this customer,” “buy three drill presses.”

Cost

People often mention that the test cost increases as we move up along the pyramid’s tiers. For models that put manual testing at the top, this is certainly true. And yes, the top of the pyramid is inhabited by more complex tests. However, it’s not a truism that that the cost of such tests should be spiralling.

Tests at the top of the pyramid need more code and more moving parts, so they’ll be more expensive, but good teams will have ways of working and a test infrastructure that make the price of creating yet another higher-level test reasonable.

Tooling

Different layers/tiers require different tools. For example, unit tests will most likely rely on a unit testing framework and some mocking framework. In some specific cases, some kind of special-purpose testing library (used in unit tests) will be required.

Tests than run through the user interface will obviously use libraries for automating interaction with web pages, fat clients, or mobile apps. Tests in the middle tier will probably use the most diverse flora of tooling. They may include lightweight servers, things like Spring Boot, in-memory databases, libraries for managing transactions and test data setup; you getthe picture.

Also, BDD frameworks, if used, will most likely be used in the middle or topmost tier (or both), as well as tools for model-based testing.

Execution Time

As we move up towards the top of the pyramid, the tests have a larger footprint: they may require entire servers to be up and running, databases to be repopulated, a series of API calls over a slow network, etc. This naturally will affect their execution time. A corollary of this is that we should strive to push tests as far down the pyramid as we can.

Feedback

Related to the previous point. Tests closer to the top will execute slower and consequently provide delayed feedback.Not only that, but the quality of the information they provide will most likely be lower. A UI-based end-to-end test is usually not the best tool for error localization, since there’s virtually no practical way for it to truly understand the system(s) it tests—not at the level of granularity needed to provide detailed information about what went wrong anyway.

Communication and Stakeholder Involvement

Tests at the top of the pyramid can have a very distinct advantage over unit tests: they may be authored so that they’re interesting to non-technical stakeholders. A good implementation of ATDD, BDD, or specification by example will produce a manageable quantity of tests at a high enough level of abstraction to be interesting to non-technical stakeholders, given that the documentation part is relevant and well written.

The test pyramid also tells us something about environment dependence. Unit tests are, by definition, environment independent. Service-level tests will often make some assumptions about the environment: a port must be open, a process can be launched, there’s a disk to write to, etc. Finally, UI tests probably depend on pretty much the entire system to be running. Depending on the architecture and method of deployment, the environment dependence may become absolute. Try a Cobol backend + licensed database with enterprise features…

These are some dimensions I find useful to discuss in putting the test pyramid to work when deciding on a testing strategy. You may use others, so please share.

I’ve always cared about the quality of my code. However, I didn’t always have the necessary tools to achieve it. For the first 6-7 years (of hobby programming in high school and during my university studies) I had to resort to what we call “manual checking” nowadays. Then my professional career began and I got exposed to different types of organizations and their ways of working.

Fast forward 15 years. By now I had worked in, or closely to, roughly 25 teams and I could see some patterns emerge. Basically, I’ve encountered three ways of working with quality assurance, and I’ve seen traces of a fourth. This is what I’ve found.

Cowboy/chaos teams: I’ve encountered these teams in small organizations or as guerilla teams operating under the corporate radar. Such teams have neither testers nor anything that resembles quality assurance. Their testing amounts to manual checking performed by the developers in conjunction with a release or if one of their rather frequent bugs is fixed. Such teams start out as very fast, but the code they produce crumbles under its own weight after a few months. Fortunately, the teams I’ve observed have worked in areas where bugs weren’t that much of a problem: nobody would get injured and they wouldn’t make the newspapers either. Bugs would “only” make the customers unhappy.

Teams with testers on the team: My experience of such teams ranges from newly formed teams struggling to understand how to work in a cross-functional iterative and incremental manner, to rather well-oiled ones. Unfortunately the first category has been dominating. In organizations that have practiced a strict division of labor, i.e. separate development and QA, forming cross-functional teams isn’t an easy task. The mini waterfall iteration is a common anti-pattern, and it’s a result of old habits: Developers still throw untested code over the wall (although, the wall isn’t there anymore), and testers keep compensating for an inferior development process by just checking that the developers haven’t made any major boo boos. There isn’t any time to do more than that, since all testing is crammed into the last two days of the iteration.

Code quality may also be an issue in teams that are just starting out as agile/cross-functional. Back in the old days, they released once every few months and took the occasional pain of integration hell and manual regression testing. This way of working has left them unprepared for frequent deliveries, which in turn requires skills in unit testing, refactoring, and continuous integration and deployment.

The teams that I’ve seen that have mastered the basics of the above development techniques have been addressing the challenge of making agile testing truly work: make the testers proactive instead of reactive, plan all testing activities properly during iteration planning meetings, pair test/program, automate manual checks, and perfect exploratory testing.

Teams that only rely on developer testing: I’ve also been able to work in teams that relied solely on developer testing. In such teams pretty much all checks were automated and they’ve had thousands of unit tests, and hundreds of integration and end-to-end tests. They were also proficient in refactoring and continuous delivery. My experience is that these teams only lacked good exploratory testing and sometimes specialized types of testing, like usability or security testing. Given that they had a good product owner (and they were able to foster them to some extent too), the worst mistake they would do was to produce something less aesthetically appealing or not 100% user-friendly.

I don’t have enough evidence, but I believe that the difference between chaotic cowboy teams and teams that to developer testing lies in the mission criticality of their product. The teams that I have experience of that did developer testing all worked on software that had to be correct. In the absence of a set of QA activities and people that would perform them, they self-organized into embracing developer testing.

Organizations with separate QA and development: Despite more than 15+ years in the industry, I haven’t seen the true waterfall setup up close. However, I’ve seen traces and residues of it when working in banking and the travel industry. To be fair, we have to acknowledge that banks and flight booking usually work, so obviously it works to separate development and testing. Then again, the code in these systems is really hard to change and few people dare to touch it, much less refactor or delete something. This is a result of hand-written test protocols, rather than test code, I believe.

Given this experience, I’m convinced that you should put your money in the developer testing basket. This is why:

Teams that do only developer testing can improve with good testers, but manage without

Teams that aim to become cross-functional and good at agile testing will be helped by developer testing practices, since they’ll make their code better and free up their testers’ time

Chaos teams that do cowboy coding have a fighting chance to improve their code and quality if they engage in developer testing

As always, your mileage may vary, but my unscientific study of 25 teams in banking, the travel industry, gaming, the public sector, and directory services has pointed me in the direction of developer testing. I’d love to hear your stories, examples, and counter examples.