Testers don’t manage the broader network. We provide information for use by other stakeholders in that network. Those stakeholders have the responsibility and the right to determine what information they will find most useful.

Executives are Entitled and Empowered to Choose their Metrics

Several years ago, I had a long talk about metrics with Hung Quoc Nguyen. Hung runs LogiGear, a successful test lab. He was describing to me some of the metrics that his clients expected. I didn’t like some of these metrics and I asked why he was willing to provide them. Hung explained that he’d discussed this with several executives. They understood that the metrics were imperfect. But they felt that they needed ways to summarize what the organization knew about projects. They felt they needed ways to compare progress, costs, priorities, and risks. They felt they needed ways to organize the information so that they could compare several projects or groups at the same time. And they felt they needed to compare what was happening now to what had happened in the past. Hung then made three points:

These are perfectly legitimate management goals.

Quantification (metrics) is probably necessary to achieve these goals.

The fact that there is no collection of metrics that will do this perfectly (or even terribly well) doesn’t eliminate the need. Without a better alternative, managers will do the best they can with what they’ve got.

Hung concluded that his clients were within their rights to ask for this type of information and that he should provide it to them.

If I remember correctly, Hung also gently chided me for being a bit of a perfectionist. It’s easy to refuse to provide something that isn’t perfect. But that’s not helpful when the perfect isn’t available. He also suggested that when it comes to testers or consultants offering a “better alternative”, every executive has both the right and the responsibility to decide which alternative is the better one for her or his situation.

By this point, I had joined Florida Tech and wasn’t consulting to clients who needed metrics, so I had the luxury of letting this discussion settle in my mind for a while before acting on it.

The biggest surprise for me was how poor a set of core business metrics the investors have to work with. I’m thinking of the numbers in balance sheets, statements of cash flow, and income statements, and the added details in most quarterly and most annual investment reports. These paint an incomplete, often inaccurate picture of the company. The numbers are so subject to manipulation, and present such an incomplete view, that it can be hard to tell whether a company was actually profitable last year or how much their assets are actually worth.

Investors often supplement these numbers with qualitative information about the company (information that may or may not present a more trustworthy picture than the numbers). However, despite the flaws of the metrics, most investors pay careful attention to financial reports.

I suppose I should have expected these problems. My only formal studies of financial metrics (courses on accounting for lawyers and commercial law) encouraged a strong sense of skepticism. And of course, I’ve seen plenty of problems with engineering metrics.

But it was still a surprise that people actually rely on these numbers. People invest enormous amounts of money on the basis of these metrics.

But in the absence of better data, when I make financial decisions (literally, every day), these numbers guide my decisions. It’s not that I like them. It’s that I don’t have better alternatives to them.

Teaching Metrics

I teach software metrics at Florida Tech. These days, I start the course with chapters from Tockey’s Return on Software: Maximizing the Return on Your Software Investment. We study financial statistics and estimate future cost of a hypothetical project. The students see a fair bit of uncertainty. (They experience a fair bit of uncertainty–it can be a difficult experience.) I do this to help my students gain a broader view of their context.

When an executive asks them for software engineering metrics, they are being asked to provide imperfect metrics to managers who are swimming in a sea of imperfect metrics.

It is important (I think very important) to pay attention to the validity of our metrics. It is important to improve them, to find ways to mitigate the risks of using them, and to advise our clients about the characteristics and risks of the data/statistics we supply to them. I think it’s important to use metrics in ways that don’t abuse people. There are ethical issues here, but I think the blanket condemnation of metrics like pass/fail ratios does not begin to address the ethical issues.

The Principles

In the context-driven principles, we wrote (more precisely, I think, I wrote) “Metrics that are not valid are dangerous.” I still mostly (*) agree with these words but I think it is too easy to extend the statement into a position that is dogmatic and counterproductive. If I was writing the Principles today, I would reword this statement in a way that acknowledges the difficulty of the problem and the importance of the context.

(*) The statement that “Metrics that are not valid” is inaccurately absolute. It is not proper to describe a metric as valid (see Trochim and Shadish, Cook & Campbell, for example). Rather, we should talk about metrics as more valid or less valid (shades of gray). The wording “not valid” was a simplification at the time, and in retrospect, should be seen as an oversimplification.

Contexts differ.

Testers provide information to our clients (stakeholders) about the product, about how we tested it, and about what we found.

Our clients get to decide what information they want. We don’t get to decide that for them.

Testers provide services to software projects. We don’t run the projects. We don’t control those projects’ contexts.

In context-driven testing, we respect the fact that contexts differ.

What Does “contexts differ” Really Mean?

I think it means that in different contexts, the people who are our clients:

are going to want different types of information

are going to want us to prioritize our work differently

are going to want us to test differently, to mitigate different risks, and to report our results in different ways.

Contexts don’t just differ for the testers. They differ for the project managers too. The project managers have to report to other people who want whatever information they want.

We don’t manage the project managers. We don’t decide what information they have to give to the people they report to.

Sometimes, Our Clients Want Metrics

Sometimes, a client will ask how many test cases the testers have run:

I don’t think this is a very useful number. It can be misleading. And if I organize my testing with this number in mind, I might do worse testing.

So if a client asks me for this number, I might have a discussion with her or him about why s/he thinks s/he needs this statistic and whether s/he could be happy with something else.

But if my client says, “No, really, I need that number“, I say, OK and give the number.

Sometimes a client will ask about defect removal efficiency:

I think this is a poor excuse for a metric. I have a nice rant about it when I teach my graduate course in software metrics. Bad metric. BAD!

If a client asks for it, I am likely to ask, Are you sure? If they’re willing to listen, I explain my concerns.

But defect removal efficiency (DRE) is a fairly popular metric. It’s in lots of textbooks. People talk about it at conferences. So no matter what I say about it, my client might still want that number. Maybe my client’s boss wants it. Maybe my client’s customer wants it. Maybe my client’s regulator wants it. This is my client’s management context. I don’t think I’m entitled to know all the details of my client’s working situation, so maybe my client will explain why s/he needs this number and maybe s/he won’t.

So if the client says, “No, really, I need the DRE“, I accept that statement as a description of my client’s situation and I say, OK and give the number.

I used to associate shrill accusations of unethicalness with conservatives who were losing control of the hearts and minds of the software development community and didn’t like it, or who were pushing a phony image of community consensus as part of their campaigns to get big contracts, especially big government contracts, or who were using the accusation of unethical as a way of shutting down discussion of whether an idea (unethical!) was any good or not.

Maybe you’ve met some of these people. They said things like:

It is unethical to write code if you don’t have formal, written requirements

It is unethical to test a program if you don’t have written specifications

It is unethical to do exploratory testing

It is unethical to manage software projects without formal measurement programs

It is unethical to count lines of code instead of using function points

It is unethical to count function points instead of lines of code

It is unethical to not adopt best practices

It is unethical to write or design code if you don’t have the right degree or certificate

It should be unethical to write code if you don’t have a license

It seemed to me that some of the people (but not all of the people) who said these things were trying to prop up a losing point of view with fear, uncertainty, doubt — they were using demagoguery as their marketing technique. That I saw as unethical.

Much of my contribution to the social infrastructure of software testing was a conscious rebellion against a closed old boys network that defended itself with dogma and attacked non-conformers as unethical.

wrong versus Wrong

So what’s with this “Using a crummy metric is unethical” ?

Over the past couple of years, I’ve seen a resurgence of ethics-rhetoric. A new set of people have a new set of bad things to condemn:

Now it seems to be unethical to have a certification in software testing that someone doesn’t like

Now it seems to be unethical to follow a heavyweight (heavily documented, scripted) style of testing

Now it seems to be unethical to give a client some data that the client asks for, like a ratio of passing tests to failing ones.

I don’t think these are usually good ideas. In fact, most of the time, I think they’re wrong.

But _U_N_E_T_H_I_C_A_L_?_!_?

I’m not a moral relativist. I think there is evil in the world and I sometimes protest loudly against it. But I think it is essential to differentiate between:

To the extent that we lose track of the difference between wrong and Wrong, I think we damage our ability to treat people who disagree with us with respect. I think we damage our ability to communicate about our professional differences. I think we damage our ability to learn, because the people we most agree with probably have fewer new things to teach us than the people who see the world a little differently.

The difference between wrong and Wrong is especially important for testers who want to think of ourselves (or market ourselves) as context-driven.

Because we understand that what is wrong in some contexts is right in some others.

A “school” provides an organizing social structure for a body of attitudes and knowledge. Schools are often led by one or a few highly visible people.

Over the past few years, several people have gained visibility in the testing community who express ideas and values that sound context-driven to me. Some call themselves context-driven, some don’t. My impression is that some are being told they are not welcome. Others are uncomfortable with a perceived orthodoxy. They like the approach but not the school. They like the ideas, but not the politics.

The context-driven school appeared for years to operate with unified leadership. This appearance was a strength. But it was never quite true: Brian and Bret left early (but they left quietly). I’ve repeatedly raised concerns about the context-driven rhetoric, but relatively quietly. James and I haven’t collaborated successfully for years–this is old news–but for most of that time, our public disagreements were pretty quiet.

I think it is time to go beyond the past illusion of unity, to welcome a new generation of leadership. Not just a new generation of followers. A new generation of leaders. And to embrace their diversity.

There is not one school. There might be none. There might be several. I’m not sure what our real status is today. There will be an evolution and I look forward to seeing the result.

For now, I continue to be enthusiastic about the approach. I still endorse the principles. But what I understand to be the meanings and implications of the principles might not be exactly the same as what you understand. I think that’s OK.

In terms of the politics of The One School, my perception is of an exclusionary tone that has become more emphatic over time. I think this can make good marketing–entertaining presentations, lots of excitement. But does it serve its community? What is the impact on the people who are actually doing the testing: looking for work; looking for advancement in their own careers; striving to increase their skills and professionalism?

For many people, the impact is minimal–they follow their own way.

But for people who align themselves with the school, I think there are risks.

I wasn’t able to travel to CAST last year (health problem), so I watched sessions on video. Watching remotely let me look at things with a different perspective. One of the striking themes in what I saw was a mistrust of test automation. Hey, I agree that regression test automation is a poor bases for an effective comprehensive testing strategy, but the mistrust went beyond that. Manual (session-based, of course) exploratory testing had become a Best Practice.

In the field of software development, I think that people who don’t know much about how to develop software are on a path to lower pay and less job security. Testing-process consultants can be very successful without staying current in these areas of knowledge and skill. But the people they consult to? Not so much.

It was not the details that concerned me. It was the tone. I felt as though I was watching the closing of minds.

I have been concerned about this ever since people in our community (not just our critics–us!) started drawing an analogy between context-driven testing and religion.

An analogy to religion often carries baggage: Divine sources of knowledge; Knowledge of The Truth; Public disagreement with The Truth is Heresy; An attitude that alternative views are irrelevant; An attitude that alternative views are morally wrong.

This illustrates exactly what troubles me. In my view, there are legitimate differences in the testing community. I think that each of the major factions in the testing community has some very smart people, of high integrity, who are worth paying attention to. I’ve learned a lot from people who would never associate themselves with context-driven testing.

Let me illustrate that with some notes on my last week (Feb 27 to March 2):

My students and I reviewed Raza Abbas Syed’s M.Sc. thesis in Computer Science: Investigating Intermittent Software Failures. The supervisor of this work was Dr. Laurie Williams. If she identified herself with any school of software testing, it would probably be Agile, not Context-Driven. But, not surprisingly, the work presented some useful data and suggested interesting ideas. I learned things from it. Should I really stop paying attention to Laurie Williams?

Yesterday, Dr. Keith Gallagher gave a guest lecture in my programmer-testing course on program slicing (see Gallagher & Lyle 1991and Gallagher & Binkley, 1996). This is a cluster of testing/maintenance techniques that haven’t achieved widespread adoption. The tools needed to support that adoption don’t exist yet. Creating them will be very difficult. This is classic Analytical School stuff. But his lecture made me want to learn more about it because it presents a glimpse of an interesting future.

This evening, I’m reading Kasurinen, Taipale & Smolander’s paper, Software Test Automation in Practice: Empirical Observations. One of my students and I will work through it tomorrow. I’m not sure how these folks would classify themselves (or if they would). Probably if they had to self-classify, it would be Analytical School. Comparing myself to a modern Newtonian physicist and them to outdated Aristoteleans strikes me as one part arrogant and five parts wrong.

I think it’s a Bad Idea to alienate, ignore, or marginalize people who do hard work on interesting problems.

I respect the right of any individual to seek his or her own level of ignorance.

But I see it as a disservice to the craft when thought-leaders encourage narrow-mindedness in the people who look to them for guidance.

When I was an undergraduate, I studied mainly math and philosophy. Of the philosophy, I studied mainly Indian philosophy, about 5 semesters’ worth. My step-grandmother was a Buddhist. Friends of mine had consistent views. I was motivated to take the ideas seriously.

One of the profound ideas in those courses was a rejection of the law of the excluded middle. According to that law, if A is a proposition, then A must be true or Not-A must be true (but not both). Some of the Indian texts rejected that. They demanded that the reader consider {A and Not-A} and {neither A nor Not-A}. In terms of the logic of mathematics, this makes no sense (and it is not a view I associate with Indian logicians). But in terms of human affairs, I think the rejection of the law of the excluded middle is a powerful cognitive tool.

I have thought that for about 40 years. I brought that with me in my part of the crafting of the context-driven principles. Something can be the right thing for your context and its opposite can be the right thing for my context.

I think we need to look more sympathetically at more contexts and more solutions. To ask more about what is right with alternative ideas and what we can learn from them. And to develop batteries of skills to work with them. For that, I think we need to get past the politics of The One School of context-driven testing.