When someone asks “I have a big suite of manual tests; which tests (or worse, which test cases) should I automate?”, I often worry about several things.

The first thing is that focusing on test cases is often a pretty lousy way to think about testing. That’s because test cases are often cast in terms of following an explicit procedure in order to observe a specific result. At best, this confirms that the product can work if someone follows that procedure, and it also assumes that any result unobserved in the course of that procedure and after it is unimportant.

The trouble is that there are potentially infinite variations on the procedure, and many factors might make a difference in a test or in its outcome. Will people use the product in only one way? Will this specific data expose a problem? Might other data expose a problem that this data does not? Will a bug appear every time we follow this procedure? The test case often actively suppresses discovery. Test cases are not testing , and bugs don’t follow the test cases.

Your test is a process of activity and reasoning: analyzing risk; designing an experiment; performing the experiment; observing what happens before, during, and after the experiment; interpreting the results; and preparing and a relevant report. This depends on your human intentions, your mindset, and your skill set. Tools can help every activity along the way, but the test is something you do, not something the machine does.

Third: lots of existing test cases are shallow, pointless, out of date, ponderous, inefficient, cryptic, and unmotivated by risk. Often they are focused on the user interface, a level of the product that is often quite unfriendly to tools. Because the test cases exist, they are often pointlessly repeated, long after they have lost any power to find a bug. Why execute pointless test cases more quickly?

It might be a much better idea to create new automated checks that are focused on specific factors of the product, especially at low levels, with the goal of providing quick feedback to the developer. It might be a good idea to prepare those checks as a collaboration between the developer and the tester (or between two developers, one of whom is in the builder’s role, with the other taking on a testing role). It might be a good idea to develop those checks as part of the process of developing some code. And it might be a really good idea to think about tools in a way that goes far beyond faster execution of a test script.

So I encourage people to reframe the question. Instead of thinking what (existing) test cases should I automate? try thinking:

What reason do we have for preserving these test cases at all? If we’re going to use tools effectively, why not design checks with tools in mind from the outset? What tool-assisted experiments could we design to help us learn about the product and discover problems in it?

What parts of a given experiment could tools accelerate, extend, enhance, enable, or intensify?

What do we want to cover? What product factors could we check? (A product factor is something that can be examined during a test, or that might influence the outcome of a test.)

To what product factors (like data, or sequences, or platforms,…) could we apply tools to help us to induce variation into our testing?

How could tools help us to generate representative, valid data and exceptional or pathological data?

In experiments that we might perform on the product, how might tools help us make observations and recognize facts that might otherwise escape our notice? How can tools make invisible things visible?

How can tools help us to generate lots of outcomes that we can analyze to find interesting patterns in the data? How can tools help us to visualize those outcomes?

How could the tool help us to stress out the product; overwhelm it; perturb it; deprive it of things that it needs?

How can developers make various parts of the system more amenable to being operated, observed, and evaluated with the help of tools?

How might tools induce blindness to problems that we might be able to see without them?

What experiments could we perform on the product that we could not perform at all without help from tools?

Remember: if your approach to testing is responsible, clever, efficient, and focused on discovering problems that matter, then tools will help you to find problems that matter in an efficient, clever, and responsible way. If you try to “automate” bad testing, you’ll find yourself doing bad testing faster and worse than you’ve ever done it before.

Sign up for the RST class, taught by both James Bach and Michael Bolton, in Seattle September 26-28, 2018. It’s not just for testers! Anyone who wants to get better at testing is welcome and encouraged to attend—developers, managers, designers, ops people, tech support folk…

People often ask me what constitutes a good amount of unit testing. I think the answer is usually a very high level of coverage but it’s fair to say that there are smart people that disagree. The truth is probably that there is not one answer to this question. Notwithstanding this, I have been able to give some pretty consistent guidance with regard to unit testing which I’d like to share. And not surprisingly it doesn’t prescribe a specific level of testing.

No coverage is never acceptable

While it’s true that some things are harder to test than others I think it’s fair to say that 0% coverage is universally wrong. So the minimum thing I can recommend is everything have at least some coverage. This helps in a number of ways not the least of which is that each subsequent test is much easier to add.

Corollary: “My code can’t be unit tested” is never acceptable

While you may chose to test more lightly (see below) it is always possible to unit test code; there are no exceptions. No matter how crazy the context, it can be mocked. No matter how entangled the global dependencies, they can be faked. The greater the mess, the more valuable it will be to disentangle the problems to create something testable. Refactoring code to enhance its testability is inherently valuable and getting the tests to “somewhere good” is a great way to quantify the value of refactoring that might otherwise go unnoticed.

Unit testing drives good developer behavior overall.

Unit Tests Cross Check Complex Intent

The very best you can do with a unit test, in fact the only thing you can do, is verify that the code does what you intended it to do. This is not the same as verifying correctness but it goes a long way.

you create suitable mocks to create any simulated situation you need

you observe the side-effects on your mocks to validate that your code is taking the correct actions

you observe the publicly visible state of your system to ensure that it is moving correctly from one valid state to the next valid state

you add test-only methods as needed (if needed) to expose properties that need testing but are otherwise not readily visible with the public contract (it’s better if you do this minimally, but for instance test constructors are a common use case)

Given that unit tests are good for the above you then have to consider “Where will those things help?”

When creating telemetry you can observe that you log the correct things the correct number of times.

shipping logging that is wrong is super common and it can cost days or weeks to find out, fix it, and get the right data…

it doesn’t take weeks to unit test logging thoroughly, so great candidate to move fast by testing

logged data that is “mostly correct” is likely to become the bane of your existence

When you have algorithms with complex internal state, the state can be readily verified.

the tests act as living documentation for the state transitions

they protect you against future mistakes

well meaning newcomers and others looking to remove dead code and/or refactor can do so with confidence

if you’re planning such a refactor, “tests first, refactor second” is a great strategy

When you have algorithms with important policy, that policy can be verified.

many systems require “housekeeping” like expiration, deletion, or other maintenance

unit tests can help you trigger those events even if they normally take days, weeks, years

When you have outlying but important success/failure cases, they can be verified.

similar to the above the most exotic cases can be tested

important failure cases where we need to purge the cache or so some other cleanup action that are not exactly policy but are essential for correctness can be tested

these cases are hard to force in end to end tests and easily overlooked in manual testing

in general, situations that are not on the main path but are essential are the most important to verify

Areas under heavy churn can have their chief modes of operation verified.

anywhere there is significant development there is the greatest chance of bugs

whatever mistakes developers tend to make, write tests that will find them and stop them

even if developer check-ins are 99% right that means in any given week something important is gonna bust, that’s the math of it…

In case of complex threading, interleaves can be verified.

By mocking mutex, critical section, or events, every possible interleave can be simulated

weakness: you can only test the interleaves you thought of (but that goes a long way)

that’s actually the universal weakness of unit tests

The above are just examples, the idea being that you use the strength areas of unit tests cross-checked against your code to maximize their value. Even very simple tests that do stuff like “make sure the null cases are handled correctly” are super helpful because it’s easy to get stuff wrong and some edge case might not run in manual testing. These are real problems that cause breakage in your continuous integration and customer issues in production. Super dumb unit tests can stop many of these problems in their tracks.

Once you have some coverage it’s easy to look at the coverage reports and then decide where you should invest more and when. This stuff is great for learning the code and getting people up to speed!

A few notes on some of the other types of testing might be valuable too.

When to Consider Integration Tests

If you can take an entire subsystem and run it largely standalone, or wired out for logging, or anything at all like that really, it creates a great opportunity for what I’ll loosely call “integration testing.” I’m not sure that term is super-well defined actually, but the idea is that more live components can be tested. For instance, maybe you can use real backend servers, or developer servers, and maybe drive your client libraries without actually launching the real UI — that would be a great kind of integration test. Maybe this is done with test users and a stub UI; maybe it’s some light automation; maybe a combination. These test configurations can be used to validate large swaths of code, including communication stacks and server reconfigurations.

It’s important to note that no amount of unit testing can ever tell you if your new server topology is going to work. You can certainly verify that your deployment is what you think it is with something kind of like a unit test (I can validate my deployment files at least) but that isn’t really the same thing.

Something less than your full stack, based on your real code, can go a long way to validating these things. Perhaps without having to implicate UI or other systems that add unnecessary complexity and/or failure modes to the test.

This level of testing can also be great for fuzzing, which is a great way to create valid inputs and/or stimulation that you didn’t think of in unit tests. When fuzzing finds failures, you may want to add new validations to your suite if it turns out you have a big hole.

When to Consider End to End Tests

Again, it would be a mistake to think that these tests have no place in a testing ecosystem. But like unit tests, it’s important to play to their strengths. Do not use them to validate internal algorithms; they’re horrible at that. They don’t give you instant API level failures near the point of failure, there’s complex logging and what not.

But consider this very normal tools example: “I am adopting these new linker flags for my main binary” — in that situation the only test that can be used is an end to end test. Probably none of the unit tests even build with those optimizations on, and if they did they would not be any good at validation anyway, the flags probably behave differently with mini-binaries. Whereas a basic end-to-end suite of “does it crash when I try these 10 essential things” is invaluable.

Likewise, performance, power, and efficiency tests are usually end to end — because the unit tests don’t tell us what we need to know.

Fuzzing may have to happen at this level, but it can be challenging to do all your fuzzing via the UI. Repro steps can get harder and harder as we go down this road as well. External failures that aren’t real problems are also more likely to crop up.

Conclusion

It’s no surprise that a blend of all of these is probably the healthiest thing. Getting to greater than 0% unit-test-coverage universally (i.e. for all classes) is a great goal, if only so that you are then ready to test whatever turns out to be needed.

Generally, people report that adding tests consistently finds useful bugs, and getting them out, while making code more testable, is good for the code base overall. As long as that continues to be the case on your team it’s probably prudent to keep investing in tests, but teams will have to consider all their options to decide how much testing to do and when.

One popular myth, that unit tests have diminishing returns, really should be banished. The thing about unit tests is that any given block of code is about as easy to test as any other. Once you have your mocks set up you can pretty much force any situation. The 100th block of code isn’t harder to reach than the 99th was and it’s just as likely to have bugs in it as any other. People who get high levels of coverage generally report things like “I was sure the last 2 blocks were gonna be a waste of time and then I looked and … the code was wrong.”

But even if returns aren’t diminishing that doesn’t mean they are free. And bugs are not created equal. So ultimately, the blend is up to you.

Today we released a prototype of a C# feature called “nullable reference types“, which is intended to help you find and fix most of your null-related bugs before they blow up at runtime.

We would love for you to install the prototype and try it out on your code! (Or maybe a copy of it! ) Your feedback is going to help us get the feature exactly right before we officially release it.

Read on for an in-depth discussion of the design and rationale, and scroll to the end for instructions on how to get started!

The billion-dollar mistake

Tony Hoare, one of the absolute giants of computer science and recipient of the Turing Award, invented the null reference! It’s crazy these days to think that something as foundational and ubiquitous was invented, but there it is. Many years later in a talk, Sir Tony actually apologized, calling it his “billion-dollar mistake”:

I call it my billion-dollar mistake. It was the invention of the null reference in 1965. At that time, I was designing the first comprehensive type system for references in an object oriented language (ALGOL W). My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn’t resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years.

There’s general agreement that Tony is actually low-balling the cost here. How many null reference exceptions have you gotten over the years? How many of them were in production code that was already tested and shipped? And how much extra effort did it take to verify your code and chase down potential problems to avoid even more of them?

The problem is that null references are so useful. In C#, they are the default value of every reference type. What else would the default value be? What other value would a variable have, until you can decide what else to assign to it? What other value could we pave a freshly allocated array of references over with, until you get around to filling it in?

Also, sometimes null is a sensible value in and of itself. Sometimes you want to represent the fact that, say, a field doesn’t have a value. That it’s ok to pass “nothing” for a parameter. The emphasis is on sometimes, though. And herein lies another part of the problem: Languages like C# don’t let you express whether a null right here is a good idea or not.

Yet!

What can be done?

There are some programming languages, such as F#, that don’t have null references or at least push them to the periphery of the programming experience. One popular approach instead uses option types to express that a value is either None or Some(T) for a given reference type T. Any access to the T value itself is then protected behind a pattern matching operation to see if it is there: The developer is forced, in essence, to “do a null check” before they can get at the value and start dereferencing it.

But that’s not how it works in C#. And here’s the problem: We’re not going to add another kind of nulls to C#. And we’re not going to add another way of checking for those nulls before you access a value. Imagine what a dog’s breakfast that would be! If we are to do something about the problem in C#, it has to be in the context of existing nulls and existing null checks. It has to be in a way that can help you find bugs in existing code without forcing you to rewrite everything.

Step one: expressing intent

The first major problem is that C# does not let you express your intent: is this variable, parameter, field, property, result etc. supposed to be null or not? In other words, is null part of the domain, or is it to be avoided.

We want to add such expressiveness. Either:

A reference is not supposed to be null. In that case it is alright to dereference it, but you should not assign null to it.

A reference is welcome to be null. In that case it is alright to assign null to it, but you should not dereference it without first checking that it isn’t currently null.

Naively, this suggests that we add two new kinds of reference types: “safely nonnullable” reference types (maybe written string!) and “safely nullable” reference types (maybe written string?) in addition to the current, unhappy reference types.

We’re not going to do that. If that’s how we went about it, you’d only get safe nullable behavior going forward, as you start adding these annotations. Any existing code would benefit not at all. I guess you could push your source code into the future by adding a Roslyn analyzer that would complain at you for every “legacy” reference type string in your code that you haven’t yet added ? or ! to. But that would lead to a sea of warnings until you’re done. And once you are, your code would look like it’s swearing at you, with punctuation? on! every? declaration!

In a certain weird way we want something that’s more intrusive in the beginning (complains about current code) and less intrusive in the long run (requires fewer changes to existing code).

This can be achieved if instead we add only one new “safe” kind of reference type, and then reinterpret existing reference types as being the other “safe” kind. More specifically, we think that the default meaning of unannotated reference types such as string should be non-nullable reference types, for a couple of reasons:

We believe that it is more common to want a reference not to be null. Nullable reference types would be the rarer kind (though we don’t have good data to tell us by how much), so they are the ones that should require a new annotation.

The language already has a notion of – and a syntax for – nullable value types. The analogy between the two would make the language addition conceptually easier, and linguistically simpler.

It seems right that you shouldn’t burden yourself or your consumer with cumbersome null values unless you’ve actively decided that you want them. Nulls, not the absence of them, should be the thing that you explicitly have to opt in to.

Here’s what it looks like:

class Person
{
public string FirstName; // Not null
public string? MiddleName; // May be null
public string LastName; // Not null
}

This class is now able to express the intent that everyone has a first and a last name, but only some people have a middle name.

Thus we get to the reason we call this language feature “nullable reference types”: Those are the ones that get added to the language. The nonnullable ones are already there, at least syntactically.

Step two: enforcing behavior

A consequence of this design choice is that any enforcement will add new warnings or errors to existing code!

That seems like a breaking change, and a really bad idea, until you realize that part of the purpose of this feature is to find bugs in existing code. If it can’t find new problems with old code, then it isn’t worth its salt!

So we want it to complain about your existing code. But not obnoxiously. Here’s how we are going to try to strike that balance:

All enforcement of null behavior will be in the form of warnings, not errors. As always, you can choose to run with warnings as errors, but that is up to you.

There’s a compiler switch to turn these new warnings on or off. You’ll only get them when you turn it on, so you can still compile your old code with no change.

The warnings will recognize existing ways of checking for null, and not force you to change your code where you are already diligently doing so.

There is no semantic impact of the nullability annotations, other than the warnings. They don’t affect overload resolution or runtime behavior, and generate the same IL output code. They only affect type inference insofar as it passes them through and keeps track of them in order for the right warnings to occur on the other end.

There is no guaranteed null safety, even if you react to and eliminate all the warnings. There are many holes in the analysis by necessity, and also some by choice.

To that last point: Sometimes a warning is the “correct” thing to do, but would fire all the time on existing code, even when it is actually written in a null safe way. In such cases we will err on the side of convenience, not correctness. We cannot be yielding a “sea of warnings” on existing code: too many people would just turn the warnings back off and never benefit from it.

Once the annotations are in the language, it is possible that folks who want more safety and less convenience can add their own analyzers to juice up the aggresiveness of the warnings. Or maybe we add an “Extreme” mode to the compiler itself for the hardliners.

In light of these design tenets, let’s look at the specific places we will start to yield warnings when the feature is turned on.

Avoiding dereferencing of nulls

First let’s look at how we would deal with the use of the new nullable reference types.

The design goal here is that if you mark some reference types as nullable, but you are already doing a good job of checking them for null before dereferencing, then you shouldn’t get any warnings. This means that the compiler needs to recognize you doing a good job. The way it can do that is through a flow analysis of the consuming code, similar to what it currently does for definite assignment.

More specifically, for certain “tracked variables” it will keep an eye on their “null state” throughout the source code (either “not null” or “may be null“). If an assignment happens, or if a check is made, that can affect the null state in subsequent code. If the variable is dereferenced at a place in the source code where its null state is “may be null“, then a warning is given.

In the example you can see how the null state of ns is affected by checks, assignments and control flow.

Which variables should be tracked? Parameters and locals for sure. There can be more of a discussion around fields and properties in “dotted chains” like x.y.z or this.x, or even a field x where the this. is implicit. We think such fields and properties should also be tracked, so that they can be “absolved” when they have been checked for null:

This is one of those places where we choose convenience over correctness: there are many ways that p.MiddleName could become null between the check and the dereference. We would be able to track only the most blatant ones:

Those are examples of false negatives: we just don’t realize you are doing something dangerous, changing the state that we are reasoning about.

Despite our best efforts, there will also be false positives: Situations where you know that something is not null, but the compiler cannot figure it out. You get an undeserved warning, and you just want to shut it up.

We’re thinking of adding an operator for that, to say that you know better:

The trailing ! on an expression tells the compiler that, despite what it thinks, it shouldn’t worry about that expression being null.

Avoiding nulls

So far, the warnings were about protecting nulls in nullable references from being dereferenced. The other side of the coin is to avoid having nulls at all in the nonnullable references.

There are a couple of ways null values can come into existence, and most of them are worth warning about, whereas a couple of them would cause another “sea of warnings” that is better to avoid:

Assigning or passing null to a non-nullable reference type. That is pretty egregious, right? As a general rule we should warn on that (though there are surprising counterarguments to some cases, still under debate).

Assigning or passing a nullable reference type to a nonnullable one. That’s almost the same as 1, except you don’t know that the value is null – you only suspect it. But that’s good enough for a warning.

A default expression of a nonnullable reference type. again, that is similar to 1, and should yield a warning.

Creating an array with a nonnullable element type, as in new string[10]. Clearly there are nulls being made here – lots of them! But a warning here would be very harsh. Lots of existing code would need to be changed – a large percentage of the worlds existing array creations! Also, there isn’t a really good work around. This seems like one we should just let go.

Using the default constructor of a struct that has a field of nonnullable reference type. This one is sneaky, since the default constructor (which zeroes out the struct) can even be implicitly used in many places. Probably better not to warn, or else many existing struct types would be rendered useless.

Leaving a nonnullable field of a newly constructed object null after construction. This we can do something about! Let’s check to see that every constructor assigns to every field whose type is nonnullable, or else yield a warning.

Once again, there will be cases where you know better than the compiler that either a) that thing being assigned isn’t actually null, or b) it is null but it doesn’t actually matter right here. And again you can use the ! operator to tell the compiler who’s boss:

A day in the life of a null hunter

When you turn the feature on for existing code, everything will be nonnullable by default. That’s probably not a bad default, as we’ve mentioned, but there will likely be places where you should add some ?s.

Luckily, the warnings are going to help you find those places. In the beginning, almost every warning is going to be of the “avoid nulls” kind. All these warnings represent a place where either:

you are putting a null where it doesn’t belong, and you should fix it – you just found a bug! – or

the nonnullable variable involved should actually be changed to be nullable, and you should fix that.

Of course as you start adding ? to declarations that should be allowed to be null, you will start seeing a different kind of warnings, where other parts of your existing code are not written to respect that nullable intent, and do not properly check for nulls before dereferencing. That nullable intent was probably always there but was inexpressible in the code before.

So this is a pretty nice story, as long as you are just working with your own source code. The warnings drive quality and confidence through your source base, and when you’re done, your code is in a much better state.

But of course you’ll be depending on libraries. Those libraries are unlikely to add nullable annotations at exactly the same time as you. If they do so before you turn the feature on, then great: once you turn it on you will start getting useful warnings from their annotations as well as from your own.

If they add anotations after you, however, then the situation is more annoying. Before they do, you will “wrongly” interpret some of their inputs and outputs as non-null. You’ll get warnings you didn’t “deserve”, and miss warnings you should have had. You may have to use ! in a few places, because you really do know better.

After the library owners get around to adding ?s to their signatures, updating to their new version may “break” you in the sense that you now get new and different warnings from before – though at least they’ll be the right warnings this time. It’ll be worth fixing them, and you may also remove some of those !s you temporarily added before.

We spent a large amount of time thinking about mechanisms that could lessen the “blow” of this situation. But at the end of the day we think it’s probably not worth it. We base this in part on the experience from TypeScript, which added a similar feature recently. It shows that in practice, those inconveniences are quite manageable, and in no way inhibitive to adoption. They are certainly not worth the weight of a lot of extra “mechanism” to bridge you over in the interim. The right thing to do if an API you use has not added ?s in the right places is to push its owners to get it done, or even contribute the ?s yourself.

Like all other C# language features, nullable reference types are being designed in the open here: github.com/dotnet/csharplang.
We look forward to walking the last nullable mile with you, and getting to a well-tuned, gentle and useful null-chasing feature with your help!

For a long time, I’ve believed that diagnostic skills are incredibly important for software engineers, and often poorly understood.

The main evidence I see of poor diagnostic skills is on Stack Overflow:

“I have a program that does 10 things, and the output isn’t right. Please fix.”

“I can’t post a short but complete program, because my code is long.”

“I can’t post a short but complete program, because it’s company code.”

I want to tackle this as best I can. If I can move the needle on the average diagnostic skill level, that will probably have more impact on the world than any code I could write. So, this is my new mission, basically.

Expect to see:

Blog posts

Podcast interviews

Screencasts

Conference and user group talks

In particular, I’ve now resolved that when I’m facing a diagnostic situation that at least looks like it’ll be interesting, I’m going to start writing a blog post about it immediately. The blog post will be an honest list of the steps I’ve been through, including those that turned out not to be helpful, to act as case studies. In the process, I hope to improve my own skills (by reflecting on whether the blind alleys were avoidable) and provide examples for other developers to follow as well.

But post Snowden, and particularly after the result of the last election here in the US, it's clear that everything on the web should be encrypted by default.

Why?

You have an unalienable right to privacy, both in the real world and online. And without HTTPS you have zero online privacy – from anyone else on your WiFi, from your network provider, from website operators, from large companies, from the government.

Using HTTPS means nobody can tamper with the content in your web browser. This was a bit of an abstract concern five years ago, but these days, there are more and more instances of upstream providers actively mucking with the data that passes through their pipes. For example, if Comcast detects you have a copyright strike, they'll insert banners into your web content … all your web content! And that's what the good guy scenario looks like – or at least a corporation trying to follow the rules. Imagine what it looks like when someone, or some large company, decides the rules don't apply to them?

So, how do you as an end user "use" encryption on the web? Mostly, you lobby for the websites you use regularly to adopt it. And it's working. In the last year, the use of HTTPS by default on websites has doubled.

Browsers can help, too. By January 2017, Google Chrome will show this alert in the UI when a login or credit card form is displayed on an unencrypted connection:

But there's another essential part required for encryption to work on any websites – the HTTPS certificate. Historically these certificates have been issued by certificate authorities, and they were at least $30 per year per website, sometimes hundreds of dollars per year. Without that required cash each year, without the SSL certificate that you must re-purchase every year in perpetuity – you can't encrypt anything.

Let's Encrypt is a 501.3(c)(3) non-profit organization supported by the Linux Foundation. They've been in beta for about a year now, and to my knowledge they are the only reliable, official free source of SSL certificates that has ever existed.

However, because Let's Encrypt is a non-profit organization, not owned by any company that must make a profit from each SSL certificate they issue, they need our support:

As a company, we've donated a Discourse hosted support community, and a cash amount that represents how much we would have paid in a year to one of the existing for-profit certificate authorities to set up HTTPS for all the Discourse websites we host.

I urge you to do the same:

Estimate how much you would have paid for any free SSL certificates you obtained from Let's Encrypt, and please donate that amount to Let's Encrypt.

If you work for a large company, urge them to sponsor Let's Encrypt as a fundamental cornerstone of a safe web.

If you believe in an unalienable right to privacy on the Internet for every citizen in every nation, please support Let's Encrypt.

[Note: I offered Maaret Pyhäjärvi the right to review this post and suggest edits to it before I published it. She declined.]

A few days ago I was keynoting at the New Testing Conference, in New York City, and I used a slide that has offended some people on Twitter. This blog post is intended to explore that and hopefully improve the chances that if you think I’m a bad guy, you are thinking that for the right reasons and not making a mistake. It’s never fun for me to be a part of something that brings pain to other people. I believe my actions were correct, yet still I am sorry that I caused Maaret hurt, and I will try to think of ways to confer better in the future.

Here’s the theme of this post: Getting up in front of the world to speak your mind is a dangerous process. You will be misunderstood, and that will feel icky. Whether or not you think of yourself as a leader, speaking at a conference IS an act of leadership, and leadership carries certain responsibilities.

I long ago learned to let go of the outcome when I speak in public. I throw the ideas out there, and I do that as an American Aging Overweight Left-Handed Atheist Married Father-And-Father-Figure Rough-Mannered Bearded Male Combative Aggressive Assertive High School Dropout Self-Confident Freedom-Loving Sometimes-Unpleasant-To-People-On-Twitter Intellectual. I know that my ideas will not be considered in a neutral context, but rather in the context of how people feel about all that. I accept that. But, I have been popular and successful as a speaker in the testing world, so maybe, despite all the difficulties, enough of my message and intent gets through, overall.

What I can’t let go of is my responsibility to my audience and the community at large to speak the truth and to do so in a compassionate and reasonable way. Regardless of what anyone else does with our words, I believe we speakers need to think about how our actions help or harm others. I think a lot about this.

Let me clarify. I’m not saying it’s wrong to upset people or to have disagreement. We have several different culture wars (my reviewers said “do you have to say wars?”) going on in the software development and testing worlds right now, and they must continue or be resolved organically in the marketplace of ideas. What I’m saying is that anyone who speaks out publicly must try to be cognizant of what words do and accept the right of others to react.

Although I’m surprised and certainly annoyed by the dark interpretations some people are making of what I did, the burden of such feelings is what I took on when I first put myself forward as a public scold about testing and software engineering, a quarter century ago. My annoyance about being darkly interpreted is not your problem. Your problem, assuming you are reading this and are interested in the state of the testing craft, is to feel what you feel and think what you think, then react as best fits your conscience. Then I listen and try to debug the situation, including helping you debug yourself while I debug myself. This process drives the evolution of our communities. Jay Philips, Ash Coleman, Mike Talks, Ilari Henrik Aegerter, Keith Klain, Anna Royzman, Anne-Marie Charrett, David Greenlees, Aaron Hodder, Michael Bolton, and my own wife all approached me with reactions that helped me write this post. Some others approached me with reactions that weren’t as helpful, and that’s okay, too.

Leadership and The Right of Responding to Leaders

In my code of conduct, I don’t get to say “I’m not a leader.” I can say no one works for me and no one has elected me, but there is more to leadership than that. People with strong voices and ideas gain a certain amount of influence simply by virtue of being interesting. I made myself interesting, and some people want to hear what I have to say. But that comes with an implied condition that I behave reasonably. The community, over time negotiates what “reasonable” means. I am both a participant and a subject of those negotiations. I recommend that we hold each other accountable for our public, professional words. I accept accountability for mine. I insist that this is true for everyone else. Please join me in that insistence.

People who speak at conferences are tacitly asserting that they are thought leaders– that they deserve to influence the community. If that influence comes with a rule that “you can’t talk about me without my permission” it would have a chilling effect on progress. You can keep to yourself, of course; but if you exercise your power of speech in a public forum you cannot cry foul when someone responds to you. Please join me in my affirmation that we all have the right of response when a speaker takes the microphone to keynote at a conference.

Some people have pointed out that it’s not okay to talk back to performers in a comedy show or Broadway play. Okay. So is that what a conference is to you? I guess I believe that conferences should not be for show. Conferences are places for conferring. However, I can accept that some parts of a conference might be run like infomercials or circus acts. There could be a place for that.

The Slide

Here is the slide I used the other day:

Before I explain this slide, try to think what it might mean. What might its purposes be? That’s going to be difficult, without more information about the conference and the talks that happened there. Here are some things I imagine may be going through your mind:

There is someone whose name is Maaret who James thinks he’s different from.

He doesn’t trust nice people. Nice people are false. Is Maaret nice and therefore he doesn’t trust her, or does Maaret trust nice people and therefore James worries that she’s putting herself at risk?

Is James saying that niceness is always false? That’s seems wrong. I have been nice to people whom I genuinely adore.

Is he saying that it is sometimes false? I have smiled and shook hands with people I don’t respect, so, yes, niceness can be false. But not necessarily. Why didn’t he put qualifying language there?

He likes debate and he thinks that Maaret doesn’t? Maybe she just doesn’t like bad debate. Did she actually say she doesn’t like debate?

What if I don’t like debate, does that mean I’m not part of this community?

He thinks excellence requires attention and energy and she doesn’t?

Why is James picking on Maaret?

Look, if all I saw was this slide, I might be upset, too. So, whatever your impression is, I will explain the slide.

Like I said I was speaking at a conference in NYC. Also keynoting was Maaret Pyhäjärvi. We were both speaking about the testing role. I have some strong disagreements with Maaret about the social situation of testers. But as I watched her talk, I was a little surprised at how I agreed with the text and basic concepts of most of Maaret’s actual slides, and a lot of what she said. (I was surprised because Maaret and I have a history. We have clashed in person and on Twitter.) I was a bit worried that some of what I was going to say would seem like a rehash of what she just did, and I didn’t want to seem like I was papering over the serious differences between us. That’s why I decided to add a contrast slide to make sure our differences weren’t lost in the noise. This means a slide that highlights differences, instead of points of connection. There were already too many points of connection.

The slide was designed specifically:

for people to see who were in a specific room at a specific time.

for people who had just seen a talk by Maaret which established the basis of the contrast I was making.

about differences between two people who are both in the spotlight of public discourse.

to express views related to technical culture, not general social culture.

to highlight the difference between two talks for people who were about to see the second talk that might seem similar to the first talk.

for a situation where both I and Maaret were present in the room during the only time that this slide would ever be seen (unless someone tweeted it to people who would certainly not understand the context).

as talking points to accompany my live explanation (which is on video and I assume will be public, someday).

for a situation where I had invited anyone in the audience, including Maaret, to ask me questions or make challenges.

These people had just seen Maaret’s talk and were about to see mine. In the room, I explained the slide and took questions about it. Maaret herself spoke up about it, for which I publicly thanked her for doing so. It wasn’t something I was posting with no explanation or context. Nor was it part of the normal slides of my keynote.

Now I will address some specific issues that came up on Twitter:

1. On Naming Maaret

Maaret has expressed the belief that no one should name another person in their talk without getting their permission first. I vigorously oppose that notion. It’s completely contrary to the workings of a healthy society. If that principle is acceptable, then you must agree that there should be no free press. Instead, I would sayif you stand up and speak in the guise of an expert, then you must be personally accountable for what you say. You are fair game to be named and critiqued. And the weird thing is that Maaret herself, regardless of what she claims to believe, behaves according to my principle of freedom to call people out. She, herself, tweeted my slide and talked about me on Twitter without my permission. Of course, I think that is perfectly acceptable behavior, so I’m not complaining. But it does seem to illustrate that community discourse is more complicated than “be nice” or “never cause someone else trouble with your speech” or “don’t talk about people publicly unless they gave you permission.”

2. On Being Nice

Maaret had a slide in her talk about how we can be kind to each other even though we disagree. I remember her saying the word “nice” but she may have said “kind” and I translated that into “nice” because I believed that’s what she meant. I react to that because, as a person who believes in the importance of integrity and debate over getting along for the sake of appearances, I observe that exhortations to “be nice” or even to “be kind” are often used when people want to quash disturbing ideas and quash the people who offer them. “Be nice” is often code for “stop arguing.” If I stop arguing, much of my voice goes away. I’m not okay with that. No one who believes there is trouble in the world should be okay with that. Each of us gets to have a voice.

I make protests about things that matter to me, you make protests about things that matter to you.

I think we need a way of working together that encourages debate while fostering compassion for each other. I use the word compassion because I want to get away from ritualized command phrases like “be nice.” Compassion is a feeling that you cultivate, rather than a behavior that you conform to or simulate. Compassion is an antithesis of “Rules of Order” and other lists of commandments about courtesy. Compassion is real. Throughout my entire body of work you will find that I promote real craftsmanship over just following instructions. My concern about “niceness” is the same kind of thing.

Look at what I wrote: I said “I don’t trust nice people.” That’s a statement about my feelings and it is generally true, all things being equal. I said “I’m not nice.” Yet, I often behave in pleasant ways, so what did I mean? I meant I seek to behave authentically and compassionately, which looks like “nice” or “kind”, rather than to imagine what behavior would trick people into thinking I am “nice” when indeed I don’t like them. I’m saying people over process, folks.

I was actually not claiming that Maaret is untrustworthy because she is nice, and my words don’t say that. Rather, I was complaining about the implications of following Maaret’s dictum. I was offering an alternative: be authentic and compassionate, then “niceness” and acts of kindness will follow organically. Yes, I do have a worry that Maaret might say something nice to me and I’ll have to wonder “what does that mean? is she serious or just pretending?” Since I don’t want people to worry about whether I am being real, I just tell them “I’m not nice.” If I behave nicely it’s either because I feel genuine good will toward you or because I’m falling down on my responsibility to be honest with you. That second thing happens, but it’s a lapse. (I do try to stay out of rooms with people I don’t respect so that I am not forced to give them opinions they aren’t willing or able to process.)

I now see that my sentence “I want to be authentic and compassionate” could be seen as an independent statement connected to “how I differ from Maaret,” implying that I, unlike her, am authentic and compassionate. That was an errant construction and does not express my intent. The orange text on that line indicated my proposed policy, in the hope that I could persuade her to see it my way. It was not an attack on her. I apologize for that confusion.

3. Debate vs. Dialogue

Maaret had earlier said she doesn’t want debate, but rather dialogue. I have heard this from other Agilists and I find it disturbing. I believe this is code for “I want the freedom to push my ideas on other people without the burden of explaining or defending those ideas.” That’s appropriate for a brainstorming session, but at some point, the brainstorming is done and the judging begins. I believe debate is absolutely required for a healthy professional community. I’m guided in this by dialectical philosophy, the history of scientific progress, the history of civil rights (in fact, all of politics), and the modern adversarial justice system. Look around you. The world is full of heartfelt disagreement. Let’s deal with it. I helped create the culture of small invitational peer conferences in our industry which foster debate. We need those more than ever.

But if you don’t want to deal with it, that’s okay. All that means is that you accept that there is a wall between your friends and those other people whom you refuse to debate with. I will accept the walls if necessary but I would rather resolve the walls. That’s why I open myself and my ideas for debate in public forums.

Debate is not a process of sticking figurative needles into other people. Debate is the exchange of views with the goal of resolving our differences while being accountable for our words and actions. Debate is a learning process. I have occasionally heard from people I think are doing harm to the craft that they believe I debate for the purposes of hurting people instead of trying to find resolution. This is deeply insulting to me, and to anyone who takes his vocation seriously. What’s more, considering that these same people express the view that it’s important to be “nice,” it’s not even nice. Thus, they reveal themselves to be unable to follow their own values. I worry that “Dialogue not debate” is a slogan for just another power group trying to suppress its rivals. Beware the Niceness Gang.

I understand that debating with colleagues may not be fun. But I’m not doing it for fun. I’m doing it because it is my responsibility to build a respectable craft. All testing professionals share this responsibility. Debate serves another purpose, too, managing the boundaries between rival value systems. Through debate we may discover that we occupy completely different paradigms; schools of thought. Debate can’t bridge gaps between entirely different world views, and yet I have a right to my world view just as you have a right to yours.

I admire Jay. I called her and we had a satisfying conversation. I filled her in on the context and she advised me to write this post.

One thing that came up is something very important about debate: the status of ideas is not the only thing that gets modified when you debate someone; what also happens is an evolution of feelings.

Yes I think “I’m right.” I acted according to principles I think are eternal and essential to intellectual progress in society. I’m happy with those principles. But I also have compassion for the feelings of others, and those feelings may hold sway even though I may be technically right. For instance, Maaret tweeted my slide without my permission. That is copyright violation. She’s objectively “wrong” to have done that. But that is irrelevant.

[Note: Maaret points out that this is legal under the fair use doctrine. Of course, that is correct. I forgot about fair use. Of course, that doesn’t change the fact that though I may feel annoyed by her selective publishing of my work, that is irrelevant, because I support her option to do that. I don’t think it was wise or helpful for her to do that, but I wouldn’t seek to bar her from doing so. I believe in freedom to communicate, and I would like her to believe in that freedom, too]

I accept that she felt strongly about doing that, so I [would] choose to waive my rights. I feel that people who tweet my slides, in general, are doing a service for the community. So while I appreciate copyright law, I usually feel okay about my stuff getting tweeted.

I hope that Jay got the sense that I care about her feelings. If Maaret were willing to engage with me she would find that I care about her feelings, too. This does not mean she gets whatever she wants, but it’s a factor that influences my behavior. I did offer her the chance to help me edit this post, but again, she refused.

4. Focus and Energy

Maaret said that eliminating the testing role is a good thing. I worry it will lead to the collapse of craftsmanship. She has a slide that says “from tester to team member” which is a sentiment she has expressed on Twitter that led me to say that I no longer consider her a tester. She confirmed to me that I hurt her feelings by saying that, and indeed I felt bad saying it, except that it is an extremely relevant point. What does it mean to be a tester? This is important to debate. Maaret has confirmed publicly (when I asked a question about this during her talk) that she didn’t mean to denigrate testing by dismissing the value of a testing role on projects. But I don’t agree that we can have it both ways. The testing role, I believe, is a necessary prerequisite for maintaining a healthy testing craft. My key concern is the dilution of focus and energy that would otherwise go to improving the testing craft. This is lost when the role is lost.

This is not an attack on Maaret’s morality. I am worried she is promoting too much generalism for the good of the craft, and she is worried I am promoting too much specialism. This is a matter of professional judgment and perspective. It cannot be settled, I think, but it must be aired.

The Slide Should Not Have Been Tweeted But It’s Okay That It Was

I don’t know what Maaret was trying to accomplish by tweeting my slide out of context. Suffice it to say what is right there on my slide: I believe in authenticity and compassion. If she was acting out of authenticity and compassion then more power to her. But the slide cannot be understood in isolation. People who don’t know me, or who have any axe to grind about what I do, are going to cry “what a cruel man!” My friends contacted me to find out more information.

I want you to know that the slide was one part of a bigger picture that depicts my principled objection to several matters involving another thought leader. That bigger picture is: two talks, one room, all people present for it, a lot of oratory by me explaining the slide, as well as back and forth discussion with the audience. Yes, there were people in the room who didn’t like hearing what I had to say, but “don’t offend anyone, ever” is not a rule I can live by, and neither can you. After all, I’m offended by most of the talks I attend.

Although the slide should not have been tweeted, I accept that it was, and that doing so was within the bounds of acceptable behavior. As I announced at the beginning of my talk, I don’t need anyone to make a safe space for me. Just follow your conscience.

What About My Conscience?

My conscience is clean. I acted out of true conviction to discuss important matters. I used a style familiar to anyone who has ever seen a public debate, or read an opinion piece in the New York Times. I didn’t set out to hurt Maaret’s feelings and I don’t want her feelings to be hurt. I want her to engage in the debate about the future of the craft and be accountable for her ideas. I don’t agree that I was presuming too much in doing so.

Maaret tells me that my slide was “stupid and hurtful.” I believe she and I do not share certain fundamental values about conferring. I will no longer be conferring with her, until and unless those differences are resolved.

Compassion is important to me. I will continue to examine whether I am feeling and showing the compassion for my fellow humans that they are due. These conversations and debates I have with colleagues help me do that.

I agree that making a safe space for students is important. But industry consultants and pundits should be able to cope with the full spectrum, authentic, principled reactions by their peers. Leaders are held to a higher standard, and must be ready and willing to defend their ideas in public forums.

The reaction on Twitter gave me good information about a possible trend toward fragility in the Twitter-facing part of the testing world. There seems to be a significant group of people who prize complete safety over the value that comes fromsafety over confrontation. In the next conference I help arrange, I will set more explicit ground rules, rather than assuming people share something close to my own sense of what is reasonable to do and expect.

I will also start thinking, for each slide in my presentation: “What if this gets tweeted out of context?”

(Oh, and to those who compared me to Donald Trump… Can you even imagine him writing a post like this in response to criticism? BELIEVE ME, he wouldn’t.)