We all know that when we do something new, for the first time, we make discoveries; and all software projects (and in fact change efforts of any variety) target something new.

(You can find out what that is by asking, “What will we be able to do when this is done that we can’t do right now? What will our customers, our staff or our systems be able to do?” This is the differentiating capability. There may be more than one, especially if the organization is used to delivering large buckets of work.)

Often, though, the discoveries that are made will slow things down, or make them impossible. Too many contexts to consider. Third parties that don’t want to co-operate, let alone collaborate. A scarcity of skills. Whatever we discover, sometimes it becomes apparent that the effort is never going to pay back for itself.

Of course, if you’ve invested a lot of money or time into the effort, it’s tempting to keep throwing more after it: the sunk-cost fallacy. So here are three questions that orgs which are resilient and resistant to that temptation are able to answer:

How can you tell if it’s failing?

What’s the process for terminating or redirecting failing efforts?

What happens to people who do that?

If you can’t answer those questions proudly for your org, or your project, you’re probably over-investing… which, on a regular basis, means throwing good money after bad, and wasting the time and effort of good people.

This has been an interesting year for me. At the end of March I came out of one of the largest Agile transformations ever attempted (still going, surprisingly well), and learned way more than I ever thought possible about how adoption works at scale (or doesn’t… making it safe-to-fail turns out to be important).

The learning keeps going. I’ve just done Sharon L. Bowman’s amazing “Training from the Back of the Room” course, and following the Enterprise Services Planning Executive Summit, I’ve signed up for the five-day course for that, too.

That last one’s exciting for me. I’ve been doing Agile for long enough now that I’m finding it hard to spot new learning opportunities within the Agile space. Sure, there’s still plenty for me to learn about psychology, we’re still getting that BDD message out and learning more all the time, and there’s occasional gems like Paul Goddard’s “Improv–ing Agile Teams” that go to places I hadn’t thought of.

It’s been a fair few years since I experienced something of a paradigm shift in thinking, though. The ESP Summit gave that to me and more.

Starting from Where You Are Now

Getting 50+ managers of MD level and up in a room together, with relatively few coaches, changes the dynamic of the conversations. It becomes far less about how our particular toolboxes can help, and more about what problems are still outstanding that we haven’t solved yet.

Of course, they’re all human problems. The thing is that it isn’t necessarily the current culture that’s the problem; it’s often self-supporting structures and systems that have been in place for a long time. Removing one can often lead to a lack of support for another, which cascades. Someone once referred to an Agile transformation at a client as “the worst implementation of Agile I’ve ever seen”, and they were right; except it wasn’t an implementation, but an adoption. Of course it’s hard to do Agile when you can’t get a server, you’ve got regulatory requirements to consider, you’ve got five main stakeholders for every project, nobody understands the new roles they’ve been asked to play and you’re still running a yearly budgeting cycle – just some of the common problems that I’ve come across in a number of large clients.

Unless you’ve got a sense of urgency so powerful that you’re willing to risk throwing the baby out with the bathwater, incremental change is the way to go, but where do you start, and what do you change first?

The thing I like most about Kanban, and about ESP, is that “start from where you are now” mentality. Sure, it would be fantastic if we could start creating cross-functional teams immediately. But even if we do that, in a large organization it still takes weeks or months to put together any group that can execute on the proposed ideas and get them live, and it’s hard to see the benefits without doing that.

There’s been a bit of a shift in the Agile space away from the notion that cross-functional teams are necessarily where we start, which means we’re shifting away from some of the core concepts of Agile itself.

Dan North and Chris Matts, my long-time friends and mentors, have been busy creating a thing called Business Mapping, in which they help organizations match their investments and budgets to the capacity they actually have to deliver, while slowly growing “staff liquidity” that allows for more flexible delivery.

Enterprise Services Planning achieves much the same result, with a focus on disciplined, data-driven change that I found challenging but exciting: firstly because I realise I haven’t done enough data collection in the past, and secondly because it directs leaders to trust maths, rather than instincts. This is still Kanban, but on steroids: not just people working together in a team, but teams working together; not just leadership at every level, but people using the information at their disposal to drive change and experiment.

The Advent of Adhocracy

Professor Julian Birkenshaw’s keynote was the biggest paradigm shift I’ve experienced since Dave Snowden introduced me to Cynefin, and those of you who know how much I love that little framework understand that I’m not using the phrase lightly.

Julian talks about three different ages:

The Industrial Age: Capital and labour are scarce resources. Creates a bureaucracy in which position is privileged, co-ordination achieved by rules, decisions made through hierarchy, and people motivated by extrinsic rewards.

The Information Age: Capital and labour are no longer scarce, but knowledge and information are. Creates a meritocracy in which knowledge is privileged, co-ordination achieved by mutual adjustment, decisions made through logical argument and people motivated by personal mastery.

The Post-Information Age: Knowledge and information are no longer scarce, but action and conviction are. Creates an adhocracy in which action is privileged, co-ordination is achieved around opportunity, decisions are made through experimentation and people are motivated by achievement.

As Julian talked about this, I found myself thinking about the difference between the start-ups I’ve worked with and the large, global organizations.

I wondered – could making the right kind of information more freely available, and helping people within those organizations achieve personal mastery, give an organization the ability to move into that “adhocracy”? There are still plenty of places which worry about cost per head, when the value is actually in the relationships between people – the value stream – and not the people as individuals. If we had better measurements of that value, would it help us improve those relationships? Would we, as coaches and consultants, develop more of an adhocracy ourselves, and be able to seize opportunities for change as and when they become available?

I keep hearing people within those large organizations make comments about “start-up mindset” and ability to react to the market, but without having Dan and Chris’s “staff liquidity”, knowledge still becomes the constraint, and without having quick information about what’s working and what isn’t, small adjustments based on long-term plans rather than routine experimentation around opportunity becomes the norm.

So I’m going off to get myself more tools, so that I can help organizations to get that information, make sense of it, and create that flexibility; not just in their products and services, but in their changes and adoptions and transformations too.

And I’ll be thinking about this new pattern all the time. It feels like it fits into a bunch of other stuff, but I don’t know how yet.

A few years back, I went to visit a company that had managed to achieve a high level of agility without high levels of coaching or training, shipping several times a day. I was curious as to how they had done it. It turned out to be a product of a highly experimental culture, and we spent a whole day swapping my BDD knowledge for their stories of how they managed to reach the place they were in.

While I was there, I saw a very interesting graph that looked a bit like this:

“That’s interesting,” I said. “Is that your bug count over time? What happened?”

“Well,” one of them said, “we realised our bug count was growing, so we hired a new developer. We thought we’d rotate our existing team through a bug-fixing role, and we hypothesized that it would bring the bug count down. And it worked, for a while – that’s the first dip. It worked so well, we thought we’d hire another developer, so that we could rotate another team member, and we thought that would get rid of the bugs… but they started going up again.”

“Ah,” I said wisely. “The developer was no good?” (Human beings like to spot patterns and think they understand root causes – and I’m human too.)

“Nope.” They were all smiling, waiting for me to guess.

“Two new people was just too many? They got complacent because someone was fixing the bugs? The existing team was fed up of the bug-fixing role?” I ran through all the causes I could think of.

“Nope.”

“All right. Who was writing the bugs?” I asked.

“Nobody.”

I was confused.

“The bugs were already there,” one of them explained. “The users had spotted that we were fixing them, and started reporting them. The bug count going up… that was a good thing.”

And I looked at the graph, and suddenly understood. I didn’t know Cynefin back then, and I didn’t understand complexity, but I did understand perverse incentives, and here was a positive version. In retrospect, the cause was obvious. It’s the same reason why crime goes up when policemen patrol the streets; because it’s easier to report it.

Conversely, a good way to have a low bug count is to make it hard to report. I spent a good few years working in Waterfall environments, and I can remember the arguments I had about whether something in my work was genuinely a bug or not… making it much harder for anyone testing my code, which meant I looked like a good developer (I really wasn’t).

Whenever we do anything in a complex system, we get unexpected side-effects. Another example of this is the Hawthorne effect, which goes something like this:

“Do you work better in this factory if we turn the lights up?”

“Yes.”

“Do you work better if we turn the lights down?” (Just checking our null hypothesis…)

“Yes.”

“What? Um, that’s confusing… do you work better with the lights up, or down?”

“We don’t care; just stop watching us.”

We’ve all come across examples of perverse incentives, which are another kind of unintended consequence. This is what happens when you turn measurements into targets.

When you’re creating a probe, it’s important to have a way of knowing it’s succeeding or failing, it’s true… but the signs of success or failure may only be clear in retrospect. A lot of people who create experiments to try get hung up on one hypothesis, and as a result they obsess over one perceived cause, or one measurement. In the process they might miss signs that the experiment is succeeding or failing, or even misinterpret one as the other.

Rather than having a hypothesis, in complexity, we want coherence – a realistic reason for thinking that the probe might have a good impact, with the understanding that we might not necessarily get the particular outcome we’re thinking of. This is why I get people creating probes to run through multiple scenarios of success or failure, so they think about what things they might want to be watching, or how they can put appropriate signals in place, to which they can apply some common sense in retrospect.

As we’ve seen, watching is itself an intervention… so you probably want to make sure it’s safe-to-fail.

Back in Greek mythology, there was a dog called Cerberus. It guarded the gate to the underworld, and it had three heads.

There was a great guy called Heracles (Hercules in Latin) who was a demi-god, which means he would have been pretty awesome if the gods hadn’t intervened to make go mad and kill his entire wife and family. Greek gods in general aren’t very nice, and you wouldn’t want them to visit at Christmas.

Heracles ended up atoning for this with twelve tasks, the last of which was to capture Cerberus himself. (It was meant to be ten tasks, but his managers decided that he had collaborated with someone from another team on one of them, and got paid by an external stakeholder for a second, so they didn’t count.)

Fortunately, it turns out that BDD’s a bit easier to tame than Cerberus, and it works best if you involve other people from the outset.

The first thing we do with BDD is have a bit of a conversation. If I know nothing about a project, I’ll ask someone to tell me about it, and whenever I think it’s useful, I ask the magic question: “Can you give me an example?”

If they aren’t specific enough – for instance, they say that there are these twelve labours they have to do – I’ll get them to be more specific. “Can you give me an example of a labour?” Labour to me means Jeremy Corbyn, not wrestling the Nemean Lion, so having these conversations helps me to find out about any misunderstandings early.

Eventually, we end up with something we can think about in concrete terms. There’s a template that we use in BDD – Given a context, when an event happens, then an outcome should occur – but even without the template, having a concrete example which starts from some context, has an event happen and ends with a desired outcome (or the best possible outcome that we can reasonably expect to happen, at least) is useful. It lets us talk about those scenarios, and ask questions about other scenarios that might exist.

Exploration by Example (what it could do)

I like to ask two questions, the patterns for which I call Context Questioning, and Outcome Questioning. “Is there any other context, which, for the same event, gives us a different outcome?” And, “Is there any other outcome that’s important?”

The first is easy. We can usually think of ways that our outcome might be thwarted, and if we can’t, then our testers can, because testers have that “break all the things!” mindset that causes them to think of scenarios that might go a different way to the way we expect.

The second is especially useful though, because it lets us talk about what other stakeholders might need to be involved, and what they need. This is particularly important if there’s a transaction happening – both stakeholders get what they want from it, or three if you’re buying a product or service via a third party like Uber. Without it, you might end up getting one stakeholder’s desired outcome but missing the other.

“Why did they leave that wooden horse out there anyway?”

“We should ask a tester. Do we have any left that haven’t been eaten by sea-serpents?”

With these two questioning patterns, we can start to explore the scope of our requirements. We now know what the system could do. What should it do? By deciding which scenarios are out of scope, we narrow down the requirements. If we don’t know enough to decide what our scenarios ought to look like, we can either do something to try it out in a way that’s safe-to-fail and get some understanding (useful if nobody in the organization has ever done it before) or we can find an expert to talk to (if they’re available).

If you find you never talk about scenarios which you decide are out-of-scope or irrelevant, either quickly or later on, you may not be exploring enough.

Specification by Example (what it should do)

Now we’ve narrowed it down to what it should do, and we’ve got some scenarios to illustrate those aspects of behaviour, we can start turning the ideas into reality. The scenarios give us a focus, and let us continue to ask questions. If we ever come across an area we’re not familiar with, we can fall back into exploration, until we have understanding and can specify once more what our system should do.

“So, if we frown really fiercely and heavily at Charon, he should let us past to get to Cerberus. What does a really fierce frown look like? Can you give me an example?”

“Well, imagine someone just used the words ‘target velocity’ in your hearing…”

“Oh, like this?“

And, once we’ve made our ideas into something real, we can use our scenarios to see whether it does what we thought it should do – a test.

Test by Example (what it does)

We can run through the scenario again, either manually or using automation, to see if it works.

And, ideally, we write the automated version of the test first, so that it gives us a very clear focus and some fast feedback; though if we’re in high-uncertainty and keep having to fall back to exploration, we might want to lay off of that for a bit.

“So, what did happen?”

“Um, I chopped off a head, and it grew back. Twice…”

“Oh, sugar. Hold on… I think Jason ran into the same behaviour last week. I can’t remember whether it was intended or not…”

And that’s it.

Some people really focus on the tools and testing. Others focus on specification. Really, though, it’s exploration that we start with; those thought-experiments that are cheap to change, and not nearly as dangerous as the real thing.

In our software, we’re not even dealing with three-headed dogs or mythical monsters; just people who want things. They aren’t even intent on making it hard for us. Even better, those people often find value in small things, and don’t need us to finish all twelve tasks to have a completed story. It’s pretty easy to explore what they want using examples, then use the results of that conversation as specifications, then as tests.

Even so, if you do manage to tame BDD, with all three of its heads, you’re still pretty awesome.

Nowadays, thanks to some help from Marian Willeke and her incredible understanding of how adults learn, I get to teach capabilities instead. It’s much more fun. This is how I do it.

First off, because I’m into BDD and hypnosis, I sit and imagine some scenarios in which people actually use the learning I’ve given them. Maybe they’re the Product Owner of a Scrum Team, or using BDD for the first time, or they have a good understanding of Agile, and now they’re learning how to coach. I watch them in my head and look at what they do, or I think about what I’ve done, in similar situations.

As with all scenarios, the event that’s happening requires capabilties; the ability to do something, and do it well.

So, for instance, I imagine a team sitting together in a huddle, talking through BDD’s scenarios. Well, you’ll need to be able to use the different strengths of the different roles. And you’ll need to be able to construct well-formed scenarios, and to differentiate between acceptance criteria and a specific example.

If I get stuck thinking about what capabilities I need to teach, I go look at Bloom’s Taxonomy, and the Revised Cognitive Domain – I really like Don Clark’s site. Marian gives some advice; when you’re teaching adults, aim higher than merely remembering; give them something they can actually do with it. The keywords help me to think about the level of expertise that the learners will need to get to (though I don’t always stick to them).

So for instance, I end up with capabilities like these:

Explain BDD and its practices

Apply shortcuts to well-understood requirements to reduce analysis and planning time

Identify core and incidental stakeholders for a project

If I’m training, I use these in conjunction with a bit of teaching, then games or exercises that help attendees really experience the things they’re able to do for the first time, and give me a chance to help them if I see they need it. The learning outcomes make a great advert for the course, too! And I use them as a backlog while I’m running the course, so I always know what’s done and what’s next.

More recently, I’ve been using this technique to put together documents which serve the same purpose for people I can’t train directly. I put the learning outcomes at the start: “When you’ve read this, you will be able to…” It’s fun to relate the titles of each section back to the outcomes at the beginning! And, of course, each capability is an embedded command to someone to actually try out the new skill.

Best of all, each capability comes with its own test. As the person writing the course or document, I can think to myself, “If my student goes on this course or reads this document, will they be able to do this thing?”

And, if they do actually take the course, I can ask them directly: “Do you feel confident now about doing this thing?” It gives me a chance to go over material if I have time, or to offer follow-up support afterwards (which I generally offer with all my courses, anyway).

You can read more about Bloom’s Taxonomy, and see the backlog for one of my BDD courses, on Marian’s site.

Now you should be able to create courses using capabilities, instead of topics. Hopefully you really want to, as well… but the Affective Domain, and what you can do with it, is a topic for another post.

We probe, then sense, then respond.

If you’re familiar with Cynefin, you know that we categorize the obvious, analyze the complicated, probe the complex and act in chaos.

You might also know that those approaches to the different domains come with a direction to sense and respond, as well. In the ordered domains – the obvious and complicated, in which cause and effect are correlated – we sense first, then we categorize or analyze, and then we respond.

In the complex and chaotic domains, we either probe or act first, then sense, then respond.

Most people find action in chaos to be intuitive. It’s a transient domain, after all; it resolves itself quickly, and it might not resolve itself in your favour… and is even less likely to do so if you don’t act (the shallow dive into chaos notwithstanding). We don’t sit around asking, “Hm, I wonder what’s causing this fire?” We focus on putting the fire out first, and that makes sense.

But why do we do this in the complex domain? Why isn’t it useful to make sense of what we’re seeing first, before we design our experiments?

As with many questions involving human cognition, the answer is: cognitive bias.

We see patterns which don’t exist.

The term “epiphany” can be loosely defined as that moment when you say, “Oh! I get it!” because you’ve got a sudden sense of understanding something.

The term “apophany” was originally coined as a German word for the same phenomenon in schizophrenic experiences; that moment when a sufferer says, “Oh! I get it!” when they really don’t. But it’s not just schizophrenics who suffer from this. We all have this tendency to some degree. Pareidolia, the tendency to see faces in objects, is probably the best-known type of apophenia, but we see patterns everywhere.

It’s an important part of our survival. If we learn that the berry from that tree with those type of leaves isn’t good for us, or to be careful of that rock because there are often snakes sunning themselves there, or to watch out for the slippery moss, or that the deer come down here to drink and you can catch them more easily, then you have a greater chance of survival. We’re always, always looking out for patterns. In fact, when we find them, it’s so enjoyable that this pattern-learning, and application of patterns in new contexts, forms the heart of video games and is one reason why they’re horribly addictive.

In fact, our brains reward us for almost seeing the pattern, which encourages us to keep trying… and that’s why gambling is also addictive, because a lot of the time, we almost win.

In the complex domain, cause and effect can only be understood in retrospect.

This is pretty much the definition of a complex domain; one in which we can’t understand cause and effect until after we’ve caused the effect. Additionally, if you do the same thing again and again in a complex domain, it will not always have the same effect each time, so we can’t be sure of which cause might give us the effect. Even the act of trying to make sense of the domain can itself have unexpected consequences!

The problem is, we keep thinking we understand the problem. We can see the root causes. “Oh! I get it!”… and off we blithely go to “fix” our systems.

Then we’re surprised when, for instance, complexity reasserts itself and making our entire organization adopt Scrum doesn’t actually enable us to deliver software like we thought it would (though it might cause chaos, which can give us other opportunities… if we survive it).

This is the danger of sensing the problem in the complex domain; our tendency to assume we can see the causes that we need to shift to get the desired effects. And we really can’t.

The best probes are hypothesis-free.

Or rather, the hypothesis is always, “I think this might have a good impact.” Having a reasonable reason for thinking this is called coherence. It’s really hard, though, to avoid tacking on, “…because this will be the outcome.” In the complex domain, you don’t know what the outcome is going to be. It might not be a good outcome. That’s why we spend so much time making sure our probes are safe-to-fail.

Particularly, if you can’t avoid having a hypothesis around outcomes (and you really can’t), one trick you can try is to have multiple outcomes. These can be conflicting, to help you check that you’re not hung up on any one outcome, or even failure outcomes that you can use to make sure your probe really is safe-to-fail.

Having multiple hypotheses means we’re more likely to find other things that we might need to measure, or other things that we need to make safe.

I really love Sensemaker.

Cognitive Edge, founded by Dave Snowden of Cynefin fame, has a really lovely bit of software called Sensemaker that collects narrative fragments – small stories – and allows the people who write those stories to say something about their desirability using Triads and Dyads and Stones.

Because we don’t know whether a story is desirable or not, the Triads and Dyads that Sensemaker uses are designed to allow for ambiguity. They usually consist of either two or three things that are all good, all bad or all neutral.

For instance, if I want to collect stories about pair programming, I might use a Dyad which has “I want to pair-program on absolutely everything!” at one end, and “I don’t want to pair-program on anything, ever,” at the other. Both of those are so extreme that it’s unlikely anyone wants to be right at either end, but they might be close. Or somewhere in the middle.

In CultureScan, Cognitive Edge use the triad, “Attitudes were about: Control, Vulnerability, or Indifference.” You can see more examples of triads, together with how they work, in the demo.

If lots and lots of people add stories, then we start seeing clusters of patterns, and we can start to think of places where experiments might be possible.

A fitness landscape from Cognitive Edge

In the fitness landscapes revealed by the stories, tightly-bound clusters indicate that the whole system is pretty rigidly set up to provide the stories being seen. We can only move them if there’s something to move them to; for instance, an adjacent cluster. Shifting these will require big changes to the system, which means a higher appetite for risk and failure, for which you need a real sense of urgency.

If you start seeing saddle-points, however, or looser clusters… well, that means there’s support there for something different, and we can make smaller changes that begin to shift the stories.

By looking to see what kind of things the stories there talk about, we can think of experiments we might like to perform. The stories though have to be given to the people who are actually going to run the experiments. Interpreting them or suggesting experiments is heading into analysis territory, which won’t help! Let the people on the ground try things out, and teach them how to design great experiments.

A good probe can be amplified or dampened, watched for success or failure, and is coherent.

Cognitive Edge have a practice called Ritual Dissent, that’s a bit like the “Fly on the Wall” pattern, but is done in a pretty negative way, in that the group to whom the experiment is being presented critiques it against the criteria above. I’ve found that testers, with their critical, “What about this scenario?” mindsets, can really help to make sure that probes really are good probes. Make sure the person presenting can take the criticism!

There’s a tendency in human beings, though, to analyze their way out of failure; to think of failure scenarios, then stop those happening. Failure feels bad. It tells us that our patterns were wrong! That we were suffering from apophany, not epiphany.

But we don’t need to be afraid of apophany. Instead of avoiding failure, we can make our probes safe-to-fail; perhaps by doing them at a scale where failure is survivable, or with safety nets that turn commitments into options instead (like having roll-back capability when releasing, for instance), or – my favourite – simply avoiding the trap of signalling intent when we didn’t mean to, and instead, communicating to people who might care that it’s an experiment we want to try.

One problem I hear repeatedly from people is that they can’t find a good place to start talking about scenarios.

An easy trick is to find the person who fought to get the budget for the project (the primary stakeholder) and ask them why they wanted the project. Often they’ll tell you some story about a particular group of people that they want to support, or some new context that’s coming along that the software needs to work in. All you need to do to get your first scenario in that situation is ask, “Can you give me an example?”

When the project or capability is about non-functionals such as security or performance, though, this can be a bit trickier. =

I can remember when we were talking to the Guardian editors about performance on the R2 project. “When (other national newspaper) went live with their new front page,” one editor explained, “their site went down for 3 days under the weight of people coming to look at the page. We don’t want that to happen.”

Or, as another organization said, “We went live, and it crashed. It took three months to get the site up and running again. The code was so awful we couldn’t fix it.”

These kind of negative stories are often drivers, particularly when there are non-functionals involved. You can always handle the requirements through monitoring instead of testing, but the conversation can’t follow the usual, “Can you give me an example?” pattern, because all the examples are things that people don’t want.

Instead, keep that negativity, and ask questions like, “What performance would we need to have to avoid that happening to us? Do we have a good security strategy for avoiding the hacking attempt that ended up with (major corporation)’s passwords getting stolen? How do we make sure we don’t crash when we go live?” Keep the focus on the negatives, because that’s what we want to avoid.

When you come to write the scenarios down, whether it’s in terms of monitoring or a test, it’s often worth keeping that negative around. You can create positive scenarios to look at the monitoring boundaries, but the negative reminds people why they’re doing this.

Given we’ve gone live with the front page
When Tony Blair resigns on the same day
Then the site shouldn’t go down under the weight of people reading that news.

Remember that if you’re the first people to solve the problem then you’ll need to try something out, but if it’s just an industry standard practice, make sure you’ve got someone on the team who’s implemented it before.

Part of the power of BDD’s scenarios is that they provide examples as to why the behaviour is valuable. You’ll need to convert this to positive behaviour to implement it, but if avoiding the negatives is valuable, include those too, even it’s just text on a wiki or blurb at the top of a feature file… and don’t be afraid to start there. Negative scenarios are hugely powerful, especially since they often have the most interesting stories attached to them.