Reviewing the community wisdom on technical debt, I am struck by the variety of definitions of the term, which are not merely variations on a theme—they are entirely contradictory concepts. I’m starting to think that we should abandon the term technical debt altogether in favor of more accurate labels.

The debt metaphor is appealing because it offers a way to explain at the executive level that development may be building up software liabilities along with software assets. However, I worry that the metaphor may also be offering some undeserved normalcy to process problems: Businesses, even profitable ones, are used to taking out lines of credit to support operations. Even more insidiously, arguments in favor of good kinds of “technical debt” are easily misused to defend bad kinds, when they really have little in common.

My point here is not merely to offer a taxonomy of technical debt, as others have attempted already (Martin Fowler; Steve McConnell). I don’t think these should be seen as variations on a common theme, but rather as distinct situations with distinct considerations. One definition leads to something that I think is always good; another leads to something that I think is always bad; and yet another leads to something that I think is okay if you’re careful.

The good: releasing early

As originally coined, technical debt referred to being willing to release software early even with an imperfect understanding of the domain or even with an architecture insufficient for the eventual aims of the product (Ward Cunningham; Bob Martin). We know that we’ll have to refactor the domain model as our understanding matures, or else live with gradually slowing development as our domain model grows increasingly out of sync with reality. But we gain a tremendous advantage in the process: We get feedback earlier, which ultimately means we will better understand both the domain and the actual needs of users.

Ward described this situation using a debt metaphor: The benefit we gain by releasing the software early is like taking out a loan that will have to be repaid by refactoring the domain model later, and we will be paying interest in the form of gradually slowing development until that refactoring happens. Ward’s intention was primarily to explain the refactoring he was doing on a project, not so much to justify releasing early.

The debt metaphor was clever, but I don’t think it actually fits. If we don’t release early, does that mean we’re not going into debt? We still have an imperfect understanding of the domain; we still have only a rudimentary architecture. We still have to refactor as our understanding matures in order to avoid slowing development. We have all the negative aspects of debt—the obligation to repay, the interest until we do—with none of the positive aspect—the up-front benefit of early feedback.

Releasing early mitigates our existing liabilities, it doesn’t add to them, and therefore it should not be described as a kind of technical debt. It’s almost always a good idea, but it does take discipline to make it successful: We have to actually listen to the feedback and be willing to change course based on it, and we have to keep the code clean and well-tested at all times so that we can change course as needed.

The bad: making a mess

Most people seem to define technical debt in terms of messy or poorly-tested code. This seems to be pretty universally acknowledged as bad, but it is often regarded as a necessary evil: In order to release on time (nevermind early!), we have to cut corners, even though we know it will cost us later in slowed development and lots of bug-fixing time. In practice, of course, we all know how this turns out: The cut corners lay in wait as traps for future development and the bugs pile up to an unmanageable degree.

If this is debt, it most resembles credit-card debt. It’s easy to take on lots of debt, and the interest is steep and compounds quickly. A credit card can be safe if you actually pay it off in full every month—in technical terms, refactor your code and make sure your tests are solid after every task or (at most) after every story. Otherwise the balance grows quickly and gets out of hand sooner than many of us would like to admit.

But it’s actually worse than that. Most of us have stories of nasty bugs hidden in messy code that took weeks (or longer) to figure out and fix. Debt can pile up, but at least it’s well-behaved; you know what the interest rate is. The problem we’re talking about now is actually a kind of unbounded risk: Any day, a new requirement can come in or a critical bug can be hit that lands in the middle of a poorly-designed and poorly-tested part of the code, potentially requiring a wide-reaching rewrite. Steve Freeman has argued that this is more like selling an unhedged call option: It can be exercised at any moment, putting you at the mercy of the market.

I would further argue that code that is poorly factored or tested should be considered work in process, and subject to the team’s work-in-process limits. A story simply isn’t done until the code is clean and demonstrably correct. I define a defect as anything short of being done: If the code is messy, doesn’t have tests, or simply doesn’t do what the customer actually wanted, those are all examples of defects.

Ideally we would catch such defects before calling the story done in the first place; if so, we don’t take on additional work until that existing work is done. If we declare a story to be done and then discover later that it really wasn’t done, either by someone reporting a bug or by developers finding messy or untested code, we should effectively add the affected story back into our current work-in-process and fix the defect before taking on additional work. Looked at this way, prioritization is meaningless for defects: The story was already prioritized some time ago; the defect is merely a new task on an old story, which is by definition higher priority than any current or future story. (Just be careful to weed out feature requests masquerading as bug reports!)

When faced with time pressure, cutting corners amounts to taking on debt or risk for the company without accountability. Instead, we have many options for explicitly cutting away pieces of work by splitting stories and prioritizing the pieces independently (Bill Wake). Each piece that actually gets prioritized and implemented needs to have clean code and tests to be considered done; the beauty of splitting stories is that you can have piece #1 done and piece #2 not started, instead of having both #1 and #2 started but not done, with a similar amount of effort.

The okay: running an experiment

William Pietri has offered the only interpretation of technical debt that I think actually fits the debt metaphor, having an actual benefit now and an actual promise to pay the debt at a definite point in the future while avoiding unbounded risk. The idea is that it’s okay to have some messy or untested code running as an experiment as long as it is actively tracked and all stakeholders agree that the code will be deleted at the end of the experiment. If the experiment gave promising results, it will generate new ideas for stories to be prioritized and implemented properly.

The benefit of this approach is very rapid feedback during the experiment. It truly counts as a valid debt because the business agrees to it, and agrees to pay it back by allowing the code to be deleted when the experiment is completed. Since the developers know the experimental code is short-lived, it must naturally be isolated from other code so that it is easy to remove later; there is no chance that some unexpected new requirement or reported bug will incur any risk beyond simply ending the experiment and deleting the experimental code early. An active experiment counts as work in process as well, so the team should agree on a limit for the number of active experiments at any one time.

Conclusion

I’m sure there are other definitions of technical debt out there, but these seem to be the big ones. Releasing early was the original definition, and I don’t think it represents debt at all. It’s simply a good idea. Making a mess is the most common definition, and while it’s a little like a debt, it’s even more like the unbounded risk of an unhedged option. It’s always a bad idea. Running experiments is a new but important definition, and actually fits the meaning of debt. It’s okay as long as you track it properly.

Reviewing a draft of this post, William Pietri offered a refinement of the debt metaphor for messy, untested code: “Instead of thinking of it as an unhedged call, one could also talk of loan sharks. If one has substantial gambling debts to a capricious mobster, then one comes to expect erratic demands for extra payments and menacing threats.” I like it: Imagine your programmers making shady backroom deals to get their work done under schedule pressure, and then being hassled by thugs when it comes time to work with that code again.

What do you think? Are there other metaphors that might work better than debt? Do you agree or disagree with my good/bad/okay evaluations?

I’ve been pondering lately what was special about the teams I’ve been on that have felt the most effective to me. It wasn’t just a certain set of practices, though those were helpful. The essence, I think, was a certain openness to ourselves and to each other, a willingness to really look at what we’re doing and how what we’re doing affects the project, ourselves, and each other.

Retrospectives are a specific practice that provides regular opportunities for this kind of openness. But even more, when I’ve really felt productive there’s been a pervading attitude on the team that it’s always okay to take a moment to explore some interaction or event at a deeper, more personal level.

One day, on a team of four, we were programming in two pairs. At some point my partner and I noticed that the other pair had reduced to a solo. We asked, “Where’s your partner?” and he replied, “I think he got upset with me.” My partner replied, “Well, I hope we can talk about that later.” And the really amazing thing is: We did. The partner returned, and described what he had gotten frustrated about, and we were all able to talk constructively about how to deal with that kind of frustration.

Another day, on another team, also while pairing, I noticed my partner withdraw and get quiet. I wasn’t quite sure why, though I was aware that we had been debating some technical point. She started to go on with the task, but I stopped to ask what was up, and she said, “I’m just not getting much ‘Yes, and…’ from you right now.” We had both learned the “Yes, and…” principle from improv classes—which I highly recommend—and it was an easy way to remind ourselves to respect and build upon each others’ ideas rather than only pursuing our own. This quick check-in in the middle of a task short-cut what could have been a long period of frustration, leading instead to a productive, collaborative session.

As I said, this isn’t about specific practices as much as it is about a thorough willingness to talk to each other openly and to respect each others’ thoughts and feelings. Obviously, I think that Agile practices, especially retrospectives and pair programming (when done well), are a good way to foster this kind of openness. In fact, what first drew me into the Extreme Programming community was the openness to learning that it demonstrated.

In my short career up to that point, I felt that I wasn’t really learning much—I was already a pretty good programmer, but the really tricky problems on my projects were less about programming and more about communicating with each other and with the project’s stakeholders and about finding ways to keep everyone sane in the face of growing complexity and business uncertainty. These were the real challenges, but no one was really addressing them—until I found the XP community, who were meeting every month to talk about exactly these issues, and about what they were trying and what they were learning about them.

Now, of course, this kind of concern for how we work together is not the exclusive domain of Agile communities, though some of us often talk that way. The CMM community is likewise dedicated to trying new things and talking candidly about how they’re working, albeit with a more formal approach. The Space Shuttle software project case study described in the CMM book is actually quite an inspiring story of worker empowerment and process experimentation and improvement over time.

Why do many in the Agile community criticize the CMM, lumping it together with the straw man process known as “Waterfall”? I think that Waterfall is an example of formality in a process without authentic openness and introspection. When a manager looks at an undisciplined project and sees it spinning out of control, the most common reaction is to impose some control with some mandated formality. The CMM may be used as a framework for that formality, even though the authors of the CMM themselves strongly warn against using it to impose formality without honest assessment and openness to continuous improvement attached. (They even have nice things to say about XP.)

I’ve been exploring the practice of mindfulness lately, and I think it nicely captures the kind of openness and introspection I’m talking about here. Mindfulness can be practiced through activities such as meditation, improvisation, journaling, and reflective listening, and has documented benefits for compassion, empathy, concentration, and even happiness itself.

I can easily see the appropriate level of formality varying based on the nature of the project, but this quality of mindfulness seems essential to any endeavor. Agile methods tend to be moderate to low on the formality scale, and—when they work—high on the mindfulness scale.

Numbers are tricky, especially when you want to use them to understand business problems. A book I read a few years ago and several blog posts more recently have highlighted this for me, so I got the urge to write about the challenges inherent in understanding with numbers and a couple of helpful tips I’ve picked up from the Agile community.

In The Elegant Solution, Matthew E. May devotes a chapter to talking about the importance of picking the right things to measure for your business (Chapter 11, “Run the Numbers”). The Elegant Solution is one of my two favorite business books (together with The Seven-Day Weekend by Ricardo Semler), filled with Lean principles learned from May’s years teaching for Toyota. But while most of the book is quite practical, its practicality kind of unravels in this chapter.

The problem is, the first several examples he gives show just how darned hard it is to get numbers right. It’s hard enough to decide what you want to measure. But the examples go way further, showing how even math PhDs can get correlations and probabilities horribly wrong. The author doesn’t even seem to be trying to make the point of how hard it is to get the math right. He seems to be trying to merely make the point that it’s important to get the math right. It just turns out that every example is more about getting the math wrong.

The first examples he gives aren’t even from business, starting with the Monty Hall problem. This problem evokes heated debates, with very smart people giving different answers and standing by those different answers insistently. The vast majority of people get it wrong, including people with advanced study in mathematics. I got it wrong at first, and it took several tries to finally accept the right answer. (In my defense, I did figure it out before getting out of college!)

In another example, May arguably gets it wrong himself:

You’re playing a video game that gives you a choice: Fight the alien superwarrior or three human soldiers in a row. The game informs you that your probability of defeating the alien superwarrior is 1 in 7. The probability of defeating a human soldier is 1 in 2. What do you do? Most people would fight the human soldiers. It seems to make intuitive sense. The odds seem to be in your favor. But they’re not. Your probability of winning three battles in a row would be 1/2 X 1/2 X 1/2, or 1/8. You have a better shot at beating the alien superwarrior. It’s a simple problem of probability. [Matthew E. May, The Elegant Solution, page 158]

Here’s the thing: If the chance of beating each human soldier is purely random and independent of beating any other human soldier, then the probability of beating three in a row is indeed 1/8. But if there’s any skill involved in the game, then the chances are not independent. When I’m playing a video game, I typically will grow through levels of skill where certain kinds of opponents become easy to defeat. When I start playing the game, maybe I almost never defeat “human soldiers”, but after some time playing I get to a level of skill where I almost always defeat them. So if I can defeat the first soldier, I will likely defeat all three; if the probability of defeating the first soldier is 1/2, the probably of defeating all three is arguably nearly 1/2 as well, which is clearly better than the 1/7 chance of beating the “alien super-warrior”.

I was reminded of this chapter from The Elegant Solution when I saw a flurry of blog posts on a similar problem recently: See Jeff Atwood’s question, his own answer, Paul Buchheit’s response, and the discussion on Hacker News… The problem is just as simple as the Monty Hall problem, and the response it just as heated. Paul Buchheit points out that the simple English statement of the problem can be parsed two different ways, which result in two completely different answers (both of which I verified myself by Monte Carlo simulation!). In another realm, Semyon Dukach suggests that the current financial crisis is due precisely to the difficulty of numerical intuition.

The examples of success with numbers in The Elegant Solution all come down to identifying very simple metrics underlying the businesses in question. Jim Collins gives similar counsel in Good To Great. But the problems mentioned here show that even given a very simple mathematical statement, you can still get into lots of trouble. In The Goal, Eli Goldratt gives some fascinating advice about how to orient your business toward the true goal of business (I won’t spoil the plot by saying what “the goal” is; it really is worth reading the book). But here again, the whole story line of The Goal shows just how unintuitive those principles are, and how long and painful (though valuable) a process it can be to learn them through real-world experience.

So what do we do? A couple pieces advice I’ve picked up over the years might help:

1. Measure, don’t guess. Specify the problem precisely enough to implement a Monte Carlo simulation, and then run it several times. This is the only way I’ve been able to convince myself of the answers to tricky problems like the Monty Hall problem. The more I think about the problem, the less sure I get about the answer. This principle also applies to optimization of code: Start with the simplest thing that could possibly work, and then measure how it performs under realistic conditions. The thing that needs optimization is often not what I guessed it might be.

2. Measure “up”. Mary and Tom Poppendieck talk about this principle in detail in their Lean Software Development books (the phrase is mentioned on page 40 of Implementing Lean Software Development). It is an antidote to local optimization: The more you focus on localized metrics, the more confused you’ll get. As they write, “The solution is to … raise the measurement one level and decrease the number of measurements. Find a higher-level measurement that will drive the right results for the lower-level metrics and establish a basis for making trade-offs.”

When I’m struggling, I try to see if there’s any possible way to take smaller steps, or to work closer together with others on the project.

Take small steps to get to done early and often

In software development we work at many different scales, from high-level business goals to specific user actions all the way down to lines of code. Agile methods emphasize getting to done early and often at all of these levels:

We take one business goal at a time and make sure it’s met before tackling the next one (exemplified by the “sprint goals” of Scrum and “minimum marketable features” as described in the book Software by Numbers).

We implement one user-level feature in an end-to-end slice (from persistence to user interface) before going on to the next (the “user stories” of Extreme Programming).

We write and test a few lines of code at a time (the “test-driven development” of Extreme Programming).

At every level, the point is to get one thing “really done” before starting on the next thing.

How do we know howsmall our steps should be? Experience and intuition are a starting point. If things are flowing along nicely it’s tempting to take slightly larger steps. But as soon as troubles start cropping up, a good bet is to take smaller steps: Breaking work down into smaller pieces while maintaining, or even strengthening, our definition of “done” helps to get things done cleanly and consistently.

On a recent project, the original plan was over-ambitious, allowing only a few months for what was turning out to be a couple years of work with the available resources. So we got together with the project champion and identified some specific short-term business goals: First, we needed to make sure that the first big client, on the verge of signing the contract, would get what they were promised. Second, we wanted to make a marketing announcement about the general availability of the system.

It turned out that meeting both of these goals required much less than the total vision for the software. In that case, they were both driven more by the content that we were going to be publishing through the system than by the end-user software functionality. We went through all the user stories and put a big red dot on each story card that was essential for the first goal, and a red circle on each story card that was essential for the second goal. Taking our velocity into account, it turned out that the “red dots” fit comfortably into the time available before launching with the client, and it was quite satisfying to have such clarity about what to focus on and what not to focus on for those few iterations.

Work close together to enhance communication

On every software project, we need to communicate with the customer about what to do and what’s been done, and we need to communicate with our teammates about how to do what needs to be done. Agile methods suggest frequent and lightweight communication by finding ways to work closer together, rather than staying far apart and communicating through impersonal means.

Barriers to communication can be both physical and process-related. Physical barriers can be anything from cubical walls and long office hallways to separations of hundreds or thousands of miles. Being separated by thousands of miles leads to both time zone challenges and cultural differences.

Extreme Programming (XP) recommends that the customer and the entire team work together in the same physical location. When this seems impractical for a project, it is often tempting to abandon XP or Agile methods altogether. It is also possible to miss the point, by working nearby but avoiding contact with each other. However, keeping in mind the advice to work closer together, we can be creative: We can do video conferencing, we can pair program remotely using screen-sharing tools, and we can at least visit each other from time to time, especially at the start of the project. Meeting each other face-to-face is an extremely powerful way to cut through cultural differences and interpersonal misunderstandings.

One example of a process-related barrier is individual task assignments, since if I’m on the hook to finish a task myself I will be less inclined to take a break to help you. Agile methods usually recommend whole-team estimating and planning, so that the whole team is interested in working close together to get all the tasks done. If one person is stuck, the whole team feels stuck and is quick to rally around the issue.

Regarding another process-related barrier, Agile methods have a reputation for throwing away documentation. But documentation really isn’t the point.

I was once in a meeting with a product manager talking about the requirements documents he liked to write, where at one point he leaned in close, pointed toward the QA group (not represented in the meeting) and whispered, “Those guys don’t even read the spec!” He clearly thought that was the QA group’s problem, because his spec document was clear and complete — he had put a lot of effort into it.

Agile methods do not recommend, “Throw away the requirements documentation.” Agile methods do recommend, “If the requirements documentation is causing pain, find ways to work closer together with the intended audience.” If the QA group isn’t reading the spec, maybe that’s because it’s not helping them do their job. Maybe it’s just not communicating effectively.