I've held off on writing this particular post for a while, since it's somewhat controversial, but what the heck, you only live once :).

As Fred Brooks pointed out in his seminal "The Mythical Man Month" (a title that EVERY engineer should have on their shelf), one of the unavoidable aspects of software engineering is the bug curve. As Brooks explained it, the bug count associated with every software project has a rather distinctive curve.

At the start of the project, the number of bugs is quite small, as developers write new code. Since the new code has significant numbers of bugs, the bug count associated with that new code increases in proportion to the number of lines of new code written.

Over the course of time, the number of bugs found in the code increases dramatically as test starts finding the bugs in the new code. Eventually, development finally finishes writing new code and starts to address the bugs that were introduced with the new code, so the rate of bug increase starts to diminish. At some point, development finally catches up with the bug backlog and starts to overtake with the new bugs being discovered by test.

And that's when the drive to ZBB (Zero Bug Bounce) finally begins, indicating that the project is finally on the track to completion.

Over time, various managers have looked at this bug trend and realized that they can stop this trend of developers introducing new bugs, test finding them, and so on by simply mandating that developers can't have any active bugs in the database before writing new code. The theory is that if the developers have to address their bugs early, there won't be a bug backlog to manage, and thus there will be a significant reduction in the time to ZBB. This means that the overall development time for the project will be reduced, which means that the cost of development is lower, the time-to-market is sooner, and everyone is happy.

On paper, this looks REALLY good. Developers can't start working on new features until they have addressed all the bugs in the previous features, this means that they won't have to worry about a huge backlog of bugs when they start on the new feature. Forcing development to deal with their bugs earlier means that they'll have a stronger incentive to not introduce as many bugs (since outstanding bugs keep the developers from the "fun stuff" - writing new code).

The problem is that Zero Defects as often promoted doesn't work in practice, especially on large projects. Fundamentally, the problem is that forcing developers to keep their bug slate clean means that developers can be perpetually prevented from writing new code.

This is especially true if you're dealing with a component with a significant history - some components have intractable bugs that would require significant rewrites of the code to resolve, but the rewrite that is necessary to fix the bug would potentially destabilize hundreds (or thousands) of applications. This means that the fix may be worse than the actual bug. Now those bugs are very real bugs, so they shouldn't be ignored (and thus the bugs shouldn't be resolved "won't fix"), on the other hand, it's not clear that these bugs should stop new development - after all, some of them may have been in the component for two or three releases already.

Now this isn't to say that they shouldn't be fixed eventually. They absolutely should be, but there are always trade-offs that have to be made. Chris Pratley (who need to blog more :)), over on the Office team has a wonderful blog post about some of the reasons that Microsoft decides not to take bug fixes, before anyone criticizes my logic in the previous paragraphs ("But of course you need to fix all the bugs in the product, stupid!"), they should read his post.

But the thing is that these bugs prevent new development from proceeding. They're real bugs, but they're not likely to be fixed, and it may take months to determine the correct fix (which often turns them into new work items). The other problem that shows up in older code-bases is bug churn. For some very old code-bases, especially the ones written in the 1980's, there is a constant non zero incoming bug rate. They show up at a rate of one or two a month, which is enough to keep a developer from ever starting work on new features.

In practice, teams that attempt to use ZD as a consistent methodology to reduce development time on large scale projects have invariably found that it doesn't reduce the overall development time.

If, on the other hand, you apply some rational modifications of ZD, you can use the ZD concepts to maintain consistent code quality throughout your project. For instance, instead of mandating an absolute zero defects across developers, set criteria about the bugs that must be fixed. For example, bugs that are older than 2 years may be lower priority than new bugs, security bugs (or potential security bugs) are higher priority than other bugs, etc.

Also, instead of setting an absolute "zero" defects policy, instead, set a limit to the number of bugs that each developer can have outstanding - a this adds some flexibility in dealing with the legacy bugs that really must be fixed, but shouldn't preclude new development. Also, as this Gamasutra article indicates, it's often useful to have a "ZD push" where the entire team drives to get their bug backlog down.

In general, I personally think that ZD has a lot in common with a large part of the XP methodology - it works fine in small test groups, but doesn't scale to large projects. On the other hand, I think that at least one aspect of XP - TDD has the potential to completely revolutionize the way that mainstream software engineering is done. And I'll write more about that later.

The other way to manage this Zero Defect idea is to do continuous bug triage during development, but at a lower intensity than at milestones. The main stakeholders (producer + lead engineer in Sierra parlance; Lead SDE, Lead SDET/STE and possibly Lead PM in Microsoft parlance) get together every week at a minimum and go through the bug list. Bugs are then thrown back to all developers on a priority basis.

Combine this with a simple statement - that is, you tell your developers to "make sure you check in high quality code, and debug it now rather than later", and you've got a pretty workable system which has worked exceptionally well for my projects to date.

You've still got a double-encoding bug at the end there. Somewhere along the line, it's being encoded to UTF-8 but interpreted as Windows-1252, then the misinterpreted data is reencoded as UTF-8.

MSDN have an ongoing problem with this which I think I've reported at least three times. Character encodings are a real pain, simply because they're often overlooked when a protocol is being designed - I don't think the Blogger API has any way to specify the encoding of a document.

I've got into the habit of replacing all characters not in the U+0000 to U+007F range with their HTML/XML entities, e.g. &#146; for the right-single-quote character.

I'm currently experiencing a little of the pain involved in the test-after-integration mindset - a number of the fixes for the bugs in our three-man, four-month project have a fairly large impact. This approach is just about OK for one-man, one-month projects which is more of the norm for us - typically there isn't enough to test that the tester's been able to test any features before the developer's finished all of them. Lack of resource is another issue - no dedicated testers, so testing is done by either one of the other developers (not working on the project) or by our consultant, who's often out trying to get new business. The trials of working for a small ISV!

I'm beginning to think it's word that's screwing up and not newsgator/.text, but I'm not sure what's going on. I've just changed the preferred encoding for word to utf8 from windows 1252 to see if it helps in the future.

It seems to me that encoding problems are becoming more prevalent; I've lost count of the number of sites I've been to in recent months that have spurious punctuation...

On the topic of zero defects, although I've never had to work that way I don't have to try to be fairly certain it's unlikely to help at all. Over the years I've inherited a number of things that have had known minor bugs, many of which were incurable without rewriting huge chunks of it, and telling a customer that they'll have to wait for a new feature because of some obscure and insignificant hiccup isn't going to wash. (Is this what's happened to Longhorn, by any chance? :)

I'm reminded of this song:

99 known bugs in the code,
99 bugs in the code,
Take one out,
Test it again,
99 known bugs in the code...

On the other hand, the number of sites with Ö written by clueless mac developers has been steadily decreasing over the years, as PCs become editing platform of choice for most. Hopefully the recent rise in scary multi-character punctuation will sort itself out sooner than that did.

Maybe the problem with this idea is that every defect is called a bug. Call minor or long-term things issues, mid-level problems bugs, and security or functional problems criticals. Or just accept that Zero Defect is another way of saying provably perfect - it isn't happening without a massive investment in time and money, and every change will disrupt it anyway.

Jeff Parker

7 Oct 2004 7:19 AM

You know Larry, I think the simple point or your article and correct me if I am wrong, is this.

It doesn't matter what methodology you use as long as you use one.

I guess that was my take on it. Which I agree with. I have tried to do XP programming but generally fall back to a more Agile aproach, it suits what I do much better. However I do mix them up some of the practices Or features of XP programming I think are great practical and usefull, other fratures I do not. It more depends I think on the actual project itself and part of planning the project to determine which aproach to use.

Tim Smith

7 Oct 2004 7:20 AM

In more simplistic terms, I would equate a ZD policy with forcing a training situation in a multithreaded environment.

"Um boss, it will take 3 days to get the report back on the new build to help diagnose the problem. I'm going to be playing games until it comes back."

Jeff,
Actually that's not my point. My point was that there are a lot of software methodologies out there that seem really good on paper but don't scale to the size of real world software engineering projects. XP and ZD are two examples of this.

On the other hand, if you cherrypick the methodologies, you can have some positive results. This doesn't work with some methodologies (from what I've read, XP is a methodology that requires that all the tenets of the methodology be applied before it works).

The bottom line is that the @i(implementation) of the methodology is what's important - and the methodology has to be flexable enough to allow for modifications.

Ahh ok, gotcha, but now why did you think this post would be controversial I just see good common sense advice you normally give.

I think you should finally rename your blog here. Larry Osterman's WebLog is kind of boring for what you do. You know so many guys like Raymond and so on have a catchy name for their blogs like The Old New Thing.

Jeff,
The reason it's controversial is that there are a number of teams actively using strict ZD, and it continues to be a very popular methodology.

I've not yet come up with a better name, although I did add a subtitle this morning (from an email discussion on MSDN categorization - they suggested that for Raymond and I that MSDN add "Old Fogey <stuff>" as a category). But I'm at a loss for a better name, to be honest.

The Wife

7 Oct 2004 11:54 AM

This is too good to resist: Larry needs help with naming his blog? And yes, this is completely off topic.

The stereotypical "Old Fart" and "Old Fogey" are boring, even if potentially true.

I like the "Larry's Logic" idea, but would be concerned that it could be derived into "Larry's Lousy Logic" for those of us alliterively trained. Hmm, now that I'm on a roll, I could see it degrading into "Lousy Larry's Limited Logic" or upgrading to "Larry's Logical Life, Ltd" or "Larry's Lifetime Logic Library". Hmm, maybe "Larry's Logic Laboratory"? "Larry's Lagniappe" to honor the gifts each post brings?

Hmmm, now what would be a good memorable name for Larry's Blog? How about "Kitchen Table" since that's where these usually end up being discussed? Of course, with your wealth of knowldge, "Including the Kitchen Sink" or "Larry's Kitchen Sink" works but isn't very catchy. Wait, this is the group that thinks "Old New Thing" is catchy (worth reading but the title doesn't do much for me).

Combining these two ideas, there's "Larry's Leftovers" or "Larry's Lunchtime Logic" or even "Larry's Labor of Love"? Maybe "Gift of the Larri" but that's probably too subtle.

This is harder than I thought it would be. I am sure with the collective power of the blog readers we can come up with a good name. Question is, will you promise to use the name we vote for?

Yes, work is boring today, and I don't want to do my homework, so I thought I'd just stir up trouble.