Monday, February 09, 2009

Design Flaws, Hernias, and Anemic Quality

Runtime defects are relatively easy to find. They make undesirable things happen that you can see with your own eyes. Some defects are certainly harder to observe than others. They might show up only in certain uncommon circumstances, or you might need to do some search and discovery with some special tools and instrumentation.

Relative to finding design flaws, the effort required to observe runtime defects is often a walk in the park. And while we put a lot of effort into finding and resolving runtime defects, design flaws can drain more money and value from businesses than the runtime defects that consume the better part of quality assurance efforts and attention. Not to downplay the risks that runtime defects pose to business, but a business that isn't also mitigating the considerable risks posed by design flaws is ensuring that the erosion of the value of software assets will progress exponentially, signaled by sharp, unexpected drops in productivity.

Design flaws are unseen because they manifest specifically as productivity problems and we tend to not track productivity in software development work in meaningful ways, and we rarely reconcile the suboptimal productivity of businesses in general with their business software.

Runtime defects corrupt data, crash applications, cause outages, enable security breaches, process business transactions incorrectly, and lead business people to the wrong conclusions and decisions. Defects are “in your face”. So are design flaws, but if you don't know to be vigilant toward them, you likely won't pay them the decisive attention that they deserve.

Imagine a design flaw as a herniated disk in your spine. The hernia pushes out on the surrounding muscles, bones, and nerves in a way that is not natural to them. This kind of misalignment not only causes problems with it's own structure, but also manifests in structural problems in adjacent systems as well.

For example, a back problem can cause changes in posture and movement to compensate for weakness and pain, changing the way that adjacent and supporting structures and systems align and operate. A spinal problem might manifest as a knee problem. The secondary problem might not only be indicative of the original problem, it might also deteriorate to the point of being as bad or worse than the original problem.

The insidious effect of secondary effects is that solving the original problem may not resolve the secondary problem. The secondary problem might be an indirect result of the original problem, and may continue to deteriorate even once the original problem is resolved. This happens when new habits are formed when learning to compensate for the original flaw.

Imagine a graphical representation of the goodness of a good software design as a series of adjacent alignments. They don't compromise each other's structure or operation.

A design flaw in this system would appear as a hernia - a bulging misalignment in the midst of the surrounding order.

Design flaws often go unnoticed. As work progresses and a system grows, new features and code that are physically or functionally adjacent to a flaw are inevitably shaped to it.

Often this isn’t done purposefully. The shape of the code that we build upon determines the shape of the code that we’re building. Adjacent code will either export order or disorder depending on whether and how much it is flawed. If it exports disorder, then it spreads productivity problems as well.

The most obvious flaws are known to us by their "workarounds". But flaws with workarounds are just those few very obvious flaws. Most flaws don't manifest to our awareness right away, if ever. We might have a spark of insight one day that illuminates the flaw, or we might have an opportunity to work with someone who wields some non-trivial design flaw mitigation skills who recognizes and points out the disorder.

When we finally become aware of a design flaw and if we have the resources to fix it, we have to be careful of what happens to design when we allow ourselves to become overly-fixated on fixing just that part of the problem that we’re looking at.

The goal is to correct the design and to restore productivity, but often we leave behind the secondary effects, either because we haven't become aware of them yet or because of an excessively-narrow focus on the original problem.

This is how we end up with those areas of our systems that aren't right and are difficult to work with but when you ask someone why the design is the way it is they respond with something like, "Dunno. It's always been that way. We don't touch that code. There must be a reason it's like that or else it wouldn't be like that. So we don't change it. It's just one of those things. It's the way software is. You know?"

I know that software often ends up like this, and I also know that it doesn't have to end up like this, and I know how to avoid turning software into these messes, and I know that the ways and means can be taught and learned. Perhaps most importantly, I know why we shouldn't let software degrade into these unknowable messes.

You Can't Afford It

Design flaws are expensive. They cause the same value loss to ripple through our systems that runtime defects do. However, with design flaws it can take you a lot longer to realize that the flaws are there and to locate the wound in the system that value and productivity continue to leak out through.

The longer that design flaws remain in your system, the more secondary flaws are introduced. Those secondary flaws are also the primary causes of their own secondary flaws. This inability to control the density of flaws in a system only gets worse, accelerating exponentially. It’s the principle causes of the need to completely re-write systems that may only be two or three years old.

Having tests for our code only makes it possible to continue to make changes to our systems, but it doesn't say that we can make these changes quickly enough or effectively enough to make these necessary changes feasible and practicable. Code that is awkward to work with has tests that are awkward to work with, doubling the inefficiency of solving the problem.

The simple issue is that you can't afford to have these kinds of problems in your systems. You can't afford design flaws just as you can't afford runtime defects, and frankly, the more design flaws you have, the more likely that you'll have runtime defects, and the more difficult they will be to find and to fix.

We have a rather naive understanding of quality in the software industry and we’re only now starting to understand the physics and economics of software development enough to understand the importance of quality and how to make it happen.

While we continue to have anemic quality that preoccupied with chasing after only a portion of issues in quality, we’ll continue to suffer. Our productivity will be far less than it should be and value will erode much faster than it should. But there are some signs that pockets of the software industry are waking up from the software dark ages and putting quality into play in ways that are much more meaningful than the mere bug-hunting that has dominated software’s concept of “quality” for the longest time.

As we learn to apply the lessons from our own industry as well as from other industries, not only are our abilities to achieve quality improving, but our understanding of the purpose of quality deepens. We are showing some signs of the industrial maturity that suggests that we may yet reach up and reclaim the productivity that we continue to squander to our primordial struggles with software production.

Working with software developers and organizations to help realize the potential of software product development through higher productivity, higher quality, and improved customer experience