Oct 20, 2012

When an organization embarks on the development of a new software product, it has two questions to answer:

What base technologies should the new product use/depend on?

How the new product will grow over time?

The first question is religious which makes it easy to answer – an organization is either a “Microsoft shop”, or an “OpenSource shop”, or “Some-other-religion shop”. Within that shop, developers may have to choose between managed or native in case that is not part of the religion, and that’s it for question #1.

Question #2 is a business question. In all fairness managers and architects typically spend the necessary time and effort to come up with a solid vision for future growth. Only after upper management is satisfied with that vision, will it let development commence.

So far so good – that is the correct process. Um, there must be a caveat, right? Right. The caveat is that the base technologies the organization has religiously chosen also have similar growth plans. That means there is a chance that a base technology may offer the same functionality as this product. That chance is low by default, but it can jump significantly if developers try to be...smart. This is religion, remember? Religion demands faith, not smartness. Every platform’s goal is to make common use cases simple. As long as application developers stick to such simple patterns, their product will leverage the platform’s enhancements. However, often times the platform is lacking certain features, and developers plug those wholes rather than wait for the platform to improve.

A very typical example of such a fix is implementing a data cache to save hard disk hits, or network roundtrips. While such a fix works perfectly short term, it is a time bomb in the long run, because performance is a fundamental problem, and sooner or later the platform will address it. Furthermore different technologies develop at different paces. For instance, 10-15 years ago the main problems were slow network communications and slow disk access. So it was tempting to bake a data cache right into the frontend box. Today, however, the main problem is to scale out the frontends in order to serve the necessary number of hits, which further trails the requirement of driving the cost of those frontends low. Now a data cache on each frontend box would consume unnecessary memory as well as CPU cycles for cache entry lookups while a single dedicated cache box per rack or per farm would be cheaper and more effective .

That is how a big win can expire over time. It becomes cancer – it is an extra code that has both a development/maintenance cost as well as production cost. And it can only get worse, because it falls into an area where the platform is obviously making improvements.

The only treatment I can think of is to surgically remove the tumorous code, i.e. to refactor. Unfortunately, refactoring has been over-promoted by the Agile community to the point where saying the R-word in front of management is politically incorrect which makes the disease really difficult to cure once it has developed.

That leads to the question: Is this disease preventable? Theoretically speaking – yes. Since time bombs are explicitly checked in by developers, if developers don’t do it, there won’t be a problem, right? Well, that’s easier said than done. Whenever a performance benchmark misses its target, a hero will jump in to fix it, for which he/she will be properly rewarded. What makes such a time bomb difficult to remove is that one it has been proclaimed successful, no one will be willing to admit that its value has expired (sometimes even too quickly.) In general, we don’t understand that success in the software industry is something very temporary.

There is a way, however, to be a hero now and to remain a hero even when a heroic fix turns bad. What’s even better is that you can be a hero (twice!) without infecting your product. When you do your original heroic fix, start with asking yourself: Is this part of my product’s value, or am I compensating for a platform’s deficiency? If you end up on the latter part, then code it in a way that makes it easily removable – avoid introducing public API at any cost; make the source file easily removable from the build; don’t let other code take dependencies on your fix that you haven’t envisioned; and most importantly - encode a switch that bypasses the new functionality.

If you stop here, there will still be a problem when your fix becomes obsolete, but then you’ll be able to either flip the switch or remove the whole thing. Either way you’ll be a hero for a second time on the same problem!

If you further make that switch a configuration setting so that flipping it doesn’t require a product change, a problem with your heroic fix may never occur. The only down side is you’ll miss being a hero for a second time on the same problem.

In conclusion, I continue to promote “timeless” development among developers as well as necessary refactoring among management. Hopefully this article has made my points clear.