Fixing What Ain't Broke (This Old Pony #78)

February 13, 2019

If it ain’t broke, don’t fix it. - old time-y adage

There’s a lot of wisdom in that cliché. Intervening in a working system [when no urgent or important problem presents itself] often results in, well, unexpected disaster. Sometimes it’s because of plain “user error” in making the change, sometimes it’s because it’s not clear how some ugly component actually contributes - right up until it’s removed.

On the other hand, there’s always room for improvement, and it’s by continually making small improvements on a system we can create significant, compound improvements over time.

So which is it?

Well, it depends! And in a brief break from our scheduled series that’s what we’re going to look at today.

First, let’s define our terms.

it ain’t broke: what we mean is both that (a) the system is working as expected and (b) there are no performance issues significant enough to cause complaints

fix it: making changes to the system, in this case changes which happen not to be motivated by any urgent issue or change in business requirements
Now with our terms defined, when is it appropriate to fix a system that ain’t broke?

Here’s what the decision will depend on:

the business importance of the system as is

expected frequency of development updates

risk in introducing changes (can back out easily and throw something away with no downstream consequences, without affecting any customers?)

nature of the changes

available support to validate safe changes

degree of intended performance improvements, including downstream changes
_ More “it depends”, _ I know, I know! But we’re getting there. I’ll use some examples for now to illustrate.

Let’s say you have a business critical system that no one touches but once every year or two. The answer here is definitely “don’t fix it”. The risk profile is all wrong.

Instead let’s say it’s an internal facing dashboard that loads really slowly, maybe 10s. You know the complicated queries could be improved to load the page in a fraction of the time. This is a likely “no”, unless you’re a summer intern. It’s not that this couldn’t be improved, but at the very least if no one is complaining and it’s a complicated system it should be at the bottom of the queue.

Now, similar idea but it’s a customer facing dashboard and it’s loading in 3 seconds. Again you know the complicated queries could be improved to load the page in a fraction of that time. Here I’d say it’s a likely yes, if you have tests. The impact is likely to be much greater, not just because it’s something paying customers are loading as the part of a product, but it’d be expected to be loaded with much greater frequency. The aggregate impact on goodwill through product perception and resource load on your infrastructure makes this look like a win.

Refactoring small parts of a codebase? In general no objection, provided it’s a section you’re actively working on or there’s an obvious end goal to the future sequence of changes.

A system with no or poor automated testing? It depends, but for a Python based app, Django or otherwise, I’d exercise caution even if all you’re doing is refactoring in the most canonical sense of the word[0]. If all you’re doing is “true” refactoring, ensure you’re using good tools at the least.

Lastly, what about replacing literals and repeated data structures with something more, structured, like a validating class? If it’s code that used frequently in the codebase and there’s decent development velocity, go for it. The immediate impact will not be obvious but the future value of extra data protection against errors and even developer typos is a big win. This is especially true the more developers there are working on a codebase.

This came out of a discussion with a client about adding a uniqueness constraint to a Django model class. The relationship is de facto unique and there is a validation at various levels on the front-end of the application to enforce this. There’s no obvious reason why this validation should fail. However failure to validate this would almost certainly result in customer facing server errors later on data retrieval. Regarding data errors, better to catch them up front adding your data (once) than later fetching it (possibly many times). I argued in favor of adding this constraint - it’ll have no noticeable impact on any current process, but it will introduce an important - and missing - layer of safety.

If this is something you have a question about, generally or specifically, or better yet an opinion, hit reply and let me know. I’d be interested to carry this topic on if you are.

Componentizedly yours,
Ben

[0] It’s probably in the book “Refactoring” by Martin Fowler that refactoring is defined as changing code in such a way that does not change it’s behavior. The ultimate goal of most refactoring is making code that interacts with the world in an unchanged way but is far easier to understand, to test, and to make further changes to