Kyle Sletten

On Code Reuse

04 Nov 2013

One of the first things we learn as developers is that if you're going to write
any amount of code you're going to have to have a strategy of code reuse.
Everyone is familiar with the curse of verbose, repetitive code and its effects
on maintainability. We've all run into problems where we didn't know if all of
the places X was done were correctly updated and that can lead to subtle and
hard-to-squash bugs. Unfortunately, while we all have learned that repetition
in code is a bad thing, I think we've failed to find an algorithm to removing
duplicate code.

I'm beginning to believe that "best practices" are a tool for people who can't
be bothered to do a basic cost/benefit analysis for themselves. I'm not saying
I'm smarter than the people who coin popular adages, I would just rather
understand the 'why' than the 'how' on issues where I'm supposed to follow a
convention.

One of the things that is almost never discussed is the idea that code-reuse
may not be our ultimate goal. Everybody touts how their style of programming
reduces redundant code by X%, but is that even a good thing? When we're so
fired up to exterminate code duplication from our code base, we need to realize
that like everything else, duplicate code may have some benefits to go along
with the long list of costs.

Costs of Duplicate Code

Everybody knows the cost of duplicate code, right? More code means more space
for bugs to hide in. Duplicate code may become out-of-sync with its clones.
Everybody has to reinvent the wheel. From a maintainability perspective, it's
very clear that we ought to at least seriously consider removing as much code
duplication from our product as possible.

To be perfectly clear, I'm not advocating copypasta, I completely agree that
these are valid concerns when code gets duplicated and I think that they should
be considered and weighed against other concerns when a decision is made
whether or not to extract a method/function/macro.

Benefits of Duplicate Code

What could the benefits of duplicate code be? To start with, I'd like to call
on a word that Rich Hickey taught me (well, not me personally; more like
everyone who watches his talks on InfoQ): complecting. If you're too lazy to
look at that definition, the gist is this: complecting is weaving two things
together. Now every time you see an opportunity to re-factor you have to ask
yourself: should these things be tied together?

Complexity

One problem that I often see is people jamming multiple paths of execution into
a single procedure in order to reuse the existing code for a particular
resource such as the database or what have you. Over time, new parameters are
introduced to allow old calls to the procedure to continue as expected while
adding new features to invocations going forward.

One reason this is a problem is that the complexity in the procedure starts to
skyrocket. If you look at the available code paths as opposed to the few code
paths actually taken, you start to see plenty of dead ends and undefined
behavior lurking in the unused code paths. How can you tell if those paths are
ever taken? Well, if your lucky, your function takes mostly primitives and you
can look at all of the call-sites. If your less likely, your code will contain
some well-worn objects as parameters and it will be a pain to scan through
every path where that object is created to see if it is passed into your
functions. This will range between hard and impossible depending on the size of
your code base and how well 'reused' your code is.

Why is complexity a problem? Well, every time you add another code path to a
procedure, you need to carefully study the effects that the new path might have
on old paths. Hopefully you can satisfy yourself quickly that there are no
adverse interactions, but you may end up getting lost in all of the branches
and sub-routines. If you're trying to verify you didn't break anything, you had
better hope you have access to a pretty representative test suite or some
thorough testers. As C.A.R. Hoare put it: "There are two ways of constructing a
software design: One way is to make it so simple that there are obviously no
deficiencies, and the other way is to make it so complicated that there are no
obvious deficiencies." Unfortunately the added complexity tends to push us
toward the latter.

Scope Creep

One thing that I've come to realize is that it is a pain to trigger regression
testing. All of the worst things come out when you make changes to code that is
already working and would not otherwise be a part of your project.
Unfortunately, religiously re-factored code tends to tie everything together in
one big knot, causing a project-wide regression test when someone makes a
change. The best way to avoid it is to only re-factor code out of parts of the
project that have to be tested in tandem no matter what changes were made to
them.

The code that is the hardest to measure by far is library code. In some
circumstances, someone has implemented a particular algorithm that is needed in
other places. My take on this is that a function should represent on particular
algorithm for performing a task and while it's not off-limits to ever touch it,
one should be very mindful when working on it to ensure that the changes being
made are in the best interest of the whole project. Unless you know that you
should be updating every occurrence of the specified function, I would prefer
greatly to create a new function and migrate all of the callers one-by-one as
needed. Of course, this can't be a general rule because as I stated above, I'm
decidedly against general rules.

Overall, I hope that you will have more to think about when you are making the
decision to re-factor code and that you'll take care not to create a burden for
future maintainers (or even your future self).