Refactoring Without Starting Over

Imagine this: for some odd reason, you end up in a situation where you have a big pile of spaghetti code (also known as legacy code) and you have a feature request to radically extend the functionality of the big pile. How do you go about this?

Scope of the problem

Imagine this: for some odd reason, you end up in a situation where you have a big pile of spaghetti code (also known as legacy code) and you have a feature request to radically extend the functionality of the big pile. How do you go about this? Obviously, there are at least the three following approaches to the problem:

Deny the request.

Hack the legacy code to cope with the request.

Refactor the legacy code to meet new coding standards.

Cases 1 and 2 have an immediate short term effect. Case 1 - no money is earned and the customer might be lost for good. Case 2 - the money is safe; however, further development and maintenance will (over time) be awfully painful. Case 3 is the ideal solution, and would be the choice of most developers and the worst case scenario of most CEOs: it has a radical short term economical impact, and could drag the development process on for ages.

However, there is also a fourth option:

Turn the legacy "interface" into a combination of Facade and Adapter patterns.

This option will leave most of the legacy code base intact, while only introducing a light weight abstraction layer. This may sound fuzzy at the moment, but in the remainder of this article, I will give an introduction to how to refactor legacy code without rewriting the code.

Going back to case 2, where we would refactor the code base to suit the newly found requirements, changes are introduced at the location of the new feature code and also at the location of all the calling applications. This is a cumbersome solution, and is likely to introduce bugs in portions of the program that used to work flawlessly. However, the method described in this article will try to describe a method of implementing new features without tampering with the legacy interface. This means that all calling applications remain unmodified, but with access to the new feature code hidden by a facade.

Design Patterns

First, what is a facade pattern? Googling on Wikipedia, we will find the following definition:

In computer programming, a facade is an object that provides a simplified interface to a larger body of code, such as a class library.

Make a software library easier to use and understand, since the facade has convenient methods for common tasks.

Make code that uses the library more readable, for the same reason.

Reduce dependencies of outside code on the inner workings of a library, since most code uses the facade, thus allowing more flexibility in developing the system.

Wrap a poorly designed collection of APIs with a single well-designed API.

Second, what is the Adapter pattern? Again, googling Wikipedia will tell us:

In computer programming, the Adapter design pattern (sometimes referred to as the wrapper pattern, or simply a wrapper) 'adapts' one interface for a class into one that a client expects. An adapter allows classes to work together that normally could not because of incompatible interfaces, by wrapping its own interface around that of an already existing class.

These two structural patterns are per definition generic, and can be applied to any code developed. At scope level, it is most suitable to deploy such patterns while designing and implementing a component, and not while adding features to existing components. Many programmers have to deal with source code written in an era before the Design Patterns, and thus no patterns have been intentionally applied. Introducing or identifying such patterns will often require a costly rewrite or a major refactoring of the code base. In the following section, we will discuss possible ways of refactoring at minimum cost.

These two structural patterns are per definition generic and can be applied to any code developed. At scope level it is most suitable to deploy such patterns while designing and implementing a component and not while adding features to existing components. Many programmers have to deal with source code written in an era before the design patterns and thus no patterns have been intentionally applied. Introducing or identifying such patterns will often require a costly rewrite or a major refactorization of the code base. In the following section, we will discuss possible ways of refactoring at minimum cost.

Introducing the patterns

The first step in the process of preparing the legacy code base for the new component feature is to identify all feasible entry points. Looking at a legacy code base, there are two basic constructs of how the code is interacting:

1. Multiple clients one entry point

The first step in the process of preparing the legacy code base for the new component feature is to identify all feasible entry points. Looking at a legacy code base, there are two basic constructs of how the code is interacting:

Figure 1: Multiple clients, one entry point, demonstrates the simplest scenario, a code base with a (more or less) well defined interface. The interface may consist of a range of free functions, or be centralized in a common class. In both cases, the code structure already holds a derivate of the Facade pattern and is ready for modifications.

2. Multiple clients and multiple entry points

Figure 2: Multiple clients, and multiple entry points, shows how a range of clients may interact with a shared component through many entry points. This is the difficult scenario, and the following tasks must be executed:

Determine the entry points (could be done programmatically by the linker, i.e., remove the legacy code objects from the linker options).

Decide between the following solutions:

Decide if the entry points in the legacy code are close enough to be moved to a common location (perhaps even a common class).

If the gap between the entry points is too large, determine the possible side-effects of modifying the underlying code for the entry points, and isolate the separate interface.

A primitive example of case 2.a could be a set of free functions for string operations, where the implementation is spread across the code base. Moving the interface and implementation to a common location introduces a common interface available to the entry points. However, it also introduces the possibility to modify the underlying code in a central place while keeping the interface intact.

An example of case 2.b could be a set of free functions for string operations and a set of free functions for database access. These are logically too wide apart, and would ideally be split up in two separate interfaces.

Having identified the entry points and interfaces to the legacy code, we should reconsider the interface and possibly update it. It makes perfect sense to introduce incremental "face lifts" in the source code, i.e., refactor once in a while to keep them in sync with their usage. In the example of case 2.b, it may not be possible to separate the two chunks, and thus an adapter might come in handy.

Adding the new feature

Having the legacy code and its interfaces prepared for the new feature, we will now have a look at how the feature could be introduced.

The above figure shows an UML diagram of the expected structure. The Common Interface is the interface entry point introduced in the previous section of this article. To abstract the code beneath this point, we introduce an adapter, interpreting the common interface and handling requests to the underlying implementation. The adapter holds references or instances of the legacy code and the new feature code. The mechanism for alternating between the legacy code and the new feature code is placed in the adapter. It may be necessary to extend the existing data structures to keep information about its origin, i.e., if the value originates from legacy code or new feature code.

The previous code samples illustrate how the alternating adapter could be added to handle legacy code along side with new feature code. Note that the data structure has been updated with an origin variable, and that the function retains its original interface. Here, the function "str_analyze" acts as an adapter, as it translates incoming requests, but also as a facade since it is also in charge of delegating the work.

What if ...

What if not all the legacy code should be updated for the newly added feature? Working with a large legacy code base, we are bound to have many generic functions, i.e., reading contents of file to string or similar common functions.

Figure 4: In the previous section, we introduced an adapter layer to handle incoming requests. Modifying this as shown in the above figure gives the code direct access to the legacy code while still keeping the code open for future implementations. This is illustrated in the code samples below.

Code after:

In closing

To put it short, this article provides a small example of how to extend existing code bases with new and shiny features. Working with commercial code, we are often met with the challenge of implementing a new feature in very old and very messy legacy code bases. The code is most likely written in an era without emphasis on Design Patterns and maintainability. Following the simple guides from this article, it should be possible to seamlessly extend legacy code bases without tampering with existing functionality. Using the method described in this article, we will have to deal with the following issues:

Maintainability: Updating or modifying either the legacy code base or the new feature code is possible without tampering the other.

Testability: Introducing the Adapter and Facade patterns imposes a layer of abstraction, making it possible to test the underlying code with unit tests.

Flexibility: The Facade pattern allows the developer to change the underlying code, infrastructure, etc., without changing the interface.

Share

About the Author

Comments and Discussions

Thanks! You obviously speak from your own experience. Refactoring often is conducted step by step. The Adapter pattern helps to gradually transform existing code without too much disruption for the rest of the application. I'd like to see more articles on refactoring!

Agreed, for the short term. But in my experience, a "planned" rewrite (whatever that means) is in the long term usually more cost effective, for example when the application is intended to be maintaned for several years and possibly become the flagship product of your company. Sure, for a quick patch this technique is good, but how many quick patches before you look back and go, wow, we should have rewritten the thing? Well, there's no good answer, just questions.

Agreed, for the short term. But in my experience, a "planned" rewrite (whatever that means) is in the long term usually more cost effective, for example when the application is intended to be maintaned for several years and possibly become the flagship product of your company. Sure, for a quick patch this technique is good, but how many quick patches before you look back and go, wow, we should have rewritten the thing? Well, there's no good answer, just questions.

I'm very fond of the "rewrite" at flagship products (actually, my situation). Sometimes, I like to build "isolation layers" which at first look may seem useless, but are very useful to do a complete rewrite, part by part.
Refactoring is only good for small parts of the application.

Another thing worth keeping in mind is: is the code really spaghetti, or are they just being lazy about understanding it? If every new developer that comes along wants to rewrite the company's assets, then the company isn't going to get very far. And you'll probably end up with a solution that does less than the original, and hits all the same problems all over again. It doesn't really speak much to progress. If anything I would ask my developers to fully document the 'spaghetti' code before they attempt any work on it, let alone a rewrite, just to ensure they really do understand it and the problems its trying to solve. If they struggle and cannot understand it, then they're not really qualified to rewrite it IMO. Just some food for thought.

I agree that one should rather spend time grasping existing (and functioning) code rather than just replacing it with a fresh and new variant. A point that I did not go into in the article (I found it out of context) is another fact of life: Code standards do change over time. What the initial developers found clever and smart 15 years ago on their blazing mainframe hardware may very well be awkward and inefficient on todays modern hardware. An example of this "time gap" is a project where legacy code is written in low level procedural C whereas the modern code may be written in object oriented C++.

In my article I try to point out a way of NOT mangling with existing functionality while still allowing todays youth to experiment with the trends of today (even though it may be outdated in a couple of years and thus be subject to decoupling later on).

Being a developer I love to get the chance of writing sparkling new code, but I also like to keep my boss and his spread sheets happy. So as a compromise I get to write sparkling new code detaching the old code perceived as spaghetti code that has proven its worth by serving the world for a mere 15 years period. In addition, I get to implement the new feature following the latest code standards. Also the decoupling introduces two new possibilites: 1) next time a customer buys a new feature I have already introduced an adapter that makes it possible to delegate between radically different features and 2) over time I might be able to persuade my boss to deprecate the legacy feature.

Other projects with smaller code bases and shorter life cycles may benefit from rewriting the code rather than decoupling it.

Phew,

What a long post. Perhaps I should merge the key points with my article

I disagree that this leads to more spaghetti. What this solution is targeting is the need to add functionality to the existing application. The new functionality can and should be designed from the ground up, and should treat the existing app as a service provider. However, working directly with the existing spaghetti leads to more coupling between the old and the new, which is generally undesirable. The Facade and Adapter patterns facilitate decoupling.
This works equally well when it is necessary to refactor a portion of the existing code base, as opposed to adding new code: wrap what needs to be refactored in a facade, and update the existing code to use the facade. The code requiring change can now be changed with minimum impact to the rest of the app.
Of course, changing existing code always has the potential for causing new bugs, but by wrapping the code in a facade, and writing unit tests for this facade, the introduction of new bugs can be minimized.

Without using this approach the very idea of adding a change to an existing app is far too daunting. Using this approach you can minimize future maintenance headaches.
If, as we go into the future, the only way that we can safely add new functionality to existing applications is to perform complete re-writes, then we are lost as we'll spend all our time reinventing the wheel. An evolutionary approach, as described here, is our best bet.