Legacy code and unit testing

We all have been in that situation, unless you just started working a year ago or less, the odds of dealing with legacy code are really high in every single industry. And most of the time you will end up like this guy.

But there is one question I always love to clarify.

What is legacy code?

Legacy code is a broad expression that may make you think on old AS/400 computers (which have not died yet) or ancient languages. The reality is very different, legacy code is any piece of code that’s not been modified by you or any other member of the team in the last six months.

I propose you an exercise, open a project you worked on a while ago and let me know if the code looks familiar to you. You could think it was written by somebody else, the underlying logic might not come up to you immediately, even the code style has changed (I hope so!). We get better, we find better ways to do it, our old habits don’t stick (luckily) and we tend to write better code when we learn new patterns and practices.

If the code has been written by other people it could be worse. We don’t solve software problems in the same way, so first we need to adapt to other people’s logic. It is not unusual long lasting projects have been modified by several developers, now you need to swim across totally different concepts and ideas.

The biggest issue when you deal with legacy code is nobody wants to touch it, because it is hard to guarantee you won’t break anything else.

That leads to the second definition I always like to remark.

Legacy code is code without unit testing

I totally agree with this definition. If we don’t have unit testing we cannot guarantee any path of execution, any logic and input/output validations. What we also guarantee is we will finish our code changes and we will try to immediately throw the code over the fence to QA that will spend 2 or 3 days using a huge spreadsheet containing different test cases to probe we didn’t actually break anything.

That costs money, it is inefficient and impact our delivery. The agility of software delivery is heavily compromised. Not only that, let’s imagine the scenario of a any application in production. On a peak season the client finds a bug, it is very critical and it’s impacting sales, or payments or something else. Phones are ringing, managers are being called to urgent meetings and you get the bug fix assigned to solve asap.

First, you panic, code is huge, messy, complicated, there were 20 programmers before you and the logic is complex. After struggling for 2 hours you seem to find the bug, at this moment your boss drank the third coffee in a row and he/she is feeling anxious to get an answer.

You modify the code, cross your fingers, it builds. Run some tests, it seems to be fixed. Now what?

QA….

QA doesn’t want to approve anything that’s not been properly tested. As a developer you cannot guarantee you didn’t break anything. There is nothing to probe it, so the boss goes to QA and ask how fast they can validate the new version. The answer is 2 days, 1 day is they don’t test everything. By this time somebody called an ambulance while your boss lies on the floor. The client will continue losing money for a couple of days if we are lucky to have fixed the bug.

There is the case for unit testing.

Unit testing

Let’s review quickly some principles about unit testing. Our code has only 4 reasons to change:

Adding a feature

Fixing a bug (most of the time)

Improving design (refactoring)

Optimizing resource usage

The first two cases fall under the category of Behavioral Changes. They change how the application behaves. The other two cases must not change behavior, they are Non-Behavioral Changes.

Unit testing must ensure behavior doesn’t change. Let’s look at a visual representation of how it looks to add a feature and a bug-fix to current code.

Despite of my natural tendency to choose ugly colors, I can represent the current codebase as the largest bar and the tiny portions on the right show what is to change the code. Without unit tests we risk the entire project every time a change is introduced. But, what if we have some unit tests, not even a lot? We create something called “test harness”, we just wrap our code with a proof the logic is still valid, at least we know some logic is not broken.

If we had some unit tests we could at least ensure some logic was not affected by our changes. More logic we wrap with tests, more secure we feel every time we make a change. But, to reach that goal we will have to apply some good practices and principles.

Another great side effect of adding unit tests is the fact we will review the logic acquiring domain knowledge and deeper understanding of our application. Also the tests reflect the logic intentions, our test cases will indicate what we are trying to probe in every single case.

Principles

To be effective unit tests must follow the principles below:

Tests MUST run fast

Tests MUST run in isolation

Tests DO NOT depend on the environment

Test ONLY validate behavior

A practical case to determine this is true should be the following. Go to the repository where your code is located, clone it for first time in a clean environment, build the code and run the tests. They MUST run, even if you disconnect from the network, change the computer or don’t have access to any external resource. And they MUST run fast. This is a clear proof our unit tests are applying correct principles.

The last item indicated on the list is crucial to be totally effective. A test only validates behavior, it is very common to find unit tests that create an object and check the properties. That doesn’t probe anything (we are not validating our language can create an object), we only validate behavior which means the code logic.

The isolation refers to dependencies and mostly services or resources. Our tests cannot depend on a database, folder, specific file, a third-party service active somewhere, etc. Our code will mostly always depend on something else but tests must create fake implementations of any external dependency, again, we test our logic not if a database can save a record properly to give an example. We assume the dependency works as we expect for our logic, for example, if our logic depends on a value returned from the database, we assume the database will return a valid value with a fake object that mimic that behavior. At the same time we can execute a negative test case, what happens if the database doesn’t return a valid value or even if an exception is thrown downstream (the logic must manage scenarios with partial operations).

Now… where to start

To be honest this is the simplest question to answer. If we never added unit tests to the project it means we are not very familiar with it, so we have to take a very conservative approach and it’s the low hanging fruit:

Start with methods that calculate, compute or process an input and produce an output without any other dependency.

I’ve never seen a project, legacy or not, where we cannot find this kind of code. It is very simple. I will add an example that is part of code I wrote some years ago which constitute a great example of this:

The code above represents an ideal case to start unit testing legacy code. Based on the input it validates certain values, throws controlled exceptions if those values are not in the expected range and returns an output. The rationale behind this code is very tied to the API provided by Walmart where certain values are in between a range and there is some logic to create default values that are required by the end system.

In fact, this code also gives us the ability of writing positive and negative tests. What happens if I pass correct values, and what happens if I pass incorrect values. In the specific code I wrote there are controlled exceptions, so tests must validate certain parameters will throw exceptions.

When we write unit tests there is a very good naming convention I recommend to use that will immediately show to any code reader what is intended to test, and the unit test writer a good mechanism to write good unit tests:

[System under test]_[Scenario]_[Expected output or behavior]

Using this convention we can immediately recognize what we are testing. Let’s see a simple example:

As you can see, I followed the naming convention, my system under test is SearchParameters method inside of a factory; the scenario indicates I ony pass the search phrase; the expected output is an object with the Query property populated. Looking at this specific example might be confusing, I said before we don’t test properties, only behavior, so why do I test just an object creation? Because there is logic within that method, passing only one parameter is accepted and internally set all the default values passing certain validations. In fact, a second test will validate a a default.

We can continue validating those scenarios. A different example is the negative case, what happens if I enter an invalid input? If we look at the code those cases will throw exceptions, let’s see one example:

Passing an invalid start (-1 in this case) throws an exception. I didn’t want to dig into the syntax of unit tests. I am using the Visual Studio Unit Test framework which is very simple, other frameworks don’t use decorators like the example, but the principles are still the same; we will have somehow the ability of testing a property or determine the code throws an exception.

Summary

Legacy code is tough to modify, it might be the time nobody has updated it, or many people working on it using different styles and practices, or simply it’s too complicated to understand. Adding unit tests serves two purposes; first, you can ensure you don’t break anything after modifying the code; second, it helps you out to understand current functionality when you add tests.

In future posts I will explain more complex techniques regarding to legacy code unit testing. Cases mentioned here are simple, most of the times we will find dependencies that make it hard to test. Luckily, there are several known patterns to deal with those issues.

Published by maxriosflores

Solution Architect for a decade.
I designed, built and implemented software solutions for more than 25 years and every single day more interested on technology. I learned to code in a Texas Instruments with 16kb at 8 years old. I shared this passion with friends coding CZ Spectrums, MSX's and C64's. I worked in computers since my early 17's with super old tools like plain C and Quick Basic. I love math and computers as much as outdoors and family life.
View more posts