How to start fixing a legacy system

You join a new team and discover that what you have inherited is a bundle of bugs all holding themselves together, which presents itself to the end-user as a fully working system.

What do you do? Unfortunately I’ve been in this situation more times than I care to remember, so there is a way forward.

Normally when you join a team with a problematic codebase they try and say “What would you like to work on?” as in a project which is away from the central core. My answer to the question is “Give me bugs, I’ll learn the system that way”. It is unglamorous, but what you will learn is the way that people use the system, and also all of the problems in taking a small code change and checking it works and getting it into production. It is not just about understanding the source code, but understanding exactly what the current process with all its idiosyncracies. Once you have more of this context in your head you can then start to make a plan for how to improve it.

You will find that the steps are pretty standard.

Set up continuous integration. The spectrum will be from having none, all the way through to having a working pipeline that is just not running everything. Start with every commit to master then go on to getting pull request builds working.

Continuous integration should build the artifact that is required to run the product. You might also provide a deploy step as well.

Get the current tests running or get a framework ready to run. You’ll probably find there are some tests in the system but because they are not automatically run they don’t work. First get them working then add them as a stage in continuous integration. If there are no tests set up a test framework with a single dummy test and get that running to put a mark down of how to write tests.

Start running code quality. Add this to continuous integration on every run. Initially you will find you have to slacken off the checking and get it passing under less strict conditions as things like code formatting are less important than actual errors. Gradually over time you can then tighten up the rules.

The most important thing I find is if you can’t run your codebase locally you really can’t make any changes safely and have no way of starting to break things down for testing. In order to achieve this you may end up with multiple docker containers with each data dependency and a bunch of processes running. You’ll need to extract your database fixtures, and a myriad of other data, but ultimately ou will end up knowing the actual extent of your system.

Prune your code. Use the delete key liberally, less code is easier to manage. You’ll be amazed when you pull on some threads and the entire thing unravels and you end up with a huge percentage of the code gone.

To get to this stage you are probably looking at 1 to 3 months work depending on how bad and how complex it is. Each step allows you to get more sure about the process (and the order will depend on what is easiest to deliver).

Then you also have to actually make code changes, this is the whole point of this work. When adding a feature or fixing a bug make sure you add tests to that stage you did earlier, this may not be easy, so you have to have a few layers of testing to accomplish this.

Running locally. If you can run your system locally you can manually perform the steps to check it works. This is obviously slow an inefficient, but you are also in a position you can attach debuggers and inspect how the code is actually working.

Integration testing. This is the next layer, automate the steps you did manually when running it locally. This means you still require all the extra data dependencies like databases, but you can replicate the behaviour a lot more consistently than by hand.

Unit tests. Web frameworks have unit testing framework, and there are many available. As you understand the integration tests better you can work out how to mock out the data dependencies and run basically the same tests without requiring all the data dependencies. This makes them faster and more dependable, and also easier to automate.

Pure unit tests. When you have pushed through all those layers you can even take that legacy code and write reasonable pure unit tests where functionality at a low-level can be tested in the way they show you in books.

Developers tend to have various beliefs about which are most effective, but having all these options available gives you a path to take what is hard to manage code that nobody really understands to easily testable code which is easy to add to. Really, the best way to approach it is to be continually improving the code quality of your tests and trying to move them to a more stable, reproducible state.

Remember when the code is simpler and easier to read and understand, you can actually work out what the next evolutions for the codebase are and how to architect your way there.

This is a fairly short overview of some of the steps I typically take when starting on a legacy system that needs some TLC. It is not a quick fix but a long grind to get to a better, more stable place.