How to work with legacy code

What you should avoid while working with legacy code and what strategy choose to succeed.

Introduction

Imagine that you are getting a bunch of legacy code to work with. It has been written by many people, for a long time. You can find there better and worse solutions, different styles. Larger or smaller parts are uncovered by unit tests. You see code smells here and there. Perhaps there are even some serious issues that have major impact on efficiency, flexibility, testability etc. After a while, you have bulk of ideas how to improve that code! Well... Don't touch this!

Why should you leave it as it is?

Please consider this as follows:

It works. Better or worse, but it works. It is clear that it could work better, but your time is limited and you have a list of new features to implement as well as already discovered bugs to fix. There is lack of unit tests. Maybe, but there is at least one passed and the most important test - test of time. Aside from that and efficiency issues, you could argue that after your refactoring the code will be more flexible, easier to learn and new features will be able to be done quicker. That is true. However, you can take it for granted that...

You will introduce new bugs. Tampering with legacy code is like playing Russian roulette. Sooner or later you will introduce such a bug that will catch you over the course of your business demo and nobady will pay attention to your great but invisible improvements. You can't be sure anything. Here are a couple of examples. Let's say that you can't see any reference to the code you are looking at. Let's remove it and check that. Application still works until you find, that some xml file outside your environment uses that removed code in some rare but crucial circumstances and now you are getting a runtime exception which message says nothing or even worse, the exception is caught in empty catch block and you think that everyting works fine. So maybe it would be a good idea to rethrow exceptions instead of swallowing them? Yes, but now you are getting a lot of runtime, less significant exceptions from the deepest pits of the application. Business people also get them and yes, it is you, who broke their application because there were no exceptions until you've touched the code. Ok, thus at least you need to introduce some exception catching policy. No, you have no time for that. But even if you receive the time...

Your little change will trigger an avalanche of other changes. If it is a big code base and you are a new guy, you will find that there are much more dependencies that you thought. Now, taking into account that you've probably asked your manager for less time that you actually needed (in order to not scare him) and he gave you even less time that you wanted and the real problem requires more time that you ever thought, you are in trouble.

How to work with legacy code?

It is an open question and it depends on code base from which you came face. I would like to point out my clues here, that seem to work for me:

Add new code on top of the existing one. Try to minimize places where new code is hooked in. If something goes wrong, you will be able to easily cut off new code and check the origin of problems.

Keep new code separated from the old one. Let it be new method, new single responsibility class, an adapter instead of some new loops or control flow logic inside old spaghetti function. It would be easy to differentiate what is new and can be refactored, modified, improved and what is old and should be leaved untouched.

Following these guidelines you will be done having application core based on legacy code that has survived test of time as well as new, flexible, legible, tested code built upon the old one.

Different "chameleon" approach lies in the fact that programmer tries to blend his code into old one so that it is very similar and the source is homogeneous. The downside is that it forces developer to further enlarge existing lasagna functions that are impossible to test. It resets the time probe and gives no assurence in return. It makes code further more rigid and illegible. It introduces more control flow statements. It adds no positive value in favor of future reformation.

The "refactoring" approach introduces new bugs. It can make code better, but there is a risk that it will also lead to state where code is scattered, have a lot of new bugs and prospective timeline indicating end of such refactoring is moving away each day.