Developing and Maintaining Secure and Reliable Software in the Real World

Wednesday, April 4, 2012

What Refactoring is, and what it isn’t

Sometimes a programmer will come to me and explain that they don’t like the design of something and that “we’re gonna need to do a whole bunch of refactoring” to make it right. Oh Oh. This doesn’t sound good. And it doesn’t sound like refactoring either….

Refactoring, as originally defined by Martin Fowler and Kent Beck, is

A change made to the internal structure of software to make it easier to understand and cheaper to modify without changing its observable behavior… It is a disciplined way to clean up code that minimizes the chances of introducing bugs.

Refactoring is done to fill in short-cuts, eliminate duplication and dead code, and to make the design and logic clear. To make better and clearer use of the programming language. To take advantage of information that you have now but that the programmer didn’t have then – or that they didn’t take advantage of then. Always to simplify the code and to make it easier to understand. Always to make it easier and safer to change in the future.

Fixing any bugs that you find along the way is not refactoring. Optimization is not refactoring. Tightening up error handling and adding defensive code is not refactoring. Making the code more testable is not refactoring – although this may happen as the result of refactoring. All of these are good things to do. But they aren’t refactoring.

Programmers, especially programmers maintaining code, have always cleaned up code as part of their job. It’s natural and often necessary to get the job done. What Martin Fowler and others did was to formalize the practices of restructuring code, and to document a catalog of common and proven refactoring patterns – the goals and steps.

Refactoring is simple. Protect yourself from making mistakes by first writing tests where you can. Make structural changes to the code in small, independent and safe steps, and test the code after each of these steps to ensure that you haven’t changed the behavior – it still works the same, just looks different. Refactoring patterns and refactoring tools in modern IDEs make refactoring easy, safe and cheap.

Refactoring isn’t and end in itself

Refactoring is supposed to be a practice that supports making changes to code. You refactor code before making changes, so that you can confirm your understanding of the code and make it easier and safer to put your change in. Regression test your refactoring work. Then make your fix or changes. Test again. And afterwards maybe refactor some more of the code to make the intent of the changes clearer. And test everything again. Refactor, then change. Or change, then refactor.

You don’t decide to refactor, you refactor because you want to do something else, and refactoring helps you do that other thing.

The scope of your refactoring work should be driven by the change or fix that you need to make – what do you need to do to make the change safer and cleaner? In other words: Don’t refactor for the sake of refactoring. Don’t refactor code that you aren’t changing or preparing to change.

Scratch Refactoring to Understand

There’s also Scratch Refactoring from Michael Feather’s Working Effectively with Legacy Code book; what Martin Fowler calls “Refactoring to Understand”. This is where you take code that you don’t understand (or can’t stand) and clean it up so that you can get a better idea of what is going on before you start to actually work on changing it for real, or to help in debugging it. Rename variables and methods once you figure out what they really mean, delete code that you don’t want to look at (or don’t think works), break complex conditional statements down, break long routines into smaller ones that you can get your head around.

Don't bother reviewing and testing all of these changes. The point is to move fast – this is a quick and dirty prototype to give you a view into the code and how it works. Learn from it and throw it away. Scratch refactoring also lets you test out different refactoring approaches and learn more about refactoring techniques. Michael Feathers recommends that you keep notes during this on anything that wasn’t obvious or that was especially useful, so that you can come back and do a proper job later - in small, disciplined steps, with tests.

What about “Large Scale” Refactoring?

You can get a big return in understandability and maintainability from making simple and obvious refactoring changes: eliminating duplication, changing variable and method names to be more meaningful, extracting methods to make code easier to understand and more reusable, simplifying conditional logic, replacing a magic number with a named constant, moving common code together.

There is a big difference between minor, inline refactoring like this, and more fundamental design restructuring – what Martin Fowler refers to as “Big Refactoring”. Big, expensive changes that carry a lot of technical risk. This isn’t cleaning up code and improving the design while you are working: this is fundamental redesign.

Some people like to call redesign or rewriting or replatforming or reengineering a system “Large Scale Refactoring” because technically you aren’t changing behavior – the business logic and inputs and outputs stay the same, it’s “only” the design and implementation that’s changing. The difference seems to be that you can rewrite code or even an entire system, and as long as you do it in steps, you can still call it “refactoring”, whether you are slowly Strangling a legacy system with new code, or making large-scale changes to the architecture of a system.

“Large Scale Refactoring” changes can be ugly. They can take weeks or months (or years) to complete, requiring changes to many different parts of the code. They need to be broken down and released in multiple steps, requiring temporary scaffolding and detours, especially if you are working in short Agile sprints. This is where practices like Branch by Abstraction come in to play, to help you manage changes inside the code over a long period of time.

In the meantime you have to keep working with the old code and new code together, making the code harder to follow and harder to change, more brittle and buggy - the opposite of what refactoring is supposed to achieve. Sometimes this can go on forever – the transition work never gets completed because most of the benefits are realized early, or because the consultant who came up with the idea left to go on to something else, or the budget got cut, and you’re stuck maintaining a Frankensystem.

This is Refactoring - That Isn't

Mixing this kind of heavy project work up with the discipline of refactoring-as-you-go is wrong. They are fundamentally different kinds of work, with very different costs and risks. It muddies up what people think refactoring is, and how refactoring should be done.

Refactoring can and should be folded in to how you write and maintain code – a part of the everyday discipline of development, like writing tests and reviewing code. It should be done quietly, continuously and implicitly. It becomes part of the cost of doing work, folded in to estimates and risk assessments. Done properly, it doesn’t need to be explained or justified.

Refactoring that takes a few minutes or an hour or two as part of a change is just part of the job. Refactoring that can take several days or longer is not refactoring; it is rewriting or redesigning. If you have to set aside explicit blocks of time (or an entire sprint!) to refactor code, if you have to get permission or make a business case for code cleanup, then you aren’t refactoring – even if you are using refactoring techniques and tools, you’re doing something else.

Some programmers believe it is their right and responsibility to make fundamental and significant changes to code, to reimagine and rewrite it, in the name of refactoring and for the sake of the future and for their craft. Sometimes redesigning and rewriting code is the right thing to do. But be honest and clear. Don’t hide this under the name of refactoring.

Good clarification / re-framing of refactoring -- a set of small-scale approaches used along the way to programming goals.

However the impetus of this post seems to have nothing to do with philosophical differences about what constitutes refactoring.

It sounds like this article was inspired by an intelligent young developer, eager to make her mark on the source code. A topic that deserves an article of its own, IMHO.

To me it is particularly interesting that intelligent young developers in the corporate world spend years ripping out their predecessors' work and marking their own code turf, and then bragging in resumes and blogs about the drastic changes they've made to corporate source code; whereas in the open source world intelligent young developers tend to shy away from large-scale code changes and re-designs.

Why do so many young intelligent developers love re-designing software so much? It is not universal, though it certainly is common.

Subscribe to this blog

About Me

I am an experienced software development manager, project manager and CTO focused on hard problems in software development and maintenance, software quality and security. For the last 15 years I have managed teams building and operating high-performance financial systems.
My special interest is how small teams can be most effective in building real software: high-quality, secure systems at the extreme limits of reliability, performance, and adaptability. Software that has to work, that is built right, and built to last.
I use this blog to explore ideas and problems in software development that are important to me. To reflect and to find new answers.