So I'm sitting down to a nice bowl of c# spaghetti, and need to add something or remove something... but I have challenges everywhere from functions passing arguments that doesn't make sense, someone who doesn't understand data structures abusing strings, redundant variables, some comments are red-hearings, internationalization is on a per-every-output-level, SQL doesn't use any kind of DBAL, database connections are left open everywhere...

Are there any tools or techniques I can use to at least keep track of the "functional integrity" of the code (meaning my "improvements" don't break it), or a resource online with common "bad patterns" that explains a good way to transition code? I'm basically looking for a guidebook on how to spin straw into gold.

I've never had to refactor something like this before, and I want to know if there's something like a guidebook or knowledgebase on how to do this sort of thing, finding common bad patterns and offering the best solutions to repair them. I don't want to just nuke it from orbit,

One file/feature at a time. I'd start, by adding a Data Access Layer. It will be fairly simple, and provide the most bang for the buck. Create some data access classes, then go through your code moving the queries into the appropriate class. Don't yet worry about combining similar queries, do that second. First is to get all the data access into ONE location. Then look for commonalities, and similar queries and refactor them.
–
CaffGeekDec 15 '11 at 16:47

If this is a hobby project without any paying users then this is an excellent opportunity to feel on your own body what a REAL rewrite actually is about. Just do it. Write it again. You will learn a LOT.
–
user1249Dec 15 '11 at 17:26

Working effectively with legacy code by Michael Feathers. It can feel a bit dated sometime, but the general idea of how to deal with code that is hard to maintain is well explained. (Michael Feathers actually recognizes the need for an update to this book, called "Brutal Refactoring")

All of this books emphase on placing a "security nest" before engaging in refactoring when possible. This means having tests that ensure that you don't break the original intent of the code you are refactoring.

+1 for Working effectively with legacy code. Unit tests are great - but 500 lines isn't in any useful way of thinking a "unit" - so it's not really unit-testable. Too damn much goes on inside such a function to be able to write a unit test. So write characterization tests that exercise some paths through your code, and then whittle away at it.
–
Carl ManasterApr 8 '11 at 14:21

Think of a "Unit" as one logical function of the code. That can be error checking, "happy path" validation, etc. Even if there is only one path through the 500 lines (no loops, conditionals, etc.) there are logical groups of functionality. You should have multiple tests per method.
–
Berin LoritschApr 8 '11 at 16:58

This is exactly what Martin Fowler talks about in his book Refactoring.

Basic summary:

Unit tests! First you want to establish tests for the existing code. Figure out what the inputs and expected outputs are and test them. Don't test to the code. Test to the spec — you don't want to reproduce bugs.

Refactor bits at a time, keeping tests passing as you go.

Keep in mind that if you have no tests at all or for large portions of code, a rewrite might actually be the most efficient choice.

This interview with Martin Fowler about the topics in the book sheds a lot of light on the subject (on unit tests, payoffs, finding bugs through refactoring, refactoring vs. rewriting).

Keep in mind that bad patterns are bad for a reason — it is a poor solution to the problem. That means that even if you've seen the same kind of bad code, it doesn't mean the problem was the same and the best solution is likely different for each case.

The odds this code example doesn't have a ton of bugs are slim. Testing to the spec makes sense. It would be interesting to the quality of the specs.
–
JeffOApr 8 '11 at 15:31

7

@Jeff O: And I'd bet you a nickel that there are no current specs. There may be some original specs, which are probably incomplete and badly written, but code like this doesn't usually come with updated specs.
–
David ThornleyApr 8 '11 at 16:10

David couldn't be more right: somewhere in the world there may exist a project with code this bad, yet with a solid set of specs, but I'd have to see it before I'd be willing to believe in its existence.
–
Carson63000Apr 8 '11 at 22:18

You'd have to create the spec, which is one reason why a rewrite is probably faster anyway.
–
NickCApr 8 '11 at 22:25

In my past experience, iteratively defining areas of functionality and extracting the relevant code into methods/functions will be the fastest way for you to get started. You can then extract code, put that code into a function, and then test to confirm that you haven't broken anything. Rinse and repeat until you're done or until you have to add new functionality. Refactoring iteratively will provide more flexibility by allowing you to take occasional breaks from refactoring, especially if you still need/want to add new code. Psychologically, an iterative approach will give you the satisfaction of seeing your codebase become more ordered, which will make it easier for you to complete this refactoring task.

I would also like to add that you might want to look and see if any of the functionality that was implemented is available in third-party libraries. When looking at my old code I facepalm over the large amount of boilerplate code I would write that was readily available in libraries or frameworks that I was unaware of at the time.
–
maple_shaft♦Dec 15 '11 at 16:48

It sounds like what you want is to have unit tests. Add test cases that cover the existing behavior, then when you refactor you can rely on your tests to alert you to any failures. Do this on a new branch in your source control as well, so that any accidents don't become catastrophic.

Wrap as much of your code into classes as you can, grouping it by functionality, but leaving it otherwise untouched for now. This will allow you to triage your code.

Create a database layer using PDO.

Convert your code to the MVC pattern, taking special care to weed out any PHP code that is needlessly generating HTML and transferring that to the view. Your controllers should determine which model and which view to load, and your models should contain everything else.

Refactor your remaining classes, one by one. Tackle the most pervasive classes first. Look for classes that do multiple things and split them up. Get rid of repeated code. Make single responsibility methods.

Add unit tests as you go along. This will make your code more stable over the long-haul. If you need to refactor it again in the future, you will then have a contract that specifies how the code is supposed to perform. PHPUnit is one test framework that you can use, but there are others as well.

I would start by covering as much of your code with automated tests as you can. It might be boring and tedious, but it lets you capture the current behavior of your system. Then, as you iteratively clean things up and rewrite/refactor parts of your codebase, you can feel more confident about not introducing a ton of regressions. You will also hopefully be able to introduce finer grained tests then you could before. This has always helped me in the past.

Yous should definitely start with writing unit tests, as others have mentioned. Unfortunately though, this code clearly needs to be nuked from existence. Now if it is working fine the way it is, maybe you do not need to, but any maintinece will quickly become very costly. This code appears to pretty much be the defintion of unmaintanable. But hey, with better design you can probably blow through a rewrite much quicker than the person who stumbled through it (I know cowboys can be quick coders, but this programmer was not cowboy... I think he might have literally been a cow).

If it is a web app, you can test the "output" with a little creativity -- use something like selenium on the browser side to make sure the UI bits still work and then check the database on the back-side to confirm that your update went through. Your black box is big, but you can at least sanity check your changes automatically and repeatably.

As others have said your main weapon here is unit tests. Regardless of what ever else you do you will need these.

As for tools to help refactoring, they exist, but they won't do large scale fixing for you. Things like Resharper will help move methods around, rename variables, extract interfaces etc, but I suspect you are going to have to do a lot of manual refactoring too.

That's not what refactoring is. Refactoring is breaking a piece of code down into its component parts (factors) in a way that improves code reuse and readability.

To improve terrible code is rewriting. So the question is how do you go about rewriting a whole application of bad code. The answer is that you start from scratch. Rewriting piecemeal is far more difficult and in the long run it will actually take longer to do properly.

If I am to rewrite it, am I better off to just jump in and rewrite it or sit down and plan everything out on paper first?
–
TheEnigmaMachineDec 15 '11 at 16:43

Paper. All but the most trivial development tasks should begin with design (planning). And in this case you have the advantage of learning from the mistakes you made the first time around.
–
Igby LargemanDec 15 '11 at 16:47

While you are right in the original meaning of those words, code refactoring nowadays is "altering of the code without changing its "external" behavior" and "rewriting" means typically rewriting from scratch (parts or whole code).
–
MaRDec 15 '11 at 16:53

@MaR: Yes, I'm perhaps being overly literal. I often see people referring to rewriting as refactoring, and that bugs me. In this case, judging by AgentKC's own account of the state of the code, it sounds like a scratch-rewrite would be advisable.
–
Igby LargemanDec 15 '11 at 16:59