by developers, for developers

If you read articles, visit websites, read tech books or visit conferences you will undoubtedly have heard about the term refactoring. There is a good chance you’ve been refactoring yourself already, whether you knew it or not. In this article you will be introduced to the practice of refactoring. The basic definition will be discussed, as well as reasons for refactoring and how to actually refactor your code. The difference between refactoring your code and rewriting your code will also be explained.

What is refactoring?

Refactoring is the process of changing small parts of your application to improve the application or code without changing it’s behaviour. Wikipedia defines refactoring as follows:

Code refactoring is the process of changing a computer program’s internal structure without modifying its external behavior or existing functionality. This is usually done to improve code readability, simplify code structure, change code to adhere to a given programming paradigm, improve maintainability, or improve extensibility.

So far, this is the most clear definition of refactoring I’ve found. It is at least more clear (and shorter) than the definition of Martin Fowler, who can probably be seen as the godfather of refactoring.

The What

Let’s shortly analyse the above definition to get a clear view of refactoring.

Code refactoring is the process of changing a computer program’s internal structure without modifying its external behavior or existing functionality.

So if you have a given code base, this means that you will start changing the codebase without actually changing what it does. It is this part that triggers some people to think refactoring is useless and means you do work twice (or three or four or more times). And they would be right, if the above was the full definition of refactoring. However, there is more…

The Why

But why would you do this? This is where the second part of the definition comes in. Let’s have a look…

This is usually done to improve code readability, simplify code structure, change code to adhere to a given programming paradigm, improve maintainability, or improve extensibility.

Now, here we have the actual reasons for refactoring. And given these, I think we have more than enough reason to assume refactoring is not necessarily a bad thing. Let’s dig into some reasons for refactoring a bit more.

Improve readability and maintainability

Even though readability and maintainability are actually two different things they do go hand in hand. Many developers have horror stories of PHP3 and PHP4 spaghetti code that they either wrote or had to maintain. In most cases, these projects were developed without a coding standard. Doing a proper design of the project was also not a concept most PHP developers were familiar with. This led to projects that were too complex to properly understand and code that could not be followed easily. This type of unreadable code makes everyone’s life more difficult. Beyond simply being hard to read, comprehend and maintain, bringing new developers up to speed on the project got harder as the project grew. This leads to developers not being able to contribute to the project quickly and an overall loss of productivity. Having proper coding guidelines will make your code more readable than thus save time and money when integrating new developers into the project.

Once an application has been deployed to production and the maintenance phase of the application starts, the project budget shrinks. During this phase, fewer developers are actually looking at the code on a regular basis thus fewer developers are familiar with it. Making sure your code is readable dramatically improves its maintainability. Optimizing the code for readability and maintainability will ensure that less time is spent re-learning the code, giving more focus on the actual changes.

Performance improvement

Not every application will have a focus on the speed of its response, but most public facing web applications will need to give its performance some attention. Where readability and maintainability problems will be found during your development phase, performance problems may not pop up until you’ve deployed your application to your production environment. At such a point, it will be uncalled for to remove certain features that cause a performance hit unless absolutely necessary. Since refactoring is focused on altering code without changing the actual logic, it is easy to apply this to performance improvements.

Implementation or change of technologies

Another place where refactoring can be applied is the implementation or change of technologies. At one point or another you will probably encounter the need to change used technologies. For instance, your application needs to switch authentication from a MySQL database to an LDAP server. Without changing the actual logic of the authentication mechanism, you alter the part of the code responsible for authenticating a user to replace the code executing the database queries to instead query the LDAP server. The code that calls the authentication (which could be multiple calls all over your application) will not need to be changed to accommodate your change of technologies. This will save you a lot of time but also the possibility of introducing bugs, inconsistencies or even explicit malfunctions in (parts of) your application. Where this won’t matter that much for your personal weblog, when you are working on a big commercial web application where every failed login or minute of downtime means missed income, the advantage is obvious.

Refactoring: The holy grail of change?

The question of course is: Is refactoring the holy grail of change? Is it the ultimate way of altering your application. I’d say it is, but only in the right situation. There are a lot of examples that you could think of where just refactoring your code simply won’t work for the required changes. There comes a point where a change can simply not be accomplished anymore by refactoring your code. When you get to a point where refactoring won’t do anymore, it is time to start rewriting (parts of) your application. Wikipedia summarizes “Rewrite” as:

A rewrite in computer programming is the act or result of re-implementing a large portion of existing functionality without re-use of its source code. When the rewrite is not using existing code at all, it is common to speak of a rewrite from scratch. When instead only parts are re-engineered, which have otherwise become complicated to handle or extend, then it is more precise to speak of code refactoring.

The difference between refactoring and rewriting is simply: With refactoring you do not break your existing API, but rather change the implementation of parts of your API or extend it without breaking existing functionality. With rewriting, it is allowed to break your API, for instance by removing or changing the order of parameters, or even removing or replacing complete classes.

So how do you determine if you want to refactor or rewrite? My answer to this would be very clear: Refactor in any situation where it is possible to do so, rewrite otherwise. Sometimes it may take some time to research your application and API to figure out if you must break something to implement changes, but refactoring is a much faster practice then rewriting so the time to research may be well-spent once you find out refactoring is possible.

The reason for my statement that refactoring is faster than rewriting should be obvious: Since you don’t break the existing API your changes will be limited to the part of the application you are actually changing. Given a good unit test coverage for that part, refactoring should be nothing more than implementing your changes and running unit tests (possibly followed by small fixes to ensure the unit tests pass). If you rewrite, you will break your API. Breaking your API means you need to ensure that any place the rewritten code is directly or indirectly called is also changed to assure your application to stay stable. Additionally, you will need to update your unit tests to cover the new API. All this usually takes more time than refactoring, but it may just be required for the change you want to implement.

Refactoring is therefore not the holy grail. It is not the ultimate change tool. As soon as your research shows that you can not avoid breaking your existing API, do not hesitate to start planning your rewrite.

Requirements for successful refactoring

As I’ve mentioned earlier not all situations are good for refactoring. There are definitely situations where refactoring won’t be enough for your required changes. However, you can set up your codebase to allow as much refactoring as possible. In other words: You need to write your initial code to accommodate later refactoring. Let’s have a look at some requirements to make this work well.

Codebase knowledge

First of all, to be able to do your refactoring well, you need to have a firm knowledge of your own codebase. This will reduce the time you need to research if refactoring is possible, but more importantly it will prevent unexpected side effects of your refactoring. With refactoring you need to be aware of what you are changing and what possible side effects it could have. Remember that the main focus in refactoring is that you change small units of code without affecting the existing API of that unit. A firm knowledge of your own codebase helps in ensuring that you don’t break your project’s code by changing those units.

Structured API

A good, well-structured existing API for your project will help in allowing easy refactoring. So even with your initial project setup, think about your API and consider what implications your design decisions will have on future extensibility and changes. Try to keep your units as small and as decoupled as possible. The less dependencies inside your project, the less headache you will have when you need to change something, because once you need to change a piece of code, the less code is depending on that case, the easier it will be to change that code without other parts of the application breaking. So even with refactoring to back you up on possibly incorrect code or business requirements that change, you still need to think about how you design your application.

Unit testing

Possibly the most important requirement for refactoring is the presence of unit tests. Unit tests are your safeguard against making changes to your API that break existing code. That is, well-written unit tests are that safeguard. Ensure that your unit tests cover as many situations as possible for your code. One important rule here is that you do not just cover the expected behaviour of your code, but also test for unexpected behaviour. Ensure that your code does not throw any exceptions, write tests where you pass incorrect values or incorrectly casted values into your API and make sure that even in those edge cases, your API does what you want it to do. Run your tests and make sure they pass. Now, start refactoring, implement the changes you want to make, and after that, run the tests again. If even one of your tests fail, that means your refactoring wasn’t successful as you broke existing code. Even if this is just an edge case, it’s something you need to take care of.

How to refactor

So now that we’ve what refactoring is, why we should refactor (or not) and some requirements, let’s start looking at some code. How do we go about refactoring. For this article, I will take a quite simple example of an authentication class. We start off with a very simple User class that has a method that is used for authentication:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

classUser{

/**

* authenticate a user

*

* @param string $username

* @param string $password

* @return boolean

* @todo actually implement this

*/

publicstaticfunctionauthenticate($username,$password)

{

if($username=='test'AND$password=='test')

{

returntrue;

}

else

{

returnfalse;

}

}

}

Of course, I also have a unit test for this code to ensure that it works as it should:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

<?php

require_once('./user.class.php');

require_once'PHPUnit/Framework.php';

classUserTestextendsPHPUnit_Framework_TestCase

{

publicfunctiontestCorrectUserCredentials()

{

$this->assertTrue(User::authenticate('test','test'));

}

publicfunctiontestIncorrectUsername()

{

$this->assertFalse(User::authenticate('wrong','test'));

}

publicfunctiontestIncorrectPassword()

{

$this->assertFalse(User::authenticate('test','wrong'));

}

}

Running these tests assures me that my code works as it should.

Obviously, this is not a good way to do authentication. What happens if I want to add a user? So we’re going to refactor the authentication to start using a database. This means we need to connect to a database, query it, and return the boolean based on the result of this. After doing this, we end up with the below authenticate() method:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

publicstaticfunctionauthenticate($username,$password)

{

mysql_connect('localhost','root','secret');

mysql_select_db('refactoring');

$sql="SELECT `id` FROM `user`

WHERE `username`='".mysql_real_escape_string($username)."'

AND `password`='".md5($password)."'";

$res=mysql_query($sql);

if(mysql_num_rows($res)==1)

{

returntrue;

}

else

{

returnfalse;

}

}

Now that we’ve made the changes, let’s run our unit tests to ensure everything still works:

Having ensured our code still works we can rest safely. Our application won’t break because of this.

But we can’t rest long. In the mean time, we found out about Doctrine, an Object Relational Mapping framework that will make working with data from the database much easier by having objects represent the records and classes represent the tables. After sitting down for a bit to work on our User class, we end up with the following code:

Adding external libraries into your API is something refactoring allows. It is what we did here. However, this last step was not refactoring, because we have possibly altered the behaviour of our code. The fact that User now inherits from Doctrine_Record and we use Doctrine_Query without a try/catch block means the code could in certain situations throw an exception. Something the code calling this authenticate() method might not be prepared for.

Tips and tricks

Refactoring code is hard, but there are many things you can do to make refactoring easier and to catch mistakes as they happen. Here are some tips and tricks from my experience of refactoring and rewriting that might be useful:

If you don’t have the code you plan to change covered by unit tests yet, write the unit tests before starting your refactoring. This will help you in preventing bugs and mistakes.

Documentation is your friend. It will help you determine whether you are refactoring or rewriting. If you use phpDoc to document all your methods, then there is a simple rule of thumb you can follow:

Once you need to start changing your method’s phpDoc for attributes like @params, @return or @throws, then you’re not refactoring anymore but rewriting.

In the above example where I introduced Doctrine, you’ll notice that I’d have to introduce an @throws occurrence. This is where refactoring makes way for rewriting. There is nothing wrong with rewriting your code, however once you start rewriting your code, you need to take into account that you may need to also change the calling code.

This article does not just apply to PHP. It can apply to other languages such as Java, Python and Javascript, and could even apply to HTML and CSS.

Don’t just refactor to refactor. If you’ve working on something and encounter a piece of code that may benefit from refactoring, go ahead and refactor it. Usually these changes don’t take a lot of time, but your codebase will benefit of it. Don’t forget to run the unit tests after your changes though!

Don’t fully trust the refactoring abilities of IDE’s. Even though they may at times be useful, they are not perfect.

So, will you use refactoring?

Over the course of this article I’ve expanded on several reasons for refactoring. The core of my message is that refactoring will save you time and money. Given the small scope that refactoring has as opposed to rewriting (parts of) your application, you will spend less time on refactoring than when you’d rewrite. And time is money, so you save money. But there is more. A more performant website, for instance, will please more visitors. Especially in e-commerce this is something that will repay you with more return customers, because a website that is pleasant to visit will leave a positive feeling for a visitor. So even very small changes may make the difference between you and your competitors. Having a slow website does not necessarily mean you need to rewrite everything to get it back to normal. However, I’ve also pointed out that you need to be realistic about this. If refactoring isn’t enough, take a step back and see if rewriting will solve your problem.

I’ve also given you some important requirements for successful refactoring. I’m sure people are able to refactor without these requirements, but your refactoring experience and the resulting code will be much better with these requirements in mind. So do adhere to these if you have the opportunity to do so.

I’ve also shown you an example of refactoring, that I think might clear up the approach to refactoring and the difference between refactoring and rewriting. And lastly, I’ve handed some practical tips and tricks for refactoring. Now, go have a look at your own code, and see if you can put it all into practice.

In my opinion, changing phpdoc is not rewriting, but still refactoring, since from an external point of view the system still behaves the same way, while the single units under test don’t. Half of the Fowler’s catalog refactoring involve changing phpdoc, part of classes and method signatures. This means also tests should be updated to reflect the new interface under test.
However, a nice article; I think software engineering practices should be evangelized as far as possible in the php community. Being a dynamic language makes it suffers more from hacks and design issues.

http://www.leftontheweb.com/ stefan

Hi Giorgio,

Thanks for the nice words. There may be some edge cases where it may be deemed refactoring and not rewriting, but any time you change the signature in whatever way (even adding new optional parameters) is a point where you really need to be conscious of your changes and how it possibly affects other code. To me, that consciously changing a signature is a grey area which you could deem both refactoring and rewriting.