Refactoring is a disciplined technique for restructuring an existing body of code, altering its internal structure without changing its external behavior. Its heart is a series of small behavior preserving transformations. Each transformation (called a 'refactoring') does little, but a sequence of transformations can produce a significant restructuring. Since each refactoring is small, it's less likely to go wrong. The system is also kept fully working after each small refactoring, reducing the chances that a system can get seriously broken during the restructuring.

What is "external behaviour" in this context? For example, if I apply move method refactoring and move some method to other class, it looks like I change external behaviour, doesn't it?

So, I'm interested in figuring out at what point does a change stop being a refactor and becomes something more. The term "refactoring" may be misused for larger changes: is there a different word for it?

Update. A lot of interesting answers about interface, but wouldn't move method refactoring change the interface?

If existing behavior sucks or is incomplete, then amend it, delete / rewrite it. You might not be re-factoring then, but who cares what the name is if the system will (right?) become better as the result of that.
–
JobAug 15 '11 at 3:27

2

Your supervisor may care if you were given permission to refactor and you did a rewrite.
–
JeffOAug 15 '11 at 12:29

1

The boundary of refactoring are unit tests. If you have an specification drawn by them, anything you change that don't break tests is refactoring?
–
GeorgeAug 18 '11 at 18:59

13 Answers
13

"External" in this context means "observable to users". Users may be humans in case of an application, or other programs in case of a public API.

So if you move method M from class A to class B, and both classes are deep inside an application, and no user can observe any change in the behaviour of the app due to the change, then you can rightly call it refactoring.

If, OTOH, some other higher level subsystem/component changes its behaviour or breaks due to the change, that is indeed (usually) observable to users (or at least to sysadmins checking logs). Or if your classes were part of a public API, there may be 3rd party code out there which depends on M being part of class A, not B. So neither of these cases are refactoring in the strict sense.

there is a tendency to call any code rework as refactoring which is, I guess, incorrect.

Indeed, it is a sad but expected consequence of refactoring becoming fashionable. Developers have been doing code rework in an ad hoc manner for ages, and it is certainly easier to learn a new buzzword than to analyse and change ingrained habits.

So what is the right word for reworks which change external behaviour?

I would call it redesign.

Update

A lot of interesting answers about interface, but wouldn't move method refactoring change the interface?

Of what? The specific classes, yes. But are these classes directly visible to the outside world in any way? If not - because they are inside your program, and not part of the external interface (API / GUI) of the program - no change made there is observable by external parties (unless the change breaks something, of course).

I feel that there is a deeper question beyond this: does a specific class exist as an independent entity by itself? In most cases, the answer is no: the class only exists as part of a larger component, an ecosystem of classes and objects, without which it can't be instantiated and/or is unusable. This ecosystem does not only include its (direct/indirect) dependencies, but also other classes / objects which depend on it. This is because without these higher level classes, the responsibility associated with our class may be meaningless/useless to the users of the system.

E.g. in our project which deals with car rentals, there is a Charge class. This class has no use to the users of the system by itself, because rental station agents and customers can't do much with an individual charge: they deal with rental agreement contracts as a whole (which include a bunch of different kinds of charges). The users are mostly interested in the sum total of these charges, that they are to pay in the end; the agent is interested in the different contract options, the length of the rental, the vehicle group, insurance package, extra items etc. etc. selected, which (via sophisticated business rules) govern what charges are present and how the final payment is calculated out of these. And country representatives / business analysts care about the specific business rules, their synergy and effects (on the income of the company, etc.). A single charge by itself has no meaning without the bigger picture.

Recently I refactored this class, renaming most of its fields and methods (to follow the standard Java naming convention, which was totally neglected by our predecessors). I also plan further refactorings to replace String and char fields with more appropriate enum and boolean types. All this will certainly change the interface of the class, but (if I do my job correctly) none of it will get visible to the users of our app. None of them cares about how individual charges are represented, even though they surely know the concept of charge. I could have selected as example a hundred other classes not representing any domain concept, so being even conceptually invisible to the end users, but I thought it is more interesting to pick an example where there is at least some visibility at the concept level. This shows nicely that class interfaces are only representations of domain concepts (at best), not the real thing*. The representation can be changed without affecting the concept. And users only have and understand the concept; it is our task to do the mapping between concept and representation.

*And one can easily add that the domain model, which our class represents, is itself only an approximate representation of some "real thing"...

interesting - in your example class A is 'an existing body of code", and if method M is public then A's external behavior is being changed. So you could probably say that class A is redesigned, whereas the overall system is being refactored.
–
sausAug 15 '11 at 1:25

I like observable to users. That's why I wouldn't say unit tests breaking are a sign, but rather end to end or integration tests would be a sign.
–
Andy WiesendangerAug 22 '11 at 16:40

External simply means interface in its true lingual meaning. Consider a cow for this example. As long as you feed some vegetables and get milk as the return value, you don't care how its internal organs work. Now if God change cows internal organs, so that its blood become blue in color, as long as the entry point and exit point (mouth and milk) don't change, it can be considered refactoring.

To me, refactoring has been most productive / comfortable when the boundaries were set by tests and/or by formal specification.

These boundaries are sufficiently rigid to make me feel safe knowing that if I occasionally cross, it will be detected soon enough so that I won't have to roll-back a lot of changes to recover. On the other hand, these give sufficient leeway to improve code without worrying about changing irrelevant behavior.

Thing I especially like is that these kinds of boundaries are adaptive so to speak. I mean, 1) I do the change and verify that it complies with spec/tests. Then, 2) it is passed to QA or user testing - note here, it may still fail because something is missing in spec / tests. OK, if 3a) testing passes, I'm done, fine. Otherwise, if 3b) testing fails then I 4) roll-back the change and 5) add tests or clarify spec so that next time this mistake won't repeat. Note that no matter if testing passes or fails, I gain something - either of code / tests / spec gets improved - my efforts don't turn into total waste.

As for other kinds boundaries - so far, I didn't have much luck.

"Observable to users" is a safe bet if one has a discipline to follow it - which to me somehow always involved much effort in analyzing existing / creating new tests - maybe too much effort. Another thing I dislike about this approach is that blindly following it may turn out to be too restrictive. - This change is prohibited because with it loading data will take 3 sec instead of 2. - Uhm well how about checking with users / UX expert whether this is relevant or not? - No way, any change in observable behavior is prohibited, period. Safe? you bet! productive? not really.

Another one I tried is to keep code logic (the way I understand it when reading). Except for the most elementary (and typically not very fruitful) changes, this one was always a can of worms... or should I say a can of bugs? I mean regression bugs. It is just too easy to break something important when working with spaghetti code.

The best way to define "external behaviour," in this context, may be "test cases."

If you refactor the code and it continues to pass the test cases (defined before the refactoring), then the refactoring has not changed the external behaviour. If one or more test cases fail, then you have changed the external behaviour.

At least, that is my understanding of the various books published on the topic (e.g., Fowler's).

The boundary would be the line that tells between who develop, maintain, support the project and those who are its users other than the supporters, maintainers, developers. So, to the external world, the behaviour looks the same whereas the internal structures behind the behaviour have changed.

So it should be OK to move functions between classes as long as they are not the ones that the users see.

As long as the code rework does not change external behaviours, add new functions or remove existing functions, I guess it is OK to call the rework a refactoring.

Disagree. If you replace your entire data access code with nHibernate, it doesn't change external behaviour but it does not follow Fowler's "disciplined techniques." This would be reengineering and calling it refactoring hides the risk factor involved.
–
pdrAug 14 '11 at 23:50

With all due respect, we must remember that the users of a class are not the end-users of the applications that are built with the class, but rather the classes which are implemented by utilizing - either calling or inheriting from - the class being refactored.

When you say that "external behavior should not change" you mean that as far as the users are concerned, the class behaves exactly as it it did before. It may be that the original (un-refactored) implementation was a single class, and the new (refactored) implementation has one or more super-classes upon which the class is built, but the users never see the inside (the implementation) they only see the interface.

So if a class has a method called "doSomethingAmazing" it doesn't matter to the user if that is implemented by the class they are referring to, or by a superclass upon which that class is built. All that matters to the user is that the new (refactored) "doSomethingAmazing" has the same result as the old (unrefactored) "doSomethingAmazing.

However, what is called refactoring in many cases isn't true refactoring, but perhaps a reimplmentation that is done to make the code easier to modify to add some new feature. So in this later case of (pseudo)-refactoring, the new (refactored) code actually does something different, or perhaps something more than the old.

What if a windows form used to pop up a dialog "Are you sure that you wanted to press the OK button?" and I decided to remove it because it accomplishes little good and annoys the users, then have I re-factored the code, re-designed it, amended it, de-bugged it, other?
–
JobAug 15 '11 at 3:29

@job: you have changed the program to meet new specs.
–
jmorenoAug 15 '11 at 4:12

IMHO you may be mixing up different criteria here. Rewriting code from scratch is indeed not refactoring, but this is so regardless of whether it changed external behaviour or not. Also, if changing a class interface were not refactoring, how come Move Method et al. exist in the catalog of refactorings?
–
Péter TörökAug 15 '11 at 7:43

@Péter Török - It depends entirely on what you mean by "changing a class interface" because in a OOP language that implements inheritance, the interface of a class includes not just what is implemented by the class itself by by all of it's ancestors. Changing a class interface means removing/adding a method to the interface (or changing the signature of a method - ie the number a type of parameters that are passed). Refactoring means who is responding the method, the class or a superclass.
–
Zeke HansellAug 24 '11 at 17:26

IMHO - this entire question may be too esoteric to be of any useful value to programmers.
–
Zeke HansellAug 24 '11 at 17:27

By "external behaviour" primarily he is talking about the public interface, but this also encompasses outputs/artifacts of the system as well. (ie you have a method that generates a file, changing the format of the file would be changing the external behaviour)

e: I'd consider the "move method" a change to external behaviour. Bear in mind here that Fowler is talking about existing code bases that have been released into the wild. Depending on your situation you may be able to verify that your change does not break any external clients and proceed on your merry way.

e2: "So what is the right word for reworks which change external behaviour?" -- API Refactoring, Breaking Change, etc...its still refactoring, its just not following the best practices for refactoring a public interface that is already in the wild with clients.

@kekela, but it is still unclear to me where this "refactoring" thing ends
–
IdsaAug 15 '11 at 18:06

@idsa According to the definition you've posted, it ceases being a refactoring the minute you change a public interface. (moving a public method from one class to another would be an example of this)
–
user29776Aug 15 '11 at 18:35

"Move method" is a refactoring technique, not a refactoring by itself. A refactoring is the process of applying several refactoring techniques to classes. There, when you say, I applied "move method" to a class, you do not actually mean "I refactored (the class)", you actually mean "I applied a refactoring technique on that class". Refactoring, in it most pure meaning is applied to design, or more specific, to some part of application design that can be viewed as a black box.

You could say that "refactoring" used in the context of classes, means "refactoring technique", thus "move method" does not break the definition of refactoring-the-process. On the same page, "refactoring" in the context of design, does not break existing features in the code, it only "breaks" the design (which is its purpose anyway).

In conclusion, "the boundary", mentioned in the question, is crossed if you confuse(mix :D ) refactoring-the-technique with refactoring-the-process.

Read it several times, but still don't get it
–
IdsaAug 18 '11 at 18:17

what part did you not understand ? (there is refactoring-the-process to which your definition applies or refactoring-the-technique, in your example, move method, to which the definition does not apply; therefor, move-method does not break the definition of refactoring-the-process, or does not cross its boundaries, whatever they are). i'm saying the concern you have should not exist for your example. the boundary of refactoring is not something fuzzy. you are just applying a definition of something to something else.
–
BelunAug 19 '11 at 10:47

if you talk about factoring numbers, then you're describing the group of integers that when multiplied together equal the starting number. If we take this definition for factoring, and apply it to the programming term refactor, then refactoring would be breaking a program down into the smallest logical units, such that when they are ran as a program, produce the same output (given the same input) as the original program.

People usually regard algorithms as more abstract than the programs
that implement them. The natural way to formalize this idea is that
algorithms are equivalence classes of programs with respect to a
suitable equivalence relation. We argue that no such equivalence
relation exists.

So, don't go too far, or you won't have any confidence in the result. On the other hand, experience dictates that you can often replace one algorithm with another, and get the same answers, sometimes faster. That's the beauty of it, eh?

So any change to the system that does not effect none pigs, can be considered to be refactoring.

Changing a class interface is a none issue, if the class is only used by a single system that is built and maintained by your team. However if the class is a public class in the .net framework that is used by every .net programmer it is a very different matter.