Michael Stal on Architecture Refactoring

Recorded at:

Bio Michael Stal is a Principal Engineer at Siemens Corporate Research and Technology where he and his team are responsible for research and customer projects. His main research areas include Software Architecture and Distributed Systems. Michael is co-author of the Pattern-Oriented Software Architecture series. He is also a frequest speaker in various conferences and author of several articles.

Software architectural refactoring it's interesting to keep it in comparison to other kinds of refactoring because if you read refactoring literature currently you will see that most of the literature is focusing on code refactoring. Code refactoring basically means take some structures, change the structures without any way touching the semantics or the behavior, the functionality should be left unchanged. But the question is if you are dealing with software architecture, if you see that many systems start with very clean and concise software architecture at the beginning, but then gets really messed up, using different kinds of backpacks and corrections and you see all of these patches from the big vendors and the operating systems base for instance. You will see that after some time all of these different software architectures get suffering from what I am calling "designer erosion", so basically the original architecture vision is hardly visible, there are a lot of different problems, if you touch one thing then you will get problems in other things.

If you have one error in this place there will popup errors in other places, so you'll have a lot of problems, you will have to deal with things like performance issues because whenever you are changing something performance might be affected, so there are a lot different consequences of unsystematic and uncontrolled changes of software architecture. That brought me to the point why we should do something like refactoring not only for implementation, but also for other kinds of software artifacts, so you could also think about applying the same kind of refactoring functionality or refactoring pedants, not only to implementation like Java programs or C# programs, but you could at the same time use the same kind of approach in order to improve your UML artifacts, to your design models, for instance, to your DSLs, test plans, whatever.

So refactoring from my view point is a much more general approach then just being applicable to software source code. So you also at the same time make it available for architecture stuff and that is the reason why I came up with the architecture refactoring tutorial in the first place, because that is something you as a software architecture is experiencing the whole time, because whenever you come to a project and especially if it's an agile project, you will see that there are some changes and the changes do not only affect implementation, they also affect the architecture, so the same issue in an agile environment is very important. So basically you are doing an incremental approach and each increment you do not only refine your architecture, but you also have to do some garden up activities, where you get rid of some deficiencies and that is basically architecture refactoring finds its place.

It was quite obvious from my view point because if you think about it, refactoring is a thing that you already do in the current software architecture activities, but they didn't coin it that way, so it was basically more the software, the code refactoring part which was really obvious in literature, but refactoring from my view point is basically a pattern. So a refactoring is a pattern which is you have a problem, some problems which your code for instance, you have a context in which that problem occurs and you want to have a proven solution in order how to get this problem solved and that is exactly what a refactoring pattern is about. It's a transformation pattern which helps you in order to improve the architecture quality or the code quality or whatever quality you are interested in. So it was really obvious from my view point that it should be a refactoring.

Sure. Maybe one very simple example which we experience in a Siemens project, we had a warehouse management system and they came up with a specific abstraction which they called abstract storage. An abstract storage could be things like a dump, a bin, a door; anything which can contain things or entities would be something like an abstract storage. So we had this very nice abstraction hierarchy, then once upon a time one of the engineers had the idea: "Ok, we also should consider things like transport ways, so we have to transport goods from once place to another". What he basically did, he came up with an additional hierarchy which was called "transport ways" consisting of different kinds of equipments, consisting of different kinds of transportation means.

Then it ended up in quite a mess because all of a sudden you had to deal with two different kinds of hierarchies, one more related to abstract storage and the other one more related to transport ways and this made the architecture really complex. And the idea was, of course, how could I get rid of this complexity and the solution was: "Do I really need this kind of additional abstraction? Do I really need to consider a transport way to be a transport way or can I consider it a little bit differently?" That is exactly what we did; we really figured out that a transport way is also some kind of abstract storage, it's a storage way of transporting goods from once place to another, but at the end it's a storage. So why don't we get rid of this transport way abstraction and just put the equipment kind of abstraction directly under the abstract storage abstraction?

This is an example where you can come up with something like a refactoring which may be called to get rid additional abstraction layers in order to strive for minimalism and simplicity, get rid of everything which is not required anymore and if you look at, for instance, software architecture qualities you are striving for, something like simplicity; simplicity basically means something is simple if you cannot remove anything anymore without really affecting its internal behavior or semantics. That is exactly what simplicity is all about and basically this kind of refactoring would support to make your application simpler, but not simplistic. So that is an example for refactoring: get rid of additional or superfluous abstraction layers. That is one example of architecture refactoring.

Sure. One of the most typical examples is, if you are a software architect in a project and as I said before normally you are trying to be very elaborate, you are trying to be very sophisticated only to the changes in a very systematic way, but that's not the way it works of course in practice, so basically what happens after a time that sometimes in space and time you get situations where your architecture gets messy. For instance you get dependency cycles and these dependency cycles are one of the most obvious problems or all of the sudden you subsystem A depends on a subsystem which depends on a subsystem C which also depends of you original subsystem A, which makes the system less testable because if you have to test A, you also need to touch B, C and so on.

If you want change something in A you have a dependency on B, so you have also to change something in B and all of these problems are really obvious and one of the other architecture refactoring which really is applicable to this kind of situation is how to break dependency cycles. For instance, one solution could be just inverting one of the dependencies, so instead of having a dependency from A to B, try to come up with a dependency from B to A instead, in order to break this kind of dependency cycle. Or maybe you got some dependencies from one component and you can split this one component into two components because for instance you have mixed up different responsibilities to the same subsystem so you get in the situation that you had a dependency cycle because of exactly that reason.

So in order to get rid this kind of dependency cycle you could just split up your subsystem and have different subsystems each of which is responsible for only one kind of functionality and not trying in contrast to separation of concern mix up all of different responsibilities in one place, and that is another example how to get rid of the dependencies. Maybe a third example could be dependency injection, instead of having a component be responsible to create, initialize or locate other things, maybe you could come up with a container or something at runtime or whatever, in order to be responsible to take care about all of these different kinds of dependencies on behalf of you and that's exactly how dependency cycles can be improved just by trying to break one of these dependencies and this way breaking the dependency cycle.

Normally what we suggest is to have a kind of agile process, as soon are you are starting to develop your architecture, which means come up with a kind of baseline architecture which really consists of the course grained artifacts and their relationships and this kind of architecture should be very stable. And then from this point on you are really going to incrementally change that kind of architecture, which basically consists of domain specific objects, infrastructure parts and also covers the most important operation qualities. When you start to incrementally then to extend and to refine you architecture you have increments and each increment before trying to add new things we always recommend before adding something try to get rid of problems and deficiencies and this is exactly where the kind of architecture refactoring takes place, so if you see what I would like to refine, or I would improve my architecture in one of these increments, try to find out whether there are some architecture refactorings you should or you could apply to this kind of problem you have and before trying to extend your architecture with additional things try to improve the architecture using refactoring.

So in this case each increment does not only consist of top down refinement activities, but also of bottom up gardening activities which we then call refactoring activities and that is the reason that it's not only implementation refactoring you need, you also need architecture refactoring because implementation refactoring is only applicable as soon as you have code, but when you start with a new project in the beginning, at least, you won't have to much code and you will also have some parts of the architecture, the architecture vision and some other parts, more refined parts, where there is no code associated with. In this case, even if you have code by the way, which is associated to the architecture, you don't want to change your architecture by touching the code. You don't want to change a variable in Java and in order to change the architecture. You want to work on a very high abstract level of modeling and that is the reason why architecture refactoring works in this kind of incremental approach, best for its purpose.

It's hard. The problem is that architecture refactoring and code refactoring are two sides of the same coin and architecture refactoring is much more abstract and much more weak. It's not something like, if you look at the book by Martin Fowler and the other authors on code refactoring, which is very concise, very precise where you have concrete implementation examples. In architecture refactoring you are starting very weak, you get some guidelines which might be very abstract at the beginning and which you as an architect have to apply to your concrete situation, so it's not that concise and precise and specific as for instance a code refactoring. That's one of the problems. The other problem, of course, is what happens if your architecture is already implemented in some way.

So as soon as you start, for instance, to name one of the simple refactorings which is also applicable to architecture, to rename something. So as soon as you start to rename a concept in your architecture model then it should or could have a direct impact on the code. You could really drive your developers completely crazy because whenever you are changing something in the architecture they have to cope with this change in their implementation and this is the real problem: how to synchronize changes in the architecture with changes in implementation. Of course this is a more general issue, but as soon as you start with architecture refactoring then of course this issue is very important and very relevant, but on the other side what does the normal way it works in real time architecture today?

You are doing architecture refactoring anyway, but in a very uncontrolled, unsystematic way without naming it refactoring. You are just changing the architectures or the same problem will appear in every project, but none the less you should keep in mind that whenever you are touching the architecture as soon as there is some code associated with the architecture it will have an impact also on the code and that is the reason why I personally think that an architecture refactoring should also state which kind of code refactorings you also should use in order to implement the architecture refactoring on the code level. So is there any relationship between the architecture refactoring and the code refactoring which might be very simple and the case rename entity might mean: "Ok, rename a method in the code refactoring part" which was very simple because you got a one-to-one relationship between both of them. But it might be much more complicated in things like: "Ok enable strict layering instead of relax layering" so maybe one of the refactorings could be: "I got a relax layering, but because of flexibility issues I really would like to have a strict layering".

So as soon as I am starting to get rid of the relaxed layering and transform it into a strict layering I will have many different kinds of impacts on my code, so it will lead maybe to a huge number of code refactorings necessary in order to cope with this kind of situation. And the other things of course which makes it very complicated is you don't have any good tools for now, so if you are using code refactorings, you will see in Visual Studio for instance or in Eclipse that it will be proved support for refactoring. But you won't see too much support for refactoring at least not in a current state of affairs if you look at UML tools or other architectural tools. The only thing which you can use is tools like software tomograph, which is a tool which allows you to analyze your architecture and find out some metrics. For instance are there any dependency cycles in your system or is there a high degree of cohesion or coupling between subsystems or within subsystems, these are things which are really helpful, but there is no tool really supporting you for transformation of architecture models.

When I started one of my colleagues told me: "Ok, you will find a lot of stuff there", so I was really: "Sure, Ok, maybe it doesn't make sense to make any tutorial on anything in this issue" and then I really made an analysis on a web so I made a Google search and tried to find out if there is some literature and to my great surprise the most references I found where referring to code refactoring. There was only one book by Ducasse and Nierstrasz which was dealing with the reengineering patterns. Reengineering on the other side is a completely different aspect. It's not related to refactoring. It has some relationships but they are completely different things. So what I found was code refactoring stuff; I found a paper on how to transform models in order to make an architecture refactoring and I found many references to code refactoring but no literature at all on architecture refactoring, which is a big surprise because anyway you as an architect have to deal with it daily.

That is a good point. The problem is that sometimes it's really difficult to really estimate what it will cost you in order to do an architecture refactoring. But from my view point it's the same like code refactoring; how would you justify code refactoring? Basically it should be in your tool chest as a software architect and as soon as you see there are some deficiencies in very early stages, you should directly try to apply an architecture refactoring, so you shouldn't wait until everything is implemented and you got a lot of problems which brings you of course to the problem what if the implementations is already there and if you see that there are some deficiencies and you have to get rid of these deficiencies, how would you then justify refactoring?

That is the problem because sometimes a refactoring can be very cost intensive, so a refactoring could mean that you really have to change a lot of things in your application. Yesterday during my tutorial I met a gentleman from a big US printing company and he told me that the problem of course is that they had some issues with the architecture but they did never refactor this architecture and they think the motivation was because the management did not want to spend anything for it. So they just added and added and after a time, namely now, their architecture became a big ball of mud basically, is what Brian Foot would say, and now each change will really cost them thousands of person hours. So whenever they want to change anything it's a lot of effort they need to spend in budget and time in order to get rid of the problem.

That means in this case I would say OK, in this late phase of a project it doesn't make sense to start refactoring because they only would have to refactor infinitely. In this case I would really make a more reengineering activity trying to come up with a completely new solution, either by reverse engineering which you already got and trying to improve it or maybe sometimes it's even better to really start from scratch. Just keep all of your best practices and pedants you used for your application and do everything from scratch without trying to refactor it. Sometimes refactoring doesn't make any sense but in a normal case if you are starting a new project, a new architecture, then in each increment really look at: "Are there any problems with the current state of affairs?" and if yes, is there any architecture refactoring which would help you in order to get rid of this problem and then I would say it's really justified, but in a case everything is already available it might be a big problem to estimate a time and budget which would be necessary to get rid of the problem.

That's an issue which is really happening in all kinds of architecture developments but which makes your life of course very complicated because sometimes you could do an architecture refactoring but of course some parts of your architecture are not under your control because they are controlled by other departments or maybe by external applications and whatever. So in this case there are only two ways: the first way is really to try to convince the other people who are under the control of the other affected parts of the architecture in order to try to really apply refactoring in sync with your department for instance; that's one possibility.

In the other possibility then you are really out of luck, either you cannot afford to do the architecture refactoring because basically you have no control of the system you need to touch in order to do that kind of refactoring or the other possibility would be to try other kinds of refactorings in order to get rid of the situations, but this is a really important issue which also is emerging elsewhere, so to encode refactoring or as soon as you start to have a new software architecture project or software system under development it's exactly the same issue where people tell you: "Ok you can do everything, but you SQL Server here and Linux there and Windows there and use this kind of UI technology" so that is really a problem where everything really gets fixed which means that you don't have any control anymore, what kinds of technologies or architectures you can change and refactor in this case so that makes your system not as pretty as we would like the software architects, but that is something which we have to live with anyway.

Yes. I think I've got a lot of good feedback from a lot of people here, from the audience and also had blog postings in my website I've got also some feedback there and I think what I did is just the beginning and basically some things can of course refactor, basically improve. So what I will do from now is I am trying to get all of the feedback, getting best practices from other people. So what I did was just looking into Siemens projects, trying what problems they had and how they addressed this kind of problems and whether I could then find some best practices because that's the way how patterns work. Do you find any kind of solution which was applicable or was applied in different kinds of settings and if this is a proven solution then it's really qualified to be a refactoring and that is exactly what I am doing now; I am trying to get my input, I am trying to get input in a way that maybe one of my refactorings is a complete waste of time so I could just get rid of it or maybe I can also get other ideas from other people. So what I really plan to do is something like a website, a catalogue, trying to come up with other people's ideas, trying to complete it, but I think it's a never ending story which you also which you also see if you only look at the code refactoring space because you can come up with infinite numbers of refactorings.

I must confess, I don't understand his distinction between architecture refactoring and code refactoring. Does he just mean you need to keep your diagrams up-to-date as you refactor the code? Otherwise, his examples of architecture refactoring would appear, to me at least, to be classic examples of code refactoring: Got dependency cycles? You would probably refactor to a pattern that loosens the coupling; Got two hierarchies and then realize that in your domain there's only really one? Sounds like refactoring toward deeper insight.

Have I missed something? I may well have as my XP leanings get me riled at any hint of separating architecture from code :-)

I use Structure101 (www.headwaysoftware.com) and know in 'realtime' (intellij plugin) when I have caused an unwanted dependency in the application I am writing. It's a cool tool to help keep your architecture 'pure'.

Refactoring is changing of code in order to improve design without changing functionality.

To me, architecture and design are higher level coding. Code (micro-design) has to be the same solution, just more detailed, than the actual design (Tactical level) which is at the same time a more detailed description of the solution from the architecture design.

Thus, a dependency is reflected in all three levels. There is no way a dependency may be in one level and not in another one. You may not have the best view to catch it, though. Maybe you have no detail in a higher level to see that dependency (a dependency is inside one class or between classes inside a functional unit), or looking at code you cannot detect a global dependency (usually three+ way dependency).

A refactoring in any part may affect any of the other levels, so all should be maintained up-to-date. A refactoring made from strategic level will impact coding, 90% of the time. I code is actually well written, changing a communication way from WS to REST for instance, will require changing probably one class and all should work the same. If it is all tightly coupled, then changing that may require rewriting half the application.

There are several more issues about this. It is interesting, so I may post something more related to this in my blog .

Since a few years some tools based on Dependency Structure Matrix methodology exist. I use Lattix quite extensively. This tool lets you see the current state of the architecture as you have modelled it conceptually against the realised architecture. Before you want to refactor you can manually insert scenario's and see how those will work out and what will be affected.

Lattix support an ever increasing number of platforms, like Java (incl Spring and Hibernate), .Net, SQL Server, Oracle, Delphi Pascal, C/C++. Buildtime it checks for dependency violations, so there is a great way to measure deviations to the intended architecture.

The method proves to be very helpful to architects, and other less technical people. It opens discussions based on what you see in the matrix in stead of gut feeling.