Archive for the ‘refactoring’ Category

The SC2009 was one of these conferences where you would leave with more questions than answers. And it feels good! All in all, I felt humbled by the quality of the sessions as much as that of the participants. Evidently, everyone there was very pleased to be present and you could feel it in the bustling about and around!

The conference was hosted and catered in the BBC Worldwide buildings and as an architecture nerd, I have to say I loved the building and the interior decoration; it must be a fantastic place to work in, but it is hardly the point of this post, I simply digress.

First, you might want to head to Twitter / #sc2009 to have a look at how participants and speakers lived the event in real time.

Also, a broader coverage of all the blogs and reports for the conference will certainly pop-up on Jason Gorman’s blog (some day, when he will have got round to doing it). And more specifically, you’ll find a vox populis video of participants, including that of a guy who’s just realised he took a terrible accent in front of the camera ;-)

The complete program of the conference is available on the conference’s website and, gathering from participants to other session, the content was pretty interesting too; I wonder what kind of sessions were rejected by the selection committee.

Programming in the small

Some time ago, I had the opportunity to have a conversation around a pint with Ivan Moore and Mike Hill, where we were debating and mostly agreeing on how code should be formatted. Before that conversation, I felt like an alien insisting on a 400 characters line length in eclipse in place of the 80 characters that is the general, wrong, consensus.

This session was drawing from their own experience on formatting, writing and refactoring code, and how to challenge accepted “best practices” of code writing. It was very fun to refactor the bad code examples they provided!

Specification Workshops

This session was subtitled “The missing Link for ATDD & Example-Driven Development” and showed how you would come round to actually specify acceptance test cases using examples brainstorming: gather a lot of people that have knowledge in the business rules you want to specify and have them give examples of the rules. Discuss.

Definitely something I would try, but I’ll probably read Gojko Adzic’s book before to gain more insight on how to do it and not confuse it with iteration planning…

Responsibility-driven Design with Mock Objects

This session, however fun, didn’t quite cut it for me. First the facilitators started by “pitying” people who don’t do TDD; that could be a whole separate post in itself, but that kind of statement kinda makes me boil inside… and I am not even talking about all the Java/.Net snickering!

Plus, we did only scratch the surface of the purpose of the session, that is, learning about responsibility-driven design; we mocked objects alright, but we spent most of the time refactoring the code we had written…

However, I learned a lot from this session: they used Ruby as the supporting language for the demonstration… and I have to say I was rather impressed with a few things they did with it! Should I really, really, learn Ruby though?

Empirical Experiences of Refactoring in Open Source

In this session, Steve Counsel explained how he has analysed a few open-source projects to see how refactorings were applied throughout a project lifecycle. Although very interesting it was somewhat predictable: the most complex refactorings (e.g. remove duplication, extract superclass…) are less implemented than the simpler ones (e.g. rename variable, rename method…) by a factor of 10!

Thing is, even though this research stroke a chord with me as I am very keen on refactorings (even automated refactoring), it seemed like few conclusions were actually drawn from it; Steve did not tell us whether he had controlled for feature additions or development team turnover… furthermore, because the case studied were open source projects, I am not sure how much it would apply to corporate systems or commercial products. I guess I will have to find out!

Test-Driven Development of Asynchronous Systems

This session was possibly the most underwhelming one of the day; partly because it was the last one, mostly because it didn’t stimulate as much thoughts as the other ones.

Granted, Nat Pryce has implemented a very clever way of testing asynchronous processes during the integration tests (basically, having process probes that can tell whether the system has reached a stable position before executing test assertions)! I think if I ever face this issue, I would be very likely to use the same approach; or purchase a product that would do the trick! Wink-wink… :)

The take-home lesson learned

As I said before, I met a lot of very interesting people who care about software as much as I do, and the overall experience made me buzzing with thoughts.

But the one thing that I got out of it and that I will definitely be even more stringent about, from now on, is: refactor, refactor, refactor!

Inversion of Control is a refactoring that rose to fame with the implementations of the likes of Spring or PicoContainer. This is a quite misunderstood feature of software design (notably its implications), but a rather often used one - mainly through the use of the above cited frameworks. This article is an attempt at debunking the myth and presenting you with the hard, crude reality.

Not a pattern, more a refactoring

Of course, I would love to be able to antagonise refactorings and patterns, but it is not that simple and the separation between the two is easily crossed.

The refactoring we are using with IoC comes along the following scenario: we have a working system, but somehow we would like to reduce dependencies between some of its components; so we decide to refactor it and feed the dependencies to the components instead of having them explicitly request them. Under this light, I have to admit that I prefer the term Dependency Injection to Inversion of Control!

To realise this refactoring, we use the Builder pattern: a component knows how to built another component so that it works correctly without having to know how to build itself.

So, now we are on the same level about IoC, here is what is really bugging me with it.

Replaces compile-time validation with run-time errors

The main issue for me, is that you cannot verify your dependencies when you are actually coding. You need to run your application and deal with a hundred configuration errors before you can even start testing your feature.

Because all the wiring of your classes happen at runtime, you will not know until you are actually using a feature if it is correctly configured or not (with all its dependencies, ad infinitum). But this shouldn’t really scare you because you have tests for everything in your system, don’t you?

Introduces more indirection and delegation

To create an efficient IoC refactoring, you need to break down your dependency into an interface (depend on things less likely to change) and it’s implementation. That’s one level of indirection.

Now when you actually configure your dependency in you XML file, you don’t use actual objects, but names… text… that’s another level of indirection!

In order to manage all this indirection, your IoC container will use intermediary objects, proxies, to actually wire your classes together.

And that’s were it hurts a second time: when debugging, you have to traverse layers upon layers of classes and methods to understand where things go wrong! And don’t even get me started on actually trying to follow the path of a piece of code at development time!!

Hides dependencies but doesn’t reduce them

After all this meddling with your class’ instances, said class remains dependent on other objects but you have now lost aggregation dependency in the midst of the framework; that is, you don’t really know any more which dependencies the class needs to do its job (e.g. a data access objects) and which are just here for delegating (e.g. listeners).

Worse, if you follow the “best practices” enunciated in the documentation of your IoC container, it is very likely that you have now introduced dependencies to the framework in many places of your system: annotations? interceptors? direct use of the IoC’s object factory?

Refactoring, not central piece of architecture (bis repetita)

As a conclusion, I would like to insist that IoC should really be a refactoring for specific parts of your system and that you shouldn’t try to have everything dependency-injected, that’s bad for your system, and that’s bad for your sanity!

These days, you can be dubbed an Architect (notice the capital A, as before) very easily: just move every single instanciation into a IoC container and you get this very enterprisey architecture that’s as easy to maintain as it was before, with the addition of all the indirection… it makes you look good when new developers just stare blankly at your 2Mb XML configuration file.

Nota bene: I really have to thank Jason for motivating me to write up this post that I drafted earlier in January!

Do you care about software? Do you care about it being specified correctly, designed accordingly, developed qualitatively, maintained easily and that it gives the intended value to the business? Then take the pledge: icareaboutsoftware.org

Analysts, developers, database administrators, stakeholders… you are all welcome to raise the profile of this campaign!

Yesterday, I was discussing the Refactorbot with a friend of mine. Although he has been a developer for longer than me, he couldn’t grasp why the Refactorbot could not possibly test every solution and had to fall back to evolutionary design.

A bit of maths

If you want to test every possible solution for a given design, you have to test every possible number of packages to hold classes and every possible combination of those classes in the packages.

If you have done some probabilities, statistics or enumerative combinatorics, you certainly will see where I am going to with that…

I have never been very good myself at mathematics, but my guess is that the number of possible solutions to our design problem can be represented with the following formula.

This is the sum of all combinations, without repetition, of n classes in p packages for p ranging from 1 (all classes in 1 packages) to n (each class in its own package):

with

(for example, for 30 classes and 5 packages, this gives us 17100720 possibilities for this case alone)

We then quickly understand that it would be impossible for a program to test each and every possible design for a sizeable system: the sheer number of possibilities would make it impossible to test in a lifetime!

Refactorbot v0.0.0.3

If you want to have fun staring at an empty console while your CPU is hitting 100%, I have created an implementation of such a refactoring method.

However, don’t be fooled by this stunning result! It would become increasingly tedious to try and find a solution for bigger and bigger systems.

The deal with evolutionary refactorings

I don’t think that the purpose of the Refactorbot is to come up with the best design - and I will write a piece soon about a definition of the BEST design - possible, but rather with a better design.

If we consider that the design of a software is its DNA, then you can probably try small improvements and keep them or discard them in an evolutionary fashion. Better design would emerge by trial and error; the Refactorbot should do just that.

XMI loading capability

In this version, you can feed it an XMI file (I create mines with ArgoUML) and it will attempt to create a better model for it.

The XMI loading, however, is very - very - slow for big models… I coded it using XPath (Xalan-j precisely) and that proved so sluggish that I implemented a cache system: after the first creation of the in-memory model of your XMI file (be patient), it will create a .ser version of your XML file in the same directory and reuse it for next runs.

Because of the nature of the algorithm (using random refactorings), you may want to execute the program many times for the same model, and I can guarantee you that this cache will come quite handy!

New refactoring algorithm

I have implemented a new algorithm that changes only one class at each iteration: it will randomly select a class, randomly decide to create a new package and/or randomly choose the package to put the class in. It will then run the metrics again and keep or discard the new model based on the new metrics.

Don’t worry, the original algorithm that attempted to fathom a complete new model is still here. It is just that I thought it would be interesting to have different algorithms to play with.

Furthermore, I think that this second algorithm is closer to Jason’s initial view that the Refactorbot would do 1 random refactoring and then run all tests to prove that the system has been improved…

Using it

For you lucky windows users with JSE 1.5+ already installed, there’s a batch file in the archive that let’s you just run the application; just run:

refactorbot.bat myxmifile.xmi 1000 p

The others will have to install a version of Java greater or equal to 1.5 and launch the refactorbot.improvemetrics.ImproveMetrics class. The required libraries are provided in the lib folder.

The output is still very crude because it will only tell you the list of packages it has generated and the the classes they contain. I should produce an XMI output very soon, but that’ll wait until I learn a bit more about XMI!

Your impressions

My own results are quite encouraging: I have tried the Refactorbot with a sizeable model (177 classes in 25 packages), and although the first loading of the CSV file is slow (it has 625 associations in a 20Mb file, and that’s what takes most of the time), the improvement of the model is quite fast! Granted, it is quite easy to improve on that model (that I reverse-engineered from a project’s codebase with ArgoUML), but the insight I got was still very invaluable!

However, this is probably the first usable version of the Refactorbot, so I would like to hear from your own experience with the automatic refactoring of your own models! Send me an email at contact@<my domain here>.com, that’ll help improving on the program…

Jason Gorman, again, inpired me for a new blog post. Some time ago, he offered an OO design challenge in which a design’s maintainability had to be improved upon without any knowledge of the underlying business case.
I quickly gathered that you could solely rely on metrics for determining quality level for a design and did a few trials myself (see below) to improve on the metrics[1].
Jason posted his own solution yesterday, and I suddenly remembered one of his earlier posts that suggested we should be able to automate this process. I detail such a solution further in this article and I give one possible (better?) solution to the design challenge.

Trying to find a methodology

My first attempt at making a better model was to try and see the patterns “through” the model. I moved a few classes around in ArgoUML, generated the Java classes and ran the Metrics plugin for Eclipse… alas, although the normalized distance from main sequence[1] was quite good (D=0.22), the relational cohesion (R=0.55) was worse than the reference model’s (R=0.83)!

In order to be able to improve models metrics with consistency, I had to devise a methodology.
I ordered the classes by their distance in the dependency graph: the more dependable, the better for maintainability. The dependency arcs are as follows:
B -> A -> C -> G
B -> A -> C
B -> A
B -> D
E -> D
E -> F
H -> D
H -> F

This prompted me to put the classes in four different packages like this:
Not very different from my model created without applied methodology, but it has a great flaw: it has a cycle between p2 and p3! (and awful metrics too: D=0.40 and R=0.66)
Moving class A back to package p2 does break the cycle and improve the normalized distance, though only slightly (D=0.375).

Automating the search for a solution

At that point, I went to bed and left it running as a background thought until Jason posted his own solution to the challenge… the way he was proceeding with the refactorings reminded me of one of his earlier posts, though I can’t seem to be able to find it any more, that suggested we might be able to build a robot to perform random refactorings that would be kept if they improved the overall design of a system… if I couldn’t devise a method for solving this problem, I had better leave it to chance completely!

So I built a simple version of such a lucky robot with a very simple design, that would just pick classes from the reference model and, for each of them, decide randomly if it should create a new package or choose, still randomly, a package to put it in…
Once the model is fully built, it runs a few metrics and compare them to the reference model and, if it shows an improvement, keeps the generated model as reference model (otherwise discards it) and does another cycle.
And here is what it produced, after a few thousand cycles:

It is definitely much more complex than any other model I could have come up with by hand, but it translates into nice metrics: D=0.1875 and R=1.0!

This leads me to believe that with a bit of work and time we could come up with a more advanced solution that would help designers get the best out of their design without risking to break everything…

You can download the rather crude source code if you wish to have a look at the application.