Wednesday, October 31, 2007

Ignore in some case consultants do this purposefully to keep their contract, in general no one intentionally create unmaintainable code. However, bad code silently creep in over the period of the product's life cycle.

Manager's mind

Lets write the code quick. We don't have enough time for the next rollout. If we cannot rollout, the project may die quickly. Lets put long-term maintainability as a secondary priority, which we'll improve after the rollout when we have time. By the way, who knows if I am still working in this project after a year.

Unfortunately, if this rollout is successful, the next release cycle has an even more tighter schedule

Code is "written once" and "read several hundred times" throughout its lifecycle. Subsequent maintenance cost is several hundreds times more than the gain we made in its initial rollout.

I cannot justify any resources spent on refactoring because it doesn't add any features to my next release. The code is working already. You mention about improved code maintainability and I don't know how to quantify that.

This is how bad code retains and accumulates. After its momentum has grown to a certain point, then everyone just accept the reality that the code is already bad enough that no motivation is there to improve it.

Lets have the developers focusing in writing the real code. We'll have the QA folks worry about the testing. This way, they can proceed at full speed in parallel.

Developer's mind

Similar logic is working elsewhere but it doesn't fit exactly what I want. It isn't provide enough feature for my need and it also provide some features that I don't need. It is not worthwhile to spend my time to read and understand how it works, lets re-implement my version from scratch.

This is how duplicated logic getting into the system. There are many ways to do one thing.

Similar logic is already working elsewhere. It is not worthwhile to spend my time reinventing the wheel. Lets copy it here and modify for my need.

This is another way of how duplicated code is getting into the system. When you change the code in one place, you need to make corresponding changes in many other places where you have made copies, otherwise some copies may not be working correctly.

Due to "laziness", variable names are kept the same in their copies, making the copied code even harder to understand.

This code is not broken, please don't make any change

This is how bad code retains and accumulates. After its momentum has grown to a certain point, then everyone just accept the reality that the code is already bad enough that no motivation is there to improve it.

Of course the code is hard to read because it is solving a complex problem. If the code is easy to read, it must not be doing anything significant. You just need to be smarter when reading the code.

I am not convinced that this is the right way of doing this kind of things. Therefore, I am not going to follow the convention even it has been done in many other places. I am going to do it my way.

I don't know why I have to do it this way. But since this is the way we do this kind of things in many other places, I just follow that convention. Is "consistency" a problem ?

One important criteria for judging the value of any serious software product is how maintainable it is over its life cycle. It is important to notice that code is "write once" but "read many times" when people subsequently maintains it (enhancing features, fixing bugs, optimizing performance). Any cost spent in understanding any non-trivial code will be multiplied many times by itself and added to the overall maintenance budget of the software product.

It is so important to spend all the effort to make the code as readable as possible. Once you do it, you will start to get your dividends over the product's life cycle.

Lets set some criteria/characteristics of code quality from a maintainability standpoint.

Good code

Just by looking at its signature, you can guess what a class/method/function is doing without looking at its implementation. A very descriptive name of the class/method/function and parameter/variable are carefully chosen such that its associated semantics is very clear. When you open up its implementation, it is doing exactly what you guess.

Since the naming is so descriptive, it doesn't need a lot of comments to make the code readable. Comments are only appears for code segments that are implementing a complex algorithm or when optimizing performance of hotspots (where code readability is intentionally sacrificed).

The responsibility of a class/method/function is very clear. There is only one way to do a particular thing and is well-encapsulated within the responsible class/method/function. There is no duplicated code.

Responsibility tends to be distributed evenly so you don't have a package with too many classes or a class with too many instance variables/methods. And you won't have a method that span more than 2 pages of code lines.

Bad code

Similar code shows up in many places, it is hard to know which of these are "similar" and which are "exactly same"

Extremely long methods (couple hundreds lines of code within a method) and usually accompanied by deeply nested case statements

Only have trivial comments and lack of useful comments

There are many different ways to do one particular thing although they are supposed to achieve the same result

To do a particular thing, the caller need to make multiple calls to different objects in a certain sequence.

The class/method/function seems to be implying certain responsibilities but when you open up its implementation, you figure out it is doing something completely different. You quickly lose confidence in your understanding and has to trace down into all implementations details to see what it is actually doing in every place.

There are some places in the code where you cannot figure out what it is trying to do

Thursday, October 25, 2007

The first step to support Legacy System is to understand the code. One approach is to read through all the lines of code. Quite often, this is quite ineffective, you may be wasting a lot of time reading dead code.

What I typically do is quickly scan through the package structure of the code, the class name and public methods. If the code structure is good, then I can get some rough idea about where the logic lies. However, I rely more on analyzing the run time behavior of the code to determine its logic flow.

A technique that I use frequently is to just run the system and determine the execution path. At the first step, I need to determine the entry points of the system.

Locate the entry points

Usually, the entry points are ...

main() method where the system boots

background maintenance tasks

event-driven callbacks

Finding out these entry points is non-trivial, especially if the documentation of the overall system is poor. I have built a tool that allows me to instrument those classes that I am interested so that when I start the system, it will print out all the entry points when they execute. Entry point is the lowest method of the stack trace that you are interested. The tool is based on AspectJ which allows me to inject some print statements when certain methods are executed.

After I get an idea about where the entry points are, the next step is to drill down into each execution point to study what it is doing.

Drill down into entry points

The first thing I usually do is to read through the code trying to understand what it is doing. Eclipse IDE is very effective to navigate through the code to see which method is calling which method. The most frequently used Eclipse feature is ...

"open declaration" which jumps to the method being called

"references" which list all the methods that is calling this methods

"Calling hierachy" which is a tree of callers which calls this method

"Type hierachy" which is a tree of Class hierachy

After that, I will add trace statements into the code to verify my understanding. Consider Trace statement is an enhanced logging mechanism which include the thread id, the calling stack trace and a message. Trace statements is temporary and its purposes is help me to understanding the code. After that, I will remove my trace statement. Since I don't want anyone to use the trace statements in the production environment, I wrote a "Trace" class so that I can easily remove it when I am done.

Document the understanding

Draw a UML diagram to describe my understanding. And also take notes about any observed issues and what can be improved.

As mentioned in my earlier blog, the software engineering skills we've learned assumes a green-field project where we starts on a clean sheet of paper. In this case, we can apply a number of practices to build a solid foundation for future evolution and maintenance.

Test Driven Development will force you to document the external behavior of your system in terms of Unit Test

Enforce a coding standard and use a tool to enforce that

Make sure the code is regularly reviewed to ensure readability and modularity

In reality, less than 2% project are under this situation. Mostly likely, we are dealing with a piece of software that is not built on this foundation. In many cases, even the projects starts from a good foundation, it will deteriorate over time because of schedule pressure, lack of engineering skills, people turnover ... etc.

How can one support such a legacy system ? By support, I mean the following ...

Fix bugs when it occurs

Tune performance when loading increases

Add features when new requirement arrives

In order to do the these things effectively, we need to first "understand" the legacy code and then "refactor" it. "Understanding" is about how to quickly establish an insight of the inner workings of the legacy code without making any modifications that may accidentally change its external behavior and causes bugs. "Refactoring" is about how to improve the understandability of legacy code by making modification to the code without changing its external behavior.

In later blogs, I will share my techniques of how to understand a piece of code written by someone else and also how to refactor it safely.

Monday, October 22, 2007

One of a frequently encountered question in enterprise software development is where does the hand off happen from the architect (who design the software) to the developer (who implements the software). Usually, the architect designs the software at a higher level of abstraction and then communicate her design to the developers, who break down into more concrete, detail-level design before turning into implementation-level code.

The hand off happens typically via an architecture spec written by the architect, composed of UML class diagrams, sequence diagrams or state transition diagrams ... etc. Based on the understanding from these diagrams, the developers go ahead to write the code.

However, the progression is not as smooth as we expect. There is a gap between the architecture diagrams and the code and a transformation is required. During such transformation, there is a possibility of mis-interpretation, wrong assumption. Quite often, the system ends up to be wrongly implemented due to miscommunication between the developer and the architect. Such miscommunication can either due to the architect hasn't described his design clear enough in the spec, or because the developer is not experienced enough to fill in some left-out detail that the architect consider obvious.

One way to mitigate this problem is to have more design review meetings, or code review session to make sure what is implemented is correctly reflecting the design. Unfortunately, I found such review sessions are usually not happening either because the architect is too busy in other tasks, or she is reluctant to read the developer's code. It ends up the implementation doesn't match the design. Quite often, this discrepancy is discovered at a very late stage and left no time to fix. While developers start patching the current implementation for bug fixing or adding new features, the architect lose the control on the architecture evolution.Is there a way for the architect to enforce her design at the early stage given the following common constraints ?

While making sure the implementation compliant with the design at a higher level, the architect doesn't want to dictate the low level implementation details

Test As A Specification

The solution is to have the architect writing the Unit Tests (e.g. JUnit Test Classes in Java), which acts as the "Spec" of her design.

In this model, the architect will focus in the "interface" aspect and how this system interact with external parties, such as the client (how this system will be used), as well as the collaborators (how this system uses other systems).

The system will expose a set of "Facade" classes which encapsulate the system's external behavior and act as the entry point to its client. Therefore, by writing Unit Tests against these "Facades", the architect can fully specify the external behavior of the system.

A set of "Collaborator" classes is also defined to explicitly capture the interaction of this system to other supporting systems. This "Collaborator" classes are specified in terms of Mock Objects so that the required behavior of supporting system is fully specified. On the other hand, the interaction sequence with the supporting systems are specified via "expectation" of Mock objects.

The architect will specify the behavior of "Facade" and "Collaborator" in a set of XUnit Test Cases, which acts as the design spec of the system. This way, the architect is defining the external behavior of the system while giving enough freedom for the developers to decide on the internal implementation structure. Typically, there are many "Impl Detail" classes which the Facade delegates to. These "Impl Detail" classes will invoke the "Collaborator interface" to get things done in some cases.

Note that the architect is not writing ALL the test cases. Architecture-level Unit Test are just a small subset of the overall test cases specifically focus in the architecture level abstraction. These tests are specifically written to ignore the implementation detail so that its stability will not be affected by change of implementation logic.

On the other hand, developers will also provide a different set of TestCases that covers the "Impl Detail" classes. Usually, this set of "Impl Level TestCase" which change when the developers change the internal implementations. By separating these 2 sets of test cases under different categories, they can evolve independently when different aspects of the system changes along its life cycle, and resulting in a more maintainable system as it evolves.

Example: User Authentication

To illustrate, lets go through an example using a User Authentication system. There maybe 40 classes that implements this whole UserAuthSubsystem. But the architecture-level test cases only focused in the Facade classes and specify only the "external behavior" of what this subsystem should provide. It doesn't touch any of the underlying implementation classes because those are the implementor's choices which the architect doesn't want to constrain.

The Open Source Phenomenon has changed drastically the skills set requirement for our software development professionals today.

Traditionally, software development involves not just writing the business logic of your application, but also a lot of plumbing logic (such as data persistence, fault resilience). These plumbing logic are typically independent of your business logic, and also quite complicated (although they are well-studied engineering problem). In most projects, large portion of the engineer effort had gone to re-inventing solution for these recurring plumbing problem.

The situation is quite different today in our software development world. Thanks to the Open Source Phenomenon, most of the plumbing logic are available in some forms of software libraries or frameworks with source code available. There are different kinds of open source licensing model with difference degrees of usage or modification restrictions. Nevertheless, the software development game plan was changed drastically since then. Instead of building all plumbing from scratch, we can just leverage what is out there as Open source libraries/frameworks. Open source phenomenon bring huge savings in the project cost and also reduce time to market.

However, surviving in the Open Source world requires a different set of software engineering skill sets that is not taught in our college. Most techniques we learn from colleges assumes a green-field development environment and we learn how to design a good system from scratch rather than how to understand or adapt existing code. We learn how to "write code" but not "read code".

Skills to survive in the Open Source world are quite different. Such as ...

"Knowing how to find" is more important than "knowing how to build"

"Google", "Wikipedia" are your friend. Of course, "Apache", "SourceForge" is good place to check. Sometimes I find it is not easy for a hard-core software engineer to look for other people's code to use. Especially before she has certain familiarity of the code base. The mindset need to be changed now. The confidence should be based on its popularity and in some case, you just use it and see how it goes.

"Quick learning" and "start fast"
Looking at the usage examples is usually a good start to get a quick understanding on how things works. Then start to write some Unit Test code to get a better feel on the area that you are interested.

Be prepared to "switch"
This is very important when you are using less proven Open Source products. Even with proven open source product, the level of supports may fluctuate in future, not something under your control. Therefore, your application architecture should try hard to minimize the coupling (and dependencies) to the open source products.

In most cases, you need just a few features out of the whole suite of the open source product. A common technique that I use often to confine my usage is to define an interface that captures all the API that I need. And then write an implementation which depends on the Open Source product. Now the dependency is localized in the implementation class. Whenever I switch, I just need to rewrite the implementation of the interface. I can even use IOC (inversion of control) mechanism to achieve even zero dependencies between my application to the Open source product.

Ability to drill down

Quite often, the open source provides most of what your need, but not 100%. In other scenarios, you want to do some enhancement, or fixing a bug. You need to be able to drill down into the code and feel comfortable to make changes.

This is a tricky area. I know many good engineers who can write good code, but only very few can pick up code written by other people, and be able to familiar with that. As I said earlier, we are trained to write code, but not read code. In dealing with open source product, code reading skills is very important.

In fact, the skills of reading code is generally very important as most projects are in maintenance mode rather than green-field development mode. I will share code reading technique more detail in later blog when I talk about doing surgery on legacy code. But here is a quick summary of an important subset that are relevant to dissecting open source products ...

Look at its package structure, class name, method name to get some rough ideas about the code structure and data structure.

Put some trace code that prints out its execution path, then run some examples to verify your understanding.

There are a couple of tools that analyze existing code base to extract its design or external behavior. For example, Agitar can analyze your Java bytecode and generate JUnit tests ... Altova can analyze your code to generate the UML diagram.

These reverse engineering tools provide value to legacy application where you have a piece of running production code but very little information about its design, and you have very little test where you can understand its expected behavior. These tools can quickly analyze the code and tell you its observed behavior. It "discover" what the code is current doing and give you a better starting point to do further analysis.Giving you a better starting point is all it provides. They are typically unable to distinguish between the intended behavior (sitting at a higher level of abstraction) from its actual implementation (sitting at a much lower level). In fact, I doubt if automatically extract the intended behavior is at all possible. The end result is that it swarms you with a lot of low-level, implementation-specific details (because the actual implementation is all it knows) and then you need to manually analyze what is important and what is unimportant to you.

In other words, it gives you a different starting point with a different format (e.g. a set of test methods and class invariants it detects instead of implementation Java code). But it doesn't raise your level of abstraction which I argue has to be done manually anyway. So does it give you a better start ? I think it is better but not too much better. By following a set of code surgery, refactoring technique, we can achieve a much better result. I will share such techniques in later blogs.

Now, is these tools useful for new development ? I don't think so. I am a stronger believer of TDD (Test-Driven-Development) where the test itself should be regarded as the "spec" and so by definition has to be written first and has to be written manually by the application designer. In TDD, you write your test first and write it manually (not generated). You put in all the expected behavior in your test methods, and you also put in Mock object to test its interaction. You diagram your UML model to capture your major classes and methods at a higher abstraction level.

However, this model doesn't fit if the call is asynchronous; ie: the calling thread is not blocked and the result will come later from a separated callback thread. For example, if you have the following AsyncService, where the "call()" method returns immediately and the result comes back from a callback interface.

In this test method, the calling thread is waiting for the callback thread. A latch is used for the synchronization. You cannot use Thread.wait() and Thread.notify() for this purpose because there is a possibility that notify can be called before the wait and then one thread will wait indefinitely. Latch is a once-off switch and if the "release" is called first, then all subsequent "attempt" call will return immediately.