31 December 2006

Duplicated Code: The same code structure in two or more places is a good sign that the code needs to be refactored: if you need to make a change in one place, you'll probably need to change the other one as well, but you might miss it

Long Method: Long methods should be decomposed for clarity and ease of maintenance

Large Class: Classes that are trying to do too much often have large numbers of instance variables. Sometimes groups of variables can be clumped together. Sometimes they are only used occasionally. Over-large classes can also suffer from code duplication.

Long Parameter List: Long parameter lists are hard to understand. You don't need to pass in everything a method needs, just enough so it can find all it needs.

Divergent Change: Software should be structured for ease of change. If one class is changed in different ways for different reasons, it may be worth splitting the class in two so each one relates to a particular kind of change.

Shotgun Surgery: If a type of program change requires lots of little code changes in various different classes, it may be hard to find all the right places that do need changing. Maybe the p[laces that are affected should all be brought together into one class.

Feature Envy: This is where a method on one class seems more interested in the attributes (usually data) of another class than in its own class. Maybe the method would be happier in the other class.

Data Clumps: Sometimes you see the same bunch of data items together in various places: fields in a couple of classes, parameters to methods, local data. Maybe they should be grouped together into a little class.

Primitive Obsession: Sometimes it's worth turning a primitive data type into a lightweight class to make it clear what it is for and what sort of operations are allowed on it (eg creating a date class rather than using a couple of integers)

Switch Statements: Switch statements tend to cause duplication. You often find similar switch statements scattered through the program in several places. If a new data value is added to the range, you have to check all the various switch statements. Maybe classes and polymorphism would be more appropriate.

Parallel Inheritance Hierarchies: In this case, whenever you make a subclass of one class, you have to make a subclass of another one to match.

Lazy Class: Classes that are not doing much useful work should be eliminated

Speculative Generality: Often methods or classes are designed to do things that in fact are not required. The dead-wood should probably be removed.

Temporary Field: It can be confusing when some of the member variables in a class are only used occasionally

Message Chains: A client asks one object for another object, which is then asked for another object, which is then asked for another, etc. This ties the code to a particular class structure.

Middle Man: Delegation is often useful, but sometimes it can go too far. If a class is acting as a delegate, but is performing no useful extra work, it may be possible to remove it from the hierarchy.

Inappropriate Intimacy: This is where classes seem to spend too much time delving into each other's private parts. Time to throw a bucket of cold water over them!

Alternative classes with different interfaces: Classes that do similar things, but have different names, should be modified to share a common protocol

Incomplete Library Class: It's bad form to modify the code in a library, but sometimes they don't do all they should do

Data Class: Classes that just have data fields, and access methods, but no real behaviour. If the data is public, make it private!

Refused Bequest: If a subclass doesn't want or need all of the behaviour of its base class, maybe the class hierarchy is wrong.

Comments: If the comments are present in the code because the code is bad, improve the code

A classic... minimise coupling (dependencies). E.g. having to remember to call a method before another; having to call many methods to achieve a usecase, gathering some state from earlier method calls to pass into later method calls. Where possible enscapsulate or hide (once called modularity) this complexity in a single class/method which reduces the passing of state from one class to another, i.e. keep the state as close to the behaviour that requires it.

Avoid cyclic dependencies between classes, packages and systems. All dependencies should form a directed acyclic graph. The most stable and generic code should drop to the bottom of the graph and the most volatile code should rise to the top of the graph - refactoring can help push code up or down the dependency graph. Another way of stating this is that if class A depends on class B, then class B should be more stable than class A. Splitting a large application up into multiple modules, that are compiled seperately, helps enforce dependency rules. Classes that don't change together or are not reused together don't belong in the same module.

Under no circumstances allow code duplication, copy and paste is a sin punishable by death. In fact, avoid duplication of all kinds, maximise reuse of code, data, systems...

Avoid static state, pass as many dependencies into a class through the constructor (constructor-based dependency injection). This improves encapsulation and reuse of classes. All dependencies are explicit and the caller can decide the implementation of dependencies to pass into the class (assumming dependencies are expressed as interfaces or abstract classes). Avoid singletons (or static factories that contain state) at the pain of death, when an object uses singletons it becomes difficult to manage the dependencies of that object.

Avoid mutable state where possible strive for immutability. Changing state often leads to more bugs. I often use the final keyword for variables and very occasionaly for parameters too. Also do not add setters just for the hell of it, only add them where they are absolutely essential. I always prefer passing state in through a constructor, rather than through setters, this helps avoid cyclic dependencies between objects as well as avoid mutation.

Be ruthless with refactoring, extract common abstractions whereever you seem them but avoid premature abstraction, only extract abstractions when you have multiple concrete instances that share patterns of structure or behaviour or where you are sure there will be multiple concrete instances at some point. Applies some rules to inheritance based abstraction: "the Liskov Substitution Principle" states derived classes must be usable through the base class interface without the need for the user to know the difference. Ensures details depend upon abstractions and that abstractions do not depend upon details.

Be ruthless with deleting redundant code (you can always retrieve the code from your version control system). The less code there is, the more maintainable the system and the less bugs you have. So remove unused methods, fields, classes, packages etc. Modern IDEs help highlight unused code as do code coverage tools.

Avoid premature optimisation, however dont just write cr*p code either. Balance optimisations against the complexity they add. So if an optimisation doesn't add any complexity it is a no-brainer, just do it in the first place.

Use external libraries where possible rather than reinvent the wheel. Less code means less bugs and easier maintenance. E.g. log4j, ant, hibernate, apache commons-*, spring.

Team

Ensure the team sits together, even better if all stakeholders (e.g. developers and business users) sit together. This allows for adhoc conversations during which knew insights are created and knowledge is shared. It allows for immediate feedback on interpretations of requirements, which ensures developers are always on track, delivering what the business wants. Even a corridor seperating team members will affect the ability of the team to work together.

Focus on recruiting a team of experts who can take the system through all stages of development, i.e. they have the skill to gather requirements, think about architecture, think about design and are responsible for coding, testing, release into production and initial (or second-line) support. Seperating these roles only causes communication problems especially because systems development is an iterative process, i.e. you gather a few requirements, think a little about architecture/design, write some code, realise your design doesn't work, rework it, then find out that some of your requirements are ambiguous and have to delve a little deeper into them. I think its ridiculous to hand over a design to coders and equally for coders to just knock something up and pass the result on to some sort of test team. Seperating roles also allows for "passing the buck" when the project inevitably fails. A small, multi-function team doesn't need to rely on documents (e.g UML-based specifications) they can just talk through important issues, possibly white-boarding them where necesary.

Invest enough time and effort in recruitment. The composition of the team is the most important factor determining the success of the project. Think about succession management, team members will get bored doing what they doing, if they are not given new challenges, or they may just leave for other external factors. So recruit even when you can manage with the current head-count. Its best to have a little spare capacity and have a constant handover (knowledge sharing) than be faced with trying to recruit for and handover a collegues role within a month.

Use non-ide specific build files. I've not tried using a project-model based tool such as Maven or a continuous integration tool such as CruiseControl but I think they are worth considering.

Run unit-tests regularly. I don't really practise test-driven development, but ensure that the tests you do write get executed on more than an adhoc basis.

Once in a while run your system through a profiler to ensure you've not added many more cpu hotspots, increased memory usage or introduced memory leaks. Of course, this doesnt help you find cases where you've increased the cpu usage on a uniform basis across many areas (as they will not be hotspots), so it may be worth writing some some of standard system test that you execute and record timings for. That way a few months down the line, you can reexecute the test and compare timings against previous ones.

I think I will be making more use of javolution.util for high performant code. I do believe that optimising code should be balanced against the code complexity it adds, however if you can make code more performant without adding any complexity or the added complexity is not to great then its a no-brainer.

Some things about the standard collections api bug me, e.g looping collections by creating iterators (especially in code that is called many times); the implementation of equals/hashCode in List implementations which uses an iterator rather than some form of array-based access.

For those who don't know, one of the aims of Bhavaya is to allow for users to specify the data they want as criteria (which is ultimately translated to sql queries, much like Hibernate), the library will then return a collection of beans matching this criteria and then keep this collection current as data in the database is changed, for example, beans may be inserted, updated or deleted from the collection. The collection fires events to listeners to inform them of changes to its elements. Bhavaya focuses on efficiency, by applying deltas to the collection without hitting the database again, this allows for efficient distributed, observable caches of data.