Tuesday, 5 June 2007

Building Cathedrals

Within companies, large and small, it is very likely that there is an in-house development team working on a systems in continuous development. In every company I've worked for I have spent the majority of the role developing a highly strategic system, whether website or CMS/CRM (or one of the other many over general three letter acronyms out there). These systems are the Cathedrals of software: they are by far the largest projects in the company and are highly important and grandiose with great architectural vision.

These systems inevitably end up the bain of senior managers, project managers and developers lives. Inevitably a battle between the business demanding bags of "quick wins" (that phrase which makes many a developer quake in their boots) and the developers who want to "do things properly". Pour on way too much vision from all parties and you end up with the poor old project manager curling into the foetal position and jabbering on about "delivery" as everyone stands around kicking them for whatever failure the business or development team wish to sight for the last release.

In these circumstances I have found Agile to be a god send: the business gets quick returns, the project managers get to wrap themselves up snuggly in their Gant charts and click away happily at MS Project and the developers - although they still sit their pushing back deadlines and moaning they haven't got enough resource - actually deliver something. All in all it keeps the whole project off the undeliverable White Elephant which the Business and development team have managed to concoct between them.

After a couple of years of doing this with success I was shocked to find that a few of the problems of the old waterfall methodology had begun to raise their ugly heads, namely: degradation, fracturing and velocity. After some careful thought and backtracking through my career I started to notice some startling similarities between all the developments I had worked on.

Degradation

Degradation is a natural occurrence of any system. Software systems are fortunate that - unlike many systems in the natural world - they rarely degrade through use - think the rail network: the more the rails are used the worse their condition becomes and therefore need replacing. Although external aspects can cause a software system to degrade through use (database size, hardware age etc.) the actual lines of code are in the exact same condition as when written.

The biggest cause of degradation in software is change: as new features are written or bugs are fixed existing code can become messy, out of date or even completely redundant without anyone actually realising it. Also existing parts of the system which work perfectly can get used in a manner not originally intended and cause bugs and performance issues.

There are a number of techniques out there in the Agile world to help minimize degradation including Refactoring, Collective Ownership and Test Driven Development. However heed the warning: despite the best unit tests in the world and pair programming every hour you've been sent, the absolute prevention of degradation relies on two things: time and human intervention. Time is required to fix degraded code which increases the length each feature takes to implement and thus affects velocity. Human intervention requires someone recognising the degradation and further more being bothered to do anything about it (collective ownership and pair programming do help here but are no way a guarantee).

The danger of degradation is that a system can degrade so much that it has a severely negative impact on the progress of a project - sometimes even bringing it to a complete halt - and resulting in the development team battling the business for a "complete rewrite". Degradation is not good for developers' morale maily due to the fear that it sounds like an admission of writing bad code. This results in the, all too common, backlash that it's the business' fault for putting too much pressure to deliver "quick wins" and not accepting the need preserve the architectural vision of the developers; and here we are again in the same spot we were with the waterfall method.

Fracturing

Fracturing can look very similar to degradation but is actually quite different. Fracturing occurs all the time in any system which works to standards - which is all systems regardless of whether those standards are documented or are for a team or individual - as they shift and move to account for changes. One example of this is the naming convention on networks: many companies opt for some kind of convention for naming switches, printers, computers etc. which seems to suit until something comes along which no longer fits. For example when a hypothetical company starts out they have three offices in the UK so the naming style becomes [OFFICE]-[EQUIPMENT]-[000]. But then the company expands to Mexico and someone decides that the naming convention needs to include the country as well: the convention is now fractured as new machines now have [COUNTRY]-[OFFICE]-[EQUIPMENT]-[000]. Then an auditor comes along and says that you should obfuscate the name of your servers to make it harder for a hacker (as UK-LON-ACCOUNTSSQL is a nice signpost to what to hack) so the convention has to change causing even more fracturing.

This happens in code all the time as your standards shift like sand: yesterday you were using constructors for everything and then today you read Gang Of Four and decide that the Abstract Factory pattern is the way to go. The next day you read Domain Driven Design and you decide that you should separate the factory from the class that it creates. Fracturing, fracturing, fracturing. And then you read about NHibernate and decide that ADO is old news and then in a few years LINQ comes out and you swap to that. Fracturing, fracturing, fracturing. Then you discover TDD and start writing the code in tests but the rest of the team doesn't get it yet. Fracturing, fracturing, fracturing.

Of course you could stand still and code to strict guidelines which never change, the team leader walking around the floor with a big stick which comes down across your knuckles every time your code strays slightly in the wrong direction. But who wants to work in an environment like that? Fracturing is a result of many things but mostly it is a result of a desire to do better and as such an improvement in quality. What coders believe is out of sync is rarely their new sparkling code but their old "I'm so embarrassed I wrote that" code. As a result their reaction is a desire to knock down the old and rewrite everything from the ground up using their new knowledge and techniques (though most recognise this is not "a good thing").

Fracturing can also result from negative factors as well: e.g. someone leaves the company taking years of experience with them and you get a new bod in but they've got to get coding quickly so we throw the standards out and we have more fracturing. Or there's a lot of pressure to get that code out so drop those new advances we proved in the last project and go back to the old safe and sound method.

Velocity

The obvious thing to say is that project velocity is negatively impacted by degradation and fracturing but that would only be half the picture. The reality is also the reverse: velocity has an affect on degradation and fracturing. Agile methodologies such as XP place a great deal of stress on maintaining a realistic velocity and the advice is wise indeed: too much velocity and more code is written than can be maintained creating too much degradation. On the other hand too little velocity and so little code is written between the natural shifts that the amount of fracturing per line of code is at a much higher ratio than it would be if the velocity had been higher.

Another consideration is that if velocity is not carefully controlled development can end up in high and low periods. High periods become stressful causing mistakes, less time to fix degraded code, negative fracturing and ultimately burn out. Low periods create boredom, frustration which cause mistakes etc. or they encourage over-engineering again causing degradation and fracturing.

Getting the velocity correct is a real art form and requires so many different considerations that they are too many to list. However one thing that is required to get velocity correct is waste. Waste is the bain of all businesses and they often spend more money trying to eliminate waste than it's original cost. Developers are expensive and to have a developer potentially sitting around doing nothing is a no-no for many companies; they want every developer outputting 100% 100% of the time. However the inability to run to such tight budgets is a reality that virtually every industry has come to terms with. Take a builder for example: if he's going to build a house he'll approximate how much sand, bricks, cement, wood and everything else he'll need then he'll add a load on top. Ninety nine times out of a hundred the builder will end up with a huge load of excess materials which he'll probably end up throwing away (or taking to the next job). Of course this isn't environmentally friendly but the principle the builder is working off is that if he ends up just two bricks or half a bag of cement short he's going to have to order more. Which means a delay of a day, which puts the whole project off track: basically it's going to cost him a lot more than the acquisition and disposal of the excess had he ordered an extra palette of bricks or bag of cement. Thus the builder makes the decision that in order to maintain velocity there will be waste.

What to do

There is an old joke:

"What's the difference between a psychotic and a neurotic? Well, a psychotic thinks 2+2=5. Whereas a neurotic knows that 2+2=4, but it makes him mad. "

Developers and businesses risk becoming psychotic about degradation and fracturing, instead believing that by coming up with some amazing architecture or definitive strategy all these problems will go away. Neurotics are only slightly less dangerous, as they become over anxious and fearful of degradation and fracturing they introduce processes, architectures and strategies until they become indistinct from the psychotics.

Eric Evans has many ideas in his book Domain Driven Design and is the very example of a pragmatist. Evan's accepts that nasty things happen to systems and bad code is written or is evolved. To Evan's it is not important that all of your code is shiny platinum quality but that code which is most critical to the business is clean and pure. This is why Evan's doesn't slate the Smart-UI pattern: because to him there are many situations where they are acceptable: it just isn't realistic to build a TVR when all you need is a 2CV.

Chapter IV (Strategic Design) of Domain Driven Design is dedicated to most of the concepts to protecting your system and is unfortunately the bit which gets least attention. Concepts such as Bounded Context, Core Domain and Large Scale Structure are very effective in dealing with degradation and fracturing. Although the best advise I can give is to read the book one of the ideas which interests me the most is the use of Bounded Contexts with Modules (A.K.A. packages - Chapter 5). Evan's explains that although the most common method for packaging tends to be around the horizontal layers (infrastructure, domain, application, UI) the best approach may be to have vertical layers based around responsibility (see Chapter 16: Large-Scale Structure). This way loosely coupled packages can be worked on in isolation and bought together within the application: there is almost a little hint of SOA going on. If an unimportant package starts to degrade or becomes fractured (for example uses NHibernate rather than ADO.NET) it doesn't require the rewriting of an entire horizontal layer to keep it clean: the layer can stay inconsistent within it's own boundary without having any affect on the other parts of the system. This leaves the development team to concentrate their efforts on maintaining the core domain and bringing business value.

This isn't the only solution though: the use of DDD on big systems will only bring benefit when it hits a certain point. If all your development team is able to do is continuously build fractured systems which degrade (either due to skill or resource shortages) then it may be best just to bite the bullet and accept it. A monolithic system takes a lot to keep going and, although DDD can be applied to prevent many of the issues discussed, if you cannot get the velocity then your domain will not develop rapidly enough to deliver the benefits of a big enterprise system. If that is the case it may be more realistic to abandon the vision and instead opt for a strategy that accurately reflects the business environment. This is where I believe Ruby-On-Rails is making headway; it isn't going to be apt for all enterprise systems but it does claim to take most of the pain out of those database-driven web-based applications that are proliferating through companies. The point is, even when a big-boy enterprise system sounds like the right thing to do, trying to develop one in a business which cannot or will not dedicate time, money and people is going to be a bigger disaster than if you ended up with a half-dozen Ruby-On-Rails projects delivering real business value. And you never know once you've delivered a critical mass you may find you can move things up to the next level naturally.

Conclusion

If you ever visit an old building, especially Cathedrals such as Canterbury, you'd know that they often gone through hundreds of iterations: bits have been damaged by fires, weather, pillaging or deliberately destroyed, they have been extended in ways sympathetic to the original and in ways originally considered abominations, some bits have been built by masters others are falling down because of their poor quality. The fact is these buildings are incredible not because they are perfect but because of the way they have adapted to use through the ages. Old buildings face huge challenges to adapt through all periods of history where architecture must be compromised and the original people, tools and materials change or may no longer be available. The challenge for these buildings across the centuries was not to maintain an unrealistic level of architectural perfection but to ensure that they maintained their use - the moment that stops the building will disappear.

About Me

I am a ThoughtWorker and general Memeologist living in the UK. I have worked in IT since 2000 on many projects from public facing websites in media and e-commerce to rich-client banking applications and corporate intranets. I am passionate and committed to making IT a better world.