In the above posts (particularly post two), I identified and discussed what I believe are the key problems with custom enterprise software development. Today, I take a closer look at one of those problems: the monolithic architecture.

Monoliths 101

Custom enterprise software is traditionally developed in monolithic form. In this architecture, the system consists of a single large process. This process may be deployed in multiple instances behind a load balancer. The code base is usually in one repository, and is built and tested as a single complete body of functionality. The system is deployed infrequently, usually via a staged approach. A build server is often used to ensure that all recent changes integrate without breaking anything.

Variants of this architecture include systems composed of multiple monoliths, deployed separately. These are known as service-oriented architectures (SOAs). The monolith(s) may also be surrounded by small support services that are deployed and run separately. These are typically more infrastructural in nature.

The defining characteristic of the monolith is that the majority of the business logic code that delivers the behavior of the system is executed within a single process and code base, so that the full features of the language and platform chosen for development can be used to write the code for the system. This has some unfortunate consequences, chief of which is technical debt.

Technical debt

“The monolith invites technical debt”

The monolith invites technical debt because it enables unbounded complexity. There are no limits to the depth of the data structures, the entanglements of the class hierarchy, or the web of object references. Classic object-oriented (OO) design patterns are an attempt to control this inherent complexity. They are only partially successful, as they can always be circumvented, misapplied or over-used. The number of interactions between parts tends to grow.

There are no architectures that can prevent the accumulation of technical debt. It is simply the action of entropy over time; that is, disorder cannot but increase. At best, you can slow it down by finding mitigation techniques. Good software design is the use of well chosen abstract structures that absorb disorder to a greater extent than other, less suitable structures. Of course, you need to have a good understanding of your problem domain, and of your language platform, to make good choices. Even then, you have to pray for the good fortune that your requirements won’t change and your team will be strong.

Other problems of the monolith

One problem with a single body of code is that ultimately, any part of the code base can talk to any other part, no matter how much you try to prevent it. Another problem is that data structures can be extended with little cost, as everything is in memory anyway. Yet another problem is that modifications can have wide-ranging impacts, and it becomes easy to inadvertently break distant areas of code during bug fixing or other small changes.

Minimizing complexity

“Our job as software developers is to achieve the lowest implementation complexity possible”

It cannot be denied that software systems have inherent complexity. They must solve a given business problem within certain constraints. The algorithms and data structures can be simplified only so far. A set of business requirements, however implemented, and in whatever language, contains a minimum level of complexity, below which the requirements are not completely implemented.

The theoretical minimum is a code base that contains no more information than is contained in the requirements specification, because to contain any less would mean it could not possibly implement the requirements. In one sense, our job as software developers is to achieve the lowest implementation complexity possible under given environmental constraints (such as schedule, team size and ability, and so on).

You aren’t going to need it

The popular maxim YAGNI (you aren’t going to need it), promoted by the agile methodologies, is a well-established tactic to reduce monolithic complexity. In practice, YAGNI loses the battle quickly in a monolithic context, because monoliths have no natural limits on complexity. Slowing complexity growth is entirely on the shoulders of an all-too-frail technical leadership.

Even when executed well, YAGNI is not cheap. It brings continuous refactoring. The bet (and it is a bet) is that the effort expended on refactoring will be less than the cost of maintaining an overly general structure containing a large volume of code for exceptional cases.

Is a trial-and-error approach really the optimal way to reduce complexity? YAGNI gets a mention here only because it seems to be the most effective way to slow the growth of complexity in monoliths. Older approaches such as explicit modeling using visual diagrams have shown poor results in practice.

Risk

“The monolith also introduces considerable risk into a project.”

The monolith also introduces considerable risk into a project. Deployments are all-or-nothing affairs. There is no easy way to update just one part of the system. Even the smallest change requires a full update.

Any change, however small, brings with it the possibility that the entire system may be knocked out due to unintended consequences. Thus, deployments are regarded as big, risky activities that require large amounts of risk mitigation. Deployments become expensive, and it makes sense to do them only infrequently.

Adverse effects of low-frequency deployment

Low-frequency deployments have a nasty effect: they make deployment even more risky. Lots of changes, often including changes to persistent data, and the schemas describing that data, are rolled into a single update. This makes reversing out of the update almost impossible, as the previous version of the system is unlikely to work correctly with the updated database schema. It becomes essential to ensure that the new version is validated and fully tested before deployment. It must work.

This dynamic is inherently fragile. By requiring the world to be completely deterministic, the monolith is necessarily brittle and will break with unexpected change.

Constant state of minor failure

“It is far better to be in a constant state of minor failure.”

It is far better to be in a constant state of minor failure. This is closer to the true nature of the world. If you can stay operational despite continuous failure, then you are by definition fault tolerant. (This idea is the inspiration for techniques such as the Netflix chaos monkey, a software agent on the network that randomly shuts down servers (a 2012 entry on the Netflix technical blog explains this technique)).

To deal with fragility, software teams introduce byzantine coping strategies. The root cause of the almost religious fervor that developers have for their favorite techniques is the trite observation that any port will do in a storm. The swirling complexity around them is impossible to control, but you can get the feeling of control by insisting that the rituals be followed.

Rituals and diminishing marginal returns

These rituals often take the form of extreme versions of common-sense techniques: units tests are good, so we need 100% coverage; four eyes are better than two, so everybody has to pair-program; bugs are bad, so we need zero bug count, or else; clean code helps communication, so a strict coding style must be enforced. And so on.

These rituals become ends in themselves. Well-meaning technical leads, architects and project managers impose them on their teams in an almost self-deceptive attempt to ensure success. The mechanism by which these rituals are to achieve success is long forgotten, and they are often justified in the basis of prior experience, or by pointing at some celebrity in the programming world as worthy of emulation.

Anything taken to extremes suffers from diminishing marginal returns. Moving unit test coverage from 90% to 100% is far more expensive in terms of developer time than moving from 0% to 10%.

Such decisions need to be correlated with the business value generated, so that a realistic assessment of the gain versus the expense can be made. This is very rarely done.

Conclusion

Returning to the question of ethics in the practice of software development, it is clearly unethical to waste business resources for tiny gains.

In my next post, I will look at best practices and ask to what extent they really are ‘best’.

For more detail on any of the issues raised here, and a lot more as well, the first five chapters of Richard’s new book, The Tao of Microservices, are available now (the first one is free!).