Maintainability is "ility" #1

Occasionally, ok often, I'm gently mocked for the length of my posts. I start with good intentions of making short, pithy Jason Yip-style posts, then think of something else I want to say and 10 pages later I manage to hit the publish button. This is one of those that got away from me.

I'm writing this post for a reason. Specifically, I'm seeing a lot of harm being done on my project by focusing narrowly on performance optimizations and neglecting maintainability. This is an ongoing discussion and debate for me at work, and I suspect it's going to continue to be an issue as long as I work in the financial world. I think it's important to win this argument, or at least gain some concessions, so I would be very happy to hear everyone else's thoughts on this subject, especially folks that disagree with me. In the course of this post I'm going to criticize the decisions and code of some of my teammates (and myself) on my current project. It isn't that I think they're doing a bad job, just that there is a very real opportunity to do better.

When you're architecting a software system, you must understand what the needs of the system really are and act accordingly. It's important to understand the desired qualities of a system because software design often involves making compromises between opposing qualities. For example, performance and scalability are often very much at odds — you can generally only optimize on one or the other. The point I'm trying to make is that you absolutely cannot focus completely on one quality of your software without considering the consequences to the other "ility" qualities of your system.

In the end, you need to be optimizing on the qualities that genuinely create business value — and I believe that the single most important quality for delivering business value on most projects is maintainability. Any deviation from a more maintainable solution in favor of performance or security or scalability or whatever is dead wrong — at least until proven otherwise. Even if you do have stringent performance or scalability targets, I'm going to argue in this post that focusing on maintainability first will get you to those very same performance or security goals more efficiently in terms of development time.

Early Performance Optimization did not Work

To use a concrete example that I'm dealing with on my current project, the back end developers are consciously coding to minimize the number of IL instructions in an attempt to improve performance. They're very concerned by issues like auto-boxing and the number of objects being created. In fact, it seems to be their main judgement of code quality.

So our code is fast right? Well, no, but we're working on it and making strides. When we started integrating the client and server we found that marshalling data from the server to the client was extremely sluggish. A little bit of profiling from one of my team members showed that we had a fairly severe bottleneck in our transport layer that totally dwarfed the run time of the rest of the system. The IL instruction optimization in the server side code didn't particularly achieve anything. What makes me angriest at myself is that we flirted with a more common approach to integration that I think would be more maintainable in the long run, but went with our current strategy in no small part because we thought it would be faster (sic). I change my mind, what I'm really livid with myself for is not forcing the backend guys to benchmark the more maintainable approach first before we committed to this path*.

Neglecting Maintainability is an Opportunity Cost

The server side developers spent time making optimizations that might, or might not have, made some minor improvements in performance. We collectively made some wide ranging architectural choices for performance that have not, in my opinion, added any value whatsoever.

I've got a major problem with the previous statement. You only have a finite amount of time and resources to throw at your project. Yeah, you can crank up your hours for short bursts, but there's always a cost for doing that. To be truly successful, you should strive to spend these finite resources on things that add the most value. The time spent on the server side architecture for performance bothers me a little bit, but the opportunity cost from not writing maintainable code or automated tests on the server side has been far more significant. From Wikipedia, Opportunity Cost is "the cost of something in terms of opportunity forgone."

The end result? While gaining essentially nothing in performance, we cost ourselves the opportunity to have worked more efficiently with that code in terms of both developer and project time by neglecting test automation and well factored code. What we have is code that is genuinely hard to follow and spot errors through inspection because the methods are too long, with deeply nested if/then and looping constructs for good measure. The existing code is much harder to change than orthogonal code backed up by unit tests would have been. Surprise! That server side code had to be changed in the very next iteration to add new features with additional changes looming for later iterations. If I hadn't spent a couple of days slicing that code up with IntelliJ's automated refactoring support, we could have very easily ended up with code duplication in areas that have a high potential to change in the future. Nothing makes extension harder than having to code the same rules in multiple places (a. More work and b. Greater chance of screwing it up).

Even worse is poor coupling and cohesion properties that have defeated our attempts at writing isolated unit tests. I'm all for integrated FIT style tests, but that shouldn't have to be the most granular testing that you can do on a system. One of the lessons I've learned by dealing with so much legacy code in the last 2 1/2 years is that coding throughput is very much effected by the granularity and quickness of the feedback loop. Writing small, granular unit tests that execute quickly leads to better productivity than the much slower feedback cycle from more coarse grained integration tests. Having to fire up the UI to test something by hand is even slower yet. I will very confidently claim that debugging time goes up geometrically with the coarseness of the testing, and that's significant because debugging is a major drain on developer productivity.

Just to beat this horse into the ground, the system we're building doesn't even have any realistic need for high performance. The data sets are small, and the transaction complexity is fairly mild. What we *do* need is reliability, but the stateful socket connection integration scheme we adopted in the name of performance has added complexity in the way that we deal with server connectivity. I think a stateless connection model, while arguably slower, would have provided more value in terms of business value. While surely improving performance, the proprietary binary formats we use for communication come with the opportunity cost of decreased interoperability, and hence a very real reduction in business opportunities.

Ok smart guy, now my code isn't fast enough!

Back to performance again. So you concentrated on producing the correct business functionality first, with maintainability in mind, and it turns our that your architecture isn't responsive enough, or maybe can't handle the volume, or just that the user interface isn't responsive enough. I'm not really addressing performance optimization and profiling in depth here, but take a quick read through Jeff Atwood's post Why aren't my optimizations optimizing? and you'll see that performance tuning is a tricky business. There are too many conflicting variables to solve the problem through pure deduction alone. Dollars to donuts, I bet you that some of the performance optimizations being done by my colleagues ended up hurting performance instead. The point being that you almost certainly need empirical measurements to measure a range of trial solutions to arrive at better performance.

To drive this point home, let's say that your performance bottleneck is in the communication between physically distributed subsystems. Forgetting for a minute about the cost of making changes to your code base, what can you do to make your system faster?

Minimize the number of network round trips because that often makes the system faster by compressing the data sent over the wire or gathering data into a more coarse grained Data Transfer Object

Maybe the fancy compression or transformation of the data is eating up resources. Change that to something else

Use more background threads

Eliminate thread swapping by cutting down the number of threads

Cache shared resources

Eliminate thread synchronization slowdowns caused by shared resources

a time to cast away stones, and a time to gather stones together; a time to embrace, and a time to refrain from embracing… (sorry, couldn't help myself)

Wait, some of these changes contradict the other. Which one is right? Are you sure? It could easily take you several attempts to find the right recipe for performance (or scalability or usability).

You definitely need to make changes to improve performance, but those changes cannot break the functionality of the application. Fast, buggy code isn't an improvement on slow, functional code. If you've written maintainable code that exhibits orthogonality, you should be able to contain the changes to isolated modules without spilling into the rest of the code. If you've built a maintainable software ecosystem of full build automation and solid test automation, you can drastically reduce the overhead of staging new code to the performance testing environment with less risk of breaking working code. In other words, the things you do for maintainability should have a direct impact on your ability to efficiently make empirical

Conclusion

There are two general themes I wanted to explore in this post. The first theme is just yet another cry for YAGNI. Try not to invest time or effort into something that isn't warranted. Make any piece of complexity earn its existence first. A lot of this thinking is based on the assumption that it's easier to add complexity to a simple solution when it's warranted than it is to work around unnecessary complexity. I'm also making a large assumption that you can make optimizations later if you've taken steps to flatten the change curve. The second theme is that I think a deliberate focus on maintainable code structure and solid project infrastructure is a more reliable path to quality optimization than early optimization. If your code and project infrastructure facilitate change you can always make adaptations to improve your other "ilities" — assuming that you're paying attention as you work and make adaptations in a timely manner of course.

And, by the way, maintainability is still the most important code quality. Your system may not have to be blindingly fast, or scale like eBay, but it will change. By all means, go learn about Big O notation and delve into the inner workings of the CLR (I'm finally reading the Jeffrey Richter book this week myself). There's a very important point I'm trying to make for anybody engaged in building software, and that is that focusing on maintainability first is very often the most reliable means to get to exactly the other qualities that you need. Ant if you write code that can't be maintained or changed, you're probably on a path to failure.

Wait, there's going to be more. The next post in the "Maintainable" software series is going to be about the DRY principle and the Wormhole Antipattern. First I'm going to give StructureMap a serious DRY'ing out for awhile, then I'm going to come back and tell you how it went.

*I believe that the decision that ultimately led to our performance and tight coupling problems was based far too much on a "Sunk Cost." More on that someday.

About Jeremy Miller

Jeremy is the Chief Software Architect at Dovetail Software, the coolest ISV in Austin. Jeremy began his IT career writing "Shadow IT" applications to automate his engineering documentation, then wandered into software development because it looked like more fun. Jeremy is the author of the open source StructureMap tool for Dependency Injection with .Net, StoryTeller for supercharged acceptance testing in .Net, and one of the principal developers behind FubuMVC. Jeremy's thoughts on all things software can be found at The Shade Tree Developer at http://codebetter.com/jeremymiller.

Performance optimizations are cheaper if done earlier, yet maintaining the code is more expensive if it isn’t maintainable (duh!), but these aren’t necessarily mutually exclusive, are they?

The key, although difficult, is to try optimize in the few critical areas.

http://fugato.net/ Gunnlaugur Thor Briem

“In other words, the things you do for maintainability should have a direct impact on your ability to efficiently make empirical”

… empirical what?

http://codebetter.com/blogs/jeremy.miller jmiller

Eric,

My experience is the same, and I’d go farther to say it’s usually just doing stupid things with the database. In defense of my server guys, they do run across financial applications that have very complex algorithms and real time UI that does go beyond db problems.

http://www.codebetter.com/blogs/eric.wise ewise

I’ve come across very very few systems in my career that agressive caching and reducing database round trips weren’t enough optimization in 99% of the cases. The other most frequent culprit is poor indexing and inefficient querying on the database tier, but that’s just yet another argument to have a solid DBA on your team. =)

http://blogs.microsoft.co.il/blogs/kim Kim

@Mike: I think we all agree that it’s important to write maintainable code. I just wanted to make the point that if you *can’t* meet your perf specs then you might need to redo large parts of your system with the possible result of throwing out large amounts of very maintainable code. IMHO it’s a big help to have the perf specs at an early stage so you can measure if your architecture is in the right ballpark or not.

Another point, in my experience significant perf improvements have not been the result of tweaking some low level method, but rather choosing a different approach to solve the problem.

http://progblog.wordpress.com Mike G

@Kim: On the contrary, if you have maintainable code then the goal of reaching the performance specs is made easier. One thing that can be guaranteed: the performance requirements are only going to go UP in the future, not down. So unmaintainable but performant code now becomes unmaintainable and unperformant code in 12 or 18 months; except now all the people who wrote the unmaintainable code have left, so you’ve got unmaintainable code, unperformant code and inefficient resources to solve the problem.

Charlie

What’s the old comment about building it first, then optimizing later? The old 80/20 rule will say that 80% of the time in your code will be spent in 20% of the code. Optimizing anywhere else is really a waste of time. Build a GOOD design, RUN the application – see if it’s fast enough – if so, your done, if NOT, put a code profiler on the application, and figure out where the bottlenecks are, and fix them, and document WHY you did it

http://blogs.microsoft.co.il/blogs/kim Kim

What are the perf specs? To optimize beyond those is a waste of time. For some systems performance is a go/ no go. If you can’t reach your perf goals it doesn’t really matter that you have maintainable code. Again, explicictly specifying the required perf specs for each part of the system is a question that has to be answered.