Wednesday, January 28, 2009

Let me start out by saying I am a big fan of refactoring, the ongoing process of changing code so that it performs the same behaviors but has more elegant structure. It's an essential discipline of good software development, especially in startups. Nonetheless, I want to talk about a dysfunction I've seen in several startups: they are literally refactoring themselves to death.

Here's a company I met with recently. They have a product that has a moderate number of customers. It's well done, looks professional, and the customers that use it like it a lot. Still, they're not really hitting it out of the park, because their product isn't growing new users as fast as they'd like, and they aren't making any money from the current users. They asked for my advice, and we went through a number of recommendations that readers of this blog will already be able to guess: adding revenue opportunities, engagement loop optimization, and some immediate split-testing to figure out what's working and what's not. Most of all, I encouraged them to start talking to their most passionate customers and running some big experiments based on that feedback.

I thought we were having a successful conversation. Towards the end, I asked when they'd be able to make these changes, so that we could meet again and have data to look at together. I was told they weren't sure, because all of their engineers were currently busy refactoring. You see, the code is a giant mess, has bugs, isn't expandable, and is generally hard to modify without introducing collateral damage. In other words, it is dreaded legacy code. The engineering team has decided it's reached a breaking point, and is taking several weeks to bring it up to modern standards, including unit tests, getting started with continuous integration, and a new MVC architecture. Doesn't that sound good?

I asked, "how much money does the company have left?" And it was this answer that really floored me. They only have enough money to last another three months.

I have no doubt that the changes the team is currently working on are good, technically sound, and will deliver the benefits they've claimed. Still, I think it is a very bad idea to take a large chunk of time (weeks or months) to focus exclusively on refactoring. The fact that this time is probably a third of the remaining life of the company (these projects inevitably slip) only makes this case more exaggerated.

The problem with this approach is that it effectively suspends the company's learning feedback loop for the entire duration of the refactoring. Even if the refactoring is successful, it means time invested in features that may prove irrelevant once the company starts learning again. Add to that the risk that the refactoring never completes (because it becomes a dreaded rewrite).

Nobody likes working with legacy code, but even the best engineers constantly add to the world's store of legacy code. Why don't they just learn to do it right the first time? Because, unless you are working in an extremely static environment, your product development team is learning and getting better all the time. This is especially true in startups; even modest improvements in our understanding of the customer lead to massive improvements in developer productivity, because we have a lot less waste of overproduction. On top of that, we have the normal productivity gains we get from: trying new approaches on our chosen platform to learn what works and what doesn't; investments in tools and learning how to use them; and the ever-increasing library of code we are able to reuse. That means that, looking back at code we wrote a year or two ago, even if we wrote it using all of our best practices from that time, we are likely to cringe. Everybody writes legacy code.

We're always going to have to live with legacy code. And yet it's always dangerous to engage in large refactoring projects. In my experience, the way to resolve this tension is to follow these Rules for Refactoring:

Insist on incremental improvements. When sitting in the midst of a huge pile of legacy code, it's easy to despair of your ability to make it better. I think this is why we naturally assume we need giant clean-up projects, even though at some level we admit they rarely work. My most important lesson in refactoring is that small changes, if applied continuously and with discipline, actually add up to huge improvements. It's a version of the law of compounding interest. Compounding is not a process that most people find intuitive, and that's as true in engineering as it is in finance, so it requires a lot of encouragement in the early days to stay the course. Stick to some kind of absolute-cost rule, like "no one is allowed to spend more time on the refactoring for a given feature than the feature itself, but also no one is allowed to spend zero time refactoring." That means you'll often have to do a refactoring that's less thorough than you'd like. If you follow the suggestions below, you'll be back to that bit of code soon enough (if it's important).

Only pay for benefits to customers. Once you start making lots of little refactorings, it can be tempting to do as many as possible, trying to accelerate the compounding with as much refactoring as you can. Resist the temptation. There's an infinite amount of improvement you can make to any piece of code, no matter how well written. And every day, your company is adding new code, that also could be refactored as soon as it's created. In order to make progress that is meaningful to your business, you need to focus on the most critical pieces of code. To figure out which parts those are, you should only ever do refactoring to a piece of code that you are trying to change anyway.

For example, let's say you have a subsystem that is buggy and hard to change. So you want to refactor it. Ask yourself how customers will benefit from having that refactoring done. If the answer is they are complaining about bugs, then schedule time to fix the specific bugs that they are suffering from. Only allow yourself to do those refactoring that are in the areas of code that cause the bug you're fixing. If the problem is that code is hard to change, wait until the next new feature that trips over that clunky code. Refactor then, but resist the urge to do more. At first, these refactoring will have the effect of making everything you do a little slower. But don't just pad all your estimates with "extra refactoring time" and make them all longer. Pretty soon, all these little refactorings actually cause you to work faster, because you are cleaning up the most-touched areas of your code base. It's the 80/20 rule at work.

Only make durable changes (under test coverage). There's no point refactoring code if it's just going to go back to the way it was before, or if it's going to break something else while you're doing it. The only way to make sure refactorings are actually making progress (as opposed to just making work) is to ensure they are durable. What I mean is that they are somehow protected from inadvertent damage in the future.

The most common form of protection is good unit-test coverage with continuous integration, becaus that makes it almost impossible for someone to undo your good work without knowing about it right away. But there are other ways that are equally important. For example, if you're cleaning up an issue that only shows up in your production deployment, make sure you have sufficient alerting/monitoring so that it would trigger an immediate alarm if your fix became undone. Similarly, if you have members of your team that are not on the same page as you about what the right way to structure a certain module is, it's pointless to just unilaterally "fix" it if they are going to "fix" it right back. Perhaps you need to hash out your differences and get some team-wide guidelines in place first?

Share what you learn. As you refactor, you get smarter. If that's not a team-wide phenomenon, then it's still a form of waste, because everyone has to learn every lesson before it starts paying dividends. Instead of waiting for that to happen, make sure there is a mechanism for sharing refactoring lessons with the rest of the team. Often, a simple mailing list or wiki is good enough, but judge based on the results. If you see the same mistakes being made over again, intervene.

Practice five whys. As always with process advice, I think it's essential that you do root cause analysis whenever it's not working. I won't recap the five-why's process here (you can read a previous post to find out more); the key idea is to refine all rules based on the actual problems you experience. Symptoms that deserve analysis include: refactorings that never complete, making incremental improvements but still feeling stuck in the mud, making the same mistakes over and over again, schedules slipping by increasing amounts, and, of course, month-long refactoring projects when you only have three months of cash burn left.

Back to my conversation with the company I met recently. Given their precarious situation, I really struggled with what advice to give. On the one hand, I think they have an urgent problem, and need to invest 100% of their energy into finding a business model (or another form of traction). On the other, they already have a team fully engaged on making their product architecture better. In my experience, it's very hard to be an effective advocate for "not refactoring" because you can come across as anti-quality or even anti-goodness. In any event, it's enormously disruptive to suddenly rearrange what people are working on, no matter how well-intentioned you are.

I did my best to help, offering some suggestions of ways they could incorporate a few of these ideas into their refactoring-in-progress. At a minimum, they could ensure their changes are durable, and they can always become a little more incremental. Most importantly, I encouraged the leaders of the company to bring awareness of the business challenges to the product development team. Necessity is the mother of invention, after all, and that gives me confidence that they will find answers in time.

11 comments:

Back when it first came out, Fowler's book Refactoring was what got me into the Agile world, so it holds a special place in my heart. But I agree completely: not a week should go by without visible and hopefully released progress toward satisfying real users.

The only way a team can judge the value of particular refactorings is in a context where the primary focus is on frequently delivering value. Otherwise, the opportunity for redesigning is infinite. In the same way that product managers must be ruthless about product prioritization, developers should be ruthless about choosing which bits of code are really worth polishing right now.

My one counterbalancing piece of advice is that the code base should get a bit better each week, rather than a bit worse. The pressure to release should always be present, but never so strong that technical debt increases over time.

Excellent post. The rigor of engineering teams needs to be balanced with the realities of the commercial environment. Code that Fowler would be in awe of is useless if it does not generate revenue.

Product and engineering managers will be better off if they adopted a iterative approach to refactoring. Refactoring at a large scale has similar risk characteristics as re-writing an application. And we all know what the success rates of rewrites are!

I would also suggest, and the engineers will almost certainly hate this suggestion, that the decision-maker asks for hard metrics to justify the refactoring effort in the first place.

I have often been told by programmers that the project needs to slip by x weeks or months because the code needs to be refactored. When asked how they know, the answer is invariably defensive in tone and vague in its explanation.

Writing this, I can recall four occasions when the hard metrics (gathered at my request by members of the QA team) proved the code was indeed performing sub-optimally - but only at the microscopic level, e.g. cpu utilisation running at 3% instead of 2%. In each of those four cases, it was cheaper and faster (by several orders of magnitude) to beef up the hardware.

Only one occasion was a substantial rewrite justified based on performance metrics. And in that one case (hard to pull a trend from one data-point, I know, but this is indirectly substantiated by anecdotal evidence from other project managers and my own programming career) the rewrite over-ran by almost half, principally because the senior programmers recommended a course of action based on incomplete knowledge.

I'm in this exact position with my startup, industrialinterface.com. We had some work done by an external group that isn't up to snuff (from my POV), and now we need to fix it AND move forward.

Fortunately, my co-founders are also engineers and understand the need for some of the stuff I do. We've reached a middle ground -- they know that I have a tendency to over-engineer and "beautify" code. I'm honest when I'm doing the "unnecessary" re-factor, and the business guys decide if it's worth the time.

One caveat: it can indicate good things about a dev team if they want to refactor constantly. It means they're embarrassed about their past work, which means 1) they're constantly getting better, and 2) they take pride in the quality of their work. IMO, those two reasons are enough to justify some of the refactoring time (although 1/3 of the businesses' TTL seems like a bad plan...)

@mooders: Note that refactoring is improving the internal design, not the performance or other aspects of the functionality. The goal of refactoring is to make it easier to do future development. There are some metrics that may help with that, but it's something where you mainly need to trust the professional judgment of your senior developers.

I seen this in practice too, a company that reaches its end with a better product, but not enough business. It's a problem with their priorities.

You have to solve the biggest problems first, and unfortunately for many techies, the business ones often trump the technical ones. Refactoring is a long-term strategy, it makes no sense if you can't survive long enough to acquire the benefits of the work.

Once 'next year' becomes a possibility again, then it should go back onto the table (too many short-term choices are as bad as too many long-term ones). Balance is important.

There is definitely a business/marketing analog to exactly this situation that happens a lot too. The long refactor of business model and market research (as opposed to more organic iteration) is just as bad. Would love to read a post from you about that side of things.

In my opinion there are too many examples where people in the industry use the word "refactor" to justify their work, but what they really do is re-enginering, redesign, unstructured/informal code cleanup or even complete rewrites.

There does come a time when code debt is literally killing a company's cash flow. Little refactoring sessions are certainly a good step and sometimes the business needs to realize they have leveraged their code base as far as it can take them.

This is usually exacerbated by zealous sales persons who oversell the features, or even more typical, promise completely new features.

@mooders - Performance is not the only metric deserving analysis of a refactoring effort. You should take note at the amount of effort estimated by your engineers to implement new features and/or squash bugs. If those estimates are going up, you already have serious code debt. A healthy code base is one where you can add features easily over time.