Other Stuff

The benefits of customer and agile development and minimum features set are continuous customer feedback, rapid iteration and little wasted code. But over time if developers aren’t careful, code written to find early customers can become unwieldy, difficult to maintain and incapable of scaling. Ironically it becomes the antithesis of agile. And the magnitude of the problem increases exponentially with the success of the company. The logical solution? “Re-architect and re-write” the product.

For a company in a rapidly changing market, that’s usually the beginning of the end.

It Seems LogicalI just had lunch (at my favorite Greek restaurant in Palo Alto forgetting it looked like a VC meetup) with a friend who was technical founder of his company and is now its chairman. He hired an operating exec as the CEO a few years ago. We caught up on how the company was doing (“very well, thank you, after five years, the company is now at a $50M run rate,”) but he wanted to talk about a problem that was on his mind. “As we’ve grown we’ve become less and less responsive to changing market and customer needs. While our revenue is looking good, we can be out of business in two years if we can’t keep up with our customer’s rapid shifts in platforms. Our CEO doesn’t have a technology background, but he’s frustrated he can’t get the new features and platforms he wants (Facebook, iPhone and Android, etc.) At the last board meeting our VP of engineering explained that the root of our problems was ‘our code has accumulated a ton of “technical debt,’ it’s really ugly code, and it’s not the way we would have done it today. He told the board that the only way to to deliver these changes is to re-write our product.” My friend added, “It sounds logical to the CEO so he’s about to approve the project.”

Shooting Yourself in the Head
“Well didn’t the board read him the riot act when they heard this?” I asked. “No,” my friend replied, sadly shaking his head, “the rest of the board said it sounded like a good idea.”

With a few more questions I learned that the code base, which had now grown large, still had vestiges of the original exploratory code written back in the early days when the company was in the discovery phase of Customer Development. Engineering designs made back then with the aim of figuring out the product were not the right designs for the company’s current task of expanding to new platforms.

I reminded my friend that I’ve never been an engineering manager so any advice I could give him was just from someone who had seen the movie before.

The Siren Song to CEO’s Who Aren’t TechnicalCEO’s face the “rewrite” problem at least once in their tenure. If they’re an operating exec brought in to replace a founding technical CEO, then it looks like an easy decision – just listen to your engineering VP compare the schedule for a rewrite (short) against the schedule of adapting the old code to the new purpose (long.) In reality this is a fools choice. The engineering team may know the difficulty and problems adapting the old code, but has no idea what difficulties and problems it will face writing a new code base.

A CEO who had lived through a debacle of a rewrite or understood the complexity of the code would know that with the original engineering team no longer there, the odds of making the old mistakes over again are high. Add to that introducing new mistakes that weren’t there the first time, Murphy’s law says that unbridled optimism will likely turn the 1-year rewrite into a multi-year project.

My observation was that the CEO and VP of Engineering were confusing cause and effect. The customers aren’t asking for new code. They are asking for new features and platforms –now. Customers couldn’t care less whether it was delivered via spaghetti code, alien spacecraft or a completely new product. While the code rewrite is going on, competitors who aren’t enamored with architectural purity will be adding features, platforms, customers and market share. The difference between being able to add them now versus a year or more in the future might be the difference between growing revenue and going out of business.

Who Wants to Work on The Old ProductPerhaps the most dangerous side-effect of embarking on a code rewrite is that the decision condemns the old code before a viable alternative exists. Who is going to want to work on the old code with all its problems when the VP Engineering and CEO have declared the new code to be the future of the company? The old code is as good as dead the moment management introduces the word “rewrite.” As a consequence, the CEO has no fallback. If the VP Engineering’s schedule ends up taking four years instead of one year, there is no way to make incremental progress on the new features during that time.

What we have is a failure of imagination
I suggested that this looked like a failure of imagination in the VP of Engineering – made worse by a CEO who’s never lived through a code rewrite – and compounded by a board that also doesn’t get it and hasn’t challenged either of them for a creative solution.

My suggestion to my friend? Given how dynamic and competitive the market is, this move is a company-killer. The heuristic should be don’t rewrite the codebasein businesses where time to market is critical and customer needs shift rapidly.” Rewrites may make sense in markets where the competitive cycle time is long.

I suggest that he lay down on the tracks in front of this train at the board meeting. Force the CEO to articulate what features and platforms he needs by when, and what measures he has in place to manage schedule risk. Figure out whether a completely different engineering approach was possible. (Refactor only the modules for the features that were needed now? Rewrite the new platforms on a different code-base? Start a separate skunk works team for the new platforms? etc.)

Lessons Learned

Not all code rewrites are the same. When the market is stable and changes are infrequent, you may have time to rewrite.

When markets/customers/competitors are shifting rapidly, you don’t get to declare a “time-out” because your code is ugly.

This is when you need to understand 1) what problem are you solving (hint it’s not the code) and 2) how to creatively fix what’s needed.

87 Responses

I think it’s a common mistake to assume that quick and dirty coding is actually quick. If a team has solid quality practices in place, it’s usually faster to let them do their best work then it is to ask them to change their practices to be sloppy. If sloppy coding comes easily to a development team, and is actually faster than doing the job right, that for me is a bad smell that could signal an inexperienced team that lacks the discipline to consistently create scaleable, maintainable, and extensible product.

Not making a product because you are not already a team of veteran engineers is a sure way to never having a product at all.

If you think about the problem: plugging a product into twitter, iphone, facebook etc., it’s no surprise that the architecture from 5 years ago didn’t make allowance for it – most of those platforms were barely twinkles in their authors’ eyes.

That’s fair, but I think many are in a position to recruit a team or outsource development. I would only caution such people against any team that agrees to lower quality in order to lower costs. Lowering quality doesn’t come easily to a disciplined team. The willingness to do so is a sign of potential systemic issues.

Your commitment to best practices is admirable, but there are (perhaps more important) factors to consider. If I may use an analogy:
story: http://bit.ly/ew4ai4
I once read a barrista refusing to serve a customer espresso over ice because it would ruin the coffee. That may be so, but that’s what the customer wanted.

If delivering in weeks instead of months at a client’s (or the market’s) request will force you to cut corners, do you turn the client away? If you are able to do so, kudos to you– I’d like to be there someday. But I’m not there yet, I can’t afford to be that picky. Furthermore, I don’t presume I know more about the client’s market than they do, frequently there are reasons to do X that I am simply not aware of.

My approach: Aim for best practices, but when clients’ requests run contrary to same, explain the impact of cutting corners. Then the key part: let them decide. I can’t always make the client come to me, but I can do my best to help them make an informed decision.

I think what caused me to respond was your (in my opinion, unfair) condemnation of engineers ‘willing to lower quality.’ Sometimes quick and dirty is what the client wants. :)

Excellent post. Engineers understand code – they don’t understand balance sheets. The message needs to shared in both formats so that people can be held accountable and responsible for their actions. If you fix the time and the money for the project you’ll have control over the scope. Then it just depends upon how disciplined you are as CEO.

What Alexandre said. This is a typical “us vs them” kind of statement to make, but it isn’t really useful.

Personal anecdote: I once worked with a team of engineers, as a consultant, to mend some communication issues they had with management. My initial brief was to perform a “technical audit”, but upon discovering that the team was unable to give a clear answer to “how does the cost of operating your system scale with number of users”, I taught them some rudiments of budgeting.

Matters improved quite a bit: the project was canceled, but at least everybody was OK with that. I’m not being cynical: this really was acknowledge by all as the best outcome, better by far than the unacknowledged money drain it had been before.

Lesson learned: engineers need to understand the economics of their project. To assume otherwise is to run the risk of overspecialization, tunnel vision and ultimately failure.

That’s a highly condescending statement. Of course engineers understand balance sheets, unless you mean to imply that engineers are also incapable of balancing a checkbook. A more correct statement is that the priorities of your engineering team are not necessarily synced with those of your company, which is a management issue. The impulse to rewrite a codebase is natural—everyone, engineering or management alike, suffers a smidgen from “Not Invented Here” syndrome.

The real question is for the CEO and CTO: “what structures and processes do you have in place to allow yourself to refactor a codebase easily? Do you have a comprehensive test suite? Do you have a management philosophy that prioritizes paying down technical debt over new features?” If the answer to that last question is “No” then they shouldn’t be considering a rewrite in the first place. If its “Yes’ then they should take the time to do it right.

We shouldn’t assume it’s always the engineers advocating a rewrite. We just successfully defended ourselves from management’s ambition to start over with a clean slate and ‘get it right this time’.

A grass-roots movement of wide-eyed engineers was able to avoid this and we’re just about to wrap up a few months of heavy refactoring which got us to the same place with releasable iterations every 2-3 weeks.

Lesson learned? It’s easier for semi-technical management to avoid thinking about the details of their goals by instead focussing on why the current architecture may be non-ideal. Force them to stay out of the architecture and concentrate only on the details of the vision and whether engineering is getting results.

This post really hit home because I’ve been there. It’s an important lesson to technical CEOs, engineering VPs and software teams: write your code as separate modular pieces.

When a company does this and finds itself in this situation, they can take 10% or 20% of their engineering capacity and refactor the code base one piece at a time.

The message then becomes: “What we have is valuable…and we’re going to change our framework so that it’s more valuable and easier to adapt to the market in the future.”

It doesn’t matter halfway through that half of the code base is modern and half isn’t – it all works together anyway with defined inputs and outputs between pieces.

Another point: building software this way also prevents the age old “we can’t both work on that massively big feature because only one person can check out the code at once.” (That’s a clear signal you’ve built a monstrosity instead of a modular piece of software.)

This is so important. Even if you have built the mega pot of spaghetti code, you can start to pull out specific functions and refactor them.

To me, this is like remodeling vs rebuilding a house. If your goal is to continue to live in the house (in the case of a web based business the parallel is “stay in business”) then you need to fix up certain parts of the house. Sometimes that is just looks, it might be new functionality, it may be an addition, it may be finishing the basement … but all of these things can be done while still living in the house. You can literally rewire your entire house.

What you can’t do is you can’t change the foundation of the house. If you try to do that, you can’t live in your house.

Taking that back to the post, as Steve said, a full rewrite can be killer.

Two companies ago, we did a full rewrite, took 5 months longer than planned, took 5 months to work out the bugs, but the company thrived afterwards and was able to grow because of the rewrite.

Last company, refactored all of the code from top to bottom. Took longer than we “hoped,” not as many bugs, business grew throughout, much less pain, no 3 day stretches without any sleep.

Now, rewriting entire large scale application? The only reason you’d need to do that is if it wasn’t written in self-contained, loosely-coupled modules. If fixing a bottleneck in your Data Access layer requires you to rewrite 80% of your application, you may not have designed it correctly.

What the post is correct about however is that while developers CAN balance a budget, they have a bias towards improving their day-to-day lives by working in a more sane code base. This sometimes means that it’s harder for them to make a business case for a rewrite.

Of course, if you follow the history and success of developer-driven organizations, the benefits of allowing developers to work on what they see fit and enjoy working out often outweigh the cost. These organizations tend to deploy fixes faster, build features faster, and innovate quite a bit more. The catch-22 of the foible outlined above is that NOT refactoring the code in a significant way may also keep the company underwater.

I’ve worked at an organization that made the decision year-over-year to not upgrade core features and infrastructure of application. It’s very easy to let 4 years pass in this fashion, chasing after “low-hanging fruit” enhancements and fixes, and find at the end of the four years that a few bad things have happened:
– Your competitors are all moving faster than you
– Your infrastructure problems have worsened and will take a year to fix
– You are stuck with two bad choices: (1) fix the infrastructure and let your competition get further ahead for a year, (2) don’t fix the infrastructure and let your competition get further ahead forever

So what’s a CEO to do? I agree with you, Aaron, that you upgrade it in chunks. More than that, you evaluate the business case for individual upgrades to modules of the code in terms of better operating efficiency, system costs, and growth potential. Updating your query builder so that ALL your developers can access data 30% faster (thanks to a turnkey system that promises fewer bugs and built-in sanitization) is more important than rebuilding Feature X so the team that curates it spends less time fixing bugs.

Indeed, lucky you, a bit more explanation of the circumstances behind the rewrite rather than triumphalism would be more useful. Was it a large job, were you under thime pressure, did you have competitors breathing down your neck?

I totally would have “liked” this if I could have. It’s so dangerous to throw everything in the “don’t do this” bucket. So…

What if…
The current system can’t scale without a tremendous risky rework?

The system won’t crashes like crazy, and is a complete blackbox to the developers. Making a change starts to take “research stories” that run for sprints. A snail’s pace would be faster.

The current system needs to be maintained, but is written in a language that is not attractive to developers anymore?

The business is changing direction and the current system can’t be reworked to support the new direction?

The existing system was a licensed proprietary system that’s no longer supported, and hadn’t been kept up to date?

And don’t forget that bloat kills. And old legacy systems can get beyond full.

This list goes on and on. Rebuild is the last resort. But sometimes, burning the existing system is what’s needed. So the trick is to pull together the right transition plan so that the work will align with business objectives, rather than having no growth for a long period of time.

And for you devs out there, anyone want to come work on a dinosaur Perl CMS? You can argue not to rebuild, but I bet you won’t take the job working on that legacy system unless you have to.

Excellent post, Steve.Thank you for sharing your thoughts on the subject. Would you consider writing new code mixing existing features and desired new fetures ( facebook, iPhone, etc) if the market you are in has high churn rate among customer? I mean, writing new software for new customers in the case churn rate is high ( specially if 80% of your customer use only 20% of your features). Thank you

I think is a bit extreme to try to advise people not to do a re-write without full knowledge of the codebase/company, and it’s current architecture and business requirements. Just because you are in a fast moving industry does NOT mean you should keep awful code and never do a rewrite.

Many companies have had their initial code written by an outsourced company, who have planted a poison seed, and it has been nurtured into a disaster. I’ve even worked with companies that started with a prototype and just continued to build on that code. In these cases – a rewrite is not only recommended, facilitates rapidly overtaking their competitors by adding features faster – post rewrite.

Another great lesson learned. As one example of this phenomenon in action, I would point to Friendster vs. MySpace in 2003-06. Jonathan Abrams wrote an answer on Quora in which he detailed how the decision to rewrite the code at Friendster was disastrous in one of the fastest-moving markets ever (online social networking). See http://qr.ae/HOy3 .

By contrast, at MySpace, which was originally written in ColdFusion for its late-2003 launch, when the site exploded in popularity and it became clear that CF wouldn’t scale, we were able to transition the whole operation to ASP.NET and maintain site performance under the load of 1.5 billion page views per day (as of 2006 when I left). The engineering team used a tool called BlueDragon that enabled CF to run on top of .NET, and proceeded to rewrite one piece at a time after that while incorporating new features, etc.

Our founding CTO, Aber Whitcomb, deserves credit for a clever solution that avoided the pitfalls of a rewrite. As the fastest-growing site in history, adding 300,000 new registered users per day, MySpace was able to succeed where Friendster failed largely because of this critical strategic decision. (For those interested, you can find Aber’s presentation at MIX06 online, where Aber joined Bill Gates onstage. See http://bll.la/EW.)

Being a software engineer myself, I think the key issue in the real world is that you never, ever should deliver a “final” solution, a fixed code stored in a silo, but a continuously evolving prototype. That’s why gmail is always in beta, you can’t know in advance what tomorrow’s requests will be.

Indeed this can be a very tricky situation, the main fact that any non tech person misses out is that code & the people who wrote it go together. Always a part of the design / archtecture lives in their heads and is rarely documented. You will be much better off if you start bulding something new around the old, that eventually will outgrow the old one, and never say the word “re-write”!

The flipside of death by rewrite is death by unmanageable codebase. A house of glass may prove too fragile to extend, there’s risk in piling it on.

The best solution will come from the VP of engineering (either rewrite & hire to support that effort while adding new features to legacy code and making a switch). Not long ago Twitter swapped out some core rails code with scala and it has helped them continue to expand. Rewrites and heavy refactoring can work.

We faced this exact issue on an Open Source project back in 2007. The Project Lead decided on a rewrite from scratch and took half the team that way… while the other half of the team (me included) decided aggressive refactoring was the way to go and – since they kept the name – we forked to our own project.

As a result, you had a version that didn’t exist except on paper, a version we were fixing & improving, and a zombie version that still isn’t quite dead yet but everyone knows it will stop moving soon.

On the plus side, for the rewrite, in November 2007, they estimated a nine month schedule. After some digging around a few months ago, they estimated they were 9 months from release.

I’m the opposite of Paul: I’m wary of clean code. It shows a team that is more concerned with the code than the business. I’ve been at two very successful startups now, both with messy code and both resisting the urge to re-write. Ask Friendster and Digg how their re-writes went.

advice is always broad, consulting is specific
a re-write can never be total,
a young engineer sees green-field with thrill while an elder one with fear,
a re-design (re-writing architecture) ahould yield at least one order of magintude of improvements, so features will benefit anyhow ;)

When I was young, I wanted to rewrite the code. Now that I’m older, I want to build a new product that supplants the legacy product… much like iOS is built on OS X but was a sort of “rewrite” of it for a new platform. If the iPhone had been a failure, it wouldn’t have had negative impact on the Mac.

Also- Steve- PLEASE make your books available in an eBook format. Preferably ePubs that I can read in iBooks.

I have switched completely over to electronic books because they are so much more convenient. My iPod has 150+ books in it, and I only buy electronic books now. I’d really like to read yours and it seems it is self published so probably pretty close to being ready for an electronic format already. I’ll settle for Kindle format if I have to… but I stopped buying paper books completely.

I have to join Prof. Bernstein here in respectful dissent, and cite Windows Vista and OS X as a pair of cautionary tales on the other side. Microsoft nearly committed hara-kiri by not doing a rewrite, while Apple saved itself from oblivion by essentially doing one. One should be wary of “proof by horror story.”

Apple has done “rewrites” a number of times on its core software, but they have made some telling choices to address the very problems Steve has written about.

First, they are a massive company and had the cash to build a completely separate team to do the rewrite.

Second, the team was sequestered off completely from the team running the older version. Sure, they had some bright minds transfer over, but the same team wasn’t asked to work on the old and new.

Third, they kept the new system a huge secret. I’m sure they did it internally to keep people focused in their respective areas. But they did it externally too. So until the new was just about to be released, nobody knew about it, and the old product was “the best thing since sliced bread.”

(They do this for hardware too. I can guarantee you that on the day iPhone 4 was released, the working iPhone 5 in the lab made it look like a child’s toy. But nobody knows about the iPhone 5 until right before it’s available.)

A rewrite is possible under those conditions. In a startup or even a small-but-profitable company? Steve’s advice is deadly accurate.

I like your example, it is pure entrepreneurial capitalism and is what should happen! But it conflates two forces – organization and coding. Coders that leave with domain experience can often do things that the same coders (even with total autonomy) could not do in the existing organization.

Legacy capital purchases, stakeholders, legacy customers, legacy data and migrations, and even the development environment and tools create taxes on developers that are extraordinary.

I have been in software for around 30 years. I have been a solo coder, an architect, a product manager, and a CEO of a company with an extremely complex product.

I feel absolutely certain that you should NEVER do a complete rearchitect/rewrite unless your goal is to create another company.

I think this just shows that you’re not a technical person, and have never steeped yourself in the issues of engineering. As much as one commenter said flippantly, “engineers don’t understand balance sheets”, non-technical leadership just doesn’t understand engineering.

The agile process encourages rewriting *as you go*. If followed, you should never be in a situation where it’s an all-or-nothing situation.

However, I’m sure many people find themselves dealing with “sins of the past”. I know I inherited a codebase two years ago that was written by a contractor, and it was his first project. It was a mess. The first two months were rewrite — but our capacity to adapt to changing requirements after that was tremendously improved. New features were included in the rewrite, so it wasn’t a zero-change outcome when the task was up.

What strikes me about the original post, though, is that the CEO complains about new features — but not performance or scalability. Crufty codebases usually suffer from performance issues long before there’s a problem with adding new features. Are all these new features really going to change the momentum of customer acquisition, or are these “me too” additions, without a clear understanding of the ROI on the engineering? How did the company get to a point where the codebase is suffering from inability to change before its inability to support the customer base?

As a startup engineer, I’ve worked for a number of companies that toss a basic lack of product vision onto the engineering team; rather than identifying the key features (or more importantly, new and innovative features) that would make their product more competitive, they ask the engineering team to toss everything into the system just to see what sticks. It never really seems to end all that well, as you end up with crufty kludges in the codebase for half-baked ideas that have absolutely no value to the bottom line.

Should a company take 3-6 months just to do a full rewrite, with no added benefits? Probably not. But should they ignore the issue of technical debt so they can make matters worse by tossing everything and the kitchen sink into the system as fast as possible? That’s probably the worse option in the long run.

My experience is that I make the most progress in a situation like this when I first embrace the contradiction: yes I have to rewrite the code if I want to make progress fast enough and no I can’t rewrite the code because I can’t afford to stand still. My reading of your post emphasizes the latter point, but both are valid.

One synthesis that resolves the apparent contradiction is to acknowledge that code is always in a state of becoming. You will always see echoes of the past and echoes of the future. When you discover a better design, figure out how to keep the old and new designs running at the same time. Implement new stuff the new way, accepting that this will slow you down a bit. When you touch old stuff, convert it to the new way, accepting that this will slow you down a bit. When the old way is nearly extinct, be prepared to push a little to convert the last of it so you can stop paying the cost of keeping both ways working.

Unfortunately, the tools and techniques for this kind of conversion are still immature, so there’s more handwork and risk than there needs to be (in some more perfect world). However, the contradiction still exists and needs to be resolved somehow.

While some of the counter arguments presented are valid, you have to be careful about whether you’re doing a re-write or a rip & replace. They’re different. With a re-write you end up at the same place you started: the feature set is roughly equivalent, but now you have the hoped for ability to iterate faster, add new capabilities previously not possible with the old code base, etc. With a rip and replace you’re placing yourself on a completely different development path, as Apple did with OS X. They didn’t re-write OS 9, they completely replaced it.

We faced the re-write decision at my last company. The Engineering team, who was having trouble keeping up with feature demands due to technical debt, gave us an 18 month schedule for the re-write. At the end of this time we were to have the exact same feature set but a cleaner architecture. The project was killed for the reasons stated in Steve’s post.

Peter, what a series of ridiculous things to say! Engineers can understand balance sheets just fine. However, if you hoard the decision making and dictate the time and money spent on a project then you have no control over the resulting quality and are the antithesis of what I’ve heard Steve advocate for. And if you do this time and time again you are will eventually get what you deserve.

The OP describes (in my mind) a classic example of not paying down your technical debt. It must be pretty bad for an experienced engineer to essentially declare bankruptcy due to the amount of debt and recommend a rewrite. FWIW, I agree with Steve that a rewrite is bad idea 99% of the time. This is more of a symptom of the dysfunction of the organization than it is a technical problem to be solved. Communication and creativity and vision are the solutions needed here.

For more on understanding the technical debt metaphor, check out Ward’s video:

I really enjoy your column. However, I’m worried that people will use this article to justify not rewriting bad systems that will kill their businesses even if they are not rewritten.

A rewrite is sort of like a heart transplant. It would be better if we could just wave away the clogged arteries, but sometimes this is not possible.

The world just isn’t this black and white. Not all rewrites are bad, and not all cases of preserving the original code are responsible code stewardship.

Frequently the code rewrite is not an all or nothing proposition. It is possible to rewrite components of a system that are particularly gnarly, and that have a clean interface to the rest of the system. These components can be extracted and replaced without putting the company at risk.

The anti-rewrite position taken to an extreme might dismiss smaller lower risk rewrites that can improve the quality of the code and make future changes a lot easier and cheaper.

A rewrite can be beneficial because: (a) It can add unit-testing to the code which makes the code easier and safer to change. (b) It can rationalize the mishmash of design decisions taken in the code in the heat of the battle. Looking back it is easier to see patterns and abstractions that were not obvious when the code was being hacked. These abstractions can make the code more pliant and cheaper to change in the future.

Every code change is a rewrite at some level. It does not have to be the two extremes of either rewriting the whole system or just sticking in a hack to handle a special case.

I really disagree with this article. It does point to a tension that should and will exist between the “balance sheet” and “technology”, but its flaw is that it misunderstands how a well written code base can position a company to move faster than anyone else, to use the buzz word, it makes a company more “agile”. Sure you can keep adding feature after feature onto some existing codebase that gets messier and messier with the justification that it advances the business, but with each new feature adds more risk and more time to the next feature. Soon a development cycle for a simple feature is a month when it should be a day. In short, re-architecting systems should not be approached with flippancy, but should not be avoided at all costs.

“he can’t get the new features and platforms he wants (Facebook, iPhone and Android, etc.) ”

Actually there is your new software requirement.

Optimistically, one year for rewrite, then one year of product. At the end of two years your software will then have to be rewritten again, to add features for a whole new slew of features and platforms.

IMHO, this is a lost cause. You will always be one year behind adding features and platforms. In order to be ready to add to production you will be in rewrite mode.

The last point is that what if the time frame changes? Rewrite takes longer or new platform needs to be added sooner than two years.

I would advise they rewrite, even though they aren’t likely going to gain much in terms of features, they gain a solid infrastructure that has really cleared out the code that isn’t used.

They become more implicit in their technology, a feature that is often unspoken but matters a lot when it comes to usability and process.

It will make future changes and future addons much more efficient and easy to implement.

Look at how often iPhone apps are updated, if there isn’t new content than I am not really likely to care unless I use the app often.

Sure its a ton of work, but it will bring the team closer in terms of everyone having an understanding of how the system is built.

Make the codebase solid, it will give the engineers a better confidence in the product.

If they so much as knock a few seconds off the time it takes to use the product its a huge win in itself.

The end result is you have a very tightly built and powerful application that people want to use because it’s absorbed the years of product knowledge into it and the result is a much stronger and faster application.

I think this title is misleading. Maybe it should be “Company Building Suicide – Rewriting The Code”. As you have taught us, the goal of a startup is do whatever it takes to find product/market fit and a scalable business model/sales process. Our goal is to cross the line where the “Business Model is Found”. I assume this is what you mean because you mention rewrites in terms of years, not weeks or months. I don’t know many startups who have so much code it will take a year to rewrite.

I’ve been at a number of startups who have had to rewrite their codebase to support a pivot while they were still searching for a business model. When you pivot, the old codebase, architected on different principles, may not support the new business model. Sometimes you have to rewrite so that you can support a new shift and new features. It’s like the cliche: 1 step back, two steps forward.

I’ve been at a company that rewrote it’s codebase while in the company building phase and it almost sunk the company. No good engineer wanted to work on the old codebase. Existing customers hated the fact they were on the “old” product and it caused pain throughout the organization. I’d argue that if you find your business model, a rule of thumb would be no more rewrites. It is too expensive.

However, while you are iterating towards a business model, you do whatever it takes to get there, even if it means you have to throw it all out and do it again.

During my time as a business guy (I occasionally code now) I was always struck and confused by the tendency to rewrite.

It seemed to me that every time a new lead engineer came on board one of my teams, or a project was shifted to a new team, the first reaction was always, “bad code, poor choice of language, we should rewrite it.”

Now having taken up some coding I have some sympathy towards this viewpoint. It’s always easier to understand your own code and it’s very difficult to learn to think how your predecessors did.

Now with some basic perspective from both sides, it seems clear that the solution is never a complete rewrite, but steady progression to more modular code which can be refactored piecemeal only when absolutely necessary. Regardless of what the CTO says, a total rewrite requires a doubling of the budget and timeline.

As a CEO I have faced this problem and managed to avoid committing suicide. The problem is when you are a start up, your resources can be limited, you need to start generating revenues to pay the bills. At this stage of the companies development, code isn’t seen as important as getting cash through the checkout.

In my case we were heavily reliant on one brilliant coder. As the site grew it became difficult to add new developers because of teaching them the new code. He was also rightly nervous of letting others add to the code and risk losing control. There was no documentation and he was the only one who knew how it all fitted together.

As a CEO I was faced with a strategic decision. Wait longer and longer for new features and products to be added, together with the risk of only having one developer who knows the code intimately. Or take deep breath and make a strategic decision to forgo new features/products and growth whilst the site is recoded in a framework that others understand and can add to. In essence it’s about managing risk. I chose the latter option with the co-operation of the original developer. In the short term we have lost market share, competitors have added new features and products and our old website looks well…. old.
In the medium-long term we now have a solid platform in a framework that others can add to. New features and products can be added more quickly and we have a more sustainable business.

The reality is there are risks associated with recoding and not, the CEO’s job is to weigh those risks and make the right decision. If he makes the wrong one then perhaps he was in the wrong job.

We were in this situation. Very old code base, complicated billing system that for every bug that was fixed, one or two others appeared! It would never stand up to the new set of requirements coming through.

7 months later. Billing system rewritten, much easier to maintain, very few bugs compared to old code. Started implementing new requirements very quickly. Overall, a success.

Ditching bad code is much easier when there’s less of it. Throwing away the first iteration takes discipline and must be executed cold and ruthlessly. It is not enough to admit the code needs rewriting. You must be prepared to delete code from day 1.

3 months is a good life cycle for a functional prototype. Budget for 1 month of overhaul for the first 3 months of development and 1 month of overhaul for every 1 month of development beyond that.

If you re-write and re-architect AT THE SAME TIME, you maximize the chances for a second systems design failure. Clean and replace module by module isofunctionally. Application Interface boundaries, whether human to computer, computer to human, or computer to computer are useful encapsulating project boundaries as well. You are either completely within a capsule or only changing the rules between capsules. If your overall architecture comes with a data flow diagram and transform centered design this is typically possible to do correctly in reasonably sized chunks.

This is a great example of why it’s important to think about the future of your code when doing early business planning. When your business decides that it’s time to scale up operations then it’s time to deal with translating the exploratory code into production code that’s flexible and well-written. If a company is ever going to have a solid codebase to build on then it has to be solid from the beginning. Starting your company’s future development on an exploratory codebase is like building a house on top of sand.

I side with those who believes that rewrite is necessary when the cost or running the old codebase is about to overrun the cost of creating a new trunk. It’s like surgery – you only do it if all conservative methods fail.

and of course the old product should not be deprecated until the new product is sound and firm on its own feet, accepted by customers and went through at least 2 upgrades.

I’m sure the engineering team is being honest when they say that the current state of the codebase is preventing features from being written. Lots of commenters have seen this scenario, as have I.

But a “drop everything and rewrite” project plan (if that’s what this is) deserves some scrutiny. I don’t know the details, obviously, but I’d hope that the engineering team considered some options for how to phase in the rewrite. And I hope they’ve explained how they’ll manage the risk of scope increases.

I’m a developer. When I argue to refactor/rewrite, it’s my job to explain concretely what I’m trying to achieve (e.g., with a user story or other concise explanation of “why this matters”). It helps me stay focused on the right problems and avoid overengineering.

Where possible, it’s also my job to find a low risk way to phase in a rewrite. If the rewrite doubles in scope, will I prevent the product from shipping? A common way to solve this problem is to set up a parallel stack (if such a thing is possible with your product).

I am going to be the consultant here and say that the correct answer really depends on the company’s specific situation. There are no hard and fast rules to determine whether to continue with the legacy system or write a new one. Instead you must define the criteria and then use that to make a decision on the best solution. Without making a decision on thought or facts you are doomed to failure regardless of the approach.

What I would ask the author is a question about the features. Are they improvements to the existing product or are they really the impetus for defining a new product? If they are improvements and still meet the core mission of the current product then you can make a product/business case for just modifying the existing code. If rather they are defining a new product then there is a great argument to start fresh.

Like all consultants say it depends but first determine your goals and criteria. This will lead to an intelligent decision that will be defensible in board meetings a year later.

You have a vibrant community here and you’d drive even more engagement with your readers. I can guarantee you’d see 2X the comments.

Just one reason why: folks will receive e-mail alerts when someone responds to *their* comment, and that will prompt further discussion.

As of right now, if you don’t subscribe to ALL comments by e-mail (a recipe for a full inbox with a blog attracting as many comments as yours does), you don’t know if someone responds to your comment or not.

It is true that a code re-write is very risky… and I had this exact discussion with my CTO a few months ago… We basically stuck with the compromised code, (from his POV) and just gradually began to adapt and modify various code elements as we moved along. – The important thing is that we had an intelligent discussion about the real issues of market forces, efficiency of the software and the need to remain focussed on the user’s experience. To that end, I can really relate to this article. Its true that engineer’s are prone to ‘engineer’s solutions’, and marketing execs are naturally focussed on the user experience, and sometimes lack of a clear understanding of the ‘back-end’ issues. In good teams, there is good communication and this can help to minimise any self-made catastrophe.

Very good post Steve, it made me think about such decision and the provoked an interesting discussion.
I would add that software development is, by its nature, a changing activity. If the schedule to rewrite the code is too long the changes may become meaningless, specially considering the fast obsolescence of new features (e.g. support to MySpace, then Twitter, then Facebook or whatever comes in the near future).

What if the code rewrite turns the solution more flexible to changes ?
Difficult to evaluate based just on ROI figures.
Henrique

This post brings up so many bad memories of my last startup. We had such potential; we had the killer app; we were the first movers in our market.

But we had “shipped our prototype”. Every engineer was working at about half normal productivity, because our code base was so fragile. We begged management to let us rewrite, to make a better product and be more productive.

Management made the decision Steve suggests, to avoid the rewrite. The suits didn’t understand our problems; they only understood, correctly, that our recommendation carried risk. They avoided that risk.

As a result, releases shipped more and more slowly, with more and more bugs. Customers often bought one license — but then failed to buy more. Engineers, tired of doing poor quality work, began leaving for greener pastures.

Our startup was surpassed by competitors, with better technology, moving more quickly. We had pioneered, and proved the viability of the market, so others were quick to follow. Our product was finally acquired, I suppose for its customer list, by a larger company who had the good sense to kill it.

[…] adoption. Despite a lack of clear answer the odds were clearly stacked against rewrite. Even some industry veterans have also vouched against it. We decided to go for a hybrid approach of encapsulating the core […]

Often times, the problem is more with how companies allocate time and resources. In my experience, what many crufty code bases need is one or two (but no more!) of the most senior engineers to be given the time and thinking space to get deep into the code and make surgical fixes. They need to be given the time to set up tests (which are often missing), think about things without being distracted, and stay in the flow to balance all the mental balls they have to have in the air to understand a legacy code base.

The problem is, engineering managers struggle to sell that allocation of resources to their managers or board members. It smacks too much of letting the engineers just geek out without adding new features. But, if positioned as a “re-architecture” or a “re-platforming” then it has the smell of a real project with real deliverables. Sometimes that’s the only way to get funded.

So, if you want to avoid re-writes, you have to leave space in your schedules for ongoing tweaking and the occasional thorough spring cleaning of the code.

Rewriting can be a mistake, but if Microsoft hadn’t written Windows NT to replace DOS, they’d be a footnote in history. The same is true with regard to OS X and the original Mac OS. WordPerfect didn’t rewrite when Windows came out (they made their DOS app more “Windows-like”), and where are they now? After a too-late native Windows port, they lost the market.

Rewrites are necessary when you acknowledge the technical debt has reached such a level that you have to. It’s a technical bankruptcy. That there are many risks to rewriting a product doesn’t mean it shouldn’t be done, and none of your fast following competitors have that disadvantage.

If you take the stance that you will carry an expensive, unadaptable code base, it’s that fast follower (See “Why Pioneers Have Arrows In Their Backs” on this site) who builds a better code base to compete with you who wins.

The key here is that someone wants the market you have, and they don’t care whether you rewrite or not. If you don’t make the right choice, you’re dead, and the right choice may in fact be the rewrite.

shouldnt the decision be based on the problem at hand rather than an aversion to a particular means to an end?

i recently took a job at a non-startup working on a financial product that had been on-again-off-again for nearly 10 years, then finally launched just a few months prior.

it was garbage.

had a total user base of maybe 200, with a target market of around 1 million – even 10% penetration would have been acceptable there.

the code was so bad. so very bad. there were precisely 0 tests. it had no cohesive software design, it was littered with every antipattern you could find, and as a financial product, IT COULD NOT ADD UP.

standing back, and looking at where tehy wanted to take it, and weighing up what they had, any choice other than to babysit that deployment into the ground, and rewrite it ( a working system with a good team i had a finger in the wind estimate of 6 months, then add features regularly on continuous delivery/deployment basis ).

the decision?

‘evolve it’*

6 months later, they are still slowly putting together a full suite of functional tests with very expensive consultants, with incremental changes to front end code to mask over the obviously broken parts.

in 6 months time that process might be done, might not.

6 months after that, they might have been able to refactor enough to start bringing in the new features.

but i doubt it.

what i’m trying to say is that theres a time and a place for a rewrite, and when the starting position is a technical liability, both costly to maintain and preventing a platform from moving forward, rewrites must be considered as a sane approach.

*once the evolve it decision had been made ( or rather, once i realised they never intended anything else, despite being hired for technical leadership ), i left. ( 2 weeks in )

[…] I am not saying that every system written in an elderly language needs to be replaced now. Doing so has its own problems, and they are real. I’m just pointing out the consequences of neglecting this aspect for too […]

I am a technical VP and have lived through several rewrites and you have hit this exactly on the head. There is no reason, none at all, why a talented, senior group of engineers couldn’t incrementally evolve any piece of software, no matter how crufty, to the right state of being. It just requires a matter I will, time, an tolerance for risk.

Here are the real reasons why an engineering team might ask for a rewrite:
1) lack if talent/imagination/motivation to slog through the old code: this is less common than you think. Much more often it’s….
2) easier to get budget and time for a big-bang, board-visible, cure-cancer-and-save-puppies rewrite than get consistent, incremental time to do a little refactoring with each new release. These things are always the first to go at crunch time, plus…
3) making the necessary changes to old code is usually risky because it has high coupling and low test coverage. So, a an engineer you can choose a risky but necessary refactoring or a “safe” but problem-compounding quick fix. If the quick fix goes badly, you can blame the crufty code. If the refactoring goes badly, the spotlight is shining right on you, buddy. If the rewrite goes badly, well responsibility is pretty well diluted by the time real problems start to surface.

The true solution to the problem is to:
1) believe anything can be fixed
2) leave some refactoring time in the schedule for every release, and
3) increase the organization’s risk tolerance around old, “battle tested” code.