Sven Johann talks with Dave Thomas about innovating legacy systems. Dave clarifies first why legacy systems are both valuable and problematic. Next, they discuss bad systemic and good incremental approaches for innovation of legacy systems; why you shouldn’t rewrite an old system but rather focus on tactical changes to reduce cost or increase productivity within one quarter; examples of how to measure success of the innovation; how to get full support from management to use any technology you want; when cleaning up the codebase makes sense and when not; good ideas to innovate the codebase; approaches to introduce tests; and how to deal with data

Transcript brought to you by innoQ
Sven Johann (SJ): What’s a legacy system?
Dave Thomas (DT): A legacy system is something that’s important to your business because it’s generating revenue and keeping your customers happy. It’s called a “legacy” for various reasons by young people; they turn up their noses because it’s not in their favorite programming language or methodology, perhaps because it’s older technology. But it’s also a legacy because it’s established. And in many cases, it’s the core supporting system for the business. It is the legacy of past innovations.
SJ: Why do legacy systems have a bad reputation in software development?
DT: We teach people about greenfield projects. Most new software developers are keen to use the latest technologies and methodologies. We aren’t always great at documenting our systems or making them testable. Often, systems that have been around for a long time are very difficult to change. One of the problems, of course, is that any good software probably needs to be killed and rewritten after its third release.
When you find things that haven’t been refreshed and redesigned, and they’re approaching their seventh or eighth release, they’re very difficult to change. Often, they use technologies that people aren’t familiar with. They’re certainly less malleable and less agile than people would like them to be.
SJ: You mentioned education. Is it necessary to change the curriculum at universities?
DT: We teach a lot about programming in the small, which is natural because universities are limited. Having a background in systems engineering or software at the systems level is important because large systems are very different than single applications or websites. It’s important to give people awareness of the complexity of reality and that we have to deal with new technologies. That’s one of the problems. You need to be able to deal with past technologies because any established business has core systems that were built in previous technologies.
I think it’s really more a healthy respect for and an understanding of the strengths and weaknesses of different generations of technologies and methods. It’s also important to understand the difference between what a single programmer can do and what large teams of programmers can do. Even the best practices of refactoring are really a joke in the context of a large legacy application. Refactoring tools really don’t help you with large legacies.
SJ: How can we deal with legacy systems? I’ve been in the “big rewrite” multiple times, which mostly failed.
DT: Big rewrites fail. Outsourcing tends to fail. Most of the classic approaches fail. That’s because systemic change is very difficult. Outsourcing tends to fail because shipping it to someone else doesn’t really change the problem; it might temporarily reduce the economics because the cost of the programmers is cheaper. But the cost of the programmers increases when you send it offshore, because they’re getting paid better for better people, and you don’t have the domain knowledge.
The difference between actually solving a problem that matters to the business by tactically approaching part of the value chain and cracking it is much more important than trying to rewrite an entire application.
SJ: We all think we can just rewrite the whole thing, only better. What are the problems with rewriting?
DT: The rewrite turns out to be a lot more complicated than expected. People typically don’t really understand the system before they start changing it. In many systems, you don’t have a specification, nor do you have tests. Unless you’re prepared to develop a substantive body of tests and have the appropriate documentation, it’s very difficult to accurately rewrite.
Refactoring is supposed to be equivalence preserving. But a rewrite is never equivalence preserving because there is always immense pressure during the rewrite to add functionality. In many cases, the rewrite gets you from a system that was in language Y on machine Z to a new system in language C on machine Q that does the same thing. No one wants the same system rewritten in another language. They want a better system.
SJ: You could argue, “It’s not the same thing because we reduced technical debt, or we need to have a modern system to retain our good developers.” Is that valid?
DT: In my view, neither of those is a valid argument, although certainly they’re used. The real issue is that you need a measure to demonstrate that this is true. Only then can you construct a business case. Say you’re doing the new code in C++. You might be able to get those developers, but are they really good developers? Are they as good as the ones you had before?
I don’t think rewriting an entire system is ever justified. I can see building critical pieces of a new system using new technology, gradually replacing the old system. Those things make sense because they’re driven by some clear business value and timeline. A rewrite can take at minimum a year, sometimes two or three years. I don’t think any business is interested in waiting for that amount of time. Businesses still like things to happen in a quarter, and you’re not going to rewrite much of a major system in a quarter.
SJ: During that time you’re also chasing the existing system, because most organizations need to also maintain and enhance the old system.
DT: This often creates a culture of the “tiger team”: the people who get bragging rights because they’re programming in the new language with a new technology. Inevitably, they say, “We’re gonna do this and this and this.” So, the systems and the expectations grow. Managing that is very difficult as well.
The big rewrite is a loser’s proposition. It’s as stupid as adding more people to a late project. It’s much easier to attack those parts of the system where you can deliver a business value or reduce cost, and then apply the innovation from those focus points to gradually change the way your business operates.
SJ: In a previous SE Radio show, we had the example of Twitter. They started writing their application as a Ruby on Rails app. At some point, it didn’t scale anymore. That was the reason for a partial rewrite.
DT: Sometimes people don’t want to program in a given language, so they decide that one thing is bad and another thing is good. The problem with Ruby is that the performance and maintainability of the code is more challenging because you can write it a lot faster. Anything you can code [quickly] is both good news and bad news because you have the advantage of getting functioning code quicker, [but you have to maintain more code]. It’s a tradeoff. When people use a company like Twitter as an example, they’re in the subset of companies that have a lot of money. They can hire the people they want to hire. That presents a different opportunity.
SJ: We discussed some of the bad ways of improving a legacy system. But what are the good ways? You said we have to deliver value. Does that mean we have to understand the business end to end before we can improve anything?
DT: There’s a software value chain that is impacted negatively in one way or other. Typically there are critical points in the value chain where making a difference would have a big impact. The approach we favor is that you find the part of the value chain where a bottleneck is or where accelerating or improving it in some way can make a difference. That’s where you can employ innovation. If you find a way to change the value chain at that point—in a period of three or four months—then you’ll probably get the support of management and you’ll be able to deliver.
The approach we use is to [create] a very quick prototype, in a few weeks, to demonstrate that the innovation will actually solve the problem. Then we validate that. We work at scale, because often you can have something that demonstrates an innovation but won’t scale. Then we implement it. It’s a pretty straightforward approach.
SJ: How do you figure out the value chain?
DT: If you’re working on something important, ask the senior executives, “What’s the most important thing you need changed with regard to your systems?” Usually they know. You can also talk to some of the key developers. Most of the time, you don’t have to do a lot of interviews. You should [take] some measurements to validate their assumptions of where the time is spent or where things are slow. People sometimes have intuitions about legacy systems, but it often turns out that the problem is not where they think they it is.
You should be able to find out what the major problem is in two or three weeks. The major issues usually jump out at you, and that makes the value clear. The change has to be worth it. In a major organization, you’re looking for at least a $10 million problem. I don’t like working on things that don’t save 20 percent of the total cycle time; it’s just not worth it.
Then you have to come up with the right innovation; that’s the creative part. At this point, we understand the problem. We understand what the value of improving it would be. If we could improve it by this amount, quickly and fairly inexpensively, then it’s worth doing.
The other solution is, “We just rewrite it all (or a portion of it).” And that may be the answer, but it seldom is. That’s when you can say, “We see this problem differently.” You propose a different way of doing things at this specific point in the value chain that will reduce the cycle time, increase the volume of transactions, or increase the reliability—whatever it is that you’re trying to improve.
That’s the fun part because that’s when you get to innovate. Most legacy systems provide a lot of opportunities for innovation. Typically, and unfortunately, software developers approach [legacy systems] with their current hot technology. Their solution isn’t really innovative.
SJ: Goals such as making the code base nicer or easier to maintain are hard to measure. You need measurable goals, right?
DT: Improving the code base so that it’s easier to maintain is something I would not give anyone a dollar for. It’s so hard to measure and so hard to do. There’s always going to be code that people don’t like. That’s just the way it is, especially when you have a lot of developers. Instead, you’re looking for something that’s tangible to the business. It’s much easier to talk about something that will significantly increase revenue or significantly reduce operating costs. Those are things that you measure. To bring in all-new hammers and saws and new smart people is an innovation of sorts. But it’s not innovation that’s really unique to that problem. Ask the question, “How could we really change this and get it done very quickly?” You must come up with a clever way of doing it differently.

Interesting discussion. One question which came to mind during the discussion: you (Sven) mentioned that you would delete the domain models. However what would you replace them with? I find it quite valuable to model entities as value objects instead of using just primitives (ie. have a Temperature class instead of just having a float representing the temperature).

I’d like to know too! Although I thought it was Dave who said it and I was surprised Sven just let it go by. I got the impression they were both really talking about ditching ORM in favour of FP-style data transformations (which is still a domain model IMO), but this point could do with a lot of expansion

My take on getting rid of ORM is, it makes sense in use cases like mobile sites (Jquery mobile, Angular JS, Phonegap, Sencha, HTML5 with a NoSQL back end like Node.js) or mobile apps where one can use two tier model (client/server) instead of three tier. Client makes direct calls to backend via JSON format which was not possible till recently with the advancements in the technology? I can see more usecases adopting this 2-tier Vs 3-tier model in future…

Hi, I really like SE-Radio but must say that the audio quality on this episode was not as good as other episodes. I found many times it was difficult to hear what Sven was saying due to inconsistent volume level.

Typical dilemma experienced by many legacy shops with 20+ years old systems with no documentation and the people who built have long gone? As the speaker explained, replacing them is very hard to cost justify unless one can come up with quantifiable benefits and convert them into $$. That said, the suggestion for incremental innovation is easier said than done with the mindset of the folks maintaining these antique, stone age applications? The fact that new generation doesn’t want to touch these only makes this harder, having to call the retired ones back to maintain the system built by their peers. In my opinion, as time goes, there will be extreme shortage of folks who can maintain these applications, making replacement inevitable? Till then most shops will be continue to be on life-support and will survive if they are not crushed by the competition (like Kodak, Barnes & Noble…)

[…] This is where I first learnt about REST, way back in 2008. More recently, the episodes on Redis, innovating with legacy systems, and marketing myself (which is why I’m making an effort to blog regularly!) really got me […]