Sunday, January 2, 2011

It Looked Good When We Started

"Well, in our country," said Alice, still panting a little, "you'd generally get to somewhere else — if you run very fast for a long time, as we've been doing."

"A slow sort of country!" said the Queen. "Now, here, you see, it takes all the running you can do, to keep in the same place. If you want to get somewhere else, you must run at least twice as fast as that!"

— Lewis Carrol, Through the Looking-Glass, and What Alice Found There

The Red Queen's race is often quoted in discussions about the increasing rate of change in our lives, which demands running just to stay in the same place. This is particularly true in software development, where change is the only constant factor. Any successful software system will need to be changed, and, if it can't, will fall by the roadside. Change can be due to many factors, such as new legislation, new standards, new business opportunities, the need to support a larger volume of business, or the need to move to a new platform (for example, when an operating systems is no longer supported by the vendor). The introduction of a software system into an environment that didn't have one before changes that environment; that change ripples to create new requirements. Perhaps initially the system was only meant to replace manual work, as when using computers to track customer orders instead of doing it on paper. But once the system is in place, users realize that it can do much more; for example, letting customers enter their own orders through the internet. Adding that functionality may trigger the idea that customers may now be able to track the order fulfillment process through the internet. Every change is the catalyst for another one.

There is a common misconception that software is easy to change. It is certainly true that distributing software updates is much easier (and cheaper) than changing hardware. However, making changes in software correctly while preserving previous functionality and without introducing new bugs is very difficult, as all experienced software developers know. (In spite of that, they are often content to get their software more-or-less working, and rely on the users to discover problems and on future releases to fix them; see Programmers as Children.)

Obviously, it makes sense to prepare for future changes, so as to make the inevitable changes as easy to make as possible. Some requirement changes, however, are beyond prediction, and many predictions that seem obvious at the time turn out to be wrong. These unfortunate facts have far-reaching implications on all aspects of software development. They mean that it is impossible to capture the precise system requirements, since these will change unpredictably. They also mean that it is impossible to prepare for all future changes, since every design decision renders some future modifications easier while making others more difficult. But they do not mean that we should just build for known requirements and ignore the possibility (or rather, inevitability) of future changes.

The Agile Development community takes an extreme view of our inability to predict future changes, and uses the rule "do the simplest thing that could possibly work" (abbreviated DTSTTCPW). This rule is based on the assumption that keeping the code as simple as possible will make future changes easier; it also assumes the meticulous use of refactoring to keep the code as clear as possible while making changes. (I have a lot to say about refactoring, but that will have to wait.) However, there are some decisions that are very difficult to back out of, and I believe that most agilists will agree that planning ahead on these decisions does not violate DTSTTCPW.

One of the earliest decisions that must be made, and one that has a profound influence on all aspects of the development process, is about tooling, including the programming languages to be used. Changing the language after a considerable amount of code has been written is very difficult. Changing languages in a system that has been in operation for over twenty years and contains several million lines of code is almost impossible; this almost amounts to a complete rewrite of the system. Much of the knowledge about many such systems has been lost, and the original developers are probably doing other things, possibly in other companies, and may even be retired. Yet there are many large-scale critical systems that are over twenty and even thirty years old.

This received world attention in the 1990s, with the famous Year 2000 (or Y2K) Problem. Programmers in the 1960s were worried about the size of their data on disk, since storage sizes were much much smaller in those days. It seems incredible that today you can get a terabyte disk for well under $100; in those days you would be paying by the kilobyte. So the decision to represent dates using just two digits to represent the year was very natural at the time. In fact, it can be argued that it was the right decision for the time. It was inconceivable that these systems would last until the year 2000; it was obvious that they will have to be replaced well before then. Programmers kept to this design decision during the 1970s, 1980s, and, incredibly, even sometimes in the 1990s, although by then storage sizes have grown, prices came way down, and 2000 was not the far future any more.

To the amazament of the 1960s programmers, some of the systems they wrote then are still in operation today. Naturally, these survivors contain some of the most critical infrastructure used by banks, insurance companies, and government bodies. All of these had to be revamped before 2000 (actually, some even before 1998; for example, credit cards issued in 1998 expired in 2000). This cost an incredible amount of money, and quite a few companies made their fortunes from Y2K remediation. The fact that this effort was largely successful, and none of the doomsday predictions made before 2000 came to pass, should not detract from the importance of the problem and the necessity of fixing it.

Now switch points of view, and think twenty years into the future. The code we are writing today is the legacy of the future; what kind of problems are we preparing for ourselves or the people who will be saddled with these systems after us? I claim that we are being as incosiderate as the programmers of the 1960s, and with less justification. After all, we have their lesson to learn from, whereas they were pioneers and had no past experience to draw on.

These days, we are building systems using a hodgepodge of languages and technologies, each of which is rapidly changing. A typical large-scale project will mix code in Java for the business logic, with Javascript for the browser front end, SQL for the back-end database, and one or more scripting language for small tasks. A legacy Cobol system may well be involved as well. The Java code will be built on top of a number of different frameworks for security, persistency, presentation, web services, and so on. Each of these will change at its own pace, necessitating constant updates to keep up with moving technologies. How will this be manageable in twenty or thirty years?

Think of a brand new application written by a small startup company. They are very concerned about time to market, but not really thinking twenty years into the future (after all, they want to make their exit in five years at the most). And their application is quite small, really, just perfect for a scripting language such as Perl, Python, or Ruby on Rails.

If they fail, like most startups, no harm has been done. But suppose their idea catches on, and they really make it. They keep adding more and more features, and get more and more users. Suddenly they have many millions of users, and their system doesn't scale any more. Users can't connect to their site, transactions get dropped, and revenue is falling. Something must be done immediately! The dynamic scripting language they used was very convenient for getting something running quickly, because it didn't require the specification of static types, classes could be created on the fly, and so on. But now that the application consists of 20 million lines of code, it is very difficult to understand without the missing information.

This scenario is not fictional; many successful internet companies went through such a phase. In fact, even infrastructure tools went through such changes. For example, consider PHP, one of the most successful web scripting languages today. PHP 4 was downward compatible with PHP 3, although it had a number of serious problems, because it was deemed inappropriate to disrupt the work of tens of thousands of developers. PHP 4 was wildly successful, with the result that when incompatible changes had to be made in PHP 5, hundreds of thousands of developers were affected, and adoption of PHP 5 was delayed.

I think that many times developers use the wrong tool for the job (see The Right Tool for the Job) because it looks like the right tool when development starts. Changing circumstances then make it the wrong tool, but switching tools can be extremely costly. With hindsight, it was obviously a mistake to base the whole application on a scripting language in the scenario described above, even though it looked like a good idea at the time. Can we use foresight to avoid such mistakes in the future?