Another, more different, type of technical debt

It dawned on me this morning that I have lately been hitting a lot of technical debt, but of a different sort than I’m used to encountering.

Technical debt is a metaphor that Ward Cunningham came up with to compare technical complexity and debt. The term comes up a lot in software development and generally the ‘debt’ is when you do something ‘quick and dirty’ to hit a deadline and then end up paying ‘interest’ for it later when you encounter problems: bugs, less robust code, code that is harder to work with, etc. If you pay off the debt soon after the deadline is hit by writing ‘good code’ then you just did a short-term trade off–but most of the time it is ignored and the development time is turned to the next deadline and over time this can build to the point where much of the work a team is doing is servicing the debt. It’s also probably brought up by engineers for more cases than necessarily apply–if today’s new feature isn’t found to meet a real need/increase demand/bring in more business, time spent doing it ‘the right way’ for building upon and extending later is wasted. (Agile development hopes to understands this.)

Ordinarily when I’ve encountered technical debt, it has been much like what the wikipedia article discusses and is often considered ‘friction’ in the system which slows down new development and maintenance.

As I’m currently involved on a product which is being radically de-staffed (not off-shored, but cut across the board) I’ve been encountering another sort of technical debt: with fewer and fewer people available that have less and less of a complete understanding of this complex physical+software system, the harder it is to maintain that system, let alone continue to do any development. Technical debt that has built up is based on work and processes spread around a large number of machines which are all to often old, physical machines of dubious need. Lack of documentation about what state things are supposed to be in. Drift of development over time which never moved important pieces to whatever the modern system leaves us with 100’s of machines which occasionally require direct user interaction when a critical bug or security breach is discovered. Development workflow that takes just a tiny bit of hands-on interaction, even little things like adding a build number to the bug tracking database, quickly drag everything to a halt when you have only 20% of a team left.

These are bits of technical debt that affected the development team when under ‘growth development’ as well, but in a less severe way as there were plenty of people (to many?) to spread the pain around. But they were known. And I wouldn’t suggest that the team should have spent time worrying about fixing problems that would purely impact ‘post growth upkeep’ before it was clear that the product was being shelved. But there were a great many of ‘best practices’ that weren’t followed over the years that become glaringly evident when the development shifts. I know I certainly had opportunities to recognize that random machines stacked in a server room were something that should be addressed, but it seemed like such an improvement compared to the original problem of random/critical machines sitting under various desks.

Going forward, I think my main takeaway from this situation is a broader understanding of how technical debt can balloon in different ways and recognize the risks it presents.