Tech Debt IRL

A recent conversation with some of my photo group turned into a Q&A regarding modern software development approaches. (If you’re wondering why we weren’t taking photos, this was over the post-walk lunch.) One interesting topic started as “What’s technical debt?” Rather than use the traditional banking metaphor (which is, after all, how the term originated) I explained it using Toronto’s public transit system (the TTC). The similarities to software development are clear: there is a constant need for maintenance and emergency fixes, but also demand for new features (in the TTC’s case, this includes new tracks, stations, and rolling stock). There’s never enough resources (time, money) to do everything, so the challenge is to find a balance. Unfortunately there’s often pressure to focus on new features at the expense of maintaining the older ones. (Does this sound familiar?)

TTC logoAs most commuters in Toronto know, the TTC has fallen so far behind on maintenance tasks that it’s now having to shut down major sections of the subway for days or even weeks at a time. People are understandably irate at the disruption this is causing, but the alternative is to ignore the backlog of problems and let them build up. Of course the best solution would have been to stay on top of the maintenance as it arose, but that would have entailed delaying some new features which would have been unpopular and probably politically unacceptable.

(*I realise I’m conflating the original meaning of tech debt, i.e. the consequences of cutting corners in order to deliver sooner, with maintenance but that is the common understanding and usage that I see these days.)

One complication which might be better explained by the financial analogy is the concept of compound interest: a delay in maintaining something doesn’t usually result in a constant or even linear cost profile. When that maintenance task is finally undertaken, there’s not only the original issue to be addressed – there are issues caused by the delays and inattention, for example the parts may be discontinued or the knowledge of that old system may no longer be unavailable. (The challenge of finding COBOL programmers to tackle Y2K issues springs to mind.)

For the TTC, just like the product owners in our organisations, they have to walk a fine line between supporting legacy systems and building new ones. We also need to do our best to prolong the life of new systems before they’re considered legacy too. It’s rarely a popular decision, but there needs to be constant attention to keeping those legacy systems running until it’s time to replace them or shut them down.

One afterthought was that most new features are justified using the cost of building and releasing them; rarely do they include running (support) costs and other expenses incurred through to end-of-life. I wonder if that longer-term view would help support the need for ongoing maintenance?