Technical Debt and Design Death: Part II

24 July 2006

Kane Mar
Scrumology Pty Ltd

My last article dealt with technical debt and design death. This article is going to expand upon those ideas. In a far-from-academic way, I’d like to discuss what happens once you already have legacy code. What are your options for dealing with it? Where exactly is the point of no return?

Legacy Systems

legacy system (n): A computer system or application program which continues to be used because of the cost of replacing or redesigning it and often despite its poor competitiveness and compatibility with modern equivalents. The implication is that the system is large, monolithic, and difficult to modify.

If legacy software only runs on antiquated hardware, the cost of maintaining this may eventually outweigh the cost of replacing both the software and hardware unless some form of emulation or backward compatibility allows the software to run on new hardware. - The Free On-line Dictionary of Computing

Any reasonably large bank or insurance company will have a multitude of legacy systems, many of them “green screens” (i.e., dumb terminals connected to an IBM mainframe running some Cobol database application). There are many hundreds of banks in the U.S. alone. Even comparatively new companies (less than twenty years old) can have legacy systems, although they often use the term core rather than legacy when describing them.

Anyone working on these legacy (or core) applications will identify with the characteristics of design death identified by Ken Schwaber and which I outlined in Part 1. To recap, these are:

  1. The code is considered part of a core or legacy system.
  2. There is either no testing, or minimal testing surrounding the code.
  3. There is highly compartmentalized knowledge regarding the core/legacy system, and it may be supported by only one or two people in the company.

I would add one additional characteristic and that is this:

      4.  The legacy system is not in a known state.

By that I mean it can be difficult (if not impossible) to determine the state of the system at any given point in time. Installing the system and recovering after a failure is often considered to be some form of black art.

Entropy is a term borrowed from thermodynamics, but when applied to software it can be considered a measure of disorder. Entropy in software comes from changes to the code base such as bug fixes, updates to existing functionality, and the addition of new functionality. Over a period of time, these small changes amount to a system that can be difficult to change, is overly connected to external systems, and has no clear delineation of functionality.

Competition drives down the value of existing software. In order to remain competitive, companies must constantly add new functionality just to maintain the value of existing software. At this point, I was going to outline a fictional scenario to demonstrate how this happens, but as it happens, I came across an article describing an actual situation. This article demonstrates the relentless nature of competition far better than I ever could.

In order to remain relevant, software (or the service provided by that software) needs to continually increase in value. This implies that in order to remain relevant, software needs to be continually changed (i.e., updated). But the very act of this change increases the entropy of the system—and thereby increases the cost of change.

In the last six months alone I've talked to at least three different companies who are planning to rewrite their systems from the ground up. These are three very different businesses with different markets and business models. When asked why they were rebuilding, the common response was that the cost of change was too high. (Interestingly, two of the three also mentioned that the current EJB framework is too heavyweight and they are looking for a lighter-weight framework. They're both evaluating EJB 3 in addition to open source frameworks such as Spring.) Automated tests (unit tests, acceptance test, FIT/FITness test, etc.) help decrease the cost of change and help the system reach a known state, but without an adequate automated testing framework the task of changing the legacy system becomes increasingly expensive.

 

decline graph

Figure 1. NFI has had a consistent decline in revenue.

A Matter of Choice

The management at NFI knows that they need to increase the functionality to their system in order to remain competitive. They have three options:

  1. Add the functionality to the core system. This would be prohibitively expensive, as the ongoing addition of new functionality would exponentially raise the cost of the software. This is often not a practical (cost effective) solution.
  2. Introduce a temporary solution that would allow NFI to use the existing legacy system in addition to the new functionality. This would not address the underlying problem, but may give the company more time. One way in which this is commonly done is by building a web service layer on top of the existing legacy API. New functionality is then constructed alongside the existing system and they are both integrated at the web services level. What happens when new data needs to be added to the legacy data model?
  3. Reconstruct the existing functionality using new platforms and technology. This solution addresses the underlying problem. It is, however, more expensive than option 2 (but less expensive than option 1). How does NFI decide which option is most suitable in their particular situation?
Graphic Evidence

For the sake of argument, let us assume that NFI has decided to rewrite existing functionality. We can draw a graph of revenue (declining) and functionality (increasing) over time (see Figure 2).

functionality graph

Figure 2. Revenue declines as functionality increases.

I’ve used StoryPoints as a measure of functionality but, provided that the units of measurement are consistent, you can use anything. I’ve also made the assumption that any rewritten system is easier to maintain with a lower cost of change. I feel that these are reasonable assumptions if some Extreme Programming (XP) practices (such as Continuous Integration (CI), Test Driven Development (TDD), refactoring etc.) are used.

From the graph in Figure 2, you can see that the rewritten functionality will be completed in mid-2004. This situation is a viable option for NFI. If the company has time then this is one approach that should be investigated. But what happens when the cost of rewriting the most basic functionality of the legacy system is projected to take longer than the company has positive revenue? (See Figure 3.)

rewriting graph

Figure 3. What happens when the cost of rewriting functionality takes longer?

Deciding Factors

It's pretty clear that a company in this situation has some difficult decisions ahead. There may be some temporary solution that would allow NFI to use the existing system while building a new product, NFI may decide to borrow money to fund the rewrite, or NFI may want to consider returning any remaining value to their shareholders. Whatever the final decision, by constructing some simple graphs it's possible to present management with a number of different options for which to choose a course of action.

Article Rating

Current rating: 0 (0 ratings)

Comments

Anonymous, 2/27/2007 11:08:43 AM
Is NFI a real example or made up?

You must Login or Signup to comment.