When we build software systems, we are often building for the current moment. Inevitably the business/market changes over time to the point where our software system is no longer a good fit. This is nobody's fault. All decisions that lead to this point were perfectly rational at the time, but in the end we have ourselves a legacy system.
In many ways, legacy systems are success stories. They are the blood & tears that got us to where we are today. But all stories come to an end.
Legacy systems often are a huge source of work for on-going maintenance. It is those strange systems everybody hopes they never have to deal with.
We should responsibly decommission legacy systems when the time comes. Here are steps on how to do so.
Before we can get rid of a legacy system, we better know what it is doing.
Investigation into legacy systems may reveal it is no longer used. Effectively everything has already been migrated off of it. Sometimes this is hard to tell as even dead systems can receive and emit traffic. If this is the case, then skip down to step 6.
Don't be a dead feature hoarder. If the legacy system implements a feature nobody uses, then by definition it doesn't have value, it has cost. Have an affirmative argument on why this feature should exist. Shed the fear of, "but what if we need this feature in the future"? It will always be easier to re-add this feature when we better understand the context in which it is needed later.
Migrating to an existing system has non-zero cost. How does the dollar value of the legacy system change if we take into consideration the additional cost of migration? Maybe it is worth more to the business to simply decommission it.
But if we believe it is indeed worth migrating, often the value the legacy system can easily slots into one or more existing system. If that is the case, migrate the code and goto step 5.
Watch out for incomplete migration plans. While its fine to take off chunks of the value off the legacy system one piece at a time, never partially migrate the value to an existing system and fallback to the legacy system for all other cases. This will only infect the existing system with concepts from the legacy system. This creates a zombie legacy system that has now grown in size.
But if there isn't an appropriate place to migrate the value we'll have to build a new system.
People often jump straight to this step, but without doing their homework, they are doomed to simply rebuild the legacy system. This is why step 1 is so important. We want to deliver the value the legacy system still has, not replicate an existing feature set.
The same issue with incomplete migration plan also applies to incomplete new system plan. Never allow the new system to call the legacy system, or else the new system is a zombie legacy system. If during a partial migration, change the legacy system to call the new system. This forces the legacy system to conform to the new design. Just be conscientious on the coupling. This can in itself be a bad idea if there is no planned alternative for the new system to receive the signal the legacy system is providing.
Ideally the migrated value or new system can run along side the legacy system. If that is the case, then finding the appropriate pinch point can allow gradual migration of the users. For example, use the A/B split test or feature flag functionality to allow a subset of users to use the new system. This will allow operations staff to carefully monitor the migration and rollback traffic if necessary.
Watch out of implicit users of the legacy system. Is the legacy system running a scheduled job which the rest of the system depends on? These are hard to detect as traffic will be sporadic.
Otherwise if only one system can be up at any given time, then a maintenance window is required. However, this often isn't the case. There is almost always a way to perform a zero-down time deployment, it simply requires more planning and design.
This is the easiest step right? Just turn off the legacy system, take it out into the parking lot and take turns at it with a sledgehammer. Wrong.
This is by far the riskiest step. Given the fact the legacy system has stuck around for so long, it is more than likely there is some unintended dependencies upon it.
Even after carefully checking there is zero traffic going to the legacy system, there can still be a dependency upon it. For example, AWS Security Groups that were provisioning by AWS Elastic Beanstalk can accidentally be re-used inappropriately by other systems.
Instead of deleting the legacy system in one go, recheck each component isn't used, then deactivate the part and see what happens. Once everything has been successfully deactivated, then final permanent deletion it is far safer.
Decommission is when the rubber meets the road. If we do a thorough analysis of all the value the legacy system provided? Did we migrate the all the value? Did we migrate all the explicit and implicit users off of the legacy system?
Successfully decommissioning a system is a badge of honour. It is one of those rare opportunities to see the full system life-cycle. Now the last step is to delete the code repository, but nobody is brave enough. Fortunately, the cost of unused code repositories are pretty low, especially if one archives them.
Do you want to kill legacy systems? You're in luck, Battlefy is hiring.