September 9, 2024

Leaving Your Legacy (System to Us)

By John Cox · 3 minute read

DALL·E 2024-09-04 15.50.30 - An abstract transformation scene showing the shift from legacy software systems to cloud-native applications. The left side is dominated by rigid, mec

A while back I was asked by a customer how long of a ramp-up and knowledge transfer would we require to take over responsibility for their legacy system. My answer: "We don't need either."

"No knowledge transfer?"

None. We have it covered.

"The system that's been running on one server for years? With our fingers crossed that the server doesn't implode? The one with the original developers long gone and us only having anecdotal knowledge of how it works?"

Yep - that's the one. We got this. We consider it a privilege to be entrusted with the responsibility.

"Wow! That makes me feel A LOT better."

This is not just a one-time thing - legacy system takeovers are some of our favorite projects. They offer ample opportunities to make real improvements in efficiencies, security, stability, and performance for our customers, without forcing major reinvestment into building new technology. And nothing beats the staccato dopamine highs of squashing bugs, console errors, and compiler warnings (for real).

So what makes a legacy system anyway? I think the term conjures up images of an arcane application written in COBOL running on a rusty PC tucked under a desk somewhere, but that is not usually the case. I believe a legacy system is one that fits one of the following descriptions:

A system for which the original designers and implementers are long gone from the organization and for which much of the institutional knowledge is gone. Much of the forward development is done via copy-pasting tack ons, turning a luxurious palace into a mud hut.
One that has continued to rack up technical debt for years and years, evidenced by dramatically out of date platform and dependency versions (Python 2, anyone?), 1000's of compiler warnings, console errors, dead/commented out code, and massive Jira backlog.
One that is deployed in a very risky way, like on a local machine or manually in the cloud via SSH, git pull, build script, and magic incantation. The hope is that the server will continue working (and hope is not a strategy).

So what's the magic secret that we bring to the table? Nothing magic about it at all. We are engineers and we like to see things work. That being said we do have a methodical approach to taking over legacy systems of any type.

Stabilize the Infrastructure

The first thing we do is stabilize the infrastructure. This usually means that we:

Design and implement the cloud infrastructure in code and establish a repeatable means for deploying the entire system infrastructure.
Replace critical high-risk and/or costly components with managed cloud services (e.g. replace a Mongo DB running on an EC2 instance with managed DocumentDB).
Implement automated software build and CI/CD system with traceable build information.

This strategy is what you might call "lift-tinker-and-shift" or, as AWS refers to it, Replatforming. The very important part for us, however, is getting the application running in a clean AWS architecture that is defined in code (generally for us this is Terraform) and deployable from the cloud (for us this is using env0). We generally try to tinker as little as possible at this point, with the biggest code changes probably related to CI/CD or possibly the swap out of hosted components with managed services. The goal in that case is to take some of the risk of deployment and management off the table.

Make it Easier to Operate

Our next objective is to make it easier to operate. By stabilizing the infrastructure we've already made it much more predictable and lowered the risk, and now we want to do some additional work to make things simpler and more understandable. Again, there are usually three parts to this.

Improve deployment processes for software and infrastructure.
Start to build in automated observability features to signal when things may be going wrong.
Shut down unnecessary components, environments, or accounts to keep things simple.

This work not only makes the system more stable and observable, it lowers the burden on us to manage and operate. A deployment before might have been a 50 step process requiring all hands on deck and done late at night. Now we can do that typically with one person and often in real-time while the system keeps serving end users.

Pay Down (some) Technical Debt

Now that we have things humming along we want to start to clear at least some of the technical debt that has accumulated. We can tackle the backlog of bugs and problems. We can upgrade software packages and dependencies. We can kill those annoying compiler warnings, clean up code standards, improve logging, and much more. All the while we are preparing for the best part.

Modernize!

Now that we have reduced our risk, made things easier to manage and operate, and brought the codebase into the 21st century, it's time to move the business forward! We can replace application features written into the code with managed cloud services, not only lowering cost of maintenance, but also reducing failure risk and getting access to new features. We can wrap APIs with API Gateway and AppSync to give us access to more features, better security, and greater reliability. We start building new architectures, like event-driven systems, around our existing application to really start to transform the way the system allows the business to react.

This is our favorite part - where we get to play with new toys and breathe new life into the system. We love advising our customers on how to leverage cloud platforms and third-party applications to meet new market opportunities. It's a privilege we earn by the work we invest in understanding and caring for their legacy systems.

Got a software system that needs love and maybe some restoration? We'd love a chance to help.