Migrating from legacy database schema, rewrite legacy or sync to new

https://softwareengineering.stackexchange.com/questions/372733

06-02-2021
|

Question

We're currently in the process of designing our new version of an existing application. This new version will not reuse any existing code as the old code is over 10 years old and tightly coupled.

With this new application we'll be designing a persistence layer. And as we want to adhere to an agile development cycle. We'll need a way to have both the old and new applications running side by side. Personally I could think of two solutions to have the applications work on the same data, and I'd like to have your input in which solution is most viable. Of course if you know of a completely different solution feel free to answer that as well.

current setup

Currently we're running two MariaDB in master-master replication. Ids are auto incremented with one generating odd and the other even ids.

new setup

We've no clarity on this just yet. Except that we'd like to remove primary key generation out of the persistence and infrastructure layer. Another wish is scalability. The latter preferably with read write segregation (if that's relevant to the question)

1. Sync

Make code to continuously sync all data between old and new. This means it has to know about the old structure and new structure and be able to move it around. Biggest issue seems to be the latency the sync introduces whi I might lead to data anomalies. To reduce this we might make one of the sync ends the "master"

2. Implement new persistence in old

Basically the idea would be to have all persistence logic in its own codebase. That way we could pull it into our old application and write adapters to have it use the new system already. I like this approach as it enforces one source of truth. Issues with this approach might be that it could take far longer to implement the adapters; as well as that it might be problematic to get the legacy code to play along with the new structure. Not to mention, having to dabble in untested spaghetti code.

What would be the best course? (Preferably with objective arguments on the advantages over the other method 😉)

Solution

Agile favors working software, but that doesn't necessarily mean complete software. From software engineering perspective, I would want to limit the amount of time both are running side-by-side.

You could keep the old system until the new system is ready to take over, and regularly keep importing/syncing the data from the old into the new, for testing, and so it will be ready after the switch over. Then you can remove or phase out the old system.

An issue with this approach, is feature parity, or when can we switch over; this could be helped if the system is designed such that you are only changing out the back-end, and features built upon it will still work. It may be worth refactoring the old system to allow this kind of migration, to reduce the amount of re/work to required.

OTHER TIPS

I think the answer is about tradeoffs. Deciding where you invest time now to save time later.

First of all is there anything wrong with the current db schema? If not then you could easily create copies or synchronise the data periodically.

Is this internal only or is it sold to customers? If its sold and supported then you may have to upgrade and migrate client systems. If that is the case, investing time in a repeatable update/ migration process makes sense.

In my mind the objective would be to get to the final result as quickly and efficiently as possible. Synchronisation and bidirectional updates sounds like huge amount of effort for questionable gain.

In my limited experience of agile, I would expect lots of incremental changes. Which means the schema will be constantly evolving, which means synchronisation would have to constantly evolve as well.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange