How does a database implementor migrate their database engine to a new data structure?

https://softwareengineering.stackexchange.com/questions/420860

20-03-2021
|

Question

I am working on implementing a database of sorts and am stuck wanting to make it perfect from the get go because I realize I don't know how to migrate the database engine from one data structure to another (as the data structure implementations evolve). I am afraid that if I pick a database data structure, then I won't be able to adjust it down the road.

Take for example a hash table for sake of this question. Say I implemented a database using a hash table, and I used hashing algorithm h1(). Well that literally distributes my records all over the place. Now I want to use hashing algorithm h2(). I can't just boot up my database with this new hash algorithm, it won't know how to read the locations of the existing database records. So I need to somehow migrate one hash to the other. I don't see how to do that. Not only that, but then I need to do this for every client that upgrades to v2 of my database so to speak. Only when you start the new one or something.

My question is, how do database implementors generally manage this problem? How do they effectively migrate their database implementation? Take for example migrating from a hash-table to a b+tree, or b+tree-1 to b+tree-2. How do they serialize the old data into the new form in practice? How do they get everyone off of the old data structures and onto the new ones?

Solution

What you are dealing with is a change in the structure of the files that are used to persist information.

There are two main ways to deal with this:

Perform a conversion from the old format to the new format during installation of the software version that relies on the new format. The conversion tool would be able to recognize and read files stored in the old format and write them back in the new format. Where a change in data structures is involved, that typically means reading the data in in the old data structure, copying it to the new data structure and writing that out again.

This method only works if during installation it is known which files are used by the software.
Maintain (read-only) support for old formats and convert them when needed. The conversion is now built into the application itself.

The important part in both cases is that either the conversion tool or the application itself can recognize the different file formats and switch to the correct implementation for reading them in.

OTHER TIPS

Under the assumption that you have a new database/data structure that works and supports all that the old one supports… What follows is the solution that comes to mind.

Make an interface/façade/abstraction layer for what the software uses from the database/data structure. Then make three implementations: one uses the old one, one uses the new one, one tries to use the other two implementations.

The third implementation will use the new implementation first. It will try to read from the new one first, if it didn't find anything, it will read from the old one. And it will write to the new one only.

If that works, you can add the code to always write to the new database when you read from the old one. If that works, you can add the code to remove form the old one what is in the new one. If that works you can make an task that access all the data, so it gets moved to the new database.

Finally, you could either package that to do the migration online or offline. For the online version, you would add code to stop using the old database (do not initialize the first implementation at all) when it is empty, even delete it if that is supported. For the offline version, you would have it run as part of an update installer.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange