What strategies can I use to achieve zero (planned) downtime in a database-backed web app?

https://softwareengineering.stackexchange.com/questions/336945

02-01-2021
|

Question

I was reading Why are websites (even this one) sometimes "Down for Maintenance"? , and it dawned on me that my industry accepts planned downtime asa given despite the disruption to staff and the (mostly hidden) cost to the business.

What strategies can I use to achieve zero (planned) downtime in a database-backed web app? Specifically upgrades that require db schema changes.

PS I thought about a using a 'database abstraction layer' (does that exist?), or use a two stage upgrade process where the first stage upgrades the schema and starts using it in parallel to the old schema and the second stage retires the old schema from use? (Does anyone do that? Is that crazy?)

PPS I'm assuming I need to have a couple web application servers hitting the same database.

PPPS an example schema change would be changing the users table - going from a single name text field to a combination of forename and surname columns. (This is the simplest example)

Solution

You've hit on the two main approaches. However, you'd use them as part of a multi-pronged strategy like the following:

Do not maintain session state on the server

If you maintain in-memory state for a user session, then you have to retain the old deployment as long as there are any active sessions. This applies to any deployment, not just one with DB schema changes, but I didn't want it to be overlooked.
Schema changes are additive; no restructuring or removal

If all changes are field additions, then you can implement them using new tables with 1:1 relationships to the old tables. This avoids the problem of DDL interfering with DML, but means that your code will become more complex. It also means that you'll eventually have a lot of subsidiary tables, which will be a maintenance nightmare.

On the positive side, this simplifies an A-B deploy: use database replication to keep the common tables up-to-date, deploy the new app into your B datacenter, transfer traffic, then decommission the A database (or flip replication). Depending on the app, you may or may not want to maintain client affinity to avoid replication lags.
Decouple API from implementation

This is your "database abstraction layer," although the current buzzword is "micro-services". If you have a single service responsible for one area of your application, then you can evolve the data that supports the service while keeping the API constant.

To use your example of changing a user's name. Version 1 has a single name field, version 2 has two fields. However, you could implement a version 1 API by combining the fields from the version 2 database, or you could implement the version 2 API by splitting the field from a version 1 database.

More important, you can implement the service in such a way that it migrates data to a new table on access. This largely resolves the "exploding tables" problem of the former point. However, you'll need to migrate data that isn't currently being accessed. And you'll need to evolve the service implementation to eventually ignore the old table.

Note also that "service" does not imply "web-service". It could be a library within your application.
Push retry logic into the client

If you implement a service-based architecture that's external to the application (ie, web-services rather than a service library), you can start to think about making your application more robust during failure. Something as simple as a retry could smooth over an implementation change, although of course the API couldn't change without a client redeploy.
Decouple update and retrieval

In this approach, changes are implemented using asynchronous messaging -- the Command pattern. On the client side, you might cache a view of the world with those changes applied, or you might design your application to be eventually consistent: you make a change, and accept that it may not be immediately applied.

This gives you a tremendous amount of flexibility to maintain parallel service implementations: each service maintains its own data and applies the commands in its own way. When you're ready to use the new service, the data will already be there.

However, the idea of eventual consistency is a hard sell to business people, who are under the illusion that computer programs are deterministic.

One final comment: other than the first, all of these techniques are more useful when you have a mature application. If you're constantly making dramatic changes to the underlying DB, then you're probably better off shutting down the applications to do so. You can still pick a quiet time to do so, and prepopulate all of the data.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange