Question

I have a master-master (circular) replication configured between two MySQL servers, call them West and East. There is also a web app that interfaces with each DB, also on its own West and East server, such that the West web app talks to the West database and the East web app talks to the East database. This all works fine during normal operations, but the problem arises when it comes time to do quarterly deployments.

In these deployments, almost always, there are database content updates and frequently there are database structure changes as well (e.g., adding a table, adding columns to a table, etc). The web app is also updated at the same time so that it stays in sync with the database structure.

Thus far, we have not been able to perform a successful deployment without breaking replication, leading to a very messy situation and potential loss of data. The way we planned to do deployments is as follows:

  1. Shut down the West side web app and direct all traffic to the East side
  2. Stop the East side slave
  3. Deploy the West side web app + DB updates
  4. Turn back on the West side web app, shut down the East web app, and direct all traffic to the updated West side
  5. Start the East side slave
  6. Deploy to East side web app + DB updates
  7. Return to nominal

The issue seems to be around step 3. When users are using the web app on the East side (thus modifying the DB) and the deployment is occurring on the West side (modifying that DB), this often stops replication from working, necessitating manual intervention, data copying, altering the pointers to the master logs, etc. Is there a better way to do this to allow each master to be updated in isolation without risking replication getting out of sync? I fear our setup is not correct and/or sustainable.

Important considerations:

  • There can be no downtime for the web app
  • The updates the users may be making while directed to one side must be preserved when both sides go back online

Also, please note that I am not a DBA, so I apologize if I failed to include any important details about the configuration settings. I will try to update the question ASAP if needed.

Was it helpful?

Solution

At my company we run MySQL with circular replication. We do not stop replication during a deploy of either the app or schema changes.

We must restart MySQL Server during upgrades or some configuration changes, and replication naturally stops when the server stops, but we bring it back up immediately.

We use pt-online-schema-change to run schema changes without making the app lose access to the database. Apps can still run read or write queries while the schema change is running. It's not perfect; it requires a brief lock at the beginning and end of the task, and you can't have triggers on your table. Also it adds a significant query load to the database server while it's restructuring a table, and this may be enough to slow down the app (it depends on how sensitive the app is to load). But mostly it allows the app to continue working without requiring any downtime.

For the app deployments, we use multiple app servers in each datacenter. A deployment knows how to stop the app instances in a "rolling" manner so there's always a minimum number of them running during the deploy. So there's no need to swing traffic to the opposite datacenter.

Because app deployments and schema deployments are both asynchronous and hard to run absolutely together, the code and schema changes must be designed carefully so both the old and new code versions will work with both the old and new schema versions. For example:

  • No code can use SELECT * ... or INSERT INTO mytable VALUES(...) because if the schema change adds or drops a column, it would break the code.
  • Some schema deployments must be done in stages, for example add a new column and finish that deployment before any code can reference the new column.
  • Renames of columns or tables are forbidden, because it causes a chicken-and-egg problem.

With redundant app servers, online schema change tools, and a few modest restrictions on code style, almost all deployments can be done without downtime.

OTHER TIPS

MySQL 8.0 makes it possible to do many (not all) ALTERs "instantly".

I like to deploy things in 3 steps. (This is very close to Bill's.)

  1. Modify the app to handle both old and new schemas -- dynamically checking to see what to do (if necessary).

  2. Update the database. Possibly some downtime here.

  3. Clean up the code (remove the handling of the "old" interface).

Steps 1 and 3 need the "rolling" deployment, but that is not a problem.

In addition to pt-osc (that Bill mentioned), there is gh-ost. (I don't know if it works with circular replication.)

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top