How can coupled databases be kept in sync with selective uni-directional syncing and with connections that can be off for days?

https://dba.stackexchange.com/questions/223454

16-01-2021
|

Pergunta

I am trying to create a database topology that looks like this:

                  .------------.        Bad Connection.
Bad Connection.   | Central DB |        May go offline 
May go offline    '------------'        for days.      
for days.               |  |             .
             \          |  |            /
              ' .--------  ------------'
                |                      |
                |                      |
         .-------------.        .-------------.
         | Remote DB1  |        | Remote DB2  |
         '-------------'        '-------------'

Some of the tables are exclusively pushed from Central down to the Remotes

        .------------.
        | Central DB  |
        |-------------|
        | Master Data |
        | Table Foo   |
        '------------'
              |  |
              V  V
      .--------  ------------.
      |                      |
      V                      V
.---------------.     .---------------.
| Remote DB1    |     | Remote DB2    |
|---------------|     |---------------|
| Unmodified    |     | Unmodified    |
| Slave Copy of |     | Slave Copy of |
| Table Foo     |     | Table Foo     |
'---------------'     '---------------'

Some of the tables are exclusively pushed from Remotes up to Central and the rows the that each individual Remote mutates are exclusively mutated by the individual Remote node.

        .---------------.
        | Central DB    |
        |---------------|
        | Slave         | 
        | Table Bar     |
        |---------------|
        | Row a from DB2|
        | Row b from DB2|
        | Row c from DB1|
        | Row d from DB1|
        | Row e from DB1|
        | Row f from DB2|
        '---------------'
              |  |
              ^  ^
      .--------  ------------.
      |                      |
      ^                      ^
.------------.        .------------.
| Remote DB1 |        | Remote DB2 |
|------------|        |------------|
| Table Bar  |        | Table Bar  |
|------------|        |------------|
| Row c      |        | Row a      |
| Row d      |        | Row b      |
| Row e      |        | Row f      |
'------------'        '------------'

Remote DB1 should not get data generated on Remote DB2. They only sync up to Central and Central never pushes down other Remotes' Bar table rows.

Lastly, when connectivity is lost to Remote DB1 or Remote DB2 or Central or any combination, each database should continue to operate locally and when connection is restored, the updates to each table or sub-section of table that they own should be pushed in the appropriate direction.

So, given this topology, and the constraints of having complete internet loss to one or multiple remote databases on occasion with the addition that each DB should merrily continue without syncing until connection is restored, is there a cluster configuration that I can use in MariaDB, MySQL, or PostgreSQL that will enable such a topology? I understand that the Remotes are doing something akin to "horizontal sharding", except that I don't want them to get other Remotes' data.

The "Master Data" tables that are only mutated by Central clearly looks like a classic Master-Slave relationship with the Remotes, except the the Remotes should stay online when the master Central goes away.

With regard to the horizontally sharded Remotes table Bar, I can imagine assigning PK ID ranges that those Remotes exclusively mutate.

Is there a way to do this topology with clustering technology or do I need to manually roll something with incremental database dumps like this technology in MySQL?

Solução

The requirement that the setup has to cope with several days of broken network connections effectively rules out "virtually" synchronous multi-master solutions like Galera and MySQL group replication. And you can also forget about MySQL NDB Cluster.

However, I think an asynchronous MariaDB or MySQL solution could do what you need, assuming the binlogs are large enough to hold the data accumulated while the network connections are down.

It seems like asynchronous MariaDB / MySQL master-master replication might be able to do what you need. Master-master replication is basically two instances of master-slave replication, where each server is master in one of the replication connections, and slave in the other. In addition it's common to have auto_increment_increment = 2 on both servers, auto_increment_offset = 1 on the first server and auto_increment_offset = 2 on the other. This way, neither database server causes collisions when using the auto_increment feature of the primary keys in INSERT statements. This will also eliminate any need for sharding in your use case.

So you will need to setup master-master replication between your Central DB and Remote DB1, and separate master-master replication between your Central DB and Remote DB2.

You will further need to set replicate-do-table=foo so that only that table is replicated to Remote DB1 and Remote DB2 from Central DB. And similarly, replicate-do-table=bar so that only that table is replicated from Remote DB1 to Central DB, and from Remote DB2 to Central DB.

There are many more details, such as GTID (which is a feature which was not available initially, but is a good idea now), but here I suggest you look at the available documentation elsewhere, e.g.:

Simple Master-Master replication on MariaDB (tunnelix.com)

How to Setup MySQL Master-Master Replication (howtoforge.com)

Licenciado em: CC-BY-SA com atribuição

Não afiliado a dba.stackexchange