Shipping big WAL archives to hot standby

https://dba.stackexchange.com/questions/134807

01-10-2020
|

Question

I'm looking to extend my PostgreSQL 9.4 master with a few slaves in hot standby read-only for load balancing.

The idea would be to update the slaves only at defined times (once every 24/48 hours) to avoid migration issues with the application server code and also because the "freshness" of the slaves is not so important.

What would be the effect of suddenly introducing a 1-2 GB of WAL archives to the WAL restore folder on the slave? Would there be a big performance effect on the incoming queries to the slave? Would the slave be available for queries while the WAL logs are restored into the DB?

Alternatively, I've been thinking about copying the whole data folder over from the master with rsync. The negatives I see here are that the required disk space on the slave is 2 x the data folder (extra non-running copy needed to do rsync on), and that the DB server on the slaves has to be restarted to switch to the new data folder.

Solution

First, in case there's an implicit misconception here: Note that the restore is not atomic. Your queries on the replica will see the intermediate states between the prior and new master state. It'll happen faster than it did on the master - assuming the replica replays reasonably fast - but clients will still see it and there's no way around that.

If you're trying to give clients an apparently atomic switch between old and new states you can't do it just by copying a bunch of WAL over.

What would be the effect of suddenly introducing a 1-2 GB of WAL archives to the WAL restore folder on the slave?

Minimal. The only significant effect will be the disk I/O of writing it as you copy it.

Would there be a big performance effect on the incoming queries to the slave?

Mild to moderate, probably mild. It'll have more impact than streaming them continuously would since the replica will be running restore flat-out so it'll churn the disk, kernel buffer cache and shared_buffers more.

Hard to say in detail, depends on workload, RAM, disk subsystem, etc.

Would the slave be available for queries while the WAL logs are restored into the DB?

Yes.

Another alternative you may not have considered. It still needs 2x the disk, but:

Run pgbouncer in front of the replica.
pg_basebackup the replica locally to a temporary datadir. Yes, you can take a pg_basebackup from a replica.
Copy the WAL to the copied replica.
Start it, wait for it to catch up.
When it's caught up, change pgbouncer's configuration to point to the new DB and sighup it to get it to reload its configuration.
Shut down the old replica

... and repeat, switching between the two datadirs.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange