The pg_wal directory on a standby server does not get auto-purged

https://dba.stackexchange.com/questions/222320

15-01-2021
|

Question

I have a classic master-slaver PG (version 10) architecture. Despite the fact the parameter wal_keep_segments is set to 200, the pg_wal directory on the standby server is not purged and keeps filling in. Do you have any ideas ? For information, I don't have this issue on the primary server.

Both primary and standby have the same configuration.

On the master:

select * from pg_stat_replication ;
-[ RECORD 1 ]----+----------------------------------------------
pid              | 5399
usesysid         | 16387
usename          | replication
application_name | walreceiver
client_addr      | XXXXXXXXXXXX
client_hostname  | XXXXXXXXXXXX
client_port      | 56780
backend_start    | 2018-11-05 10:18:50.280663+00
backend_xmin     |
state            | streaming
sent_lsn         | 71/E3000000
write_lsn        | 71/E3000000
flush_lsn        | 71/E3000000
replay_lsn       | 71/E3000000
write_lag        |
flush_lag        |
replay_lag       |
sync_priority    | 0
sync_state       | async
-[ RECORD 2 ]----+----------------------------------------------
pid              | 10175
usesysid         | 16389
usename          | barman_replication
application_name | barman_receive_wal
client_addr      | XXXXXXXXXXXX
client_hostname  | XXXXXXXXXXXX
client_port      | 42572
backend_start    | 2018-11-12 03:09:03.715933+00
backend_xmin     |
state            | streaming
sent_lsn         | 71/E3000000
write_lsn        | 71/E3000000
flush_lsn        | 71/E3000000
replay_lsn       |
write_lag        | 00:00:02.516016
flush_lag        | 00:00:02.516016
replay_lag       | 06:28:11.482478
sync_priority    | 0
sync_state       | async

On the standby:

select * from pg_replication_slots ;
-[ RECORD 1 ]-------+------------
slot_name           | barman
plugin              |
slot_type           | physical
datoid              |
database            |
temporary           | f
active              | f
active_pid          |
xmin                |
catalog_xmin        |
restart_lsn         | 16/9F000000
confirmed_flush_lsn |

Solution

The replication slot should not exist on the standby, only on the master. Unless you are using cascading replication, which you don't seem to be.

If you have a replication slot on the standby but no one is connecting to it to read from and advance it, that explains the retention.

See https://www.postgresql.org/docs/9.6/continuous-archiving.html#BACKUP-BASE-BACKUP:

It is often a good idea to also omit from the backup the files within the cluster's pg_replslot/ directory, so that replication slots that exist on the master do not become part of the backup. Otherwise, the subsequent use of the backup to create a standby may result in indefinite retention of WAL files on the standby

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange