Too many WAL files in archive folder PostgreSQL Streaming Replication

https://dba.stackexchange.com/questions/186103

09-10-2020
|

Question

I have two Postgresql 9.5 servers configured in streaming replication. There are no errors in log files and the servers seem to be in sync.

MASTER postgresl.config

archive_command = 'test ! -f mnt/server/archivedir/%f && cp %p mnt/server/archivedir/%f'
wal_keep_segments = 32

Everything else is pretty much default. And yet I have 1600+ files in the archive directory! Because of space constraints I have dared to delete some of them. I would like to figure out what is causing this. I also see that there are only very old files on the slave for the same directory.

The only thing I can think of is that I recently did a restore on a database. I dropped the database, created it and did a backup restore. I did this several times. The slave seems to have caught up with the changes. But I have a ton of WAL files in the archive folder for each of the backup restores. Would appreciate some insight into what could be happening.

Solution

wal_keep_segments tells it how many files to keep in the pg_xlog directory, not the archive directory.

PostgreSQL doesn't manage the archive directory for you, that is up to you. It doesn't even know where (or what) the archive location is, that is why it asks you for an archive_command, rather than just a directory. As far as PostgreSQL is concerned, your archive command could pass the files through od, send the output to a line printer with acid-free paper, then put that in a 3 ring binder and store them in your attic. Although more common would be encrypt them and upload them to Amazon's S3, say.

It is kind of weird to use both wal_keep_segments and archive_command together, at least for relatively simple setups. If you just want a streaming replica and don't need the ability to do point-in-time recovery, you could turn off archiving altogether.

If you really want to keep using an archive but have it cleaned up when the files are no longer needed, you can use the archive_cleanup_command setting. But now that we have streaming replication and replication slots, that is almost obsolete.

OTHER TIPS

As I said in this post https://stackoverflow.com/a/63401901/10043142, PostgreSQL Wiki (https://wiki.postgresql.org/wiki/Streaming_Replication) says something important about this in the following post, step 5:

# To enable read-only queries on a standby server, wal_level must be set to
# "hot_standby". But you can choose "archive" if you never connect to the
# server in standby mode.
wal_level = hot_standby

# Set the maximum number of concurrent connections from the standby servers.
max_wal_senders = 5

# To prevent the primary server from removing the WAL segments required for
# the standby server before shipping them, set the minimum number of segments
# retained in the pg_xlog directory. At least wal_keep_segments should be
# larger than the number of segments generated between the beginning of
# online-backup and the startup of streaming replication. If you enable WAL
# archiving to an archive directory accessible from the standby, this may
# not be necessary.
wal_keep_segments = 32

# Enable WAL archiving on the primary to an archive directory accessible from
# the standby. If wal_keep_segments is a high enough number to retain the WAL
# segments required for the standby server, this is not necessary.
archive_mode    = on
archive_command = 'cp %p /path_to/archive/%f'

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange