resuming server replication on PostgreSQL servers

https://dba.stackexchange.com/questions/239762

05-02-2021
|

Domanda

I got one main server and other two servers are replicating from the main.

I got full disk space on the main server (I have an archive_command that put wal archives into the /wal_archive/ directory).

It got full, and the PostgreSQL log shows the error:

archive command failed:

    2019-06-04 09:52:49.079 EEST [3365] LOG:  archive command failed with exit code 1
    2019-06-04 09:52:49.079 EEST [3365] DETAIL:  The failed archive command was: test ! -f /wal_archive/000000010000028C0000003B && cp pg_xlog/000000010000028C0000003B /wal_archive/000000010000028C0000003B

I cleaned up the /wal_archive/ directory with pg_archivecleanup, but PostgreSQL still throws the same error.

I have tried to reload PostgreSQL config without making changes to the config itself and without restart using:

/etc/init.d/postgresql reload

but still the same, in log, PostgreSQL throws an error. How should I resume wal_archive copying to the wal_archive directory?

Should I change archive_command to true, reload, and change archive back to original again?

I'm trying to avoid restarting the server itself.

Soluzione

It seems like archiving managed to partly write the WAL archive file before the space ran out.

Then the test in your archive_command will notice that there is already a fike of that name and will fail.

In this case the solution would be to manually remove that partially archived WAL segment so that the next attempt to archive it can succeed.

You might want to improve your archive_command by removing the file if cp fails (while still returning a non-zero return code).

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a dba.stackexchange