Question

I'd configured repmgr replication on node1 and node3 (primary and standby respectively), and the setup worked sucessfully creating new records and objects on standby as expected. But after some weeks I'd noticed that replication wasn't working anymore, however some repmgr commands are returning results as the replication are working. I tried to restart and register again the standby node, but it doesn't worked.

How can I continue to replicate?

Here's status of nodes:

-bash-4.2$ psql -V
psql (PostgreSQL) 10.3

NODE1 - PRIMARY

-bash-4.2$ repmgr node check
Node "node1":
    Server role: OK (node is primary)
    Replication lag: OK (N/A - node is primary)
    WAL archiving: OK (0 pending archive ready files)
    Downstream servers: OK (this node has no downstream nodes)
    Replication slots: OK (node has no replication slots)
-bash-4.2$

NODE3 - STANDBY

-bash-4.2$ repmgr -f /etc/repmgr/10/repmgr.conf node check 
Node "node3":
    Server role: OK (node is standby)
    Replication lag: OK (0 seconds)
    WAL archiving: OK (0 pending archive ready files)
    Downstream servers: CRITICAL (1 of 1 downstream nodes not attached; missing: node3 (ID: 3))
    Replication slots: OK (node has no replication slots)

-bash-4.2$ repmgr node status 
Node "node3":
    PostgreSQL version: 10.3
    Total data size: 2393 MB
    Conninfo: host=node3 user=repmgr dbname=repmgr connect_timeout=2
    Role: standby
    WAL archiving: disabled (on standbys "archive_mode" must be set to "always" to be effective)
    Archive command: /bin/true
    WALs pending archiving: 0 pending files
    Replication connections: 0 (of maximal 10)
    Replication slots: 0 (of maximal 10)
    Upstream node: node3 (ID: 3)
    Replication lag: 0 seconds
    Last received LSN: 4/AC000000
    Last replayed LSN: 4/AC000140
Was it helpful?

Solution

You should probably raise your wal limits to keep more files around, also not a bad idea is to set them aside using the archive_command, like this

archive_command = 'test ! -f /postgres/archive/%f && cp -n %p /postgres/archive/%f'
wal_keep_segments = 256

Raise it high enough for your use case , 256 is just an example here, the paths need adjustments to match your installation.

secondly, use cluster show to verify the cluster is healty, it's more clear than to check the node.

lastly: Did you register the standby after cloning ? You don't show this in your command list. After the cloning you need to start and then register it

repmgr standby register

If it already existed in the repmgr.nodes table, add --force

OTHER TIPS

Some needed wal files to replicate wasn't found on primary. Then I reinstated the standby cloning it again.

Commands submitted on standby server:

   pg_ctl stop
   repmgr -f /etc/repmgr/10/repmgr.conf --force --rsync-only  -h node1 -d repmgr -U repmgr --verbose standby clone
   pg_ctl start
   repmgr node status
   repmgr node check
Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top