Question

I'm trying to setup a replica using repmgr:

repmgr -D /var/lib/postgresql/9.3/main -p 5432 -U repmgr -R postgres \
   --verbose standby clone psql.master.example.com
repmgr --verbose standby register

I've manage to sync the DB's but the standby replica won't start:

postgres@psql01a:~$ /usr/lib/postgresql/9.3/bin/postgres --single -D /var/lib/postgresql/9.3/main -P -d 1
2016-04-18 14:02:05 UTC [30048]: [1-1] user=,db=,client= LOG:  database system was shut down in recovery at 2016-04-18 14:00:51 UTC
2016-04-18 14:02:05 UTC [30048]: [2-1] user=,db=,client= LOG:  entering standby mode
2016-04-18 14:02:05 UTC [30048]: [3-1] user=,db=,client= DEBUG:  checkpoint record is at 27B5/BA68B550
2016-04-18 14:02:05 UTC [30048]: [4-1] user=,db=,client= DEBUG:  redo record is at 27B5/B3626B20; shutdown FALSE
2016-04-18 14:02:05 UTC [30048]: [5-1] user=,db=,client= DEBUG:  next transaction ID: 0/2281005353; next OID: 230242292
2016-04-18 14:02:05 UTC [30048]: [6-1] user=,db=,client= DEBUG:  next MultiXactId: 879585; next MultiXactOffset: 1823275
2016-04-18 14:02:05 UTC [30048]: [7-1] user=,db=,client= DEBUG:  oldest unfrozen transaction ID: 2094018845, in database 134461654
2016-04-18 14:02:05 UTC [30048]: [8-1] user=,db=,client= DEBUG:  oldest MultiXactId: 1, in database 16546
2016-04-18 14:02:05 UTC [30048]: [9-1] user=,db=,client= DEBUG:  transaction ID wrap limit is 4241502492, limited by database with OID 134461654
2016-04-18 14:02:05 UTC [30048]: [10-1] user=,db=,client= DEBUG:  MultiXactId wrap limit is 2147483648, limited by database with OID 16546
2016-04-18 14:02:05 UTC [30048]: [11-1] user=,db=,client= DEBUG:  resetting unlogged relations: cleanup 1 init 0
2016-04-18 14:02:05 UTC [30048]: [12-1] user=,db=,client= DEBUG:  initializing for hot standby
2016-04-18 14:02:05 UTC [30048]: [13-1] user=,db=,client= LOG:  redo starts at 27B5/B3626B20
2016-04-18 14:02:05 UTC [30048]: [14-1] user=,db=,client= DEBUG:  recovery snapshots are now enabled
2016-04-18 14:02:05 UTC [30048]: [15-1] user=,db=,client= CONTEXT:  xlog redo running xacts: nextXid 2281009749 latestCompletedXid 2281009746 oldestRunningXid 2281009747; 2 xacts: 2281009748 2281009747
2016-04-18 14:02:05 UTC [30048]: [16-1] user=,db=,client= PANIC:  btree_xlog_delete_get_latestRemovedXid: cannot operate with inconsistent data
2016-04-18 14:02:05 UTC [30048]: [17-1] user=,db=,client= CONTEXT:  xlog redo delete: index 1663/16546/215742765; iblk 363218, heap 1663/16546/215740352;
Aborted

Any idea how to start the replica?

Was it helpful?

Solution

The main problem was different configuration in postgres.conf, after modifying shared_buffers and several others (configuration based on pgtune for given hardware):

maintenance_work_mem = 1GB
effective_cache_size = 22GB
work_mem = 15MB
wal_buffers = 8MB
shared_buffers = 7680MB
max_connections = 1024

After that I've hit another error:

2016-04-18 15:36:39 UTC [5150-1] FATAL:  could not create semaphores: No space left on device
2016-04-18 15:36:39 UTC [5150-2] DETAIL:  Failed system call was semget(5432064, 17, 03600).
2016-04-18 15:36:39 UTC [5150-3] HINT:  This error does *not* mean that you have run out of disk space.
  It occurs when either the system limit for the maximum number of semaphore sets
 (SEMMNI), or the system wide maximum number of semaphores (SEMMNS), would be 
 exceeded. You need to raise the respective kernel parameter.  Alternatively, 
 reduce PostgreSQL's consumption of semaphores by reducing its max_connections
 parameter. The PostgreSQL documentation contains more information about
configuring your system for PostgreSQL.

Which could be fixed by increasing kernel limits:

echo 250 32000 256 256 > /proc/sys/kernel/sem

(this shouldn't be needed for PostgreSQL > 9.3)

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top