pgpool-II: is possible to promote a node to master more than one time?

https://stackoverflow.com/questions/19198654

30-06-2022
|

سؤال

I have this configuration with pgpool: "Host-1" master and "Host-2" slave, if "Host-1" go down, pgpool correctly promotes "Host-2" to be the master; but then if "Host-1" return up, pgpool is not aware of this and, if "Host-2" go down, pgpool doesn't promote "Host-1" to be the master even if "Host-1" is online. I enabled health_check but it seems completely useless, because the state of "Host-1" (after it becomes up) is always 3="Node is down".

This is the output of the command "show pool_nodes" during the events:

-> Initially situation: "Host-1" UP (master), "Host-2" UP (slave)

 node_id | hostname | port | status | lb_weight |  role
---------+----------+------+--------+-----------+--------
 0       | Host-1    | 5432 | 2      | nan       | master
 1       | Host-2    | 5432 | 1      | nan       | slave

-> node 0 goes down: "Host-1" DOWN, "Host-2" UP

 node_id | hostname | port | status | lb_weight |  role
---------+----------+------+--------+-----------+--------
 0       | Host-1    | 5432 | 3      | nan       | slave
 1       | Host-2    | 5432 | 2      | nan       | master

-> node 0 returns up: "Host-1" UP, "Host-2" UP

 node_id | hostname | port | status | lb_weight |  role
---------+----------+------+--------+-----------+--------
 0       | Host-1    | 5432 | 3      | nan       | slave
 1       | Host-2    | 5432 | 2      | nan       | master

note that the status of "Host-1" is however 3 that means "Node is down"

-> node 1 goes down: "Host-1" UP, "Host-2" DOWN: at this point i'm not able to connect to db, even if node 0 is up and running!

What I have to do to permit to pgpool to promote master the node 0 another time? If can be useful, these are sections "Backend Connection Settings" and "HEALTH CHECK" of my pgpool.conf:

# - Backend Connection Settings -

backend_hostname0 = 'Host-1'
                                   # Host name or IP address to connect to for backend 0
backend_port0 = 5432
                                   # Port number for backend 0
#backend_weight0 = 1
                                   # Weight for backend 0 (only in load balancing mode)
#backend_data_directory0 = '/data'
                                   # Data directory for backend 0
backend_flag0 = 'ALLOW_TO_FAILOVER'
                                   # Controls various backend behavior
                                   # ALLOW_TO_FAILOVER or DISALLOW_TO_FAILOVER

backend_hostname1 = 'Host-2'
                                   # Host name or IP address to connect to for backend 0
backend_port1 = 5432
                                   # Port number for backend 0
#backend_weight1 = 1
                                   # Weight for backend 0 (only in load balancing mode)
#backend_data_directory1 = '/data'
                                   # Data directory for backend 0
backend_flag1 = 'ALLOW_TO_FAILOVER'
                                   # Controls various backend behavior
                                   # ALLOW_TO_FAILOVER or DISALLOW_TO_FAILOVER

#------------------------------------------------------------------------------
# HEALTH CHECK
#------------------------------------------------------------------------------

health_check_period = 10
                                   # Health check period
                                   # Disabled (0) by default
health_check_timeout = 20
                                   # Health check timeout
                                   # 0 means no timeout
health_check_user = 'admin'
                                   # Health check user
health_check_password = '12345'
                                   # Password for health check user
health_check_max_retries = 10
                                   # Maximum number of times to retry a failed health check before giving up.
health_check_retry_delay = 1
                                   # Amount of time to wait (in seconds) between retries.

المحلول 2

You have to bring up the slave node before it can be promoted. This means, in your case, using Slony to fully fail over and rebuild the former Master as a new Slave.

The basic problem is that writes written to the new master must be replicated over to the old ones before you can fail back. This is first and foremost a Slony problem. After you verify that Slony is working and everything is replicated, then you can troubleshoot your pgpool side but not until then (and then you might need to re-attach it to PGPool). With PGPool in Master/Slave mode, PGPool is secondary to whatever other replication system you are using.

نصائح أخرى

Once your slave node is up and replication is working, you need to re-attach the node to pgpool.

$ pcp_attach_node 10 pgpool_host 9898 admin _pcp_passwd_ 0

Last arguments is node id, for your case it is 0.

See http://www.pgpool.net/docs/latest/pgpool-en.html#pcp_attach_node more details.

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى StackOverflow