How do we get around PostgreSQL autovacuum taking a table-level lock ACCESS EXCLUSIVE on replicas?

https://dba.stackexchange.com/questions/239777

05-02-2021
|

Question

We are running:

user@primary/client-n:~$ psql -d database -c "SELECT version();"                                                                   
version
---------------------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 10.7 (Ubuntu 10.7-1.pgdg16.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609, 64-bit
(1 row)

on:

user@primary/client-n:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.6 LTS
Release:        16.04
Codename:       xenial

and have a setup with one primary and two streaming replication clients configured with:

user@client-n:~$ psql -d postgres -c "SELECT name, setting FROM pg_settings WHERE name IN ( 'hot_standby', 'hot_standby_feedback', 'max_standby_streaming_delay' );"
           name             | setting 
----------------------------+---------
hot_standby                 | on
hot_standby_feedback        | on
max_standby_streaming_delay | 150000
(3 rows)

We have one database only (other than the standard ones) and one table in the database. Approximately 3 to 4 times per day we run into a special case autovacuum situation which is described in the documentation as:

[...] it (autovacuum) will not return the space to the operating system, except in the special case where one or more pages at the end of a table become entirely free and an exclusive table lock can be easily obtained

We are monitoring pg_locks and have been able to observe the autovacuum daemon taking the table-level lock ACCESS EXCLUSIVE which in turn leads to a whole host of blocked processes on the clients as illustrated with the log entries below:

Primary:

...
2019-06-04 05:59:29.154 BST [8998-1] LOG:  automatic vacuum of table "database.schema.table": index scans: 1
...

Client 1:

...
2019-06-04 05:59:03.660 BST [21167-858] [PostgreSQL JDBC Driver@ip_address(port):role@database] | LOG:  process 21167 still waiting for AccessShareLock on relation 16390 of database 16388 after 1000.222 ms
2019-06-04 05:59:03.660 BST [21167-859] [PostgreSQL JDBC Driver@ip_address(port):role@database] | DETAIL:  Process holding the lock: 2741. Wait queue: 21167, 1215, 26415.
2019-06-04 05:59:03.660 BST [21167-860] [PostgreSQL JDBC Driver@ip_address(port):role@database] | STATEMENT:  SELECT ...
2019-06-04 05:59:03.730 BST [1215-51] [PostgreSQL JDBC Driver@ip_address(port):role@database] | LOG:  process 1215 still waiting for AccessShareLock on relation 16390 of database 16388 after 1000.188 ms at character 15
2019-06-04 05:59:03.730 BST [1215-52] [PostgreSQL JDBC Driver@ip_address(port):role@database] | DETAIL:  Process holding the lock: 2741. Wait queue: 21167, 1215, 26415.
2019-06-04 05:59:03.730 BST [1215-53] [PostgreSQL JDBC Driver@ip_address(port):role@database] | STATEMENT:  SELECT ...
...
2019-06-04 05:59:19.975 BST [22242-4569] [PostgreSQL JDBC Driver@ip_address(port):role@database] | LOG:  process 22242 still waiting for AccessShareLock on relation 16390 of database 16388 after 1000.281 ms at character 15
2019-06-04 05:59:19.975 BST [22242-4570] [PostgreSQL JDBC Driver@ip_address(port):role@database] | DETAIL:  Process holding the lock: 2741. Wait queue: 21167, 1215, 26415, 2423, 1289, 24009, 22441, 2640, 1843, 1056, 23336, 28060, 1860, 1134, 19419, 14649, 2721, 29540, 20138, 22242.
2019-06-04 05:59:19.975 BST [22242-4571] [PostgreSQL JDBC Driver@ip_address(port):role@database] | STATEMENT:  SELECT...
...

And the process holding the lock:

postgres=# SELECT pid, backend_type, wait_event_type, wait_event FROM pg_stat_activity WHERE pid = 2741;
 pid  | backend_type | wait_event_type |   wait_event
------+--------------+-----------------+----------------
 2741 | startup      | Activity        | RecoveryWalAll
(1 row)

Client 2:

...
2019-06-04 06:00:08.964 BST [16153-1] [PostgreSQL JDBC Driver@ip_address(port):role@database] | FATAL:  terminating connection due to conflict with recovery
2019-06-04 06:00:08.964 BST [16153-2] [PostgreSQL JDBC Driver@ip_address(port):role@database] | DETAIL:  User was holding a relation lock for too long.
2019-06-04 06:00:08.964 BST [16153-3] [PostgreSQL JDBC Driver@ip_address(port):role@database] | HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2019-06-04 06:00:09.964 BST [5747-537] [PostgreSQL JDBC Driver@ip_address(port):role@database] | LOG:  process 5747 still waiting for AccessShareLock on relation 16390 of database 16388 after 1000.248 ms
2019-06-04 06:00:09.964 BST [5747-538] [PostgreSQL JDBC Driver@ip_address(port):role@database] | DETAIL:  Process holding the lock: 12709. Wait queue: 5747, 19765, 16036, 14617, 12280, 14513, 14728, 15398, 27611, 14542, 15948, 23398, 5853, 5098, 4324, 10760, 23480, 30192, 15300, 16228.
2019-06-04 06:00:09.964 BST [5747-539] [PostgreSQL JDBC Driver@ip_address(port):role@database] | STATEMENT:  SELECT ...
2019-06-04 06:00:09.975 BST [19765-6847] [PostgreSQL JDBC Driver@ip_address(port):role@database] | LOG:  process 19765 still waiting for AccessShareLock on relation 16390 of database 16388 after 1000.180 ms
2019-06-04 06:00:09.975 BST [19765-6848] [PostgreSQL JDBC Driver@ip_address(port):role@database] | DETAIL:  Process holding the lock: 12709. Wait queue: 5747, 19765, 16036, 14617, 12280, 14513, 14728, 15398, 27611, 14542, 15948, 23398, 5853, 5098, 4324, 10760, 23480, 30192, 15300, 16228.
2019-06-04 06:00:09.975 BST [19765-6849] [PostgreSQL JDBC Driver@ip_address(port):role@database] | STATEMENT:  SELECT ...
...
2019-06-04 06:01:25.487 BST [15873-1] [PostgreSQL JDBC Driver@ip_address(port):role@database] | LOG:  process 15873 still waiting for AccessShareLock on relation 16390 of database 16388 after 1000.218 ms at character 15
2019-06-04 06:01:25.487 BST [15873-2] [PostgreSQL JDBC Driver@ip_address(port):role@database] | DETAIL:  Process holding the lock: 12709. Wait queue: 5747, 19765, 16036, 14617, 12280, 14513, 14728, 15398, 27611, 14542, 15948, 23398, 5853, 5098, 4324, 10760, 23480, 30192, 15300, 16228, 16127, 16285, 15873.
2019-06-04 06:01:25.487 BST [15873-3] [PostgreSQL JDBC Driver@ip_address(port):role@database] | STATEMENT:  SELECT ...
...
2019-06-04 06:01:29.160 BST [16127-6] [PostgreSQL JDBC Driver@ip_address(port):role@database] | LOG:  process 16127 acquired AccessShareLock on relation 16390 of database 16388 after 8560.748 ms at character 15
2019-06-04 06:01:29.160 BST [16127-7] [PostgreSQL JDBC Driver@ip_address(port):role@database] | STATEMENT:  SELECT ...
...

And the process holding the lock, again:

postgres=# SELECT pid, backend_type, wait_event_type, wait_event FROM pg_stat_activity WHERE pid = 2741;
 pid  | backend_type | wait_event_type |   wait_event
------+--------------+-----------------+----------------
12709 | startup      | Activity        | RecoveryWalAll
(1 row)

The blocked queries on the clients lead to API latency of between 10 to 20 seconds and occasionally, a high number of 5xx responses. Out SRE team has been tasked with bringing the API latency during these incidents down and we are looking for ways of resolving this what we understand to be a very niche situation. We are currently experimenting with recovery_min_apply_delay = 120s on client 1 (hence the later log entries) so that both clients do not lock up at the same time. This has somewhat reduced he number of erroneous responses and lowered the latency spikes a bit. We are unsure about how to go about resolving this issue fully, and in fact, whether it is possible to do so. We would appreciate your advice. We have found this related post, but sadly, it hasn't been resolved either.

Solution

One undocumented way of avoiding VACUUM truncation and the exclusive lock it requires is to set old_snapshot_threshold to a value other than -1. This in turn will get rid of the locks and the resulting recovery conflicts.

From PostgreSQL v12 on you can use the better technique of disabling VACUUM truncation for individual tables:

ALTER TABLE mytab
   SET (vacuum_truncate = on,
        toast.vacuum_truncate = on);

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange