Why is autovacuum running during a VACUUM FREEZE on the whole database?

https://dba.stackexchange.com/questions/267931

02-03-2021
|

Frage

I have a 14.7 TB Postgres(9.1) database that has to endure heavy writes for prolonged periods of time. I know that I need to closely manage my transaction ids to prevent db lockup. Recently I noticed the queries slowing, and saw that there were multiple autovacuums running on our massive read-only tables, with the "(to prevent wraparound)" subscript. I stopped the running software and executed a vacuumdb -F -a command. However, when I run select current_query from pg_stat_activity I see that the autovacuum processes are still running, even during a manual vacuum. I tried killing the autovacuums with select pg_cancel_backend(pid) and they died but then immediately restart again. My questions:

Is autovacuum supposed to continue to run during a manual db vacuum?
How do I effectively get these autovacuum processes to stop?
Why would these autovacuums continue to run on a read-only table? What is there to vacuum?

Lösung

Autovacuum is triggered by the table statistics on the table, and as long as your manual VACUUM (FREEZE) is not done, these are not updated. That is why anti-wraparound autovacuum processes will still start.

But that's not a big problem: Only one VACUUM can run on a table at any given time. Now anti-wraparound autovacuum workers won't give up when they block another process, in that case your manual VACUUM. But if you kill the anti-wraparound autovacuum workers, your manual VACUUM gets the lock, and now it is the restarted anti-wraparound autovacuum worker that is blocked. Look at pg_locks to verify that they are waiting for a lock (granted = FALSE).

Now your vacuumdb processes one table after the other, so you must be ready to kill autovacuum workers as soon as it starts processing the next table and is blocked by a new autovacuum worker there.

It might be easier to manually launch VACUUM on those big read-only tables rather than using vacuumdb, because then you can control when which table gets vacuumed.

Make sure you set maintenance_work_mem high to speed up VACUUM. You should also set autovacuum_vacuum_cost_delay to 2ms or lower at least on these big tables, so that future autovacuum runs get done faster.

To reduce the pain for the future, lower autovacuum_freeze_max_age on the affected tables substantially. Then the next anti-wraparound vacuum will kick in sooner and will be done faster.

Best of all, upgrade to v13 as soon as you can, because from that version on insert-only tables will receive regular vacuum runs as well, which should do away with the problem.

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit dba.stackexchange