Do the different “types” of autovacuum have different performance characteristics?

https://dba.stackexchange.com/questions/161635

05-10-2020
|

Question

In monitoring the autovacuum activity on my database, I've noticed that autovacuums tagged with (to prevent wraparound) seem to take longer than "regular" autovacuums. That got me started trying to understand the differences in performance/behavior of the various "kinds" or autovacuums.

From the docs I've identified a few different reasons why autovacuum might VACUUM a table:

The table has old transactions that need to be frozen (relfrozenxid > min(autovacuum_freeze_max_age, table.autovacuum_freeze_max_age). This is the "(to prevent wraparound)" autovacuum.
The number of tuples obsoleted in the table since the last VACUUM exceeds the "vacuum threshold"

Are there any others that I'm missing?

The "aggressiveness" knobs that seem to be exposed for a manual VACUUM are FREEZE and DISABLE_PAGE_SKIPPING. Do these different kinds of autovacuum ever use these options, and if so, when?

Solution

In monitoring the autovacuum activity on my database, I've noticed that autovacuums tagged with (to prevent wraparound) seem to take longer than "regular" autovacuums.

That's to be expected. They have to do more work. These are essentially a vacuum freeze run by autovacuum.

Freezing tuples means marking them as visible by all current and future transactions by replacing their transaction ID with a special fixed value FrozenTransactionId (xid 3). Since the xmin for a tuple is stored in the tuple on disk, this means each tuple that's not already frozen and is a candidate for freezing needs to be modified. This often triggers a full-page write since the tuple is generally the first modification on the page since the last checkpoint, resulting in an 8k write for the first frozen tuple on each page. (This also bloats WAL and therefore PITR backups, replication streams, etc).

Freeze vacuums in PostgreSQL 9.6 and below cannot skip parts of the table; they must scan the whole table each time, even if only a little of it is a candiate for freezing.

Unless you tune autovacuum to freeze more aggressively, it tends to leave forced freezes quite late and then do a LOT of work at once. This is efficient since we don't scan the table lots of times, but expensive since we do a lot of work each time we do process it.

When vacuum isn't freezing tuples it's still doing much of the same work. But it's usually touching much smaller numbers of pages on each pass since it only has to care about tuples that have a not-for-locking-only xmax set by a since-committed transaction. The pages with these tuples are also MUCH more likely to be in cache since they're recently deleted, and any multixact data for them is more likely to be in cache too.

You can make freeze vacuums faster by:

Lowering autovacuum_vacuum_cost_delay - this will make autovacuum use more resources and put more load on the system, but run faster.
Lowering autovacuum_freeze_max_age so they run more often and do less work each time. This is overall less efficient, since the whole table must be scanned each time, evicting pages from the OS buffer cache as we read the table. (Pg uses a ring-buffer for seq scans, but the OS generally won't). But it means we do fewer writes for each pass and each pass takes less time.
Doing COPY with the FREEZE option to bulk-load a table inside the same transaction that TRUNCATEd it or CREATEd it. Tuples loaded this way start off pre-frozen. (See the manual on COPY for details).
Manually freezing tables after bulk-loading them, so you pay the freeze cost at a time you control.

Personally I'd like normal VACUUMs (and autovacuums) to do more opportunistic freezing; in particular, if we've just touched the item pointer array on a page to mark some tuples as free space, we should probably scan for freeze-able tuples while the page is dirty. This would play very well with the new freeze map code in PostgreSQL 10.

There are some big improvements in freezing in PostgreSQL 9.6 that reduce the impact of repeated vacuums on tables and make freezing much cheaper. Due to the introduction of the freeze map functionality in the visibility map we can avoid scanning the whole table, skipping over frozen pages. See commits a892234 and fd31cd265.

commit fd31cd265138019dcccc9b5fe53043670898bc9f
Author: Robert Haas <rhaas@postgresql.org>
Date:   Thu Mar 10 16:12:10 2016 -0500

    Don't vacuum all-frozen pages.

    Commit a892234f830e832110f63fc0a2afce2fb21d1584 gave us enough
    infrastructure to avoid vacuuming pages where every tuple on the
    page is already frozen.  So, replace the notion of a scan_all or
    whole-table vacuum with the less onerous notion of an "aggressive"
    vacuum, which will pages that are all-visible, but still skip those
    that are all-frozen.

    This should greatly reduce the cost of anti-wraparound vacuuming
    on large clusters where the majority of data is never touched
    between one cycle and the next, because we'll no longer have to
    read all of those pages only to find out that we don't need to
    do anything with them.

    Patch by me, reviewed by Masahiko Sawada.

I think there is also work in progress for PostgreSQL 10 to entirely avoid the need to modify tuples on disk to mark them as frozen, but can't find the relevant mailing list thread atm.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange