Postgres - Vacuum doesn't complete on large/busy table

https://dba.stackexchange.com/questions/241147

06-02-2021
|

Question

We have a decently active PG database hosted on AWS. We recently started getting notifications like the following:

 Transaction ID age reached 750 million. Autovacuum parameter values for [autovacuum_vacuum_cost_limit, autovacuum_vacuum_cost_delay, autovacuum_naptime] are updated to make autovacuum more aggressive.

I also noticed the disk usage for this particular table was increasing fast. Here's the used space:

[
  {
    "oid": "16413",
    "table_schema": "public",
    "table_name": "connections",
    "row_estimate": 1.01476e+07,
    "total_bytes": 518641270784,
    "index_bytes": 478458511360,
    "toast_bytes": 30646272,
    "table_bytes": 40152113152,
    "total": "483 GB",
    "index": "446 GB",
    "toast": "29 MB",
    "table": "37 GB"
  }
]

Then we were doing some analysis on something else and noticed a long-running (5 days ago) vacuum process:

[
  {
    "pid": 14747,
    "duration": "14:11:41.259451",
    "query": "autovacuum: VACUUM ANALYZE public.connections (to prevent wraparound)",
    "state": "active"
  }
]

(This was actually a new one, but looked identical to this as the last one never finished).

To confirm, i see that the connections table hasn't been autovacuumed since the 15th and has a lot of things to cleanup:

[
  {
    "relid": "16413",
    "schemaname": "public",
    "relname": "connections",
    "seq_scan": 19951154,
    "seq_tup_read": 226032655046,
    "idx_scan": 41705151351,
    "idx_tup_fetch": 375484186787,
    "n_tup_ins": 8029742,
    "n_tup_upd": 13217694302,
    "n_tup_del": 542670,
    "n_tup_hot_upd": 96750657,
    "n_live_tup": 10237553,
    "n_dead_tup": 887751401,
    "n_mod_since_analyze": 350036721,
    "last_vacuum": null,
    "last_autovacuum": "2019-06-15 17:05:51.526792+00",
    "last_analyze": null,
    "last_autoanalyze": "2019-06-15 17:06:27.310486+00",
    "vacuum_count": 0,
    "autovacuum_count": 4190,
    "analyze_count": 0,
    "autoanalyze_count": 4165
  }
]

I've read a bunch about configuring autovacuum_vacuum_scale_factor and autovacuum_analyze_scale_factor differently for very active tables. And that's all good, but it doesn't appear it's going to get through this one right now that's running.

I've also read about optimizations to autovacuum_vacuum_cost_limit and autovacuum_vacuum_cost_delay to let it be more aggressive in the work it needs to do.

I've tried to change a few of those for the table, but it just sits there when I try to write any values for that particular table.

What's the best way to get the table vacuumed?

Also, would effect would a reboot of the database have on all this?

Solution

You don't show your settings for "autovacuum_work_mem" and "maintenance_work_mem". The default settings in 9.4 are very low (64MB, which only allows 11M tuples to be vacuumed per pass over the index), unless RDS (or you) changed them. You need to set these to the highest value you can given the amount of RAM you have.

"index": "446 GB",
"toast": "29 MB",
"table": "37 GB"

Having the index be 12 times bigger than the table seems demented (Edit: or it would be if there was only index--I didn't think about having many indexes). Is this a normal btree index, or is it pg_trgm or something? Is the index corrupted somehow? Do you know how you got into this situation?

Space in indexes is harder to reuse than space in tables. A given leaf page can only be used for new tuples if the new tuples are near the same range of values as the tuples it already holds, or if the page is completely empty. So if the key space is alway moving in one direction (like a sequence), and almost-but-not-quite-all of the old tuples eventually get deleted, then you could be left with a bunch of pages which only hold one tuple each, and can't be reused. Tables don't have this problem as mostly-empty pages can be used to hold any tuple that comes along. Or, if your table was horribly bloated at some point but then got cleaned up, it is possible the table got shrunk down but the indexes did not. (Tables can shrink if they happen to have a big chunk of completely empty pages at the end of the table). It can be hard to tell which of these things is the one that happened. You can use pg_freespacemap on the indexes to see how many pages are completely empty, but that probably won't be accurate until the vacuum finishes.

The fastest way out of this, although with some down time, might be to start a VACUUM FULL of the table--you don't want to spend a lot of time vacuuming a hopelessly bloated index when you could just throw it away and rebuild it in a non-bloated state. The VACUUM FULL will block, then in another session, kill the autovacuum worker (which holds a lock blocking the VACUUM FULL). You want to have the other command already waiting when you kill the autovac worker, because it will be restarted quickly and so take the lock again, unless something else is already there waiting to grab it.

I've tried to change a few of those for the table, but it just sits there when I try to write any values for that particular table.

Changing those table-specific settings requires a lock on the table. The autovac worker is blocking that lock. Normally an autovac will detect when it is blocking something else and surrender the lock, but (to prevent wraparound) does not do that. So you would need to kill the autovac worker to be able to make this change (which is a general theme of everything here).

I wouldn't recommend changing that table-specific setting anyway. If you want this to be a one-time thing, just run a manual VACUUM or VACUUM FULL to accomplish that. If you want it to be permanent, it is hard to justify why this one table needs a different "autovacuum_vacuum_cost_delay" than your other tables, at least based on the info given, so just change it at the system level. If you change it at the system level, it still won't take effect in the middle of the ongoing autovac, you would need to kill it so the next one picks up the change.

Also, would effect would a reboot of the database have on all this?

A reboot would cause the autovac to lose part of the work it has already done, and start over again. It wouldn't accomplish anything, unless you also made meaningful configuration changes which will take effect after the reboot.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange