In Postgres, what should be the autovacuum strategy for big tables (hundreds of millions rows) that get added/removed thousands of rows every day?

dba.stackexchange https://dba.stackexchange.com/questions/276245

Question

I have quite a big table with a lot of daily traffic for reads, inserts and deletions. Currently, it has 392 million of live tuples and 27 million of dead ones. Vacuum settings (autovacuum_vacuum_threshold, autovacuum_vacuum_scale_factor, etc) are set to the defaults.

Occasionally I do get some performance issues that make queries last >2 minutes when they usually take couple of seconds.

At first, I'd have thought about lowering the vacuum scale factor from the current 0.2 to 0.05 or even 0.01. But, because the autovacuum runs already several times a day (and they may run for a while, >1 hour), I'm not sure if lowering the scale factor would make it worse as it would run even more often, despite meaning that less number of dead tuples the autovacuum will work on.

No correct solution

OTHER TIPS

Your table seems fine at first glance. If autovacuum gets done and you have less than 30% of dead tuples, I see no need to worry.

You might want to use the pgstattuple extension to check if the table has a lot of free space; if yes, that would be an indication to make autovacuum faster.

The keyword here is faster: you'd have to lower autovacuum_vacuum_cost_delay or increase autovacuum_vacuum_cost_limit for that. Making autovacuum run more often won't do any good.

But if you are already experiencing performance problems with autovacuum being as fast as it currently is, I'd leave the settings alone.

Thousands of rows added/removed a day is not very many. If it has 392 million live tuples and autovacuum fires several times a day at default settings, that must mean 100s of millions of rows added/removed per day.

Are all these changes concentrated in one part of the table, or spread evenly throughout? Maybe you could partition to keep the hottest tuples grouped together?

Do your queries benefit from index only scans? If so, making autovac more aggressive might make sense. But if not, there is little reason to think that making it more aggressive than the default would accomplish much, other than consuming more IO.

You really need to capture the slow query in the act to see why it is being slow. One useful method is with auto_explain. Having log_analyze=on can really slow the entire system down, especially on older kernels/hardware which have slow access to the clock. But log_timing=off redeems most of that slowdown, while retaining much of the valuable info.

track_io_timing=on
shared_preload_libraries = 'pg_stat_statements,auto_explain'
auto_explain.log_min_duration = '20s'
auto_explain.log_analyze=on
auto_explain.log_timing=off -- if ./pg_test_timing indicate a slow clock.
auto_explain.log_buffers=on
auto_explain.log_nested_statements=on
Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top