Postgres Vacuum in Function

https://dba.stackexchange.com/questions/182258

08-10-2020
|

Question

After googleing a bit, it's plainly clear you cannot run vacuum from a function:

Postgres mailing - Vacuum behaviour in plpgsql function

You can't run VACUUM from inside a function because of memory-management limitations. In current sources there is an error check to prevent you from trying.

Stackoverflow - VACUUM cannot be executed from a function or multi-command string

Once the data is loaded, I would like to "vacuum analyze" the affected tables, both to recover the space from the deleted records and to accurately reflect the new contents. [...] When I run this, I get:

ERROR: VACUUM cannot be executed from a function or multi-command string

But I really need to "automatically" run vacuum whenever a certain function is run: this function updates a lot of records (if not all of them) so the number of dead rows increases quite fast and other query performance decrease a lot. The problem is the function is called from another program. Any best practice? Should I manually add the vacuum statement into a separeted piece of code after the function execution, someting like

sql = SELECT my_function() <program.execute(sql)> sql = VACUUM [FULL | ANALYZE] my_updated_table <program.execute(sql)>

Solution

From the documentation page:

Tip: Plain VACUUM may not be satisfactory when a table contains large numbers of dead row versions as a result of massive update or delete activity. If you have such a table and you need to reclaim the excess disk space it occupies, you will need to use VACUUM FULL, or alternatively CLUSTER or one of the table-rewriting variants of ALTER TABLE. These commands rewrite an entire new copy of the table and build new indexes for it. All these options require exclusive lock. Note that they also temporarily use extra disk space approximately equal to the size of the table, since the old copies of the table and indexes can't be released until the new ones are complete.

If you can't get away with an exclusive lock on the table, you also might be fine with a simple ANALYZE <tablename>;, as that will only take a read lock and should update the statistics for the table and reduce ( if not entirely eliminate ) performance degradation for the other queries. Then you can let the regular autovacuum daemon do its thing ( or whatever your current custom vacuum solution might be ) normally.

If your table doesn't even have a clustered index, you will first need to do an ALTER TABLE <tablename> CLUSTER ON <indexname>;. You should probably have some kind of clustered index anyway: Having a massive heap table and poor performance against it often goes hand in hand.

As far as best practices go, I would be highly inclined to recommend the ANALYZE approach insofar as doing an automatic statistics update is by far preferable to an automatic index rebuild or virtually any form of VACUUM FULL being run without an admin actively watching it.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange