Question

Is it possible to check how much re-usable space is available in a tablespace before running a VACUUM FULL on a large table?

I have a large postgres table (about 20G) that gets an occasional VACUUM FULL. The available freespace on that drive varies between 15-25 Gbytes. Before each vacuum is attempted I log the table size (using postgres queries) and the available diskspace (using OS tools).

I know that VACUUM FULL requires a full copy to be made of the table. So if the table is 20G then 20G of freespace is required.

Sometimes the table will be 20G, there will only be 15G OS space available and the vacuum will work. I guess the extra 5G required is recovered internally from the tablespace.

Other times the vacuum will fail due to lack of space, I guess on these occasions the extra 5G required wasn't found in the tablespace.

I'd like to be able to check that I've got enough space for a VACUUM FULL beforehand, how can I do this? I know how big the table is, I know how much space is available to the OS, but what I don't know is how much re-recyclable space is available in the tablespace.

Was it helpful?

Solution

Firstly, I would suggest using pgstattuple to obtain tuple-level statistics.

pgstattuple returns a relation's physical length, percentage of “dead” tuples, and other info. This may help users to determine whether vacuum is necessary or not.

For example:

create extension pgstattuple ;
create table my_table ( id int , name text);
insert into my_table select a, md5(a::text) from generate_series(1, 1e7)a;

-- size of my_table
 Schema |   Name   | Type  |  Owner   |  Size  | Description
--------+----------+-------+----------+--------+-------------
 public | my_table | table | postgres | 651 MB |

-- dead_tuple_percent = 0, pgstattuple will not lock your table
SELECT tuple_percent, dead_tuple_count, dead_tuple_percent, free_space, free_percent FROM pgstattuple('my_table');

 tuple_percent | dead_tuple_count | dead_tuple_percent | free_space | free_percent
---------------+------------------+--------------------+------------+--------------
         89.35 |                0 |                  0 |     338776 |         0.05

-- let update 50% rows
update my_table set name = name || id where id < 5000000;

-- now, dead_tuple_percent = 28.63%
 tuple_percent | dead_tuple_count | dead_tuple_percent | free_space | free_percent
---------------+------------------+--------------------+------------+--------------
         60.43 |          4999999 |              28.63 |    1834236 |         0.17

-- size of my_table has increased

 Schema |   Name   | Type  |  Owner   |  Size   | Description
--------+----------+-------+----------+---------+-------------
 public | my_table | table | postgres | 1016 MB |                

-- try to vacuum full
vacuum full my_table;

-- after that, dead_tuple_percent = 0 and size of my_table has reduced
 tuple_percent | dead_tuple_count | dead_tuple_percent | free_space | free_percent
---------------+------------------+--------------------+------------+--------------
         88.92 |                0 |                  0 |    1664780 |         0.23
 Schema |   Name   | Type  |  Owner   |    Size    | Description
--------+----------+-------+----------+------------+-------------
 public | my_table | table | postgres | 691 MB     |

Secondly, if you are in production environment, I would suggest using pg_repack to reclaim disk without locking your table.

pg_repack is a PostgreSQL extension which lets you remove bloat from tables and indexes, and optionally restore the physical order of clustered indexes. Unlike CLUSTER and VACUUM FULL it works online, without holding an exclusive lock on the processed tables during processing. pg_repack is efficient to boot, with performance comparable to using CLUSTER directly.

For instance:

/usr/pgsql-11/bin/pg_repack -d postgres -U postgres -n -t my_table  &
Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top