Does SELECT remove dead rows like VACUUM does?
-
05-02-2021 - |
Вопрос
I was fiddling with VACUUM
and noticed some unexpected behavior where SELECT
ing rows from a table seems to reduce the work VACUUM
has to do afterwards.
Test Data
Note: autovacuum is disabled
CREATE TABLE numbers (num bigint);
ALTER TABLE numbers SET (
autovacuum_enabled = 'f',
toast.autovacuum_enabled = 'f'
);
INSERT INTO numbers SELECT generate_series(1, 5000);
Trial 1
Now we run an update on all the rows,
UPDATE numbers SET num = 0;
And when we run VACUUM (VERBOSE) numbers;
we get,
INFO: vacuuming "public.numbers"
INFO: "numbers": removed 5000 row versions in 23 pages
INFO: "numbers": found 5000 removable, 5000 nonremovable row versions in 45 out of 45 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 6585
There were 0 unused item pointers.
Trial 2
Now we issue another UPDATE
, but this time we add a SELECT
afterward,
UPDATE numbers SET num = 1;
SELECT * FROM numbers;
And when we run VACUUM (VERBOSE) numbers;
we get,
INFO: vacuuming "public.numbers"
INFO: "numbers": removed 56 row versions in 22 pages
INFO: "numbers": found 56 removable, 5000 nonremovable row versions in 45 out of 45 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 6586
There were 56 unused item pointers.
What exactly is happening here? Why does the second version I run, after the SELECT
remove dead tuples from the pages it visits, quite like VACUUM
does?
I am running Postgres 11.3 on macOS 10.14.5.
Решение
From this post on /r/PostgreSQL to an answer by Laurenz Albe it seems that Heap Only Tuples (HOT) updates may be responsible. From the description of HOT updates in src/backend/access/heap/README.HOT
Effectively, space reclamation happens during tuple retrieval when the page is nearly full (<10% free) and a buffer cleanup lock can be acquired. This means that
UPDATE
,DELETE
, andSELECT
can trigger space reclamation, but often not duringINSERT ... VALUES
because it does not retrieve a row.
The quote is not in the original answer, but the rest is a quote,
To support or refute this theory, run the following query:
SELECT n_tup_upd, n_tup_hot_upd FROM pg_stat_user_tables WHERE schemaname = 'public' AND relname = 'TABLE_NAME';
If
n_tup_hot_upd
is greater than zero, we have got a case.
Другие советы
In the special case of an unindexed table, yes, SELECT can do the same work as VACUUM (as far as removing dead rows is concerned).