Question

We are updating a datawarehouse on a weekly basis using a series of

TRUNCATE source_table1 
COPY source_table1 FROM [...]

... for data import and:

DROP TABLE IF EXISTS my_table
CREATE TABLE my_table AS SELECT [...]

for table updates.

We end our update process with a VACUUM FULL [VERBOSE] ANALYZE, because, as the documentation suggests, VACUUM should be done when a significant share of tuples has been updated or removed. Here, as it is a share of 100% for all tables, we reasonably thought that VACUUM should be applied.

As we see the ouptput of the verbose option, it seems that Postgresql has not much to do, as every table VACUUMed gives:

INFO:  vacuuming "public.table345"
INFO:  "table345": found 0 removable, 9831703 nonremovable row versions in 62538 pages
DETAIL : 0 dead row versions cannot be removed yet.

I guess that on the contrary ANALYZE is more than helpfull to update the internal statistics. Most tables are 10-100m rows big.

But we were wondering if VACUUM FULL or just VACUUM were really necessary in that case?

(Or may be the whole update process (DROP / CREATE TABLE AS) is not the right way to do?)

Was it helpful?

Solution

Point about TRUNCATE/COPY

Unrelated note, depending on the WAL level, wrapping the TRUNCATE and COPY in the same transaction things may go faster, because WAL will be skipped. Further the CTAS will always skip most WAL.

In minimal level, WAL-logging of some bulk operations can be safely skipped, which can make those operations much faster (see Section 14.4.7). Operations in which this optimization can be applied include source

  • CREATE TABLE AS
  • CREATE INDEX
  • CLUSTER
  • COPY into tables that were created or truncated in the same transaction
    BEGIN;
      TRUNCATE source_table1 
      COPY source_table1 FROM [...]
    COMMIT;

Issue of VACUUM FULL

  1. There is no need to do a VACUUM FULL on a new table. The table is already new in those transactions so there is no need to rewrite it, as FULL does. There is also no need to VACUUM it because there are no movable rows that VACUUM will act on. When you run VACUUM FULL VERBOSE, you can see nothing was removable, and no dead rows were removed. VACUUM FULL can be useful if

    1. you're staging INSERTS though a temp table, and adding them in batches into a dirty table. Then it can often pay off at the end of the process.
    2. you're running UPDATE on a large batch because UPDATE generates new rows forcing VACUUM to later mark the old rows as dead.
  2. A simple ANALYZE will work fine. This will update the statistics on the table.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top