Question

I created a backup of my database in a PostgreSQL 11.6 (with TimescaleDB 1.60 extension) using pg_dump:

PGPASSWORD=mypassword pg_dump -h 127.22.0.4 -p 5432 -U postgres -Z0 -Fc database_development

and restored it to a new server running the same versions of PostgreSQL 11.6 (with TimescaleDB 1.60 extension) using pg_restore. For the restore, executed the following commands in psql as user postgres:

CREATE DATABASE database_development;
\c database_development
CREATE EXTENSION timescaledb;
SELECT timescaledb_pre_restore();

\! time pg_restore -Fc -d database_development /var/lib/postgresql/backups/database_development_2020-02-29

SELECT timescaledb_post_restore();

The database size of the original database was 389 GB but the restored database was 229 GB. These sizes were obtained by running

select pg_size_pretty(pg_database_size('database_development'))

Some differences:

The old database is stored on a ext4 partition, while the new database is stored on a ZFS filesystem with compression disabled. Both database instances are running inside a Docker container with an Ubuntu 18.04 host.

Question: How can we explain the differences in the database sizes? There were no errors encountered during both the pg_dump and pg_restore.

Was it helpful?

Solution

Dump doesn’t account for dead tuples and only takes live tuples into account but dead tuples do account for space hence the space difference.

The reason is dump being a logical one it will only create statements to insert your data and the dead rows would anyways be invisible to it. If you have lots of updates and deletes happening or in other words your db is highly transactional, it will create more dead row versions and needs an aggressive vacuum to deal with bloating as well. If and when you compare the dead vs live row counts, before and after the restore you will see the difference.

Also just to be on a safer side do perform a manual vacuum analyse on the db post a dump restore, i have seen in the past that due to wrong estimation the planner changes the optimum plan to be used for the queries.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top