postgresql logical replication — subscriber size discrepancy

https://dba.stackexchange.com/questions/258915

23-02-2021
|

Question

I have a postgres 11 database in Amazon RDS with a little over 2 TB data in it whose tables have incrementing integer ids that are nearing exhaustion. I've been vetting using logical replication to have the existing db synch data to a new db with a schema modified to have bigint ids. Everything looks ok with a lot of due diligence and I am about ready to make the cutover.

One thing I have noticed is that the subscriber database being a smaller overall size, even though many of its columns and indexes have been switched to using bigint ids vs ints. Does anyone have any idea why this might be the case? This is the one question I have not been able to discover a confirmation for vs my guesses (fragmentation, etc...).

Using the queries described here, I have the following usage:

total: 2.171 TB in publisher, 2.020 TB in subscriber
indexes: 0.737 TB in publisher, 0.581 TB in subscriber
toast: 0.212 TB in publisher, 0.215 TB in subscriber
tables: 1.223 TB in publisher,1.223 TB in subscriber

Solution

This is to be expected and no problem.

When logical replication starts, it copied the existing data from the source tables. The data in the target tables will then be densely packed and consume less space.

As replication proceeds, the target tables get modified and “dead tuples” will accumulate and be removed by autovacuum. The tables will become slightly “bloated” over time, which is normal and healthy.

Since you don't plan to use logical replication for a long time, the replica will probably never grow as big as the original.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange