Question

I have a database with 13GB size. When I backup it with this command:

pg_dump -f out.sql --format=p -b -o -O -x -U postgres mydb

file out.sql has 53GB size. it is about 4x of original database size. Why this happen and how I can solve this?

Was it helpful?

Solution

It really depends on the types of data you have saved and the data types used, especially if you are simply dumping it into an sql file. Compression aside, if your database is designed well, it should take up less space than the dump.

The int data type, for example always uses 4 bytes- each letter, space, full stop or comma in this paragraph is 1 byte in size, yet each number saved as a 32 bit int is always 4 bytes in size. This however allows for a number range of plus 2 billion to minus 2 billion, i.e. numbers that can potentially be 10 digits long. In other words if you were dumping that information in written form as you see here, then any number over 9999, or below -999 would effectively be 'larger' in size than the format that it is saved in so if the db is mostly numerical then that could explain the discrepancy. (That also means that numbers 100 and below or -10 and above would effectively be 'smaller' in size in the written form but digressing aside, there's always smallint (int2) - and if you're really fussy I believe there is a tinyint extension if that byte means so much to you).

Another thought is perhaps a dna database, if you only have the letters A C G and T meaning only 4 possibilities then you can save each 'letter' in just 2 bits of information - and as a byte has 8 bits then you can effectively store everything at a quarter of the size.

Either way whatever forms the db uses, be it numbers, binaries, dates, floats, enum types... whatever, if the database isn't plain text then the likelihood is that the dump will be larger than the original.

OTHER TIPS

If size matters, why don't you use the custom format? --format=c

It uses compression by default.

I have a database with 13GB size

What size is that? The size ocuppied by the /data/ directory? That is very different from a dump. A dump is a textual representation of the data, it's to be expected that it occupies more space that a binary representation (eg: a timestamp field occupies 8 bytes internall, in a dump that will represented as a string that surely will take more than 8 characters). On the other hand, the binary data contains additional information (and, more importantly, index data, that do not go into the dump). So, to compare binary size with dump size is pretty irrelevant

pg_dump mydb -oOxU postgres | bzip2 > out.sql.bz2
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top