Frage

I made two compressed copy of my folder, first by using the command tar czf dir.tar.gz dir This gives me an archive of size ~16kb. Then I tried another method, first i gunzipped all files inside the dir and then used

gzip ./dir/*
tar cf dir.tar dir/*.gz

but the second method gave me dir.tar of size ~30kb (almost double). Why there is so much difference in size?

War es hilfreich?

Lösung

Because zip process in general is more efficient on big sample than on small files. You have zipped 100 files of 1ko for example. Each file will have a certain compression, plus the overhead of the gzip format.

file1.tar -> files1.tar.gz  (admit 30 bytes of headers/footers)
file2.tar -> files2.tar.gz  (admit 30 bytes of headers/footers)
...
file100.tar -> files100.tar.gz  (admit 30 bytes of headers/footers)
------------------------------
30*100 = 3ko of overhead.

But if you try to compress a tar file of 100ko (which contains your 100 files), the overhead of the gzip format will be added only one time (instead of 100 times) and the compression can be better)

Andere Tipps

Overhead from the per-file metadata and suboptimal conpression by gzip when processing files individually resulting from gzip not observing data in full and thus compressing with suboptimal dictionary (which is reset after each file).

tar cf should create an uncompressed archive, it means the size of your directory should almost be the same as your archive, maybe even more.

tar czf will run gunzip compression through it.

This can be further checked by doing a man tar in shell prompt in Linux,

   -z, --gzip, --gunzip, --ungzip
          filter the archive through gzip
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top