From the ZIP format ..
Assuming that there is only one central directory and no comments and no extra fields, the overhead should be similar to the following. (The overhead will only go up if any additional metadata is added.)
- Per file (Local file header) - 30+len(filename)
- Per file (Data descriptor) - 12 (to 16)
- Per file (Central directory header) - 46+len(filename)
- Per archive (EOCD) - 22
So the overhead, where afn
is the average length of all file names, and f
is the number of files:
f * ((30 + afn) + 12 + (46 * afn)) + 22
= f * (88 + 2 * afn) + 22
This of course makes ZIP a very poor choice for very tiny bits of compressed data where a (file) structure or metadata is not required - zlib, on the other hand, is a very thin Deflate wrapper.
For small payloads, a poor Deflate implementation may also result in a significantly larger "compressed" size, such as the notorious .NET implementation ..
Examples:
Storing 1 file, with name "hello world note.txt" (len = 20),
= 1 * (88 + 2 * 20) + 22 = 150
bytes overheadStoring 100 files, with an average name of 14 letters,
= 100 * (88 + 2 * 14) + 22 = 11622
bytes overhead