Question

I have the first contiguous 2/3rds of a file that was compressed with zlib's deflate() function. The last 1/3 was lost in transmission. The original uncompressed file was 600KB.

Deflate was called multiple times by the transmitter while chopping the original file into chunk sizes of 2KB and passing Z_NO_FLUSH until the end of file when Z_FINISH was passed. The resulting complete compressed file was transmitted, but partially lost as described.

Is it possible to recover part of the original file? If so, any suggestions on how?

I'm using both the plain C implementation of ZLIB, and/or the Python 2.7 implementation of ZLIB.

Was it helpful?

Solution

Though I don't know python, I managed to get this to work:

#!/usr/bin/python
import sys
import zlib
f = open(sys.argv[1], "rb")
g = open(sys.argv[2], "wb")
z = zlib.decompressobj()
while True:
    buf = z.unconsumed_tail
    if buf == "":
        buf = f.read(8192)
        if buf == "":
            break
    got = z.decompress(buf)
    if got == "":
        break
    g.write(got)

That should extract all that's available from your partial zlib file.

OTHER TIPS

Update: As @Mark Adler pointed out; partial content can be decompressed using zlib.decompressobj:

>>> decompressor = zlib.decompressobj()
>>> decompressor.decompress(part)
"let's compress some t"

where part is defined below.

--- Old comment follows:

By default zlib doesn't handle partial content in Python.

This works:

>>> compressed = "let's compress some text".encode('zip')
>>> compressed
'x\x9c\xcbI-Q/VH\xce\xcf-(J-.V(\xce\xcfMU(I\xad(\x01\x00pX\t%'
>>> compressed.decode('zip')
"let's compress some text"

It doesn't work if we truncate it:

>>> part = compressed[:3*len(compressed)/4]
>>> part.decode('zip')
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File ".../lib/python2.7/encodings/zlib_codec.py", lin
e 43, in zlib_decode
    output = zlib.decompress(input)
error: Error -5 while decompressing data: incomplete or truncated stream

The same if we use zlib explicitly:

>>> import zlib
>>> zlib.decompress(compressed)
"let's compress some text"
>>> zlib.decompress(part)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
error: Error -5 while decompressing data: incomplete or truncated stream

The following seems doable in theory but needs tinkering with low-level zlib routines to work. In http://www.zlib.net/zlib_how.html we find an example program zpipe.c, and in its line by line description:

CHUNK is simply the buffer size for feeding data to and pulling data from the zlib routines. Larger buffer sizes would be more efficient, especially for inflate(). If the memory is available, buffers sizes on the order of 128K or 256K bytes should be used.

#define CHUNK 16384
...

Here is my suggestion: You set the buffer very small -- if supported, maybe even to a single byte. That way, you will decompress as much as possible right up to the inevitable Z_BUF_ERROR. At that point, one usually discards the gathered data (look for premature deflate_end calls that "clean up" behind your back) but in your case you could simply stream to a file and close it when you find you can't go on.

The last few bytes of output may contain thrash if the wrong "final" symbol got decoded, or zlib may abort prematurely, rather than outputting a partial symbol. But you know your data is going to be incomplete anyway, so that should not be a problem.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top