Domanda

I'm trying to uncompress data that was compressed using the ZLIB library written by Jean-loup Gailly back in the 1990s. I think it is a popular library (I see a lot of programs that ship the zlib32.dll file it uses) so I hope someone will be familiar enough with it to help me. I am using the compress() function directly which from what I read uses rfc-1951 DEFLATE format.

Here is a segment of the code I am using to read some compressed data from a stream and uncompress it:

InputStream is = new ByteArrayInputStream(buf);

//GZIPInputStream gzis = new GZIPInputStream(is);

InflaterInputStream iis = new InflaterInputStream(is);

byte[] buf2 = new byte[uncompressedDataLength];

iis.read(buf2);

The iis.read(buf2) function throws an internal exception of "Data Format Error". I tried using GZIPInputStream also, but that also throws the same exception.

The "buf" variable is type byte[] and I have confirmed by debugging that it is the same as what my C program gets back from the ZLIB compress() function (the actual data comes from a server over TCP). "uncompressedDataLength" is the known size of the uncompressed data that was also provided by the C program (server).

Has anyone tried reading/writing data using this library and then reading/writing the same data on the Android using Java?

I did find a "pure Java port of ZLIB" referenced in a few places, and if I need to I can try that, but I would rather use the built-in/OS functions if possible.

È stato utile?

Soluzione

The data formats deflate, zlib and gzip in play here are all related.

  • The base is the deflate compressed data format, defined in RFC 1951. As it is often quite useless in its pure form, we usually use a wrapping format around it.

  • The gzip compressed data format (RFC 1952) is intended for compression of files. It consists of a header which has space for a file name and some attributes, a deflate data stream, and a CRC-32 check sum (4 bytes) at the end. (There is also support of multiple such files in one stream in the specification, but I think this isn't used as often.)

  • The zlib compressed data format, defined in RFC 1950: It consists of a smaller header (2 or 6 bytes), a deflate data stream, and an Adler-32 check sum (4 bytes) at the end. (The Adler-32 check sum is intended to be faster to calculate than the CRC-32 check sum used in gzip.) It is intended for compressed transmission of data inside some other protocols, or compressed storage inside other file formats. For example, it is used inside the PNG file format.

The zlib library supports all these formats. Java's java.util.zip is build on zlib (as part of the VM's implementation/native calls), and exposes access to these with several classes:

  • The Deflater and Inflater classes implement - depending on the nowrap argument to the constructor - either the zlib or the deflate data formats.

  • DeflaterOutputStream/DeflaterInputStream/InflaterInputStream/InflaterOutputStream build on a Deflater/Inflater. The documentation doesn't say clearly whether the default Inflater/Deflater implements zlib or deflate, but the source shows that it uses the default Deflater or Inflater constructor, which implements zlib.

  • GZipOutputStream/GZipInputStream implement, as the name says, the gzip format.

I had a look at the source code of zlib's compress function, and it seems to use the zlib format. So your code should do the right thing. Make sure there is no missing data, or additional data which is not part of the compressed data block before or after it.

Disclaimer: This is the state for Java SE, I suppose it is similar for Android, but I can't guarantee this.

The jzlib library you found (I suppose), which is a Java reimplementation of zlib, also implements all these data formats (gzip was added in the latest update). For interactive use (on the compressing side) it is preferable, since it allows some flushing actions which are not possible with java.util's classes (other than using some workaround like changing the compression level), and it also might be faster since it avoids native calls (which always have some overhead).

PS: The zip (or pkzip) file format is also related: It uses deflate internally for each file inside the archive.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top