سؤال

My problem is zip compression. I have to split file in parts and parallely compress them, then connect the parts in correct order and save as zip archive with one file. Splitting file and sending parts to hosts isn't a problem - I'm using jpvm. My question is: how to split compression? I've tried to use java.util.zip.Deflater to compress every part (result is byte array), and then write them into one ZipOutputStream, but this doesn't seems to work - while saving to file it compresses already compressed bytes once more. Do I have to compress every part with deflater and then manually add zip header, some checksum or something like that? Does Deflater add any headers? I appreciate any help, thank you!

هل كانت مفيدة؟

المحلول

You need to use the nowrap option of Deflater to produce a raw deflate stream with no headers or trailers. Then you will need to wrap that raw deflate stream with the appropriate zip headers and trailers yourself.

To create a single deflate stream on multiple processors, you need to be able to flush the compressed output to a byte boundary (for the pieces that are not the last piece) using the Z_SYNC_FLUSH operation in zlib. (The last piece would be finished normally.) Then the pieces can be simply concatenated.

The Java 7 (but not Java 6) documentation supports this with the optional fourth parameter of the deflate() method. That can be set to SYNC_FLUSH.

Breaking up the data in this way will degrade compression, since each block cannot benefit from the history of the preceding block. This can be solved using the setDictionary() method. Provide to each thread both the data to compress as well as the 32K bytes of uncompressed data that precedes it. Then use the 32K with setDictionary(), followed by the deflate().

You can see pigz for an example of parallel compression in C using zlib directly.

Once you have your deflate stream, you wrap it appropriately to make it a zip file. See the appnote for the zip file format. You will also need to compute the CRC-32 of the uncompressed data to be able to fill in those fields.

نصائح أخرى

Unfortunately you did not show your code, so I cannot be sure I understood your exactly. However as far as I understood your problem I can recommend you the following.

  1. Check the original file size and decide what will be the size of your chunk.
  2. Start reading the file until you reach the chunk size. While reading write the content into zip using ZipOutputStream. Create files with suffixes that will allow you to join the content later. The suffix should be running index. Since you want to store one file in several zip files use one entry per zip.
  3. When reading zip files just sort them according to the suffix (see earlier) and read your only entry and then copy bytes from ZipInputSteam to your FileOutputStream.

Unfortunately I did not understand exactly what do your multiple hosts mean. Do you mean that your file is so big that you create each zip on separate machine in simultaneously? If this is correct modify #2 as following: while reading file fragment send its content to remote host and use ZipOutputStream there. To read file from specific point use InputStream.skip().

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top