Base64-encode a file and compress it

https://stackoverflow.com//questions/9681239

13-12-2019
|

Question

My goal is to encode a file and zip it in a folder in java. I have to use the Apache's Commons-codec library. I am able to encode and zip it and it works fine but when i decode it back to its original form, it looks like the file has not completely been encoded. Looks like a few parts are missing. Can anybody tell me why this happens?

I am also attaching the part of my code for your reference so that you can guide me accordingly.

private void zip() {
    int BUFFER_SIZE = 4096;
    byte[] buffer = new byte[BUFFER_SIZE];

    try {
        // Create the ZIP file
        String outFilename = "H:\\OUTPUT.zip";
        ZipOutputStream out = new ZipOutputStream(new FileOutputStream(
                outFilename));

        // Compress the files
        for (int i : list.getSelectedIndices()) {
            System.out.println(vector.elementAt(i));
            FileInputStream in = new FileInputStream(vector.elementAt(i));
            File f = vector.elementAt(i);

            // Add ZIP entry to output stream.
            out.putNextEntry(new ZipEntry(f.getName()));

            // Transfer bytes from the file to the ZIP file
            int len;

            while ((len = in.read(buffer)) > 0) {
                buffer = org.apache.commons.codec.binary.Base64
                        .encodeBase64(buffer);
                out.write(buffer, 0, len);

            }

            // Complete the entry
            out.closeEntry();
            in.close();

        }

        // Complete the ZIP file
        out.close();
    } catch (IOException e) {
        System.out.println("caught exception");
        e.printStackTrace();
    }
}

Solution

BASE64 encoded data are usually longer than source, however you are using the length of the source data to write encoded to output stream.

You have use size of the generated array instead of your variable len.

Second notice - do not redefine buffer each time you encode a byte. Just write result into output.

 while ((len = in.read(buffer)) > 0)  {                         
     byte [] enc = Base64.encodeBase64(Arrays.copyOf(buffer, len));
     out.write(enc, 0, enc.length);
 }

UPDATE: Use Arrays.copyOf(...) to set length of the input buffer for encoding.

OTHER TIPS

Your main problem is that base64 encoding can not be applied block-wise (especially not the apache-commons implementation). This problem is getting worse because you don't even know how large your blocks are as this depends on the bytes read by in.read(..).

Therefore you have two alternatives:

Load the complete file to memory and then apply the base64 encoding.
use an alternative Base64 encoder implementation that works stream-based (the Apache Batik project seems to contain such an implementation: org.apache.batik.util.Base64EncoderStream)

When you read the file content into buffer you get len bytes. When base64 encoding this you get more than len bytes, but you still only write len bytes to the file. This beans that the last part of your read chunks will be truncated.

Also, if your read does not fill the entire buffer you should not base64 encode more than len bytes as you will otherwise get trailing 0s in the padding of the last bytes.

Combining the information above this means that you must base64 encode the whole file (read it all into a byte[]) unless you can guarantee that each chunk you read can fit exactly into a base64 encoded message. If your files are not very large I would recommend reading the whole file.

A smaller problem is that when reading in your loop you should probably check for "> -1", not "> 0", but int his case it does not make a difference.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow