Question

gz files in one directory. I want to combine them in one big .gz file and unzip it and load it into HDFS.

For e.g. repo contains files a.gz,b.gz,c.gz. Now I want to combine them into one file called d.gz I want to unzip it and load into HDFS. These .gz files are CSV files.

To unzip it I know I can GZIPInput/OutputStream but how do I combine files into one big files in Java.

Please guide. Thanks in advance.

Was it helpful?

Solution

A gz file contains exactly one file. It's not meant to contain multiple files.

The best way to do this is TAR the files together then GZ the resulting TAR. TAR has command line options to automate this into a single operation. For Java, use jtar: https://code.google.com/p/jtar/

Alternatively, a ZIP file may be what you're looking for.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top