Question

I have to archive the HDFS files frequently. The files have to be compressed in the Bunzip format using Java code. Now, what I did is the following:

  1. Move the input files to a local location hdfs.moveToLocalFile
  2. bzip using the bzip2 command.
  3. Move the .bz2 files to the HDFS to another locationhdfs.moveFromLocalFile.

I'm using Hadoop 1.1.2 version. Is there any API available to bzip the files directly, without local copy and BZip?

Also now I'm using the linux shell command to BZip the files. Can somebody help me how to do the BZip command using Java code?

Was it helpful?

Solution

public void addFile(String source, String destination, Configuration paramConfiguration) throws IOException, URISyntaxException {
    FileSystem localFileSystem = FileSystem.get(paramConfiguration);
    String str1 = paramString1.substring(source.lastIndexOf('/') + 1, source.length());
    if (destination.charAt(destination.length() - 1) != '/') {
        destination = destination + "/" + str1;
    } else {
        destination = destination + str1;
    }
    BZip2Codec localBZip2Codec = new BZip2Codec();
    String str2 = localBZip2Codec.getDefaultExtension();
    Path localPath = new Path(paramString2 + str2);

    CompressionOutputStream localCompressionOutputStream = localBZip2Codec.createOutputStream(localFileSystem.create(localPath));

    IOUtils.copyBytes(localFileSystem.open(new Path(paramString1)), localCompressionOutputStream, 4096, true);
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top