How to configure Hadoop MapReduce mapper output compression if I use org.apache.hadoop.mapreduce (new) API?

StackOverflow https://stackoverflow.com/questions/17363060

Question

Is it possible to turn on mapper output compression on new mapreduce API and if so could you please point how? I see lot of examples doing so based on hadoop.mapred.JobConf API but no one for mapreduce API.

If it is not configurable through new API can I do something to get it to work?

Was it helpful?

Solution

You can use the following codes to enable the map output compression:

public static void enableMapOutputCompress(Job job) {
    job.getConfiguration().setBoolean("mapred.compress.map.output", true);
    job.getConfiguration().setClass("mapred.map.output.compression.codec",
            SnappyCodec.class, CompressionCodec.class);
}

You can change org.apache.hadoop.io.compress.SnappyCodec to other compression class, for example: org.apache.hadoop.io.compress.GzipCodec, org.apache.hadoop.io.compress.LzoCodec.

I suggest to use SnappyCodec.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top