Hadoop mapper compress output doesn't work?

https://stackoverflow.com/questions/18820239

28-06-2022
|

Question

I am using hadoop cdh4.1.2, and my mapper program is almost a echo of input data. But in my job status page, I saw

FILE: Number of bytes written  3,040,552,298,327

is almost equals to

FILE: Number of bytes read 3,363,917,397,416

for mappers, while I have already set

conf.set("mapred.compress.map.output", "true");

it seems them compressing algorithm does not work for my job? why is this?

Solution

Does your job have a reducer?

If so, check 'Reduce shuffle bytes'.If that is considerably less than(1/5th or so) 'Map output bytes', you may assume map output is compressed.Compression happens after map is done,So, it might be showing actual data size it has output and not the compressed size.

If you still have doubt on whether it is working,submit the job with and without compression and compare 'Reduce shuffle bytes'.As far as map output compression is concerned 'Reduce shuffle bytes' is all that matters.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow