Question

I am using hadoop cdh4.1.2, and my mapper program is almost a echo of input data. But in my job status page, I saw

FILE: Number of bytes written  3,040,552,298,327

is almost equals to

FILE: Number of bytes read 3,363,917,397,416

for mappers, while I have already set

conf.set("mapred.compress.map.output", "true");

it seems them compressing algorithm does not work for my job? why is this?

Was it helpful?

Solution

Does your job have a reducer?

If so, check 'Reduce shuffle bytes'.If that is considerably less than(1/5th or so) 'Map output bytes', you may assume map output is compressed.Compression happens after map is done,So, it might be showing actual data size it has output and not the compressed size.

If you still have doubt on whether it is working,submit the job with and without compression and compare 'Reduce shuffle bytes'.As far as map output compression is concerned 'Reduce shuffle bytes' is all that matters.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top