Question

I'm running a MapReduce task on Gzipped .arc files. Similar to this question, I'm having difficulties as the Gzip decompression is running automatically (since files have a .gz extension), but it is causing issues on newline/carriage-return being rendered as just newline as per Unix file encoding. This makes the input completely unreadable since it depends on specific character counts embedded in the file. I am trying to disable the Gzip decompression so I can do it instead in my mapper, correctly. I have tried:

 -jobconf stream.recordreader.compression=none

But that doesn't seem to affect the compression. Is there any way I can prevent Gzip decompression on my input?

Thanks, -Geoff

Was it helpful?

Solution

I've identified the potential problem, and a work around on the question you've referenced:

Basically its a problem in PipeMapper.java, which you can easily amend

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top