Why does part of my data go into the reducer directly without going into the combiner

StackOverflow https://stackoverflow.com/questions/15845029

  •  01-04-2022
  •  | 
  •  

Question

I am using version 0.20.0 of hadoop.

I have set the combiner class successfully. And my program runs successfully.

However, I found about 5% of my data didn't go into the combiner after coming out from the mapper, this data go into the reducer directly. I don't know why?

Was it helpful?

Solution

A note on the implementation of combiners in Hadoop: by default, the execution framework reserves the right to use combiners at its discretion. In reality, this means that a combiner may be invoked zero, one, or multiple times. In addition, combiners in Hadoop may actually be invoked in the reduce phase, i.e., after key-value pairs have been copied over to the reducer, but before the user reducer code runs. As a result, combiners must be carefully written so that they can be executed in these diff erent environments.

You can find this in section 2.4 of the PDF below

Data-Intensive Text Processing with MapReduce

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top