Why does part of my data go into the reducer directly without going into the combiner

StackOverflow https://stackoverflow.com/questions/15845029

  •  01-04-2022
  •  | 
  •  

Pergunta

I am using version 0.20.0 of hadoop.

I have set the combiner class successfully. And my program runs successfully.

However, I found about 5% of my data didn't go into the combiner after coming out from the mapper, this data go into the reducer directly. I don't know why?

Foi útil?

Solução

A note on the implementation of combiners in Hadoop: by default, the execution framework reserves the right to use combiners at its discretion. In reality, this means that a combiner may be invoked zero, one, or multiple times. In addition, combiners in Hadoop may actually be invoked in the reduce phase, i.e., after key-value pairs have been copied over to the reducer, but before the user reducer code runs. As a result, combiners must be carefully written so that they can be executed in these diff erent environments.

You can find this in section 2.4 of the PDF below

Data-Intensive Text Processing with MapReduce

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top