I understand that for including a combiner in Hadoop MapReduce the following line is included (which I have done already);

   conf.setCombinerClass(MyReducer.class);

What I don't understand is that where do I actually implement the functionality of the combiner. Do I create a combine{} method under MyReducer? such as the reduce method;

  public void reduce(Text key, Iterator<IntWritable> values,
  OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { }

Many thanks in advance!

有帮助吗?

解决方案

A Combiner should simply be a Reducer, and thusly implement the Reducer interface (there is no Combiner interface). Think of the combining step as a sort of intermediary reducing step between the Mapper and Reducer.

Take the Word Count example. From Yahoo's tutorial:

Word count is a prime example for where a Combiner is useful. The Word Count program in listings 1--3 emits a (word, 1) pair for every instance of every word it sees. So if the same document contains the word "cat" 3 times, the pair ("cat", 1) is emitted three times; all of these are then sent to the Reducer. By using a Combiner, these can be condensed into a single ("cat", 3) pair to be sent to the Reducer. Now each node only sends a single value to the reducer for each word -- drastically reducing the total bandwidth required for the shuffle process, and speeding up the job. The best part of all is that we do not need to write any additional code to take advantage of this! If a reduce function is both commutative and associative, then it can be used as a Combiner as well.

Hope that helps.

其他提示

Considering your snippet, you just need to implement your reduce() method as usual, there is nothing special to do here. However, keep in mind that the combiner function is an optimization. This means that Hadoop doe'ot provide a guarantee of how many times it will call it for a particular map output. It may not call it at all.

If you check the API of Hadoop Reducer class, you will find the reduce() method. No combine() or whatever else method to override.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top