Question

A Combiner runs after the Mapper and before the Reducer,it will receive as input all data emitted by the Mapper instances on a given node. then emits output to the Reducers.

And also,If a reduce function is both commutative and associative, then it can be used as a Combiner.

My Question is what does the phrase "commutative and associative" mean in this situation?

Was it helpful?

Solution

Assume you have a list of numbers, 1 2 3 4 5 6.

Associative here means you can take your operation and apply it to any subgroup, then apply it to the result of those and get the same answer:

(1) + (2 + 3) + (4 + 5 + 6)
  ==
(1 + 2) + (3 + 4) + (5) + (6)
  ==
...

Think of the parenthesis here as the execution of a combiner.

Commutative means that the order doesn't matter, so:

1 + 2 + 3 + 4 + 5 + 6
  ==
2 + 4 + 6 + 1 + 2 + 3
  ==
...

For example, addition, fits this property, as seen before. "Maximum" fits this property above as well, because the max of maxs is the max. max(a,b) == max(b,a).

Median is an example that doesn't work: the median of medians is not the true median.


Don't forget another important property of a combiner: the input types for the key/value and the output types of the key/value need to be the same. For example, you can't take in a string:int and return a string:float.

Often times, the reducer might output some sort of string instead of numerical value, which may prevent you from just plugging in your reducer as the combiner.

OTHER TIPS

For commutativity, let's say your reducer can be represented by a function (in the mathematical term) called f(). Then your reducer is commutative if f(a, b) = f(b, a) For instance:

  • sum(A, B) is the same as sum(B, A)
  • xor(A, B) is the same as xor(B, A)
  • concat(A, B) is not the same as concat(B, A)

For associativity, the property is that f(f(a, b), c) = f(a, f(b, c)). For example:

  • (A + B) + C is the same as A + (B + C)
  • (A - B) - C is not the same as A - (B - C)

So in the context of Map/Reduce, your reducer has to respect these 2 properties. For example, if your reducer is doing just a sum(), or a max(), it respects both properties, but something like mean() or median() does not, and thus you can not use it as a combiner.

I personally see combiners as mini-reducers that run in memory after the map phase as an optimization to reduce network traffic, and the commutativity/associativity actually makes sense if you see a Map/Reduce this way:

enter image description here

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top