Question

Using Hadoop MapReduce

I have a list as input:

  1. A
  2. B
  3. C

And I want to get the Cartesian product of the list with itself:

  • A => A,f(A,A)
  • A => B,f(A,B)
  • A => C,f(A,C)
  • B => A,f(B,A)
  • B => B,f(B,B)
  • B => C,f(B,C)
  • C => A,f(C,A)
  • C => B,f(C,B)
  • C => C,f(C,C)

f() is a function that gives a value for a pair of keys.

How do I do that a in a simple manner using Hadoop MapReduce in Java?

Of course I can't hold the entire input list in memory.

Thanks!!

Was it helpful?

Solution

You can implement it in Java map reduce. Let us assume, you want to do cross product between two files A and B with splits 3 and 4 respectively. Then you have to write custom input format that splits up the two datasets and then ensured there was a SPLIT for each subset of data.

So your splits would look like:

 A1 X B1
 A1 X B2
 A1 X B3
 A1 X B4
 A2 X B1
 A2 X B2
 A2 X B3
 A2 X B4
 A3 X B1
 A3 X B2
 A3 X B3
 A3 X B4

Use link https://github.com/adamjshook/mapreducepatterns/blob/master/MRDP/src/main/java/mrdp/ch5/CartesianProduct.java for your reference.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top