Domanda

Using Hadoop MapReduce

I have a list as input:

  1. A
  2. B
  3. C

And I want to get the Cartesian product of the list with itself:

  • A => A,f(A,A)
  • A => B,f(A,B)
  • A => C,f(A,C)
  • B => A,f(B,A)
  • B => B,f(B,B)
  • B => C,f(B,C)
  • C => A,f(C,A)
  • C => B,f(C,B)
  • C => C,f(C,C)

f() is a function that gives a value for a pair of keys.

How do I do that a in a simple manner using Hadoop MapReduce in Java?

Of course I can't hold the entire input list in memory.

Thanks!!

È stato utile?

Soluzione

You can implement it in Java map reduce. Let us assume, you want to do cross product between two files A and B with splits 3 and 4 respectively. Then you have to write custom input format that splits up the two datasets and then ensured there was a SPLIT for each subset of data.

So your splits would look like:

 A1 X B1
 A1 X B2
 A1 X B3
 A1 X B4
 A2 X B1
 A2 X B2
 A2 X B3
 A2 X B4
 A3 X B1
 A3 X B2
 A3 X B3
 A3 X B4

Use link https://github.com/adamjshook/mapreducepatterns/blob/master/MRDP/src/main/java/mrdp/ch5/CartesianProduct.java for your reference.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top