Frage

I have two List each of which contains a sequence of Tuples. I'm trying to run a function against coresponding elements of the List.

  val l1 = List(("a" , Seq( ("link1", 2) , ("link2" , 4) )))
                                                  //> l1  : List[(String, Seq[(String, Int)])] = List((a,List((link1,2), (link2,3)
                                                  //| )))
  val l2 = List(("b" , Seq( ("link1", 2) , ("link3" , 3) , ("link2" , 3) )))
                                                  //> l2  : List[(String, Seq[(String, Int)])] = List((b,List((link1,2), (link3,3)
                                                  //| , (link2,3))))

So I'm trying to group two lists to below format and then apply function against the corresponding elements :

  l1Grouped = Seq( ("link1", 2) , ("link2" , 4)
  l2Grouped = Seq( ("link1", 2) , ("link2" , 3)

Once the elements are in above format I can use zip to apply the function.

"link3" is not contained in either List since it is just contained in l2

To achieve this I'm trying intersect function below to group the items :

 l1(0)._2.intersect(l2(0)._2)                    //> res0: Seq[(String, Int)] = List((link1,2), (link2,3))

But this adds the similar items once.

How can I function be run against corresponding elements of vars l1 & l2 defined above ?

In practice l1 & 2 are of type spark obj RDD , I'm using a List in this example for testing but same solution should be compatible with an RDD.

War es hilfreich?

Lösung

Neither intersect or zip is needed:

val map = l1(0)._2.toMap
for {
  (k, v1) <- l2(0)._2
  v2 <- map.get(k)
} yield ... // Return a value based on v1 and v2

We store the elements from list one into a map, then interate over list two, only returning values if they also exist in the map.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top