Do records with the same key in two RDDs repartitioned by key reside in the same node in spark?
-
16-10-2019 - |
Solution
The tuple of one partition is always on the same node because a partition itself is impartible. So if you do a groupBy
or write your own partitioner which partitions by key, all records with the same key/partition number will be shuffled to the same node.
Otherwise, transformations like mapPartition which pass an iterator to a user defined function wouldn't work.
Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange