Question

i am learning partitioner concept now.can any one explain me the below piece of code.it is hard for me to understand

public class TaggedJoiningPartitioner extends Partitioner<TaggedKey,Text> {

    @Override
    public int getPartition(TaggedKey taggedKey, Text text, int numPartitions) {
        return taggedKey.getJoinKey().hashCode() % numPartitions;
    }
}

how this taggedKey.getJoinKey().hashCode() % numPartitions determine which reducer to be executed for a key?

can any one explain me this?

Was it helpful?

Solution

It's not as complex as you think once you break things down a little bit.

taggedKey.getJoinKey().hashCode() will simply return an integer. Every object will have a hashCode() function that simply returns a number that will hopefully be unique to that object itself. You could look into the source code of TaggedKey to see how it works if you'd like, but all you need to know is that it returns an integer based on the contents of the object.

The % operator performs modulus division, which is where you return the remainder after performing division. (8 % 3 = 2, 15 % 7 = 1, etc.).

So let's say you have 3 partitioners (numPartitions = 3). Every time you do modulus division with 3, you'll get either 0, 1, or 2, no matter what number is passed. This is used to determine which of the 3 partitioners will get the data.

The whole idea of partitioners is that you can use them to group data to be sorted. If you wanted to sort by month, you could pass every piece of data with the string "January" to the first partition, "December" to the 12th partitioner, etc. But in your case it on the outside looks a bit confusing. But really they just want to spread the data out (hopefully) evenly, so they're using a simple hash/modulus function to choose the partition at random.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top