Question

In the map phase of my program, I need to know the total number of mappers that are created. This will help me in the key creation process of the map (I want to emit as many key-value pairs for each object as the number of mappers).

I know that setting the number of mappers is just a hint, but what is the way to get the actual number of mappers. I tried the following in the configure method of my Mapper:

public void configure(JobConf conf) {
    System.out.println("map tasks: "+conf.get("mapred.map.tasks"));
    System.out.println("tipid: "+conf.get("mapred.tip.id"));
    System.out.println("taskpartition: "+conf.get("mapred.task.partition"));
}

But I get the results:

map tasks: 1
tipid: task_local1204340194_0001_m_000000
taskpartition: 0
map tasks: 1
tipid: task_local1204340194_0001_m_000001
taskpartition: 1

which means (?) that there are two map tasks, and not just one, as printed (which is quite natural, since I have two small input files). Shouldn't the number after map tasks be 2?

For now, I just count the number of files in the input folder, but this is not a good solution, since a file could be larger than the block size and result in more than one input splits and hence mappers. Any suggestions?

Was it helpful?

Solution

Finally, it seems that conf.get("mapred.map.tasks")) DOES work after all, when I generate an executable jar file and run my program in the cluster/locally. Now the output of "map tasks" is correct.

It did not work only when running my mapreduce program locally on hadoop from the eclipse-plugin. Maybe it is an eclipse-plugin's issue.

I hope this will help someone else having the same issue. Thank you for your answers!

OTHER TIPS

I don't think there is an easy way to do this. I've implemented my own InputFormat class, if you do that you can implement a method to count the number of InputSplits which you can request in the process that starts the job. If you put that number in some Configuration setting, you can read it in your mapper process.

btw the number of input files is not always the number of mappers, as large files can be split.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top