Question

What is the difference between a mapper and a map task? Similarly, a reducer and a reduce task? Also, how are number of mappers,maptasks,reducers,reducetasks determined during the execution of a mapreduce task? Give interrelationships between them if there is any.

Was it helpful?

Solution

Simply map task is an instance of Mapper. Mapper and reducer are methods in mapreduce jobs.

When we run a mapreduce job, number of map tasks spawned depends on the number blocks(number of blocks depend on input splits) in the input. However the number of reduce tasks can be specified in the mapreduce driver code. Either it can be specified by setting property mapred.reduce.tasks in the job configuration object or org.apache.hadoop.mapreduce.Job#setNumReduceTasks(int reducerCount); method can be used.

In the old JobConf API setNumMapTasks() method was there. But setNumMapTasks() method is removed in the new API org.apache.hadoop.mapreduce.Jobwith the intension of number of mappers should be calculated based on the input splits.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top