While configuring a Map Reduce job, I know that one can set the number of reduce tasks by using the method job.setNumReduceTasks(2);.

Can we set the number of map tasks?

I don't see any methods to do this.

If there is no such functionality, does someone know why this framework has the ability to have more than 1 reduce task, but not more than 1 map task?

有帮助吗?

解决方案

There used to be property for setting the number of map tasks, which was setNumMapTasks. Bur it was merely a hint to the framework, and could not guarantee that you'll get only the specified number of maps. The map creation is actually governed by the InputFormat you are using in your job. And this is the reason it is not supported anymore.

If you are not happy with the number of mappers created by the framework, you could try tweaking the values of following 2 properties as per your requirements :

- mapred.min.split.size
- mapred.max.split.size

其他提示

Number of map tasks is not something the programmer sets,rather its something that the hadoop framework,in particular the TaskTracker that creates as many mappers as the number of input splits(generally of 64mb but can be changed) of the InputFile from HDFS...

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top