What vaue to be set for mappers and reducers while executing jobs in hadoop and how to decide it?

https://stackoverflow.com/questions/10493034

06-06-2021
|

题

I am running Hive jobs on hadoop cluster. I just came to know to know that the performance will get improve/change if you just concentrate on different behavior mapper and reducer. But I haven't played with it until. Until no I just played with Hive and executing queries with default mapper and reducer?

As I know about the mapper and reducer I am worried that what value to be set the mapper and reducer so that performance will get vary. I also thinking that is it need to set to master node only or we have to set for all nodes?

Anyone who has idea related to this please explain me scenario about this.

Also what are the other parameters do we need to set while executing jobs?

解决方案

In best of my understanding number of mappers is not something you set per job. It is calculated by JobTracker taking into account number of slots per node (something you set Cluster wide in the MapRed-site.xml , number of splits you have, and other jobs (if you use Fair of Capacity Scheduler - you queue parameters are also taken into account).
Number of reducers affects results and thereof you can set it per job. by following command
set mapred.reduce.tasks=128

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow