Question

What are the priorities of the following 3 options for setting number of reduces? In other words, if all three are set, which one will be taken into account?

Option1:

setNumReduceTasks(2) within the application code

Option2:

-D mapreduce.job.reduces=2 as command line argument

Option3:

through $HADOOP_CONF_DIR/mapred-site.xml file

 <property>
  <name>mapreduce.job.reduces</name>
  <value>2</value>
 </property>
Was it helpful?

Solution 2

You have them racked in priority order - option 1 will override 2, and 2 will override 3. In other words Option 1 will be the one used by your job in this scenario

OTHER TIPS

According to the Hadoop - The Definitive Guide

The -D option is used to set the configuration property with key color to the value yellow. Options specified with -D take priority over properties from the configuration files. This is very useful because you can put defaults into configuration files and then override them with the -D option as needed. A common example of this is setting the number of reducers for a MapReduce job via -D mapred.reduce.tasks=n. This will override the number of reducers set on the cluster or set in any client-side configuration files.

First Priority: Passing configuration parameters through command line (while submitting MR Application)

Second Priority: Setting configuration parameters in application code

Third Priority: It will read default parameters from multiple xml files such as core-site.xml, hadoop-env.sh, hdfs-site.xml, log4j.properties and mapred-site.xml

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top