My question is how to set this "hive.exec.parallel.thread.number" option in an Amazon EMR cluster on startup?
add configuration into hive-site.xml
(In my case, file path is ./.versions/hive-0.11.0/conf/hive-site.xml
)
<property>
<name>hive.exec.parallel</name>
<value>true</value>
<description>Whether to execute jobs in parallel</description>
</property>
If they are not equivalent in my case, will one of these strategies yield better performance?
It's different. This property controls different stages paralle in one hive job, so performance depends on specific hive query.