How to tell if I am about to run Hadoop streaming job on a cluster or in “local” mode?

https://stackoverflow.com/questions/8686284

10-04-2021
|

Domanda

Hadoop streaming will run the process in "local" mode when there is no hadoop instance running on the box. I have a shell script that is controlling a set of hadoop streaming jobs in sequence and I need to condition copying files from HDFS to local depending on whether the jobs have been running locally or not. Is there a standard way to accomplish this test? I could do a "ps aux | grep something" but that seems ad-hoc.

Soluzione 2

Rather than trying to detect at run time which mode the process is operating, it is probably better to wrap the tool you are developing in a bash script that explicitly selects local vs cluster operatide. The O'Reilly Hadoop describes how to explicitly choose local using a configuration file override:

hadoop v2.MaxTemperatureDriver -conf conf/hadoop-local.xml input/ncdc/micro max-temp

where conf-local.xml is an XML file configured for local operation.

Altri suggerimenti

Hadoop streaming will run the process in "local" mode when there is no hadoop instance running on the box.

Can you pl point to the reference for this?

A regular or a streaming job will run the way it is configured, so we know ahead of time in which mode a Job is run. Check the documentation for configuring Hadoop on a Single Node and Cluster in different modes.

I haven't tried this yet, but I think you can just read out the mapred.job.tracker configuration setting.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow