Domanda

I am setting an environment variable in my bootstrap code

export HADOOP_HOME=/home/hadoop
export HADOOP_CMD=/home/hadoop/bin/hadoop
export HADOOP_STREAMING=/home/hadoop/contrib/streaming/hadoop_streaming.jar
export JAVA_HOME=/usr/lib64/jvm/java-7-oracle/

This is followed by usage of one of the variables defined above -

$HADOOP_CMD fs -mkdir /home/hadoop/contents
$HADOOP_CMD fs -put /home/hadoop/contents/* /home/hadoop/contents/

The execution fails with the error message -

/mnt/var/lib/bootstrap-actions/2/cycle0_unix.sh: line 3: fs: command not found
/mnt/var/lib/bootstrap-actions/2/cycle0_unix.sh: line 4: fs: command not found

cycle0.sh is the name of my bootstrap script.

Any comments as to what is happening here?

È stato utile?

Soluzione

I found a proper solution to my problem. My attempt to copy datafiles from S3 to EMR using hadoop fs commands has been futile. I have just learned about S3DistCp command available in EMR for file transfer so I am skipping the $HADOOP_CMD method. For those who care how S3DistCp works Link to AWS EMR Docs. I still do not understand why bootstrap script will not accept an environment variable in subsequent statements.

Altri suggerimenti

To get back to the topic of the question, it seems that environment variables can't be set from any bootstrap code, they can only be set or updated from a script that must be named

hadoop-user-env.sh

More details here: http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hadoop-config_hadoop-user-env.sh.html

I think you don't need the environment variable. just change

fs

to

hadoopfs

You configure such Spark-specific (and other) environment variables with classifications, see https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps.html

Another (rather dirty) option is to enrich bashrc with some export FOO=bar in the bootstrap action.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top