Environment variables set in bootstrap does not take effect in AWS EMR

https://stackoverflow.com/questions/23115680

04-07-2023
|

Question

I am setting an environment variable in my bootstrap code

export HADOOP_HOME=/home/hadoop
export HADOOP_CMD=/home/hadoop/bin/hadoop
export HADOOP_STREAMING=/home/hadoop/contrib/streaming/hadoop_streaming.jar
export JAVA_HOME=/usr/lib64/jvm/java-7-oracle/

This is followed by usage of one of the variables defined above -

$HADOOP_CMD fs -mkdir /home/hadoop/contents
$HADOOP_CMD fs -put /home/hadoop/contents/* /home/hadoop/contents/

The execution fails with the error message -

/mnt/var/lib/bootstrap-actions/2/cycle0_unix.sh: line 3: fs: command not found
/mnt/var/lib/bootstrap-actions/2/cycle0_unix.sh: line 4: fs: command not found

cycle0.sh is the name of my bootstrap script.

Any comments as to what is happening here?

Solution

I found a proper solution to my problem. My attempt to copy datafiles from S3 to EMR using hadoop fs commands has been futile. I have just learned about S3DistCp command available in EMR for file transfer so I am skipping the $HADOOP_CMD method. For those who care how S3DistCp works Link to AWS EMR Docs. I still do not understand why bootstrap script will not accept an environment variable in subsequent statements.

OTHER TIPS

To get back to the topic of the question, it seems that environment variables can't be set from any bootstrap code, they can only be set or updated from a script that must be named

hadoop-user-env.sh

More details here: http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hadoop-config_hadoop-user-env.sh.html

I think you don't need the environment variable. just change

fs

hadoopfs

You configure such Spark-specific (and other) environment variables with classifications, see https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps.html

Another (rather dirty) option is to enrich bashrc with some export FOO=bar in the bootstrap action.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow