Environment variables set in bootstrap does not take effect in AWS EMR

https://stackoverflow.com/questions/23115680

04-07-2023
|

문제

I am setting an environment variable in my bootstrap code

export HADOOP_HOME=/home/hadoop
export HADOOP_CMD=/home/hadoop/bin/hadoop
export HADOOP_STREAMING=/home/hadoop/contrib/streaming/hadoop_streaming.jar
export JAVA_HOME=/usr/lib64/jvm/java-7-oracle/

This is followed by usage of one of the variables defined above -

$HADOOP_CMD fs -mkdir /home/hadoop/contents
$HADOOP_CMD fs -put /home/hadoop/contents/* /home/hadoop/contents/

The execution fails with the error message -

/mnt/var/lib/bootstrap-actions/2/cycle0_unix.sh: line 3: fs: command not found
/mnt/var/lib/bootstrap-actions/2/cycle0_unix.sh: line 4: fs: command not found

cycle0.sh is the name of my bootstrap script.

Any comments as to what is happening here?

해결책

I found a proper solution to my problem. My attempt to copy datafiles from S3 to EMR using hadoop fs commands has been futile. I have just learned about S3DistCp command available in EMR for file transfer so I am skipping the $HADOOP_CMD method. For those who care how S3DistCp works Link to AWS EMR Docs. I still do not understand why bootstrap script will not accept an environment variable in subsequent statements.

다른 팁

To get back to the topic of the question, it seems that environment variables can't be set from any bootstrap code, they can only be set or updated from a script that must be named

hadoop-user-env.sh

More details here: http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hadoop-config_hadoop-user-env.sh.html

I think you don't need the environment variable. just change

fs

hadoopfs

You configure such Spark-specific (and other) environment variables with classifications, see https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps.html

Another (rather dirty) option is to enrich bashrc with some export FOO=bar in the bootstrap action.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow