Cloudera CDH on EC2
-
12-12-2019 - |
Question
I am an aws newbie, and I'm trying to run Hadoop on EC2 via Cloudera's AMI. I installed the AMI, downloaded the cloudera-haddop-for-ec2-tools, and now I'm trying to configure
haddop-ec2-env.sh
It is asking for the following:
AWS_ACCOUNT_ID
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
EC2_KEYDIR
PRIVATE_KEY_PATH
when running:
./hadoop-ec2 launch-cluster my-cluster 10
i'm getting
AWS was not able to validate the provided access credentials
Firstly, I have the first 3 attributes for my own account. This is a corporate account, and I received an email with the access key id and secret access key for my email. Is it possible that my account doesn't have the proper permissions to do what is needed here. Exactly why does this script need my credentials? What does it need to do?
Secondly, where is the EC2 key dir? I've uploaded my key.pem file that amazon created for me, and hard coded that into the PRIVATE_KEY_PATH and chmod 400 on the .pem file. Is that the correct key that this script needs?
Any help is appreciated?
Sam
Solution
The cloudera ec2 tools heavily rely on the amazon ec2 api tools. Therefore, you must do the following:
1) Download amazon ec2 api tools from http://aws.amazon.com/developertools/351
2) Download cloudera ec2 tools from http://cloudera-packages.s3.amazonaws.com/cloudera-for-hadoop-on-ec2-0.3.0.tar.gz
3) Set the following env variables I am only giving Unix based examples
export EC2_HOME=<path-to-tools-from-step-1>
export $PATH=$PATH:$EC2_HOME/bin
export $PATH=$PATH:<path-to-cloudera-ec2-tools>/bin
export EC2_PRIVATE_KEY=<path-to-private-key.pem>
export EC2_CERT=<path-to-cert.pem>
4) In cloudera-ec2-tools/bin set the following variables
AWS_ACCOUNT_ID=<amazon-acct-id>
AWS_ACCESS_KEY_ID=<amazon-access-key>
AWS_SECRET_ACCESS_KEY=<amazon-secret-key>
EC2_KEYDIR=<dir-where-the-ec2-private-key-and-ec2-cert-are>
KEY_NAME=<name-of-ec2-private-key>
And then run
$ hadoop-ec2 launch-cluster my-hadoop-cluster 10
Which will create a hadoop cluster called "my-hadoop" with 10 nodes on multiple ec2 machines