Question

I am an aws newbie, and I'm trying to run Hadoop on EC2 via Cloudera's AMI. I installed the AMI, downloaded the cloudera-haddop-for-ec2-tools, and now I'm trying to configure

haddop-ec2-env.sh

It is asking for the following:

AWS_ACCOUNT_ID
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
EC2_KEYDIR
PRIVATE_KEY_PATH

when running:

./hadoop-ec2 launch-cluster my-cluster 10

i'm getting

AWS was not able to validate the provided access credentials

Firstly, I have the first 3 attributes for my own account. This is a corporate account, and I received an email with the access key id and secret access key for my email. Is it possible that my account doesn't have the proper permissions to do what is needed here. Exactly why does this script need my credentials? What does it need to do?

Secondly, where is the EC2 key dir? I've uploaded my key.pem file that amazon created for me, and hard coded that into the PRIVATE_KEY_PATH and chmod 400 on the .pem file. Is that the correct key that this script needs?

Any help is appreciated?

Sam

Was it helpful?

Solution

The cloudera ec2 tools heavily rely on the amazon ec2 api tools. Therefore, you must do the following:

1) Download amazon ec2 api tools from http://aws.amazon.com/developertools/351

2) Download cloudera ec2 tools from http://cloudera-packages.s3.amazonaws.com/cloudera-for-hadoop-on-ec2-0.3.0.tar.gz

3) Set the following env variables I am only giving Unix based examples

export EC2_HOME=<path-to-tools-from-step-1>
export $PATH=$PATH:$EC2_HOME/bin
export $PATH=$PATH:<path-to-cloudera-ec2-tools>/bin
export EC2_PRIVATE_KEY=<path-to-private-key.pem>
export EC2_CERT=<path-to-cert.pem>

4) In cloudera-ec2-tools/bin set the following variables

AWS_ACCOUNT_ID=<amazon-acct-id>
AWS_ACCESS_KEY_ID=<amazon-access-key>
AWS_SECRET_ACCESS_KEY=<amazon-secret-key>
EC2_KEYDIR=<dir-where-the-ec2-private-key-and-ec2-cert-are>
KEY_NAME=<name-of-ec2-private-key>

And then run

$ hadoop-ec2 launch-cluster my-hadoop-cluster 10

Which will create a hadoop cluster called "my-hadoop" with 10 nodes on multiple ec2 machines

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top