Question

When I create a streaming job with Amazon Elastic MapReduce (Amazon EMR), using the Ruby command line interface, how can I specify to use only EC2 spot instances (except for master)? The command below is working, but it "forces" me to use at lease 1 core instance...

./elastic-mapreduce --create --stream          \
--name    n2_3                             \
--input   s3://mr/neuron/2              \
--output  s3://mr-out/neuron/2          \
--mapper  s3://mr/map.rb         \
--reducer s3://mr/noop_reduce.rb \
--instance-group master --instance-type m1.small --instance-count 1 \
--instance-group core   --instance-type m1.small --instance-count 1 \
--instance-group task   --instance-type m1.small --instance-count 18 --bid-price 0.028

Thanks

Was it helpful?

Solution

Both CORE and TASKS nodes run TaskTrackers but only CORE nodes run DataNodes so, yes, you need at least one CORE node.

So you could run spot core nodes?

./elastic-mapreduce --create --stream \
...
--instance-group master --instance-type m1.small --instance-count 1 \
--instance-group core   --instance-type m1.small --instance-count 19 --bid-price 0.028

p.s. you also could run one CORE and many TASK nodes but, depending on how much reading/writing you're doing, you'll have pain since 18 nodes will be reading/writing to 1 node.

# expect problems....
./elastic-mapreduce --create --stream \
...
--instance-group master --instance-type m1.small --instance-count 1 \
--instance-group core   --instance-type m1.small --instance-count 1  --bid-price 0.028
--instance-group task   --instance-type m1.small --instance-count 18 --bid-price 0.028
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top