Question

I am interested in knowing the working of ELB. I have a client which sends CPU intensive request to the server. All the request forwarded to server passes via AWS ELB. I have created a autoscaling group for server with a scaling up policy with average CPU Utilization threshold of 80%.

For example, Client will be sending just 5 request via ELB. 1st request forwarded to the server and its CPU Utilization goes to 100%. Now, for second request a new instance is created if I am not wrong. This new instance need some time to initialize & starts processing so that its CPU Util goes to 100%. It means I have to put some artificial delay greater than the initialization time of instance between two consecutive request forwarded to ELB in order to have one server instance processing each request ? Correct me if I am wrong? What should be optimal delay added in that case ?

Also, Is CPU Util. threshold average for whole auto scaling group or one instance ? Is my scaling up policy works ?

Desired result would be each server instance processing one and all server run parallel. I am doing it to speed up the overall processing time using elasticity.

Thanks in anticipation

Was it helpful?

Solution

Using a load balancer, you would have to account for the time to launch and instance. It would probably make your solution a bit more complicated than it needs to be. Instead you may want to consider the following.

You have an instance that receives requests. If its not a lot of requests, it could even be a micro instance. This instance would then create jobs within SQS. (You could use another message queue system, but SQS metrics can be used in autoscaling).

Your worker instances would take a message from SQS and then process it. You would save the results to s3 or database depending what is best for your scenario (You may be able to deliver the results to a callback provided by the client).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top