No traffic from ELB to one of the Auto Scaling Instances

https://stackoverflow.com/questions/9287133

29-04-2021
|

Question

We use Auto Scaling and it works pretty good for us, but this morning something happened to it. CPU Utilization of one of the Instances was about %0 for some reason which brought %100 of CPU Utilization to the rest of Instances in the same Availability Zone and it didn't scale up, because Average CPU Utilization of all Instances was about %70 while trigger should start new Instance when %80 is hit. ELB Instance health check is used as well, but this %0 Instance was healthy.

Is it possible to configure Auto Scaling to remove such Instances? We don't want to setup any custom cronjobs for check ups.

Auto Scaling Issue

Solution

Update 2

Is it possible to configure Auto Scaling to remove such Instances?

Yes, see below - according to your comments you have done this correctly already.

We don't want to setup any custom cronjobs for check ups.

Given your configuration is apparently correct (implying a respective issue with Auto Scaling and/or ELB), I'm afraid that it is not possible to avoid a custom solution by actively shutting unused instances down or facilitating as-set-instance-health, as already suggested in my initial answer below - the former is suggested by tribalcrossing's answer to ELB-Unhealthy instances taken OOS then removed from ELB automatically as well, which seems to address your situation:

We run a cronjob that's fired every 5 minutes to scan all of the servers in an ELB to check to see if it's been up for more than 5 minutes AND is unhealthy. When we find one, we shut it down. We've hadd issues of "dead" instances stuck in ELB and throwing off monitoring metrics that trigger autoscaling actions, and that cronjob has solved the problem for us.

Update 1

ELB Instance health check is used as well, but this %0 Instance was healthy.

Which health indicator are you referring to and how did you conclude the instance being healthy in turn?

It is important to realize, that Autoscaling and ELB measure healthy instances differently, see alighafour's response to Autoscaling not reacting to unhealthy instances:

ELB checks at the application layer while autoscaling checks at the machine layer.

This difference is further detailed in the AWS team's response to the linked question ELB-Unhealthy instances taken OOS then removed from ELB automatically (which addresses an inverse issue actually):

Autoscaling is looking at instance health - they'll take an instance down if the data shows that the instance is not healthy. They'll take it out of the ELB at that time and then shut down the instance.

ELB, on the other hand, is doing an application health check by reading in a file or doing a connection to a port. If the application fails a certain number of these checks, the instance continues to run, but the ELB won't send it any new traffic. The ELB continues to perform the health check - if the application instance becomes healthy again, it'll start routing traffic to it. ELB doesn't remove the instances from the ELB registration - it simply stops sending it traffic until it's healthy again. [emphasis mine]

Conclusion

It looks like the aforementioned scenario might apply to your experience indeed: ELB stopped sending traffic to your instance, because the ELB health check failed, while the Auto Scaling health check didn't see a problem with the instance as such; this might happen for example, if the ELB health check probes an Apache served webpage, which fails to respond for whatever reason (e.g. an Apache crash or else).

Solution

You need to configure the Auto Scaling Policy to base its health decision on both, the EC2 health status and the ELB health status, as outlined in section Creating a Health Check for Elastic Load Balancing within Maintaining Current Scaling Level:

By default, Auto Scaling uses the Amazon EC2 health status for all Auto-Scaling-managed instances. To also use the Elastic Load Balancer's health check, set the HealthCheckType property of the group to ELB:

% as-update-autoscaling-group myGroup –-health-check-type ELB

With this configuration in place, the instance is going to be considered unhealthy as soon as the ELB health check fails as well, and it will be replaced accordingly.

Initial Answer

Is it possible to have multiple triggers for one Auto Scaling Group?

Unfortunately not, see e.g. the AWS team response to How to set Multiple Triggers in Template:

Unfortunately, the Auto Scaling service only allows 1 trigger per Auto Scaling group and so we do not support having multiple triggers for the same group within a template at this time.

An alternative approach could be to implement a custom solution via as-set-instance-health, as mentioned in section Custom Health Check within Maintaining Current Scaling Level :

If you have your own health check system, you can integrate it with Auto Scaling. Use SetInstanceHealth to send the instance's health information directly from your system to Auto Scaling.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow