Amazon AWS micro instance with 100% cpu and unresponsive

Question 1

Author's note, 2019: this post was originally written in 2013, and is about the t1.micro instance type. The current EC2 free tier now allows you to choose either the t1.micro or t2.micro instance class. Unlike the t1.micro's intermittent hard-clamping behavior, the t2.micro runs continuously at full capacity until your CPU credit balance nears depletion, and degrades much more gracefully.

This is the expected behavior. See t1.micro Instances in the EC2 User Guide for Linux Instances.

Note the graphs that say "CPU level limited." I have measured this, and if you consume 100% cpu on a micro instance for more than about 15 seconds, the throttling kicks in and your available cycles drop from 2 ECU to approximately 0.2 ECU (roughly 200MHz) for the next 2-3 minutes, at which point the cycle repeats and you'll be throttled again in just a few seconds if you are still pulling hard on the processor.

During throttle time, you only get ~1/10th of the cycles compared to when you are getting peak performance, because the hypervisor "steals" the rest¹... so you are still going to see that you were using a solid 100%... because you were using all that were available. It doesn't take much to pin a micro to the ceiling. Or floor... so either you are asking too much of the instance class or you have something unexpectedly maxing your CPU.

Establish an SSH connection while the machine is responsive, start "top" running, and then stay connected, so that when it starts to slow down, you already have the tool going that you need to use to find out what's the cpu hog.

¹ hypervisor steals the rest: A common misconception at one time was that the time stolen from EC2 instances by the hypervisor (visible in top and similar utilities) was caused by "noisy neighbors" -- other instances on the same hardware competing for CPU cycles. This is not the cause of stolen cycles. For some older instance families, like the m1, stolen cycles would be seen if AWS had provisioned your instance on a host machine that had faster processors than those specified for the instance class; the cycles were stolen so that the instance had performance matching what you were paying for, rather than the performance of the actual underlying hardware. EC2 instances don't share the physical resourced underlying your virtualized CPU resources.

Question 2

Run top and see how high st (or steal) is. If st is at 97%, then you are being throttled and only have 3% of your CPU to work with. You don't need to be doing anything CPU intensive for that to be slow!

If that is the case and you cannot change how much CPU you require, the only fix is upgrade to a small instance. Small instances do not have as much throttling.

http://theon.github.io/you-may-want-to-drop-that-ec2-micro-instance.html