Question

I've been having a problem with my aws ec2 ubuntu instances, they always have a 100% cpu utilization over certain amount of time (around 8 hours) until i restart it.

The instance is ubuntu server 13.04 and it has a basic LAMP, that's all. I have a cron job to do a ping every couple minutes to keep a VPN tunnel up, but it shouldn't be doing this.

When it's at 100% cpu utilization i can't ping it, ssh into it or browse it, but it doesn't rejects the connection, it just keep "trying".

Any idea what's the reason behind it? Im guessing it has something to do with Amazon throttling the instance, but it's weird that it has a 100% cpu use over 8 hours.

This is the CPU log of the instance, every other indicator seems normal.

I cant attach images here, so i'm posting a link

100% cpu utilization

EDIT

This happened to me before with other instances, and right now i have an Amazon Linux AMI running at 100% for 4 days straight now, and that one only has tomcat, with no apps deployed. I just realized, its unresponsive, im terminating it.

Was it helpful?

Solution

Author's note, 2019: this post was originally written in 2013, and is about the t1.micro instance type. The current EC2 free tier now allows you to choose either the t1.micro or t2.micro instance class. Unlike the t1.micro's intermittent hard-clamping behavior, the t2.micro runs continuously at full capacity until your CPU credit balance nears depletion, and degrades much more gracefully.


This is the expected behavior. See t1.micro Instances in the EC2 User Guide for Linux Instances.

Note the graphs that say "CPU level limited." I have measured this, and if you consume 100% cpu on a micro instance for more than about 15 seconds, the throttling kicks in and your available cycles drop from 2 ECU to approximately 0.2 ECU (roughly 200MHz) for the next 2-3 minutes, at which point the cycle repeats and you'll be throttled again in just a few seconds if you are still pulling hard on the processor.

During throttle time, you only get ~1/10th of the cycles compared to when you are getting peak performance, because the hypervisor "steals" the rest¹... so you are still going to see that you were using a solid 100%... because you were using all that were available. It doesn't take much to pin a micro to the ceiling. Or floor... so either you are asking too much of the instance class or you have something unexpectedly maxing your CPU.

Establish an SSH connection while the machine is responsive, start "top" running, and then stay connected, so that when it starts to slow down, you already have the tool going that you need to use to find out what's the cpu hog.


¹ hypervisor steals the rest: A common misconception at one time was that the time stolen from EC2 instances by the hypervisor (visible in top and similar utilities) was caused by "noisy neighbors" -- other instances on the same hardware competing for CPU cycles. This is not the cause of stolen cycles. For some older instance families, like the m1, stolen cycles would be seen if AWS had provisioned your instance on a host machine that had faster processors than those specified for the instance class; the cycles were stolen so that the instance had performance matching what you were paying for, rather than the performance of the actual underlying hardware. EC2 instances don't share the physical resourced underlying your virtualized CPU resources.

OTHER TIPS

Run top and see how high st (or steal) is. If st is at 97%, then you are being throttled and only have 3% of your CPU to work with. You don't need to be doing anything CPU intensive for that to be slow!

If that is the case and you cannot change how much CPU you require, the only fix is upgrade to a small instance. Small instances do not have as much throttling.

http://theon.github.io/you-may-want-to-drop-that-ec2-micro-instance.html

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top