Question

The setup: EC2 servers autoscaling behind ELB, connecting to RDS mysql database, all static files served from cloudfront.

I'm running nginx as the web server on the EC2 servers, keepalive set to 20, worker processes 4, . Codeigniter is the backend and using codeigniter sessions.

I've been running lots of benchmarks to attempt to test the performance, siege, apache benchmark, blitz.io.

I'm testing two particular pages, the first performance is extremely good, it uses codeigniter sessions so hits the database to read and update the ci_sessions database. The second page is the one I'm having trouble with, it runs a query with several joins which complete in roughly 0.4 seconds with a single user. This query is optimised, and I'm using InnoDB tables. Under apache benchmark with c10 and n1000 100% of requests come back within 634 ms.

When I run concurrent users > 200 I start running into problems. Adding more EC2 servers doesn't help, the CPU's are around 50% utilised. The RDS database monitoring shows also the CPU and memory usage is less than 70%, and the average DB connections is < 35.

Performance has been improved by moving to a large RDS instance and large EC2 instances, this makes me wonder whether I/O is coming into play here.

If I boot up a server outside of the ELB during load tests and hit this page it comes back in less than a second, but if I fire up another server within the ELB it remains up to 4 or 5 seconds. This suggests that I'm not overloading the RDS.

I tried ramping up the ELB slowly with 5 minute bursts and this didn't seem to help.

I'm wondering where to look next for this problem, whether it's some kind of I/O issue or something else because the RDS, and EC2 servers don't seem pushed to their capabilities. Any suggestions or ideas where to look next would be much appreciated

Was it helpful?

Solution

Okay. Well this is a very broad subject as you know. But I will try and help.

  1. The ELB is generally not very good at burst scaling. After speaking with Amazon engineers about this, I figured out they actually won't scale the ELB on bursts because it is not possible. You need to have consistent load over time to get the ELB scaled up. Because of this, I switched to haproxy. In addition to the ELB not scaling on burst load, it also uses a CNAME for the DNS lookup which is going to affect your performance as well. So if you are planning on having burst load often, or demanding DNS lookup, probably best to get off the ELB.

  2. RDS is a black box. Well that is not totally true, but in general I avoid RDS unless I know the backend is a simple setup that is easy to scale. Having said that, RDS does help with scaling, but I would dumb down the backend and ensure your query runs quickly. Run it on a regular MySQL instance and see if it is subsecond. In my experience, when you say the query is "optimized" that doesn't really mean there is not another way to make it more "optimized" if you catch my drift.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top