Question

Backgournd:

Asp.net 4.0 based ecommerce website. Hosted on cloud setup with a dedicated SQL server 2008 standard (16 core) with 32gb RAM.

User interaction:

  1. Users visit the website.
  2. browse through different categories (no static content yet)
  3. put product in cart
  4. A timer ticks for up-to 15 minutes.
  5. Checkout
  6. Login/Create account
  7. Payment is processed using Authorize.Net gateway (the user stays on our website)
  8. Emails are shot upon signup/forgot password/order completion using a third party SMTP provider.

Note: Availability of tickets is checked when the product page is loaded and when they are putting in cart. Once they have a product in cart, surely a timer is ticking for 15 minutes. Querying database every 25 seconds to update the timer.

Incident:

Okay guys, we had a huge sale last week probably put about 10000 tickets for sale for the fans across USA. We saw the traffic when beyond control and for 2-4 hours we saw there were about 1000 concurrent users on our website.

The Problem:

The problem was we had about 6 2gb cloud servers which quickly filled and then crashed due to enormous traffic. Then we had to spin up 4gb, 8gb and 16gb servers (2 each) to deal with the traffic. during the crash period which was about 15-20 minutes the website became unresponsive and also we saw database (dedicated one) was touching 100% CPU usage.

  • gb is the RAM capacity of the servers.

The framework:

The .net code is written very efficiently, it only executes two SQL statements to fetch and build all necessary data that needs to be rendered on the browser. All business logic that deals with the database is written in stored procedures. No cursors, no dynamic sql in the stored procedure.

Required:

  1. I am unable to understand why the website is crashing... I have lots of code analysis tools implemented that keep telling us which code part is taking too long or which query is taking so much time. When we have bigger servers (8gb or more) then the website is working smoothly.

  2. Should I eliminate the need of hammering database every page load? Like what about having static pages? (though it will need us to export the products info into html which is fine).

  3. What about if I store the pages in Lucene.Net index? and then render from it? Will the I/O cost a lot in this scenario?

I really want some expert opinion about how to tackle this? We initially had plan to deal with 25k concurrent users, but we see we would require lots of big cloud servers to handle that.

Thanks

Was it helpful?

Solution

Should I eliminate the need of hammering database every page load? Like what about having static pages? (though it will need us to export the products info into html which is fine).

You don't need to convert products to html, or any third part code to do this. Asp.NET have built-in support for output cache.

Webforms:

<%@ OutputCache Duration="60" VaryByParam="none" %>

MVC:

[OutputCache(Duration=60, VaryByParam="none")]
public ActionResult Index()
{
    return View();
}

Where Duration is the duration for which the page will be cached in seconds, and VaryByParam is the url parameter that acts as key for that page . It will be cache the page for each different parameter provided, so you'd normally leave none for the index and ProductId for specific product page)

But you'll have to investigate further, as this may not be the only reason of your site's slowdown.

OTHER TIPS

What do your queries look like? You say business logic is in stored procs but are you using dynamic sql, cursors or full-text indexing in those procs? All are possible causes of high CPU.

Lucene.NET could only help if you're using sql full-text indexes in which case it has been shown to be more efficient. But only in the case where searching is your bottleneck.

Caching can help popular pages and reduce the load on your database as mentioned by @Andre but watch your cache hit/miss ratio and which pages you're caching. For example, you'll get a lot of bang for your buck on the Categories and Products pages, but you'll end up using more memory for less benefit (if any) on your user-specific shopping cart pages.

If you're displaying real-time ticket availability on those popular pages that could be really hurting you a lot if you're hitting the database to get that number. Try increasing the latency on those updates and do your validation later in the process if that's the case.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top