Question

Last night one of the websites (.NET 4.0 forms) hosted on my Win 2008 R2 (IIS 7.5) Server started to time out throwing the following error for all connected users.

TYPE     System.Web.HttpException
MESSAGE  Request timed out.
DETAIL   System.Web.HttpException (0x80004005): Request timed out.

The outage was confined to just one website within IIS, the others continued to work fine.

Unfortunately I was unable to identify why the website was timing out. Here are the steps I took:

  • First thing I did was look at the task manager which revealed normal CPU and memory usage. Network activity was also moderate.

  • I then opened IIS to look at the live connections under 'Worker Processes'. There were about 60 live connections, so it didn't look like anything DDoS related.

  • Checked database connectivity (hosted on a separate server), all fine!

  • I then reset the website on IIS. That didn't work

  • I tried to then do a complete iisreset...still no luck :(

  • In the end (and under some duress) the only thing I could think to do to resolve this was to restart the server.

Restarting the server worked but I am nervous not knowing why this happened in the first place. Can anyone recommend any checks that I failed to carryout? Is there an official checklist for working through these sorts of IIS problems? I have reviewed the IIS logs but don't see anything unusual on the run up to the outage.

Any pointers or links to useful resources to help me understand and mitigate against this in future will be much appreciated.

EDIT

The only time I logged into the server that day was to add an additional web handler component (for remote deploy) to IIS Web Deploy. I'm doubtful this caused the outage as the server worked for for 6 hours after.

Was it helpful?

Solution

Because iisreset didn't helped and you had to restart whole machine, I would suspect it was a global resources shortage and mostly used website (or most resource consuming) was impacted. It could be because of not available RAM, network connections congestion due to some malfunctioning calls (for example a lot of CLOSE_WAIT sockets exhausting connections pool, we've seen that in production because of malfunction of external service). It could be also one specific client problem, which was disconnected after machine restart so eventually the problem disappeared.

I would start from:

Historical analysis

  • review Event Viewer to see any errors/warnings from that period of time,
  • although you have already looked into IIS logs, I would do it once again with help of Log Parser Lizard to make some statistics like number of request per client, network bandwith per client, average response time per client and so on.

Monitoring

  • continuously monitor Performance Counters:
    • \Processor(_Total_)\% Processor Time,
    • \.NET CLR Exceptions(_Global_)\# of Exceps Thrown / sec,
    • \Memory\Available MBytes,
    • \Web Service(Default Web Site)\Current Connections (per each your site name),
    • \ASP.NET v4.0.30319\Request Wait Time,
    • \ASP.NET v4.0.30319\Requests Current,
    • \ASP.NET v4.0.30319\Request Queued,
    • \Process(XXX)\Working Set,
    • \Process(XXX)\% Processor Time (XXX per each w3wp process),
    • \Network Interface(XXX)\Bytes total / sec
  • run Performance Analysis of Logs (PAL) Tool during time of failure to make a very detailed analysis of performance counters data,
  • run netstat -ano to analyze network traffic (or TCPView tool even better)

If all this will not lead you to any conclusion, create a Debug Diagnostic rule to create a memory dump of the process for long running requests and analyze it with WinDbg and PSSCor extension for .NET debugging.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top