Question

I have a number of Windows 2008 R2 24 core servers that run the same process, but each instance of the process has a different data set. Usually 2-4 instances of the process run on each server. The processes are compiled for x64, have a GUI, and use Workstation GC.

Every second, the process outputs the GC counts to a log file on local disk. The log is used for many other things as well. Once in a while, I find that one of these processes pauses execution for 5 or more seconds. I see that nothing is written to the log for that duration of time. Every time this happens, it concides with the number of Gen2 GCs increasing by 1.

This is a rare event. This happens maybe once every 10000 Gen2 GCs across all processes.

Each machine has more than enough RAM to keep all processes in RAM.

This morning I had a 9 second pause in one of the processes and this time I captured Performance counters for the affected process and the entire machine. None of the other processes running at the time were affected. Analysis of the Performance Counters shows the following:

Comparing after the pause with before the pause:

  • Virtual Bytes, Page File Bytes, Virtual Bytes, Working Set and Working Set-Private for the Process dropped by approx the same amount - 1Gb. To give you an idea of the size of the process, Private Bytes dropped from 3.1 Gb to 2.1 Gb.
  • Handle Count for the Process decreased from 8835 to 8705
  • Available Bytes for the entire machine increased by approx 1 Gb
  • Page Faults/sec did not spike
  • CPU usage was stable during the pause

Can anyone confirm that this activity can be attributed to swapping? Given that the machines have more than enough RAM, are there any suggestions for fixing these pauses?

Update #1 (3/5/2012):

Experienced a 6.5 second pause in one of the processes today. .NET Clr Memory performance counters show the size of the LOH did not change, but the size of the Gen 2 Heap and the Size of all heaps and Total committed bytes dropped by 700 Mb. Total reserved bytes dropped by 250 Mb. So it seems that a lot garbage in Gen2 was reclaimed on this particular GC.

Update #2 (3/6/2012):

Experienced a 7 second pause in one of the processes today. The following dropped: Gen 2 Heap Size (.NET CLR Memory) by 900 Mb Num Bytes in all Heaps (.NET CLR Memory) by 900 Mb Num Total Commited Bytes (.NET CLR Memory) by 800 Mb Num Total Reserved Bytes (.NET CLR Memory) by 540 Mb Virtual Bytes (Process) by 550 Mb Working Set (Process) by 800 Mb Working Set - Private (Process) Page File Bytes (Process) by 800 Mb Private Bytes (process) by 800 Mb

LOH stayed the same

Was it helpful?

Solution 2

It appears that a bona-fide Gen2 GC takes a couple seconds on a process of several gigs in size.

So why do some Gen2 GCs take 5 seconds and others take almost no time? Because I have Concurrent/Background Gc enabled and it appears as if when a Concurrent GC completes, the Gen2 GC counter is incremented. I think this is misleading.

With Concurrent GC disabled, the Gen2 GC counts drop substantially and every Gen2 GCs takes a few seconds.

OTHER TIPS

It looks like the behavior of your application is such that a lot of segments in the Large Object Heap can become "dead" within the same GC 2 cycle (see this link in msdn). When a segment in the LOH is dead after a GC 2, it is returned to the OS, which can be expensive when you are returning a lot of them simultaneously.

Your application might fall outside the envelope for which the CLR GC modes are tuned. If your application allocates large objects such as big arrays repeatedly, you might see if you get more predictable GC behavior by pooling and re-using them yourself, rather than relying on the GC.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top