Question

It seems that using Critical Sections quite a bit in Vista/Windows Server 2008 leads to the OS not fully regaining the memory. We found this problem with a Delphi application and it is clearly because of using the CS API. (see this SO question)

Has anyone else seen it with applications developed with other languages (C++, ...)?

The sample code was just initialzing 10000000 CS, then deleting them. This works fine in XP/Win2003 but does not release all the peak memory in Vista/Win2008 until the application has ended.
The more you use CS, the more your application retains memory for nothing.

Was it helpful?

Solution

Microsoft have indeed changed the way InitializeCriticalSection works on Vista, Windows Server 2008, and probably also Windows 7.
They added a "feature" to retain some memory used for Debug information when you allocate a bunch of CS. The more you allocate, the more memory is retained. It might be asymptotic and eventually flatten out (not fully bought to this one).
To avoid this "feature", you have to use the new API InitalizeCriticalSectionEx and pass the flag CRITICAL_SECTION_NO_DEBUG_INFO.
The advantage of this is that it might be faster as, very often, only the spincount will be used without having to actually wait.
The disadvantages are that your old applications can be incompatible, you need to change your code and it is now platform dependent (you have to check for the version to determine which one to use). And also you lose the ability to debug if you need.

Test kit to freeze a Windows Server 2008:
- build this C++ example as CSTest.exe

#include "stdafx.h" 
#include "windows.h" 
#include <iostream> 

using namespace std; 

void TestCriticalSections() 
{ 
  const unsigned int CS_MAX = 5000000; 
  CRITICAL_SECTION* csArray = new CRITICAL_SECTION[CS_MAX];  

  for (unsigned int i = 0; i < CS_MAX; ++i)  
    InitializeCriticalSection(&csArray[i]);  

  for (unsigned int i = 0; i < CS_MAX; ++i)  
    EnterCriticalSection(&csArray[i]);  

  for (unsigned int i = 0; i < CS_MAX; ++i)  
    LeaveCriticalSection(&csArray[i]);  

  for (unsigned int i = 0; i < CS_MAX; ++i)  
    DeleteCriticalSection(&csArray[i]); 

  delete [] csArray; 
} 

int _tmain(int argc, _TCHAR* argv[]) 
{ 
  TestCriticalSections(); 

  cout << "just hanging around..."; 
  cin.get(); 

  return 0; 
}

-...Run this batch file (needs the sleep.exe from server SDK)

@rem you may adapt the sleep delay depending on speed and # of CPUs 
@rem sleep 2 on a duo-core 4GB. sleep 1 on a 4CPU 8GB. 

@for /L %%i in (1,1,300) do @echo %%i & @start /min CSTest.exe & @sleep 1 
@echo still alive? 
@pause 
@taskkill /im cstest.* /f

-...and see a Win2008 server with 8GB and quad CPU core freezing before reaching the 300 instances launched.
-...repeat on a Windows 2003 server and see it handle it like a charm.

OTHER TIPS

Your test is most probably not representative of the problem. Critical sections are considered "lightweight mutexes" because a real kernel mutex is not created when you initialize the critical section. This means your 10M critical sections are just structs with a few simple members. However, when two threads access a CS at the same time, in order to synchronize them a mutex is indeed created - and that's a different story.

I assume in your real app threads do collide, as opposed to your test app. Now, if you're really treating critical sections as lightweight mutexes and create a lot of them, your app might be allocating a large number of real kernel mutexes, which are way heavier than the light critical section object. And since mutexes are kernel object, creating a excessive number of them can really hurt the OS.

If this is indeed the case, you should reduce the usage of critical sections where you expect a lot of collisions. This has nothing to do with the Windows version, so my guess might be wrong, but it's still something to consider. Try monitoring the OS handles count, and see how your app is doing.

You're seeing something else.

I just built & ran this test code. Every memory usage stat is constant - private bytes, working set, commit, and so on.

int _tmain(int argc, _TCHAR* argv[])
{
    while (true)
    {
        CRITICAL_SECTION* cs = new CRITICAL_SECTION[1000000];
        for (int i = 0; i < 1000000; i++) InitializeCriticalSection(&cs[i]);
        for (int i = 0; i < 1000000; i++) DeleteCriticalSection(&cs[i]);
        delete [] cs;
    }

    return 0;
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top