Critical Section change in behavior in Windows 2003 SP1

https://stackoverflow.com/questions/5802208

22-10-2019
|

Question

Today I came across this in MSDN:

"Starting with Windows Server 2003 with Service Pack 1 (SP1), threads waiting on a critical section do not acquire the critical section on a first-come, first-serve basis. This change increases performance significantly for most code. However, some applications depend on first-in, first-out (FIFO) ordering and may perform poorly or not at all on current versions of Windows (for example, applications that have been using critical sections as a rate-limiter). To ensure that your code continues to work correctly, you may need to add an additional level of synchronization. For example, suppose you have a producer thread and a consumer thread that are using a critical section object to synchronize their work. Create two event objects, one for each thread to use to signal that it is ready for the other thread to proceed. The consumer thread will wait for the producer to signal its event before entering the critical section, and the producer thread will wait for the consumer thread to signal its event before entering the critical section. After each thread leaves the critical section, it signals its event to release the other thread."

At first, I thought WTF?! - I had always assumed that threads would acquire a critical section in the order they attempted to acquire it. Although this seems like an oddly large change in behavior for a Service Pack, the service pack was for a Server Edition of Windows and Vista was under development at the time I believe.

Anyway, so it makes a little sense -- this way the next waiting thread the scheduler rotates to will be the one that gets the critical section next, at least I assume. That's the only thing that therefore makes sense, unless they decided to do a random selection for fun ;).

Still, this is an assumption I had made and am now evaluating my code to make sure no case of FIFO reliance is a problem.

Has anyone had any real-world problems with this? Although the ordering of threads acquiring the critical section is not GUARANTEED to be FIFO, is it not usually FIFO? If not usually FIFO (or close to FIFO), does anyone know how long a thread could wait for a heavily contested critical section? If it is a low priority thread, does this mean it could keep waiting nearly indefinitely if there is always a higher priority thread trying to get the critical section (even if the low priority thread was long ago the next in line if FIFO was adhered to)? Is there a safety catch to prevent this scenario, or is reliance on a secondary synchronization object mandated?

Of course, this really only matters on a really heavily contested critical section.

I dunno, maybe I am making too much of it... but something bothers me about this. Any insight is appreciated. Thanks ;)

Solution

In my experience critical sections have never been FIFO (maybe the doc team got their wires crossed saying it's new in 2003). And yes, it can lead to thread starvation, which we have seen a lot. If you need FIFO, you need a mutex.

Mutexes are kernel objects, and so to acquire them is more expensive than the ring 3 optimistic critical section. But FIFO is not one of those things you can (or should) necessarily dismiss out of hand as being not necessary, and it needn't be anything to do with "hierarchy" of threads (whatever that is - does that mean priority?). 1000 threads of equal priority hitting a single lock will easily cause starvation.

OTHER TIPS

This is the 1st time I hear this and when I think about it, it does not seem to be a problem.

If I understood it correctly ::

OldWay :

Thread A acquired the CritSec
Thread B waiting for the CritSec , tried to acquire it at time t
Thread C waiting for the CritSec , tried to acquire it at time t + dt

When Thread A releases the CritSec, OS ensures that Thread B acquires it.

NewWay :

Thread A acquired the CritSec
Thread B waiting for the CritSec , tried to acquire it at time t
Thread C waiting for the CritSec , tried to acquire it at time t + dt

When Thread A releases the CritSec, OS may choose any Thread to acquire it. So, it may be Thread B or C that will acquire it after A releases it.

I never assumed (and do not think anybody else assumed) the Threads waiting for a CritSec will acquire it in the order they wanted to acquire it.

Maybe this arises as a problem for some Profilers/Debuggers or some Performance Monitoring Mechanism that makes this assumption...

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow