Win32 Read/Write Lock Using Only Critical Sections

https://stackoverflow.com/questions/1008726

06-07-2019
|

Question

I have to implement a read/write lock in C++ using the Win32 api as part of a project at work. All of the existing solutions use kernel objects (semaphores and mutexes) that require a context switch during execution. This is far too slow for my application.

I would like implement one using only critical sections, if possible. The lock does not have to be process safe, only threadsafe. Any ideas on how to go about this?

Solution

I don't think this can be done without using at least one kernel-level object (Mutex or Semaphore), because you need the help of the kernel to make the calling process block until the lock is available.

Critical sections do provide blocking, but the API is too limited. e.g. you cannot grab a CS, discover that a read lock is available but not a write lock, and wait for the other process to finish reading (because if the other process has the critical section it will block other readers which is wrong, and if it doesn't then your process will not block but spin, burning CPU cycles.)

However what you can do is use a spin lock and fall back to a mutex whenever there is contention. The critical section is itself implemented this way. I would take an existing critical section implementation and replace the PID field with separate reader & writer counts.

OTHER TIPS

If you can target Vista or greater, you should use the built-in SRWLock's. They are lightweight like critical sections, entirely user-mode when there is no contention.

Joe Duffy's blog has some recent entries on implementing different types of non-blocking reader/writer locks. These locks do spin, so they would not be appropriate if you intend to do a lot of work while holding the lock. The code is C#, but should be straightforward to port to native.

You can implement a reader/writer lock using critical sections and events - you just need to keep enough state to only signal the event when necessary to avoid an unnecessary kernel mode call.

Old question, but this is something that should work. It doesn't spin on contention. Readers incur limited extra cost if they have little or no contention, because SetEvent is called lazily (look at the edit history for a more heavyweight version that doesn't have this optimization).

#include <windows.h>

typedef struct _RW_LOCK {
    CRITICAL_SECTION countsLock;
    CRITICAL_SECTION writerLock;
    HANDLE noReaders;
    int readerCount;
    BOOL waitingWriter;
} RW_LOCK, *PRW_LOCK;

void rwlock_init(PRW_LOCK rwlock)
{
    InitializeCriticalSection(&rwlock->writerLock);
    InitializeCriticalSection(&rwlock->countsLock);

    /*
     * Could use a semaphore as well.  There can only be one waiter ever,
     * so I'm showing an auto-reset event here.
     */
    rwlock->noReaders = CreateEvent (NULL, FALSE, FALSE, NULL);
}

void rwlock_rdlock(PRW_LOCK rwlock)
{
    /*
     * We need to lock the writerLock too, otherwise a writer could
     * do the whole of rwlock_wrlock after the readerCount changed
     * from 0 to 1, but before the event was reset.
     */
    EnterCriticalSection(&rwlock->writerLock);
    EnterCriticalSection(&rwlock->countsLock);
    ++rwlock->readerCount;
    LeaveCriticalSection(&rwlock->countsLock);
    LeaveCriticalSection(&rwlock->writerLock);
}

int rwlock_wrlock(PRW_LOCK rwlock)
{
    EnterCriticalSection(&rwlock->writerLock);
    /*
     * readerCount cannot become non-zero within the writerLock CS,
     * but it can become zero...
     */
    if (rwlock->readerCount > 0) {
        EnterCriticalSection(&rwlock->countsLock);

        /* ... so test it again.  */
        if (rwlock->readerCount > 0) {
            rwlock->waitingWriter = TRUE;
            LeaveCriticalSection(&rwlock->countsLock);
            WaitForSingleObject(rwlock->noReaders, INFINITE);
        } else {
            /* How lucky, no need to wait.  */
            LeaveCriticalSection(&rwlock->countsLock);
        }
    }

    /* writerLock remains locked.  */
}

void rwlock_rdunlock(PRW_LOCK rwlock)
{
    EnterCriticalSection(&rwlock->countsLock);
    assert (rwlock->readerCount > 0);
    if (--rwlock->readerCount == 0) {
        if (rwlock->waitingWriter) {
            /*
             * Clear waitingWriter here to avoid taking countsLock
             * again in wrlock.
             */
            rwlock->waitingWriter = FALSE;
            SetEvent(rwlock->noReaders);
        }
    }
    LeaveCriticalSection(&rwlock->countsLock);
}

void rwlock_wrunlock(PRW_LOCK rwlock)
{
    LeaveCriticalSection(&rwlock->writerLock);
}

You could decrease the cost for readers by using a single CRITICAL_SECTION:

countsLock is replaced with writerLock in rdlock and rdunlock
rwlock->waitingWriter = FALSE is removed in wrunlock

wrlock's body is changed to

EnterCriticalSection(&rwlock->writerLock);
rwlock->waitingWriter = TRUE;
while (rwlock->readerCount > 0) {
    LeaveCriticalSection(&rwlock->writerLock);
    WaitForSingleObject(rwlock->noReaders, INFINITE);
    EnterCriticalSection(&rwlock->writerLock);
}
rwlock->waitingWriter = FALSE;

/* writerLock remains locked.  */

However this loses in fairness, so I prefer the above solution.

Take a look at the book "Concurrent Programming on Windows" which has lots of different reference examples for reader/writer locks.

Check out the spin_rw_mutex from Intel's Thread Building Blocks ...

spin_rw_mutex is strictly in user-land and employs spin-wait for blocking

This is an old question but perhaps someone will find this useful. We developed a high-performance, open-source RWLock for Windows that automatically uses Vista+ SRWLock Michael mentioned if available, or otherwise falls back to a userspace implementation.

As an added bonus, there are four different "flavors" of it (though you can stick to the basic, which is also the fastest), each providing more synchronization options. It starts with the basic RWLock() which is non-reentrant, limited to single-process synchronization, and no swapping of read/write locks to a full-fledged cross-process IPC RWLock with re-entrance support and read/write de-elevation.

As mentioned, they dynamically swap out to the Vista+ slim read-write locks for best performance when possible, but you don't have to worry about that at all as it'll fall back to a fully-compatible implementation on Windows XP and its ilk.

If you already know of a solution that only uses mutexes, you should be able to modify it to use critical sections instead.

We rolled our own using two critical sections and some counters. It suits our needs - we have a very low writer count, writers get precedence over readers, etc. I'm not at liberty to publish ours but can say that it is possible without mutexes and semaphores.

Here is the smallest solution that I could come up with:

http://www.baboonz.org/rwlock.php

And pasted verbatim:

/** A simple Reader/Writer Lock.

This RWL has no events - we rely solely on spinlocks and sleep() to yield control to other threads.
I don't know what the exact penalty is for using sleep vs events, but at least when there is no contention, we are basically
as fast as a critical section. This code is written for Windows, but it should be trivial to find the appropriate
equivalents on another OS.

**/
class TinyReaderWriterLock
{
public:
    volatile uint32 Main;
    static const uint32 WriteDesireBit = 0x80000000;

    void Noop( uint32 tick )
    {
        if ( ((tick + 1) & 0xfff) == 0 )     // Sleep after 4k cycles. Crude, but usually better than spinning indefinitely.
            Sleep(0);
    }

    TinyReaderWriterLock()                 { Main = 0; }
    ~TinyReaderWriterLock()                { ASSERT( Main == 0 ); }

    void EnterRead()
    {
        for ( uint32 tick = 0 ;; tick++ )
        {
            uint32 oldVal = Main;
            if ( (oldVal & WriteDesireBit) == 0 )
            {
                if ( InterlockedCompareExchange( (LONG*) &Main, oldVal + 1, oldVal ) == oldVal )
                    break;
            }
            Noop(tick);
        }
    }

    void EnterWrite()
    {
        for ( uint32 tick = 0 ;; tick++ )
        {
            if ( (tick & 0xfff) == 0 )                                     // Set the write-desire bit every 4k cycles (including cycle 0).
                _InterlockedOr( (LONG*) &Main, WriteDesireBit );

            uint32 oldVal = Main;
            if ( oldVal == WriteDesireBit )
            {
                if ( InterlockedCompareExchange( (LONG*) &Main, -1, WriteDesireBit ) == WriteDesireBit )
                    break;
            }
            Noop(tick);
        }
    }

    void LeaveRead()
    {
        ASSERT( Main != -1 );
        InterlockedDecrement( (LONG*) &Main );
    }
    void LeaveWrite()
    {
        ASSERT( Main == -1 );
        InterlockedIncrement( (LONG*) &Main );
    }
};

Look my implementation here:

https://github.com/coolsoftware/LockLib

VRWLock is a C++ class that implements single writer - multiple readers logic.

Look also test project TestLock.sln.

UPD. Below is the simple code for reader and writer:

LONG gCounter = 0;

// reader

for (;;) //loop
{
  LONG n = InterlockedIncrement(&gCounter); 
  // n = value of gCounter after increment
  if (n <= MAX_READERS) break; // writer does not write anything - we can read
  InterlockedDecrement(&gCounter);
}
// read data here
InterlockedDecrement(&gCounter); // release reader

// writer

for (;;) //loop
{
  LONG n = InterlockedCompareExchange(&gCounter, (MAX_READERS+1), 0); 
  // n = value of gCounter before attempt to replace it by MAX_READERS+1 in InterlockedCompareExchange
  // if gCounter was 0 - no readers/writers and in gCounter will be MAX_READERS+1
  // if gCounter was not 0 - gCounter stays unchanged
  if (n == 0) break;
}
// write data here
InterlockedExchangeAdd(&gCounter, -(MAX_READERS+1)); // release writer

VRWLock class supports spin count and thread-specific reference count that allows to release locks of terminated threads.

I wrote the following code using only critical sections.

class ReadWriteLock {
    volatile LONG writelockcount;
    volatile LONG readlockcount;
    CRITICAL_SECTION cs;
public:
    ReadWriteLock() {
        InitializeCriticalSection(&cs);
        writelockcount = 0;
        readlockcount = 0;
    }
    ~ReadWriteLock() {
        DeleteCriticalSection(&cs);
    }
    void AcquireReaderLock() {        
    retry:
        while (writelockcount) {
            Sleep(0);
        }
        EnterCriticalSection(&cs);
        if (!writelockcount) {
            readlockcount++;
        }
        else {
            LeaveCriticalSection(&cs);
            goto retry;
        }
        LeaveCriticalSection(&cs);
    }
    void ReleaseReaderLock() {
        EnterCriticalSection(&cs);
        readlockcount--;
        LeaveCriticalSection(&cs);
    }
    void AcquireWriterLock() {
        retry:
        while (writelockcount||readlockcount) {
            Sleep(0);
        }
        EnterCriticalSection(&cs);
        if (!writelockcount&&!readlockcount) {
            writelockcount++;
        }
        else {
            LeaveCriticalSection(&cs);
            goto retry;
        }
        LeaveCriticalSection(&cs);
    }
    void ReleaseWriterLock() {
        EnterCriticalSection(&cs);
        writelockcount--;
        LeaveCriticalSection(&cs);
    }
};

To perform a spin-wait, comment the lines with Sleep(0).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow