Question

// SubFetch(x,y) = atomically x-=y and return x (__sync_sub_and_fetch)
// AddFetch(x,y) = atomically x+=y and return x (__sync_add_and_fetch)
// CompareWait(x, y) = futex(&x, FUTEX_WAIT, y) wait on x if x == y
// Wake(x, y) = futex(&x, FUTEX_WAKE, y) wake up y waiters

struct Lock
{
Lock() : x(1) {}

void lock()
{
    while (true)
    {
        if (SubFetch(x, 1) == 0)
            return;

        x = -1;

        CompareWait(x, -1);
    }
}

void unlock()
{
    if (AddFetch(x, 1) == 1)
        return;

    x = 1;

    Wake(x, 1);
}

private:
    int x;
};

Linux 3.0 provides a system call called futex, upon which many concurrency utilities are based including recent pthread_mutex implementations. Whenever you write code you should always consider whether using an existing implementation or writing it yourself is the better choice for your project.

Above is an implementation of a Lock (mutex, 1 permit counting semaphore) based upon futex and the semantics description in man futex(7)

It appears to contain a deadlock bug whereby after multiple threads are trying to lock and unlock it a few thousand times, the threads can get into a state where x == -1 and all the threads are stuck in CompareWait, however noone is holding the lock.

Can anyone see where the bug is?

Update: I'm a little surprised that futex(7)/semantics is so broken. I completely rewrote Lock as follows... is this correct now?

// CompareAssign(x,y,z) atomically: if (x == y) {x = z; ret true; } else ret false;

struct Lock
{
Lock() : x(0) {}

void lock()
{
    while (!CompareAssign(x, 0, 1))
        if (x == 2 || CompareAssign(x, 1, 2))
            CompareWait(x, 2);
}

void unlock()
{
    if (SubFetch(x, 1) == 0)
        return;

    x = 0;

    Wake(x, 1);
}

private:
int x;
};

The idea here is that x has the following three states:

0: unlocked
1: locked & no waiters
2: locked & waiters
Was it helpful?

Solution

The problem is that you explicitly -1 assign to x if the SubFetch fails to acquire the lock. This races with the unlock.

  1. Thread 1 acquires the lock. x==0.
  2. Thread 2 tries to acquire the lock. The SubFetch sets x to -1, and then thread 2 is suspended.
  3. Thread 1 releases the lock. The AddFetch sets x to 0, so the code then explicitly sets x to 1 and calls Wake.
  4. Thread 2 wakes up and sets x to -1, and then calls CompareWait.

Thread 2 is now stuck waiting, with x set to -1, but there is no one around to wake it, as thread 1 has already released the lock.

OTHER TIPS

The proper implementation of a futex-based Mutex is described in Ulrich Drepper's paper "Futexes are tricky"

http://people.redhat.com/drepper/futex.pdf

It includes not only the code but also a very detailed explanation of why it is correct. The code from the paper:

class mutex
{
 public:
 mutex () : val (0) { }
 void lock () {
   int c;
   if ((c = cmpxchg (val, 0, 1)) != 0)
     do {
       if (c == 2 || cmpxchg (val, 1, 2) != 0)
         futex_wait (&val, 2);
     } while ((c = cmpxchg (val, 0, 2)) != 0);
 }
 void unlock () {
//NOTE: atomic_dec returns the value BEFORE the operation, unlike your SubFetch !
   if (atomic_dec (val) != 1) {
     val = 0;
     futex_wake (&val, 1);
   }
 }
 private:
   int val;
};

Comparing the code in the paper with your code, I spot a difference

You have

if (x == 2 || CompareAssign(x, 1, 2))

using the futex's value directly whereas Drepper uses the return value from the previous CompareAssign(). That difference will probably affect performance only.

Your unlock code is different, too, but seems to be semantically equivalent.

In any case I would strongly advise you to follow Drepper's code to the letter. That paper has stood the test of time and received a lot of peer review. You gain nothing from rolling your own.

How about this scenario with three threads, A, B , and C.

The initial state of this scenario has:

  • thread A holding the lock
  • thread B not contending for the lock just yet
  • thread C in CompareWait()
  • x == -1 from when C failed to acquire the lock
        A                  B                C
    ==============   ================  ===============
     AddFetch()
     (so x == 0)
                       SubFetch()
                       (so x == -1)

     x = 1

                       x = -1

     Wake()

At this point whether B or C are unblocked, they will not get a result of 0 when they SubFetch().

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top