Question

Below is a simplified C program that demonstrates a problem I am having with a concurrent stack implemented using the GNU built in compare and swap on an intel cpu. It took me a while to understand what was happening, but now that I do I see that it is well within the guarantees provided by atomic compare and swap.

When a node is popped from the stack, modified, then placed back on the stack, the modified value may become the new head of the stack, corrupting it. The comments in test_get describe the order of events that cause this.

Is there any way to reliably use the same node with the same stack more than once? This is an exaggerated example but even returning an unmodified node to gHead could leak other nodes. The original point of this data structure was to repeatedly use the same nodes.

typedef struct test_node {
    struct test_node *next;
    void *data;
} *test_node_p;

test_node_p gHead = NULL;
unsigned gThreadsDone = 0;

void test_put( test_node_p inMemory ) {
    test_node_p scratch;

    do {
        scratch = gHead;
        inMemory->next = scratch;
    } while ( !__sync_bool_compare_and_swap( &gHead , scratch , inMemory ) );
}

test_node_p test_get( void ) {
    test_node_p result;
    test_node_p scratch;

    do {
        result = gHead;
        if ( NULL == result ) break;
        //  other thread acquires result, modifies next
        scratch = result->next;     //  scratch is 0xDEFACED...
        //  other thread returns result to gHead
    } while ( !__sync_bool_compare_and_swap( &gHead , result , scratch ) );
    //  this thread corrupts gHead with 0xDEFACED... value

    if ( NULL == result ) {
        result = (test_node_p)malloc( sizeof(struct test_node) );
    }

    return result;
}

void *memory_entry( void *in ) {
    test_node_p node;
    int index , count = 1000;
    for ( index = 0 ; index < count ; ++index ) {
        node = test_get();
        *(uint64_t *)(node) |= 0xDEFACED000000000ULL;
        test_put( node );
    }

    __sync_add_and_fetch(&gThreadsDone,1);

    return NULL;
}

void main() {
    unsigned    index , count = 8;
    pthread_t   thread;

    gThreadsDone = 0;

    for ( index = 0 ; index < count ; ++index ) {
        pthread_create( &thread , NULL , memory_entry , NULL );
        pthread_detach( thread );
    }

    while ( __sync_add_and_fetch(&gThreadsDone,0) < count ) {}
}

I am running this test with 8 logical cores. When I use only 4 threads the problem rarely occurs but with 8 it is easily reproducible. I have a MacBook with an Intel Core i7.

I am not interested in using some library or framework that has solved this, I want to know how it was solved. Thank you.

Edit:

Here are two solutions that do not work in my case.

Some architectures provide ll/sc instruction pairs that perform atomic tests on the address instead of the value. Any write to the address, even of the same value, prevents success.

Some architectures provide larger than pointer size compare and swap. With this, a monotonic counter is paired with the pointer which is atomically incremented every time it is used so the value always changes. Some intel chips support this but there is no GNU wrapper.

Here is a play by play of the problem. Two threads, A and B, reach the point in test_get where they have the same value for result and it is not NULL. Then the following sequence occurs:

  1. Thread A passes the compare and swap and returns result from test_get
  2. Thread A modifies the content of result
  3. Thread B dereferences result, getting whatever thread A put there
  4. Thread A finishes with result and calls test_put
  5. Thread A passes the compare and swap in test_put putting result back on gHead
  6. Thread B reaches the compare and swap in test_get and passes
  7. Thread B has now corrupted gHead with the value from Thread A

Here is a similar scenario with three threads that does not require Thread A to modify result.

  1. Thread A passes the compare and swap and returns result from test_get
  2. Thread A does not modify the content of result
  3. Thread B dereferences result, getting a valid value in scratch
  4. Thread C calls test_put with an unrelated value and succeeds
  5. Thread A calls test_put and succeeds in putting result back on gHead
  6. Thread B reaches the compare and swap in test_get and passes
  7. Thread B has now corrupted gHead by leaking whatever thread C added

In either scenario, the problem is that thread A passes the compare and swap twice, once for get then again for put, before thread B reaches the compare and swap for get. Any value thread B has for scratch should be discarded, but is not because the value in gHead appears correct.

Any solution that makes this less likely without actually preventing it just makes the bug harder to track down.

Note that the scratch variable is just a name for wherever the dereferenced value of result is placed before the atomic instruction begins. Removing the name does remove the time slice between dereference and compare that may be interrupted.

Note also that atomic means it succeeds or fails as a whole. Any aligned read of a pointer is implicitly atomic on the hardware in question. As far as other threads are concerned, there is no interruptible point where only half the pointer has been read.

Was it helpful?

Solution 2

(I am discarding my previous answer.)

The issue is that you do not have a mechanism to atomically read gHead and gHead->next, but such is needed to achieve your lock free stack. Since you intend to busy loop anyway to deal with compare and swap collisions, you can use the equivalent of a spin lock:

void lock_get () {
    while (!_sync_bool_compare_and_swap(&gGetLock, 0, 1)) {}
}

void unlock_get () {
    unlock_get_success = _sync_bool_compare_and_swap(&gGetLock, 1, 0);
    assert(unlock_get_success);
}

Now, the loop in test_get() can be surrounded by lock_get() and unlock_get(). The CAS loop of test_get() is just one thread contending with test_put(). Jens' implementation of the CAS loop seems cleaner though.

lock_get();
result = __sync_val_compare_and_swap(&gHead, 0, 0);
while ((oldval = result)) {
   result = __sync_val_compare_and_swap(&gHead, result, result->next);
   if (oldval == result) break;
}
unlock_get();

This implements the intention, which is that only one thread should be popping off the head.

OTHER TIPS

Never access an atomic variable through a simple evaluation. Also, for a compare and swap loop like yours, __sync_val_compare_and_swap is more convenient, I think.

/* read the head atomically */
result = __sync_val_compare_and_swap(&gHead, 0, 0);
/* compare and swap until we succeed placing the next field */
while ((oldval = result)) {
   result = __sync_val_compare_and_swap(&gHead, result, result->next);
   if (oldval == result) break;
}

If you have a CAS variable (in your case gHead). You have to always use CAS to access it. Or protect it with a lock. For reading and writing. Stuff like "result = gHead;" is a big no-no.

Re-reading your question, a LIFO is a stack. Implementing any CAS based data structure is based on having only ONE thing to change. In a stack that is top-of-stack. You seem to be doing a linked list. I am sure there are cool ways to do an atomic linked list.

But for a stack, do one stack pointer, like everybody else :)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top