Вопрос

I wrote a Java lock free queue implementation. It has a concurrency bug. I cannot find it. This code is not important. I just worry about I cannot explain observed behavior related with volatile variables.

The bug is visible by the exception ("null head"). This is impossible state because there is atomic integer holding current queue size. The queue has a stub element. It provides that a reader thread doesn't change tail pointer and a writer thread doesn't change head pointer.

Queue length variable guarantee that the linked list will never be empty. It's as a semaphore.

The take method behaves like it gets stolen length value.

class Node<T> {
    final AtomicReference<Node<T>> next = new AtomicReference<Node<T>>();
    final T ref;
    Node(T ref) {
        this.ref = ref;
    }
}
public class LockFreeQueue<T> {
    private final AtomicInteger length = new AtomicInteger(1);
    private final Node stub = new Node(null);
    private final AtomicReference<Node<T>> head = new AtomicReference<Node<T>>(stub);
    private final AtomicReference<Node<T>> tail = new AtomicReference<Node<T>>(stub);

    public void add(T x) {
        addNode(new Node<T>(x));
        length.incrementAndGet();
    }

    public T takeOrNull() {
        while (true) {
            int l = length.get();
            if (l == 1) {
                return null;
            }
            if (length.compareAndSet(l, l - 1)) {
                break;
            }
        }
        while (true) {
            Node<T> r = head.get();
            if (r == null) {
                throw new IllegalStateException("null head");
            }
            if (head.compareAndSet(r, r.next.get())) {
                if (r == stub) {
                    stub.next.set(null);
                    addNode(stub);
                } else {
                    return r.ref;
                }
            }
        }
    }

    private void addNode(Node<T> n) {
        Node<T> t;
        while (true) {
            t = tail.get();
            if (tail.compareAndSet(t, n)) {
                break;    
            }
        }
        if (t.next.compareAndSet(null, n)) {
            return;
        }
        throw new IllegalStateException("bad tail next");
    }
}
Это было полезно?

Решение

I think there is an error in how you use the counter in takeOrNull(), when you remove the stub, you decrease Length by 1, but don't re-increase it when adding the stub back at the end, since you use addNode() instead of add(). Let's say you added an element successfully, so your queue looks like this:

Length is 2
STUB -> FIRST_NODE -> NULL
 ^          ^
 |          |
Head       Tail

So now one thread starts doing takeOrNull(), length decreases to 1, Head moves to FIRST_NODE, and since this is the STUB node, it gets re-added to the end, so now you have:

Length is 1
FIRST_NODE -> STUB -> NULL
 ^             ^
 |             |
Head          Tail

Do you see? Length is 1 now! On the next takeOrNull(), you will get NULL, even if FIRST_NODE is still in the queue and has never been returned... You just (temporarily) lost a piece of data. Further, you can now repeat this ad infinitum and start accumulating nodes. Like if you add three nodes, Length is 4 and you have FIRST, STUB, NEW1, NEW2, NEW3. If you then do three takeOrNull(), you end up with NEW2, NEW3, STUB and Length 1. So this way you end up loosing elements, but I admit to not being completely sure about how this would ever trigger the exception. Let me eat and think about it some more. ;-)

EDIT: Ok food did me good, I came up with a sequence that triggers the head null exception. Let's start with a valid queue with one element like before:

Length is 2
STUB -> FIRST_NODE -> NULL
 ^          ^
 |          |
Head       Tail

Now we have four threads, two that try to takeOrNull() and two that add() concurrently. Both add threads have moved the tail pointer correctly, the first one moved tail from FIRST to SECOND, and then was paused. The second add thread moved then tail from SECOND to THIRD, then updated the old tail's (SECOND's) next pointer, and then incremented the counter and exited. We're left with:

Length is 3
STUB -> FIRST_NODE -> NULL          SECOND_NODE ->  THIRD_NODE -> NULL
 ^                                                     ^
 |                                                     |
Head                                                  Tail

Now the two takeOrNull threads wake up and execute, since Length is 3, both will be able to get an element! The first one moves Head from STUB to FIRST, the second moves Head from FIRST to NULL. And now HEAD is null, and whenever takeOrNull() gets called next, EXCEPTION!

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top