Question

In theory, it should be possible to at least brute force a verification of a lock-free algorithm (there are only so many combinations of function calls intersecting). Are there any tools or formal reasoning processes available to actually prove that a lock-free algorithm is correct (ideally it should also be able to check for race conditions and the ABA problem as well)?

Note: If you know a way to just prove one point (e.g. only prove that it is safe from the ABA problem) or a problem I haven't mentioned then post the solution anyway. In the worst case scenario, each method can be done in turn to fully verify it.

Was it helpful?

Solution

You should definitely try the Spin model checker.

You write a program-like model in a simple C-like language called Promela, which Spin internally translates into a state machine. A model can contain multiple parallel processes.

What Spin then does is check every possible interleaving of instructions from each process for whatever conditions you want to test -- typically, absence of race conditions, freedom from deadlocks etc. Most of these tests can be easily written using assert() statements. If there is any possible execution sequence that violates an assertion, the sequence is printed out, otherwise you are given the "all-clear".

(Well, in actual fact it uses a much fancier and faster algorithm to accomplish this, but that is the effect. By default, all reachable program states are checked.)

This is an incredible program, it won the 2001 ACM System Software Award (other winners include Unix, Postscript, Apache, TeX). I got started using it very quickly, and in a couple of days was able to implement models of the MPI functions MPI_Isend() and MPI_Irecv() in Promela. Spin found a couple of tricky race conditions in one segment of parallel code I converted across to Promela for testing.

OTHER TIPS

Spin is indeed excellent, but also consider Relacy Race Detector by Dmitriy V'jukov. It is purpose-built for verifying concurrent algorithms including non-blocking (wait-/lock-free) algorithms. It's open source and liberally licensed.

Relacy provides POSIX and Windows synchronization primitives (mutexes, condition variables, semaphores, CriticalSections, win32 events, Interlocked*, etc), so your actual C++ implementation can be fed to Relacy for verification. No need to develop a separate model of your algorithm as with Promela and Spin.

Relacy provides C++0x std::atomic (explicit memory ordering for the win!) so you can use pre-processor #defines to select between Relacy's implementation and your own platform specific atomic implementation (tbb::atomic, boost::atomic, etc).

Scheduling is controllable: random, context-bound, and full search (all possible interleavings) available.

Here's an example Relacy program. A few things to note:

  • The $ is a Relacy macro that records execution information.
  • rl::var<T> flags "normal" (non-atomic) variables that also need to be considered as part of the verification.

The code:

#include <relacy/relacy_std.hpp>

// template parameter '2' is number of threads
struct race_test : rl::test_suite<race_test, 2>
{
    std::atomic<int> a;
    rl::var<int> x;

    // executed in single thread before main thread function
    void before()
    {
        a($) = 0;
        x($) = 0;
    }

    // main thread function
    void thread(unsigned thread_index)
    {
        if (0 == thread_index)
        {
            x($) = 1;
            a($).store(1, rl::memory_order_relaxed);
        }
        else
        {
            if (1 == a($).load(rl::memory_order_relaxed))
                x($) = 2;
        }
    }

    // executed in single thread after main thread function
    void after()
    {
    }

    // executed in single thread after every 'visible' action in main threads
    // disallowed to modify any state
    void invariant()
    {
    }
};

int main()
{
    rl::simulate<race_test>();
}

Compile with your normal compiler (Relacy is header-only) and run the executable:

struct race_test
DATA RACE
iteration: 8

execution history:
[0] 0:  atomic store, value=0, (prev value=0), order=seq_cst, in race_test::before, test.cpp(14)
[1] 0:  store, value=0, in race_test::before, test.cpp(15)
[2] 0:  store, value=1, in race_test::thread, test.cpp(23)
[3] 0:  atomic store, value=1, (prev value=0), order=relaxed, in race_test::thread, test.cpp(24)
[4] 1:  atomic load, value=1, order=relaxed, in race_test::thread, test.cpp(28)
[5] 1:  store, value=0, in race_test::thread, test.cpp(29)
[6] 1: data race detected, in race_test::thread, test.cpp(29)

thread 0:
[0] 0:  atomic store, value=0, (prev value=0), order=seq_cst, in race_test::before, test.cpp(14)
[1] 0:  store, value=0, in race_test::before, test.cpp(15)
[2] 0:  store, value=1, in race_test::thread, test.cpp(23)
[3] 0:  atomic store, value=1, (prev value=0), order=relaxed, in race_test::thread, test.cpp(24)

thread 1:
[4] 1:  atomic load, value=1, order=relaxed, in race_test::thread, test.cpp(28)
[5] 1:  store, value=0, in race_test::thread, test.cpp(29)
[6] 1: data race detected, in race_test::thread, test.cpp(29)

Recent versions of Relacy also provide Java and CLI memory models if you're into that sort of thing.

Data race detection is an NP hard problem [Netzer&Miller 1990]

I heard about the tools Lockset, and DJit+ (they teach it in the CDP course). Try reading the slides, and googling what they reference to. It contains some interesting information.

I don't know what platform or language you're using - but on the .Net platform there is a Microsoft Research project called Chess which is looking very promising at helping those of us doing multithreaded components - including lock-free.

I've not used it a huge amount, mind.

It works (crude explanation) by explicitly interleaving threads in the tightest possible ways to actually force your bugs out into the wild. It also analyses code to find common mistakes and bad patterns - similar to code analysis.

In the past, I've also built special versions of the code in question (through #if blocks etc) that add extra state-tracking information; counts, versions etc that I can then dip into within a unit test.

The problem with that is that it takes a lot more time to write your code, and you can't always add this kind of stuff without changing the underlying structure of the code that's already there.

If you want to really verify lock-free code (as opposed to exhaustively testing a small instance), you can use VCC (http://vcc.codeplex.com), a deductive verifier for concurrent C code which has been used to verify some interesting lock-free algorithms (e.g. lock-free lists and resizable hashtables using hazard pointers, optimistic multiversion transaction processing, MMU virtualization, various synchronization primitives, etc.). It does modular verification, and has been used to verify nontrivial chunks of industrial code (up to about 20KLOC).

Note, however, that VCC is a verifier, not a bug hunting tool; you will have to do substantial annotation on your code to get it to verify, and the learning curve can be a bit steep. Note also that it assumes sequential consistency (as do most tools).

BTW, peer review is not a good way to verify a concurrent algorithm (or even a sequential one). There's a long history of famous people publishing concurrent algorithms in important journals, only to have bugs discovered years later.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top