Question

Declarations: I use vs 2010/vs 2013, and clang 3.4 prebuilt binary.

I've found a bug in our production code. I minimize the reproduce code to the following:

#include <windows.h>
#include <process.h>
#include <stdio.h>
using namespace std;

bool s_begin_init =  false;
bool s_init_done =  false;

void thread_proc(void * arg)
{
    DWORD tid = GetCurrentThreadId();
    printf("Begin Thread %2d, TID=%u\n", reinterpret_cast<int>(arg), tid);
    if (!s_begin_init)
    {
        s_begin_init = true;
        Sleep(20);
        s_init_done = true;
    }
    else
    {
        while(!s_init_done) { ; }
    }
    printf("End   Thread %2d, TID=%u\n", reinterpret_cast<int>(arg), tid);
}

int main(int argc, char *argv[])
{
    argc = argc                       ; argv = argv                       ; 
    for(int i = 0; i < 30; ++i)
    {
        _beginthread(thread_proc, 0, reinterpret_cast<void*>(i));
    }
    getchar();
    return 0;
}

To compile and run the code: cl /O2 /Zi /Favc.asm vc_O2_bug.cpp && vc_O2_bug.exe

Some of the threads are busying in the while loop. By checking the produced assembly code, I found the assembly code of

while(!s_init_done) {; }

is:

; Line 19
    mov al, BYTE PTR ?s_init_done@@3_NA     ; s_init_done
$LL2@thread_pro:
; Line 21
    test    al, al
    je  SHORT $LL2@thread_pro
; Line 23

It's obvious that when use -O2 optimization flag, VC copy the s_init_done to al register, and repeatedly test the al register.

I then use the clang-cl.exe compiler driver to test the code. The result is same, and the assembly code are
equivalent.

It looks that the compiler think that variable s_init_done will never be changed because the only statement which change it's value is in the "if" block, which is exclusive with the current "else" branch.

I tried the same code with VS2013, The result is also same.

What I doubt is: In C++98/C++03 standard, there's no concept of thread. So the compiler can perform such an optimization for a single-thread-machine. But since c++11 has thread, and both clang 3.4 and VC2013 have support C++11 well, do my question is:

Is think kind of optimization a compiler bug for C++98/C++03, and for C++11 separately?

BTW: When I use -O1 instead, or add volatile qualifier to s_init_done, the bug disappeared.

Was it helpful?

Solution

Your program contains data races on s_begin_init and s_init_done, and therefore has undefined behavior. Per C++11 §1.10/21:

The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior.

The fix is to declare both boolean variables to be atomic:

std::atomic<bool> s_begin_init{false};
std::atomic<bool> s_init_done{false};

or to synchronize accesses to them with a mutex (I'll throw in a condition variable to avoid busy-waiting):

std::mutex mtx;
std::condition_variable cvar;
bool s_begin_init = false;
bool s_init_done = false;

void thread_proc(void * arg)
{
    DWORD tid = GetCurrentThreadId();
    printf("Begin Thread %2d, TID=%u\n", reinterpret_cast<int>(arg), tid);
    std::unique_lock<std::mutex> lock(mtx);
    if (!s_begin_init)
    {
        s_begin_init = true;
        lock.unlock();
        Sleep(20);
        lock.lock();
        s_init_done = true;
        cvar.notify_all();
    }
    else
    {
        while(!s_init_done) { cvar.wait(lock); }
    }
    printf("End   Thread %2d, TID=%u\n", reinterpret_cast<int>(arg), tid);
}

EDIT: I just noticed the mention of VS2010 in the OP. VS2010 does not support C++11 atomics, so you will have to use the mutex solution or take advantage of MSVC's non-standard extension that gives volatile variables acquire-release semantics:

volatile bool s_begin_init = false;
volatile bool s_init_done = false;
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top