Strange results about C++11 memory model (Relaxed ordering)

https://stackoverflow.com/questions/12222767

29-06-2021
|

Question

I was testing the example in the memory model of the Anthony Williams's book "C++ Concurrency"

#include<atomic>
#include<thread>
#include<cassert>

std::atomic_bool x,y;
std::atomic_int z;

void write_x_then_y() {
  x.store(true, std::memory_order_relaxed);
  y.store(true, std::memory_order_relaxed);
}

void read_y_then_x() {
  while(!y.load(std::memory_order_relaxed));
  if(x.load(std::memory_order_relaxed)) {
    ++z;
  }
}

int main() {
  x = false;
  y = false;
  z = 0;
  std::thread a(write_x_then_y);
  std::thread b(read_y_then_x);
  a.join();
  b.join();
  assert(z.load()!=0);
}

According to the explanation, relaxed operations on difference variables (here x and y) can be freely reordered. However, I repeated running the problem for more than several days. I never hit the situation that the assertion (assert(z.load()!=0);) fires. I just use the default optimization and compile the code using g++ -std=c++11 -lpthread dataRaceAtomic.cpp Does anyone actually try it and hit the assertion? Could anyone give me an explanation about my test results? BTW, I also tried the version without using the atomic type, I got the same result. Currently, both programs are running healthily. Thanks.

Solution

This can depend on the type of processor you are running on.

x86 does not have a memory model as relaxed as other processors. In particular, stores will never be reordered with regards to other stores.

http://bartoszmilewski.com/2008/11/05/who-ordered-memory-fences-on-an-x86/ has more info on x86's memory model.

OTHER TIPS

A couple of things about atomics, memory ordering, and testing.

First, that example is an illustration; you're supposed to read it and think about it. In the real world, the overhead of starting a new thread means that write_x_then_y will have finished running long before read_y_then_x gets started, so a test program that repeatedly launches those two threads won't, in fact, ever see a reordering. Welcome to the wonderful world of testing multi-threaded code!

Second, there are two reordering issues to consider.

First, the compiler can generate code that stores or reads things in a different order than the source code uses; that's a valid optimization in the absence of multi-threading, and it's an important one. On the other hand, once you introduce multiple threads, store order and read order can matter. As a result, the new C++ memory model specifies when stores and loads can't be moved; in particular, they can't be moved across atomic accesses. That gives you a fixed point that you can reason about: I did this non-atomic store before I did this atomic store, so I know the compiler will do the first one before the second.

Second, the hardware can reorder stores and loads; this typically is the result of the processor's caching strategy, and is referred to as "visibility"; changes made to a variable in one thread aren't necessarily visible to another thread that reads that variable after the first thread has written to it. That's because the two threads can be running on two separate processors, each with it's own cache; if the new value wasn't written out to main memory, or the other processor has an old value in its cache, the second thread won't see the change. Atomics provide rules about when values become visible (which translates into when writes have to be flushed out the cache into main memory and when reads have to go to main memory instead of the cache [oversimplified, but you get the idea]); that's what this example is about. And, as @Michael said, just because the value doesn't have to be made visible, doesn't mean it can't be. Some processors have weak memory models that allow this sort of thing, with possible speed improvements and definite complications when analyzing what they do, and some processors don't. x86 is in the latter category: pretty much everything you do will be sequentially consistent, even if you allow weaker visibility constraints.

Just because one can reorder x and y does not oblige the compiler to generate indeterminate behavior.

x.store(false, memory_order_relaxed); // redundant store
y.store(true, memory_order_relaxed);
x.store(true, memory_order_relaxed);

It appears that if we see x as true, then y must be true. However, the compiler may choose to reorder those

x.store(false, memory_order_relaxed); // redundant store
x.store(true, memory_order_relaxed);
y.store(true, memory_order_relaxed);

Will it choose to? In this case, probably not. It's easy enough on the compiler to generate optimal code in the xyx pattern. However, in a more complicated case, with more things going on, reordering may allow the compiler to fit more values in registers.

The memory model just describes what is possible, there is no guarantee that the ordering will actually happen. It depends on both the compiler and hardware. If you want to exhaustively explore allowed behaviors, use a tool like CDSChecker.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow