Question

I have read webpage :

http://bartoszmilewski.com/2008/12/01/c-atomics-and-memory-ordering/

and then coding the test source compiled at g++ 4.8.1 , cpu is Intel ...

global var : r1=0;r2=0;x=0;y=0;

Thread1 : 
         x  = 1 ;  //line 1 
         r1 = y ;  //line 2
Thread2 :
         y  = 1 ;  //line 3 
         r2 = x ;  //line 4

And I will get r1==0 && r2 == 0 sometimes while run thread1 and thread2 concurrently , I know it is the load of y (line 2) and load of x (line 4) executed before store of x(line 1) , store of y(line 3) ....even strong memory model like intel cpu , load disordered before store still happen , that is why r1==0 && r2 ==0 still happen in this test !!!!

Refering to c++11 memory model , I change source like following :

global vars :
             int r1=0,r2=0 ;
             atomic<int> x{0} ;
             atomic<int> y{0} ; 
Thread1 :
             x.store(1,memory_order_acq_rel) ;
             r1=y.load(memory_order_relaxed) ;
Thread2 :
             y.store(1,memory_order_acq_rel) ;
             r2=x.load(memory_order_relaxed) ;

This time , no results of r1==0 && r2 == 0 happen , that memory_order I used is according to the website I mentioned at the start , see the statements :

memory_order_acquire: guarantees that subsequent loads are not moved before the current load or any preceding loads.

memory_order_release: preceding stores are not moved past the current store or any subsequent stores.

memory_order_acq_rel: combines the two previous guarantees

memory_order_relaxed: all reorderings are okay.

look work out ... still I do another test , I change code to :

global vars :
             int r1=0,r2=0 ;
             atomic<int> x{0} ;
             atomic<int> y{0} ; 
Thread1 :
             x.store(1,memory_order_relaxed) ;
             r1=y.load(memory_order_relaxed) ;
Thread2 :
             y.store(1,memory_order_relaxed) ;
             r2=x.load(memory_order_relaxed) ;

Confuse me is that , this test still get no results of r1==0 && r2==0 !! if this case works , why bother use memory_order_acq_rel ? or this only works in intel cpu ? other kind of cpu still need memory_order_acq_rel in x and y's store ?

Was it helpful?

Solution

The result from your first experiment is interesting: "And I will get r1==0 && r2 == 0 sometimes while run thread1 and thread2 concurrently ....even strong memory model like intel cpu , load disordered before store still happen" but not only for the reasons you think. Atomics don't only prevent the processor and cache subsystem from reordering memory accesses, but the compiler as well. GCC 4.8 at Coliru optimizes this code to assembly with the load instructions before the stores:

_Z7thread1v:
.LFB326:
    .cfi_startproc
    movl    y(%rip), %eax
    movl    $1, x(%rip)
    movl    %eax, r1(%rip)
    ret

Even if the processor guaranteed memory ordering here, you need some kind of fencing to keep the compiler from screwing things up.

Your second program is ill-formed due to the use of memory_order_acq_rel as the memory ordering for a store. acquire only makes sense for loads, and release only for stores, so memory_order_acq_rel is only valid as an ordering for atomic read-modify-write operations like exchange or fetch_add. Replacing m_o_a_r with memory_order_release achieves the semantics you want, and the assembly produced is again interesting:

_Z7thread1v:
.LFB332:
    .cfi_startproc
    movl    $1, x(%rip)
    movl    y(%rip), %eax
    movl    %eax, r1(%rip)
    ret

The instructions are exactly what we would expect to be generated, with no special fence instructions. The processor memory model is strong enough to provide the necessary ordering guarantees with plain-old mov instructions. In this instance, atomics are only necessary to tell the compiler to keep its fingers out of the code.

Your third program is (technically) unpredictable despite generating the same assembly as the second:

_Z7thread1v:
.LFB332:
    .cfi_startproc
    movl    $1, x(%rip)
    movl    y(%rip), %eax
    movl    %eax, r1(%rip)
    ret

Although the results are the same this time, there's no guarantee that the compiler won't choose to reorder the instructions as it did for the first program. The result may change when you upgrade your compiler, or introduce other instructions, or for any other reason. If you start compiling on ARM, all bets are off ;) It's also interesting that despite relaxing the requirements in the source program, the generated assembler is the same. There's no way to relax the memory ordering outside the restrictions that the processor architecture puts in place.

OTHER TIPS

There are a bunch of issues here: (1) Releases and acquires must be in pairs. Otherwise, they don't establish synchronization and don't guarantee anything. (2) Even if you make the stores release and the loads acquire in your example, the memory model still allows r1=r2=0. You need to make everything seq_cst to forbid that execution. (3) We've built a tool at http://demsky.eecs.uci.edu/c11modelchecker.html for testing C11 atomic code. It will give you all executions allowed under reasonable interpretations of the C/C++11 memory model.

You may not see these interesting behaviors on current GCC versions yet, as at least the earlier versions ignored the memory ordering parameter and always used seq_cst. If GCC changes that, you could see r1=r2=0.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top