To force an order following the bucket-read, I guess I would need an explicit atomic_thread_fence() between the bucket read and the following atomic_store.
I do not believe the atomic_thread_fence()
call is necessary: the flag update has release semantics, preventing any preceding load or store operations from being reordered across it. See the formal definition by Herb Sutter:
A write-release executes after all reads and writes by the same thread that precede it in program order.
This should prevent the read of bucket
from being reordered to occur after the flag
update, regardless of where the compiler chooses to store data
.
That brings me to your comment about another answer:
The
volatile
ensures that there are ld/st operations generated, which can subsequently be ordered with fences. However, data is a local variable, not volatile. The compiler will probably put it in register, avoiding a store operation. That leaves the load from bucket to be ordered with the subsequent reset of flag.
It would seem that is not an issue if the bucket
read cannot be reordered past the flag
write-release, so volatile
should not be necessary (though it probably doesn't hurt to have it, either). It's also unnecessary because most function calls (in this case, atomic_store_explicit(&flag)
) serve as compile-time memory barriers. The compiler would not reorder the read of a global variable past a non-inlined function call because that function could modify the same variable.
I would also agree with @MaximYegorushkin that you could improve your busy-waiting with pause
instructions when targeting compatible architectures. GCC and ICC both appear to have _mm_pause(void)
intrinsics (probably equivalent to __asm__ ("pause;")
).