How do memory fences work?

https://stackoverflow.com/questions/7280119

19-01-2021
|

Question

I need to understand memory fences in multicore machines. Say I have this code

Core 1

mov [_x], 1; mov r1, [_y]

Core 2

mov [_y], 1; mov r2, [_x]

Now the unexpected results without memory fences would be that both r1 and r2 can be 0 after execution. In my opinion, to counter that problem, we should put memory fence in both codes, as putting it to only one would still not solve the problem. Something like as follows...

Core 1

mov [_x], 1; memory_fence; mov r1, [_y]

Core 2

mov [_y], 1; memory_fence; mov r2, [_x]

Is my understanding correct or am I still missing something? Assume the architecture is x86. Also, can someone tell me how to put memory fences in a C++ code?

Solution

Fences serialize the operation that they fence (loads & stores), that is, no other operation may start till the fence is executed, but the fence will not execute till all preceding operations have completed. quoting intel makes the meaning of this a little more precise (taken from the MFENCE instruction, page 3-628, Vol. 2A, Intel Instruction reference):

This serializing operation guarantees that every load and store instruction that precedes the MFENCE instruction in program order becomes globally visible before any load or store instruction that follows the MFENCE instruction.1

A load instruction is considered to become globally visible when the value to be loaded into its destination register is determined.

Using fences in C++ is tricky (C++11 may have fence semantics somewhere, maybe someone else has info on that), as it is platform and compiler dependent. For x86 using MSVC or ICC, you can use the _mm_lfence, _mm_sfence & _mm_mfence for load, store and load + store fencing (note that some of these are SSE2 instructions).

Note: this assumes an Intel perspective, that is: one using an x86 (32 or 64 bit) or IA64 processor

OTHER TIPS

C++11 (ISO/IEC 14882:2011) defines a multi-threading-aware memory model. Although I don't know of any compiler that currently implements the new memory model, C++ Concurrency in Action by Anthony Williams documents it very well. You may check Chapter 5 - The C++ Memory Model and Operations on Atomic Types where he explains about relaxed operations and memory fences. Also, he is the author of the just::thread library that may be used till we have compiler vendor support of the new standard. just::thread is the base for the boost::thread library.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow