Memory Model: preventing store-release and load-acquire reordering

Question 1

I forgot to post the soon found answer back to SO. Better late than never..

Turns out it is impossible thanks to how processors (at least x86-x64 kind of them) optimize memory accesses. I found the answer when was reading Intel manuals on its procs. Example 8-5:" Intra-Processor Forwarding is Allowed" was looking suspicious. Googling for "store buffer forwarding" lead to Joe Duffy's blog posts (first and second - read them pls).

To optimize writes processor uses store buffers (per processor queues of write ops). Buffering writes locally allows it to do next optimization: satisfying reads from the previously buffered writes to the same memory location and which haven't left the processor yet. The technique is called store-buffer forwarding (or store-to-load forwarding).

The end result in our case is that as reading at B is satisfied from a local storage (store buffer) it is not considered a volatile read and can be reordered with further volatile reads from another memory location (C).

It seems like a violation of the rule "Volatile reads don't reorder with each other". Yes, it is a violation, but very rare and exotic one. Why did it happen? Probably because Intel's released its first formal document on memory model of its processors years after .NET (and its JIT compiler) saw the sunlight.

So the answer is: no, the dummy reading (B) doesn't prevent reordering between A and C and cannot be used as a lightweight memory barrier.

Question 2

Here I will use an arrow notation to conceptualize the memory barriers. I use an up arrow ↑ and a down arrow ↓ to represent volatile writes and reads respectively. Think of the arrow head as pushing away any other reads or writes. So no other memory access can move past the arrow head, but they can move past the tail.

Consider your first example. This is how it would be conceptualized.

↑          
volatile1 write  // A
volatile2 read   // B
↓

So clearly we can see that the read and the write are allowed to switch positions. You are correct.

Now consider your second example. You claimed that introducing a dummy read would prevent the write of A and the read of B from getting swapped.

↑          
volatile1 write  // A
volatile1 read   // A
↓
volatile2 read   // B
↓

We can see that B is prevented from floating up by the dummy read of A. We can also see that the read of A cannot float down because, by inference, that would be the same as B moving up before A. But, notice that we have no ↑ arrow that would prevent the write to A from floating down (remember it can still move past the tail of an arrow). So no, at least theoretically, injecting a dummy read of A will not prevent the original write of A and the read of B from getting swapped because the write to A is still allowed to move downward.

I had to really think about this scenario. One thing I pondered for a quite some time is whether the read and write to A are locked together in tandem. If so then that would prevent the write to A from moving down because it would have to take the read with it which we already said was prevented. So if you go with that school of thought then your solution might just work. But, I read the specification again and I see nothing special mentioned about volatile accesses to the same variable. Obviously, the thread has to execute in a manner that is logically consistent with the original program sequence (that is mentioned in the specification). But, I can visualize ways the compiler or hardware could optimize (or otherwise reorder) that tandem access of A and still get the same result. So, I simply have to side with caution here and assume that the write to A can move down. Remember, a volatile read does not mean "fresh read from main memory". The write to A could be cached in a register and then the read comes from that register delaying the actual write to a later time. Volatile semantics do not prevent that scenario as far as I know.

The correct solution would be to put a call to Thread.MemoryBarrier in between the accesses. You can see how this is conceptualized with the arrow notation.

↑          
volatile1 write       // A
↑
Thread.MemoryBarrier
↓
volatile2 read        // B
↓

Now you can see that the read is not allowed to float up and the write is not allowed to float down preventing the swap.

You can see some of my other memory barrier answers using this arrow notation here, here, and here just to name a few.

Question 3

EDIT The conclusions I drew from the C# specs are wrong, see below. END EDIT

I surely am not someone 'authorized', but I think you haven't understood the memory model correctly.

Quoting the C# specification, section §10.10 Execution order, third bullet point on page 105:

The ordering of side effects is preserved with respect to volatile reads and writes.

Volatile reads and writes are defined as "side-effects" and this paragraph states that the ordering of side-effects is preserved.

So it is my understanding that your whole question is based on an incorrect assumption: volatile reads and writes cannot be reordered.

I think you got confused with that fact that with respect to non-volatile memory operations, volatile reads and writes are only half-fences.

EDIT this article: The C# Memory Model in Theory and Practice, Part 2 states exactly the opposite and supports your assertion that volatile reads can move up past an unrelated volatile write. The suggested solution is to introduce a MemoryBarrier where it matters.

Comment by Daniel below also says that the CLI spec is more specific about this than the C# spec and allows this reordering.

Now I find that the C# spec I quoted above is confusing! But given that on x86 the same instructions are used for a volatile memory access and a regular memory access, it makes perfect sense that they are subject to the same half-fence reordering issues. END EDIT

Question 4

Let me disagree with the accepted answer from Brian Gideon.

OmariO your solution to the problem (dummy read) looks perfectly correct to me. As you mentioned correctly, writes to a variable cannot be reordered with a following read from the same variable. If that reordering was possible then it would make the code incorrect in a single-threaded case (the read operation could return not the same value that has been written by the previous write operation). That is it would violate the fundamental rule of any memory model: single-threaded execution of a program must not be logically changed.

Also guys, Brian and OmariO, please don't mix up memory operations with acquire/release semantics and acquire/release memory fences. E.g. a read-acquire operation is not the same as an acquire fence. They have similar semantics but the distinction between them is very important. The best explanation of that terms that I know is in the great blog of Jeff Preshing:
Acquire and Release Semantics
Acquire and Release Fences