atomic<T>.load() with std::memory_order_release

Question 1

That's simply not allowed. The C++ (11) standard has requirements on what memory order constraints you can put on load/store operations.

For load (§29.6.5):

Requires: The order argument shall not be memory_order_release nor memory_order_acq_rel.

For store:

Requires: The order argument shall not be memory_order_consume, memory_order_acquire, nor memory_order_acq_rel.

Question 2

These combinations do not make any sense, and they are not allowed either.

An acquire operation synchronizes previous non-atomic writes or side effects with a release operation so that when the acquire (load) is realized, all other stores (effects) that happened before the release (store) are also visible (for threads that acquire the same atomic that was released).

Now, if you could do (and would do) an acquire store and a release load, what should it do? What store should the acquire operation synchronize with? Itself?

Question 3

The C/C++/LLVM memory model is sufficient for synchronization strategies that ensure data is ready to be accessed before accessing it. While that covers most common synchronization primitives, useful properties can be obtained by building consistent models on weaker guarantees.

The biggest example is the seqlock. It relies on "speculatively" reading data that may not be in a consistent state. Because reads are allowed to race with writes, readers don't block writers -- a property which is used in the Linux kernel to allow the system clock to be updated even if a user process is repeatedly reading it. Another strength of the seqlock is that on modern SMP arches it scales perfectly with the number of readers: because the readers don't need to take any locks, they only need shared access to the cache lines.

The ideal implementation of a seqlock would use something like a "release load" in the reader, which is not available in any major programming language. The kernel works around this with a full read fence, which scales well across architectures, but doesn't achieve optimal performance.

Question 4

Do the combinations vv.store(42, std::memory_order_acquire) and vv.load(std::memory_order_release) also make sense?

Technically, they are formally disallowed but it isn't important to know that, except to write C++ code.

They simply cannot be defined in the model and it's important that you know and understand, even if you don't write code.

Note that disallowing these values is an important design choice: if you write your_own::atomic<> class, you can chose to allow these values and define them as equivalent to relaxed operations.

It's important to understand the design space; you must not have too much respect for the all C++ thread primitive design choices, some of which are purely arbitrary.

In which situation could one use them? What are the semantics of these combinations?

None, as you must understand the fundamental notion that a read is not a write (it took me a while to get it). You can only claim to understand non linear execution when you get that idea.

In a non threaded program that doesn't have async signals, all steps are sequential and it doesn't matter that reads aren't writes: all reads of an object could just rewrite the value if you arrange for sequence points to be respected and if you allow writing to a constant its own value (which is OK in practice as long as memory is R/W).

So the distinction between reads and writes, at such level, isn't that important. You could manage to define a semantic based only operations that are both reads and writes of a memory location such that writing to a constant is allowed and reading an invalid value that isn't used is OK.

I don't recommend it of course, as it's pretty ugly to blurry the distinction between reads and writes.

But for multithreading you really don't want to have writes to data you only read: no only it would create data races (which you could arbitrary declare to be unimportant when the old value is written back), it would also not map to the CPU worldview as a write changes the state of the cache line of a shared object. The fact that a read isn't a write is essential for efficiency in multithread programs, much more than for single threaded ones.

At the abstract level, a store operation on an atomic is a modification so it's part of its modification order, and a load is not: a load only points to a position in the modification order (a load can see an atomically stored value, or the initial value, the value established at construction, before all atomic modifications).

Modifications are ordered with each others and loads are not, only with respect to modifications. (You can view loads as happening at exactly the same time.)

Acquire and release operations are about making an history (a past) and communicating it: a release operation on an object makes your past the past of the atomic object and the acquire operation makes that past your past.

A modification that isn't an atomic RMW cannot see previous value; on the other hand, an algorithm that includes a load and then a store (on one or two atomics) sees some previous value but isn't in general guaranteed to see the value left by the modification just before it in the modification order, so an acquire load X followed by a release store Y transitively releases the history and makes a past (of another thread at some point by another release operation that was seen by X) part of the past associated with the atomic variable by Y (in addition to the rest of our past).

A RMW is semantically different from acquire then release because there is never "space" in the history between the release and the acquire. It means that programs using only acq+rel RMW operations are always sequentially consistent, as they get the full past of all threads they interact with.

So if you want a acq+rel load or store, just do a RMW read or RMW write operation:

acq+rel load is RMW writing back the same value
acq+rel store is RMW dismissing the original value

You can write you own (strong) atomic class that does that for (strong) loads and (strong) stores: it would be logically defined as your class would make all operations, even loads, part of the operation history of the (strong) atomic object. So a (strong) load could be observed by a (strong) store, as they are both (atomic) modifications and reads of the underlying normal atomic object.

Note the set of acq_rel operations on such "strong atomic" objects would have strictly stronger guarantees than the intended guarantees of the set of seq_cst operations on normal atomics, for programs using relaxed atomic operations: the intent of the designers of seq_cst is that using seq_cst do not make programs using mixed atomic operations sequentially consistent in general.