When will a memory write become globally visible?

https://stackoverflow.com/questions/17337418

01-06-2022
|

Frage

I'm writting some low-level synchronization code in C. And I met a problem:

Assume there're two threads Thread A and Thread B running on a x86_64 machine. Thread A write a memory location at time t1 and there is no more writes to this location afterward. Thread B read the same memory location at time t2.

Thread A:
    foo = magic_value;  /* happens at t1 */

Thread B:
    bar = foo;  /* happens at t2 */
    assert(bar == magic_value);

My question is: whether there exist a delta, for any t1 and t2 that t2 - t1 > delta. Thread B is guarented to read the newest value that Thread A wrote at t1.

I've read the documents from Intel and AMD and they did not mentioned if such a guarentee exists. I know that this value may depends on processor model or even mother board design (for multi-socket machine). I guess there must be some limit on this latency on any sane currently available x86_64 machine.

I know how to use sychronization primitives such as locks or memory barriers to guarentee such behaviour. I just need to know if such a guarenteed latency existed for a memory access to become globally visible.

Thanks a lot!!

Lösung

I am pretty sure that there are no guarantees about the maximum time between one CPU writing to a memory location and another seeing it. In a NUMA system the coherency protocol will take a long time. In practice it will be as fast as possible, but I doubt there are any guarantees.

Why do you need to know this though? When you're writing synchronization primitives you only need to think about ordering. x86_64 enforces strong consistency which means that stores will be seen by other CPUs in the order they happened and that's really the only thing you need to worry about.

Andere Tipps

Yes, there exist such delta: The x86 memory is coherent, but I don't think there is any gurarantee on the actual maximum delta. The paper Comparing Cache Architectures and Coherency Protocols on x86-64 Multicore SMP Systems may interest you (although it's a benchmark, not a formal documentation).

AFAIK foo should be declared volatile to force the compiler to actually add the write instructions instead of optimizing it away.

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit StackOverflow