Word Tearing on x86

https://stackoverflow.com/questions/1607378

05-07-2019
|

Question

Under what circumstances is it unsafe to have two different threads simultaneously writing to adjacent elements of the same array on x86? I understand that on some DS9K-like architectures with insane memory models this can cause word tearing, but on x86 single bytes are addressable. For example, in the D programming language real is an 80-bit floating point type on x86. Would it be safe to do something like:

real[] nums = new real[4];  // Assume new returns a 16-byte aligned block.
foreach(i; 0..4) {
    // Create a new thread and have it do stuff and 
    // write results to index i of nums.
}

Note: I know that, even if this is safe, it can sometimes cause false sharing problems with the cache, leading to slow performance. However, for the use cases I have in mind writes will be infrequent enough for this not to matter in practice.

Edit: Don't worry about reading back the values that are written. The assumption is that there will be synchronization before any values are read. I only care about the safety of writing in this way.

Solution

The x86 has coherent caches. The last processor to write to a cache line acquires the whole thing and does a write to the cache. This ensures that single byte and 4 byte values written on corresponding values are atomically updated.

That's different than "its safe". If the processors each only write to bytes/DWORDS "owned" by that processor by design, then the updates will be correct. In practice, you want one processor to read values written by others, and that requires synchronization.

It is also different than it is "efficient". If several processors can each write to different places in the cache line, then the cache line can ping-pong between CPUs and that's a lot more expensive than if it the cache line goes to a single CPU and stays there. The usual rule is to put processor-specific data in its own cache line. Of course, if you are only going to write to just that one word, just once, and the amount of work is significant compared to a cache-line move, then your performance will be acceptable.

OTHER TIPS

I might be missing something, but I don't foresee any issues. x86 architecture writes only what it needs, it doesn't do any writing outside the specified values. Cache-snooping handles the cache issues.

You are asking about x86 specifics, yet your example is in some high-level language. Your specific question about D can only be answered by the people who wrote the compiler you are using, or perhaps the D language specification. Java for example requires that array element access must not cause tearing.

Regarding x86, atomicity of operations is specified in Section 8.1 of Intel's Software Developer's Manual Volume 3A. According to it, atomic store operations include: storing a byte, storing word-aligned word and dword-aligned dword on all x86 CPUs. It also specifies that on P6 and later CPUs unaligned 16-, 32- and 64-bit access to cached memory within a cache line is atomic.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow