Does one assembler instruction always execute atomically?

https://stackoverflow.com/questions/1091099

11-09-2019
|

Question

Today I came across this question:

you have a code

static int counter = 0;
void worker() {
    for (int i = 1; i <= 10; i++)
        counter++;
}

If worker would be called from two different threads, what value will counter have after both of them are finished?

I know that actually it could be anything. But my internal guts tells me, that counter++ will most likely be translated into single assembler instruction, and if both threads are execute on the same core, counter will be 20.

But what if those threads are run on different cores or processors, could there be a race condition in their microcode? Is one assembler instruction could always be viewed as atomic operation?

Solution

Specifically for x86, and regarding your example: counter++, there are a number of ways it could be compiled. The most trivial example is:

inc counter

This translates into the following micro operations:

load counter to a hidden register on the CPU
increment the register
store the updated register in counter

This is essentially the same as:

mov eax, counter
inc eax
mov counter, eax

Note that if some other agent updates counter between the load and the store, it won't be reflected in counter after the store. This agent could be another thread in the same core, another core in the same CPU, another CPU in the same system, or even some external agent that uses DMA (Direct Memory Access).

If you want to guarantee that this inc is atomic, use the lock prefix:

lock inc counter

lock guarantees that nobody can update counter between the load and the store.

Regarding more complicated instructions, you usually can't assume that they'll execute atomically, unless they support the lock prefix.

OTHER TIPS

The answer is: it depends!

Here is some confusion around, what an assembler instruction is. Normally, one assembler instruction is translated into exactly one machine instruction. The excemption is when you use macros -- but you should be aware of that.

That said, the question boils down is one machine instruction atomic?

In the good old days, it was. But today, with complex CPUs, long running instructions, hyperthreading, ... it is not. Some CPUs guarantee that some increment/decrement instructions are atomic. The reason is, that they are neat for very simple syncronizing.

Also some CPU commands are not so problematic. When you have a simple fetch (of one piece of data that the processor can fetch in one piece) -- the fetch itself is of course atomic, because there is nothing to be divided at all. But when you have unaligned data, it becomes complicated again.

The answer is: It depends. Carefully read the machine instruction manual of the vendor. In doubt, it is not!

Edit: Oh, I saw it now, you also ask for ++counter. The statement "most likely to be translated" can not be trusted at all. This largely depends also on the compiler of course! It gets more difficult when the compiler is making different optimizations.

Not always - on some architectures one assembly instruction is translated into one machine code instruction, while on others it does not.

In addition - you can never assume that the program language you are using is compiling a seemingly simple line of code into one assembly instruction. Moreover, on some architectures, you cannot assume one machine code will execute atomically.

Use proper synchronization techniques instead, dependent on the language you are coding in.

Increment/decrement operations on 32-bit or less integer variables on a single 32-bit processor with no Hyper-Threading Technology are atomic.
On a processor with Hyper-Threading Technology or on a multi-processor system, the increment/decrement operations are NOT guaranteed to be executed atomicaly.

Invalidated by Nathan's comment: ~~If I remember my Intel x86 assembler correctly, the INC instruction only works for registers and does not directly work for memory locations.~~

So a counter++ would not be a single instruction in assembler (just ignoring the post-increment part). It would be at least three instructions: load counter variable to register, increment register, load register back to counter. And that is just for x86 architecture.

In short, don't rely on it being atomic unless it is specified by the language specification and that the compiler that you are using supports the specifications.

No you cannot assume this. Unless it clearly stated in compiler specification. And moreover no one can guarantee that one single assembler instruction indeed atomic. In practice each assembler instruction is translated to number of microcode operation - uops.
Also the issue of race condition is tightly coupled with memory model(coherence, sequential, release coherence and etc.), for each one the answer and result could be different.

Another issue is that if you don't declare the variable as volatile, the code generated would probably not update the memory at every loop iteration, only at the end of the loop the memory would be updated.

Might not be an actual answer to your question, but (assuming this is C#, or another .NET language) if you want counter++ to really be multi-threaded atomic, you could use System.Threading.Interlocked.Increment(counter).

See other answers for actual information on the many different ways why/how counter++ could not be atomic. ;-)

In most cases, no. In fact, on x86, you can perform the instruction

push [address]

which, in C, would be something like:

*stack-- = *address;

This performs two memory transfers in one instruction.

That's basically impossible to do in 1 clock cycle, not the least because one memory transfer is also not possible in one cycle!

On many other processors, the seperation between memory system and processor is bigger. (often these processor can be little or big-endian depending on memory system, like ARM and PowerPC), this also has consequences for atomic behaviour if the memory system can reorder reads and writes.

For this purpose, there are memory barriers (http://en.wikipedia.org/wiki/Memory_barrier)

So in short, while atomic instructions are enough on intel (with the relevant lock prefixes), more must be done on non-intel, since the memory I/O might not be in the same order.

This is a known problem when porting "lock-free" solutions from Intel to other architectures.

(Note that multiprocessor (not multicore) systems on x86 also seem to need memory barriers, at least in 64-bit mode.

I think that you'll get a race condition on access.

If you wanted to ensure an atomic operation in incrementing counter then you'd need to use ++counter.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow