Only write a CAS retry loop in your source if there's no language built-in that implements the atomic operation you want. Hardware (especially x86) can often do better.
Java's AtomicInteger
has getAndIncrement()
and incrementAndGet()
method (since Java 7 at least) which makes it easy for the JVM to JIT it into asm that's more efficient than an actual CAS retry loop. This is like C++11's std::atomic::fetch_add()
. See also Practical uses for AtomicInteger.
On x86, you want your JVM to take advantage of x86's hardware support for this operation. This is much more likely to happen if you use a function that maps directly to it, instead of a CAS-retry loop that the optimizer would have to work hard to optimize into a non-looping implementation.
(There's hardware bus/cache arbitration for lock
ed operations when multiple CPU cores contend for the same cache line; only one thread at a time can actually own the cache line and be doing an increment. You could argue that it's wait-free, even if "steps" are clock cycles instead of CPU instructions: there's probably a low upper-bound on how long a lock
ed operation can take wait on any given system even with all other cores hammering on the same cache line.)
; possible x86 implementation of incrementAndGet() for a 32-bit integer
; which you'd hopefully get (after inlining and so on)
mov eax,1
lock xadd [mem], eax ; atomically do [mem]+=eax, and put the old value in eax
inc eax ; old_value += 1 to get the new value
; result in EAX
No loop required.
On an LL/SC machine (most non-x86, like ARM, PowerPC, MIPS), there will be a retry loop, but it's not exactly CAS. And a CAS retry loop on a LL/SC machine has extra overhead. It's very minor, but it's definitely better to let the JVM see the atomic operation you want directly. See Atomically clearing lowest non-zero bit of an unsigned integer for more discussion of CAS vs. LL/SC. A CAS loop could in theory optimize into a pure LL/SC loop.
That question is also an example of a case where your best bet (in C++ or Java source) is a CAS retry loop, because the language doesn't have an atomic primitive that does what you want. (Neither does any common hardware).