Advantages and disadvantages of CAS programming [closed]

Question 1

Advantage: no locks, hence no deadlock and generally better scalability

Disadvantage: risk of starvation (unless the algorithm is also wait-free, but this is generally not the case)

edit:wait-free algorithms do some operations when it losses CAS race.instead of busytrying/startvation.

Question 2

Only write a CAS retry loop in your source if there's no language built-in that implements the atomic operation you want. Hardware (especially x86) can often do better.

Java's AtomicInteger has getAndIncrement() and incrementAndGet() method (since Java 7 at least) which makes it easy for the JVM to JIT it into asm that's more efficient than an actual CAS retry loop. This is like C++11's std::atomic::fetch_add(). See also Practical uses for AtomicInteger.

On x86, you want your JVM to take advantage of x86's hardware support for this operation. This is much more likely to happen if you use a function that maps directly to it, instead of a CAS-retry loop that the optimizer would have to work hard to optimize into a non-looping implementation.

(There's hardware bus/cache arbitration for locked operations when multiple CPU cores contend for the same cache line; only one thread at a time can actually own the cache line and be doing an increment. You could argue that it's wait-free, even if "steps" are clock cycles instead of CPU instructions: there's probably a low upper-bound on how long a locked operation can take wait on any given system even with all other cores hammering on the same cache line.)

; possible x86 implementation of incrementAndGet() for a 32-bit integer
; which you'd hopefully get (after inlining and so on)

mov    eax,1
lock   xadd [mem], eax       ; atomically do [mem]+=eax, and put the old value in eax
inc    eax                   ; old_value += 1 to get the new value
; result in EAX

No loop required.

On an LL/SC machine (most non-x86, like ARM, PowerPC, MIPS), there will be a retry loop, but it's not exactly CAS. And a CAS retry loop on a LL/SC machine has extra overhead. It's very minor, but it's definitely better to let the JVM see the atomic operation you want directly. See Atomically clearing lowest non-zero bit of an unsigned integer for more discussion of CAS vs. LL/SC. A CAS loop could in theory optimize into a pure LL/SC loop.

That question is also an example of a case where your best bet (in C++ or Java source) is a CAS retry loop, because the language doesn't have an atomic primitive that does what you want. (Neither does any common hardware).