Is atomic decrementing more expensive than incrementing?

Question 1

I believe it's referring to the fact that the increment can be "hidden", where the "decrement and check" has to be done as one operation.

I'm not aware of any architecture where --counter (or counter--, asssuming we're talking about simple datatypes like int, char, etc) is slower than ++counter or counter++.

Question 2

He discusses this in more detail somewhere, I think in his atomic<> weapons presentation. Basically it's all about where the memory fences are needed in the shared_ptr use case, not any intrinsic property of atomic increments vs decrements.

The reason is that you don't really care about the exact ordering of increments with a ref counted smart pointer as long as you don't miss any but with decrements it is critical that you have memory barriers in place so that on your final decrement which triggers the delete you don't have any possibility of previous memory accesses from another thread to the object owned by the smart pointer being reordered to after the memory is freed.

Question 3

The issue Sutter is talking about is the fact that the reference count increment doesn't require any follow-up action for correctness. You're taking a non-zero reference count to another non-zero count, so no further action is required. However, the decrement requires follow-up action for correctness. The decrement takes a non-zero reference count to either a non-zero or zero reference count, and if your decrement of the reference count goes to zero, you need to perform an action --- specifically, deallocate the referenced object. This decrement-and-act dynamic requires greater consistency, both at the fence level (so the deallocation doesn't get reordered with some other read/write on another core that the CPU's memory/cache-management logic re-ordered) and at the compiler level (so the compiler doesn't reorder operations around the decrement that might cause reads/writes to be re-ordered around the potential deallocation).

So, for Sutter's described scenario, the difference in cost between increment and decrement isn't in the fundamental operations themselves, it's in the consistency constraints imposed on the actual use of the decrement (specifically, acting on the decrement itself) that don't apply to the increment.