The issue Sutter is talking about is the fact that the reference count increment doesn't require any follow-up action for correctness. You're taking a non-zero reference count to another non-zero count, so no further action is required. However, the decrement requires follow-up action for correctness. The decrement takes a non-zero reference count to either a non-zero or zero reference count, and if your decrement of the reference count goes to zero, you need to perform an action --- specifically, deallocate the referenced object. This decrement-and-act dynamic requires greater consistency, both at the fence level (so the deallocation doesn't get reordered with some other read/write on another core that the CPU's memory/cache-management logic re-ordered) and at the compiler level (so the compiler doesn't reorder operations around the decrement that might cause reads/writes to be re-ordered around the potential deallocation).
So, for Sutter's described scenario, the difference in cost between increment and decrement isn't in the fundamental operations themselves, it's in the consistency constraints imposed on the actual use of the decrement (specifically, acting on the decrement itself) that don't apply to the increment.