preemption, pthread_spin_lock and atomic built-in

Question

pthread_mutex_lock typically has a fast path which uses an atomic operation to try to acquire the lock. In the event that the lock is not owned, this can be very fast. Only if the lock is already held, does the thread enter the kernel via a system call. The kernel acquires a spin-lock, and then reattempts to acquire the mutex in case it was released since the first attempt. If this attempt fails, the calling thread is added to a wait queue associated with the mutex, and a context switch is performed. The kernel also sets a bit in the mutex to indicate that there is a waiting thread.

pthread_mutex_unlock also has a fast path. If the waiting thread flag is clear, it can simply release the lock. If the flag is set, the thread must enter the kernel via a system call so the waiting thread can be woken. Again, the kernel must acquire a spin lock so that it can manipulate its thread control data structures. In the event that there is no thread waiting after all, the lock can be released by the kernel. If there is a thread waiting, it is made runnable, and ownership of the mutex is transferred without it being released.

There are many subtle race conditions in this little dance, and hopefully it all works properly.

Since a thread that attempts to acquire a locked mutex is context switched out, it does not prevent the thread the owns the mutex from running, which gives the owner an opportunity to exit its critical section and release the mutex.

In contrast, a thread that attempts to acquire a locked spin-lock simply spins, consuming CPU cycles. This has the potential of preventing the thread that owns the spin-lock from exiting its critical section and releasing the lock. The spinning thread can be preempted when its timeslice has been consumed, allowing the thread that owns the lock to eventually regain control. Of course, this is not great for performance.

In practice, spin-locks are used where there is no chance that the thread can be preempted while it owns the lock. A kernel may set a per-cpu flag to prevent it from performing a context switch from an interrupt service routine (or it may raise the interrupt priority level to prevent interrupts that can cause context switches, or it may disable interrupts altogether). A user thread can prevent itself from being preempted (by other threads in the same process) by raising its priority. Note that, in a uniprocessor system, preventing the current thread from being preempted eliminates the need for the spin lock. Alternatively, in a multiprocessor system, you can bind threads to cpus (cpu affinity) so that they cannot preempt one another.

All locks ultimately require an atomic primitive (well, efficient locks; see here for a counter example). Mutexes can be inefficient if they are highly contended, causing threads to constantly enter the kernel and be context switched; especially if the critical section is smaller than the kernel overhead. Spin locks can be more efficient, but only if the owner cannot be preempted and the critical section is short. Note that the kernel must still acquire a spin lock when a thread attempts to acquire a locked mutex.

Personally, I would use atomic operations for things like shared counter updates, and mutexes for more complex operations. Only after profiling would I consider replacing mutexes with spin locks (and figure out how to deal with preemption). Note that if you intend to use condvars, you have no choice but to use mutexes.