Deferred bcast wakeup for condition variables - is it valid?

https://stackoverflow.com/questions/7536401

27-01-2021
|

Question

I'm implementing pthread condition variables (based on Linux futexes) and I have an idea for avoiding the "stampede effect" on pthread_cond_broadcast with process-shared condition variables. For non-process-shared cond vars, futex requeue operations are traditionally (i.e. by NPTL) used to requeue waiters from the cond var's futex to the mutex's futex without waking them up, but this is in general impossible for process-shared cond vars, because pthread_cond_broadcast might not have a valid pointer to the associated mutex. In a worst case scenario, the mutex might not even be mapped in its memory space.

My idea for overcoming this issue is to have pthread_cond_broadcast only directly wake one waiter, and have that waiter perform the requeue operation when it wakes up, since it does have the needed pointer to the mutex.

Naturally there are a lot of ugly race conditions to consider if I pursue this approach, but if they can be overcome, are there any other reasons such an implementation would be invalid or undesirable? One potential issue I can think of that might not be able to be overcome is the race where the waiter (a separate process) responsible for the requeue gets killed before it can act, but it might be possible to overcome even this by putting the condvar futex in the robust mutex list so that the kernel performs a wake on it when the process dies.

Solution

There may be waiters belonging to multiple address spaces, each of which has mapped the mutex associated with the futex at a different address in memory. I'm not sure if FUTEX_REQUEUE is safe to use when the requeue point may not be mapped at the same address in all waiters; if it does then this isn't a problem.

~~There are other problems that won't be detected by robust futexes; for example, if your chosen waiter is busy in a signal handler, you could be kept waiting an arbitrarily long time.~~ [As discussed in the comments, these are not an issue]

Note that with robust futexes, you must set the value of the futex & 0x3FFFFFFF to be the TID of the thread to be woken up; you must also set bit FUTEX_WAITERS on if you want a wakeup. This means that you must choose which thread to awaken from the broadcasting thread, or you will be unable to deal with thread death immediately after the FUTEX_WAKE. You'll also need to deal with the possibility of the thread dying immediately before the waker thread writes its TID into the state variable - perhaps having a 'pending master' field that is also registered in the robust mutex system would be a good idea.

I see no reason why this can't work, then, as long as you make sure to deal with the thread exit issues carefully. That said, it may be best to simply define in the kernel an extension to FUTEX_WAIT that takes a requeue point and comparison value as an argument, and let the kernel handle this in a simple, race-free manner.

OTHER TIPS

I just don't see why you assume that the corresponding mutex might not be known. It is clearly stated

The effect of using more than one mutex for concurrent pthread_cond_timedwait() or pthread_cond_wait() operations on the same condition variable is undefined; that is, a condition variable becomes bound to a unique mutex when a thread waits on the condition variable, and this (dynamic) binding shall end when the wait returns.

So even for process shared mutexes and conditions this must hold, and any user space process must always have mapped the same and unique mutex that is associated to the condition.

Allowing users to associate different mutexes to a condition at the same time is nothing that I would support.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow