Not locking mutex for pthread_cond_timedwait and pthread_cond_signal ( on Linux )

https://stackoverflow.com/questions/1003190

05-07-2019
|

Question

Is there any downside to calling pthread_cond_timedwait without taking a lock on the associated mutex first, and also not taking a mutex lock when calling pthread_cond_signal ?

In my case there is really no condition to check, I want a behavior very similar to Java wait(long) and notify().

According to the documentation, there can be "unpredictable scheduling behavior". I am not sure what that means.

An example program seems to work fine without locking the mutexes first.

Solution

The first is not OK:

The pthread_cond_timedwait() and pthread_cond_wait() functions shall block on a condition variable. They shall be called with mutex locked by the calling thread or undefined behavior results.

http://opengroup.org/onlinepubs/009695399/functions/pthread_cond_timedwait.html

The reason is that the implementation may want to rely on the mutex being locked in order to safely add you to a waiter list. And it may want to release the mutex without first checking it is held.

The second is disturbing:

if predictable scheduling behaviour is required, then that mutex is locked by the thread calling pthread_cond_signal() or pthread_cond_broadcast().

http://www.opengroup.org/onlinepubs/007908775/xsh/pthread_cond_signal.html

Off the top of my head, I'm not sure what the specific race condition is that messes up scheduler behaviour if you signal without taking the lock. So I don't know how bad the undefined scheduler behaviour can get: for instance maybe with broadcast the waiters just don't get the lock in priority order (or however your particular scheduler normally behaves). Or maybe waiters can get "lost".

Generally, though, with a condition variable you want to set the condition (at least a flag) and signal, rather than just signal, and for this you need to take the mutex. The reason is that otherwise, if you're concurrent with another thread calling wait(), then you get completely different behaviour according to whether wait() or signal() wins: if the signal() sneaks in first, then you'll wait for the full timeout even though the signal you care about has already happened. That's rarely what users of condition variables want, but may be fine for you. Perhaps this is what the docs mean by "unpredictable scheduler behaviour" - suddenly the timeslice becomes critical to the behaviour of your program.

Btw, in Java you have to have the lock in order to notify() or notifyAll():

This method should only be called by a thread that is the owner of this object's monitor.

http://java.sun.com/j2se/1.4.2/docs/api/java/lang/Object.html#notify()

The Java synchronized {/}/wait/notifty/notifyAll behaviour is analogous to pthread_mutex_lock/pthread_mutex_unlock/pthread_cond_wait/pthread_cond_signal/pthread_cond_broadcast, and not by coincidence.

OTHER TIPS

Butenhof's excellent "Programming with POSIX Threads" discusses this right at the end of chapter 3.3.3.

Basically, signalling the condvar without locking the mutex is a potential performance optimisation: if the signalling thread has the mutex locked, then the thread waking on the condvar has to immediately block on the mutex that the signalling thread has locked even if the signalling thread is not modifying any of the data the waiting thread will use.

The reason that "unpredictable scheduler behavior" is mentioned is that if you have a high-priority thread waiting on the condvar (which another thread is going to signal and wakeup the high priority thread), any other lower-priority thread can come and lock the mutex so that when the condvar is signalled and the high-priority thread is awakened, it has to wait on the lower-priority thread to release the mutex. If the mutex is locked whilst signalling, then the higher-priority thread will be scheduled on the mutex before the lower-priority thread: basically you know that that when you "awaken" the high-priority thread it will awaken as soon as the scheduler allows it (of course, you might have to wait on the mutex before signalling the high-priority thread, but that's a different issue).

The point of waiting on conditional variable paired with a mutex is to atomically enter wait and release the lock, i.e. allow other threads to modify the protected state, then again atomically receive notification of the state change and acquire the lock. What you describe can be done with many other methods like pipes, sockets, signals, or - probably the most appropriate - semaphores.

I think this should work (note untested code):

// initialize a semaphore
sem_t sem;
sem_init(&sem,
    0, // not shared
    0  // initial value of 0
    );


// thread A
struct timespec tm;
struct timeb    tp;

const long sec      = msecs / 1000;
const long millisec = msecs % 1000;

ftime(&tp);
tp.time += sec;
tp.millitm += millisec;
if(tp.millitm > 999) {
    tp.millitm -= 1000;
    tp.time++;
}
tm.tv_sec  = tp.time;
tm.tv_nsec = tp.millitm * 1000000;

// wait until timeout or woken up
errno = 0;
while((sem_timedwait(&sem, &tm)) == -1 && errno == EINTR) {
    continue;
}

return errno == ETIMEDOUT; // returns true if a timeout occured


// thread B
sem_post(&sem); // wake up Thread A early

Conditions should be signaled outside of the mutex whenever possible. Mutexes are a necessary evil in concurrent programming. Their use leads to contention which robs the system of the maximum performance that it can gain from the use of multiple processors.

The purpose of a mutex is to guard access to some shared variables in the program so that they behave atomically. When a signaling operation is done inside a mutex, it causes an inclusion of hundreds of irrelevant machine cycles into the mutex which have nothing to do with guarding the shared data. Potentially, it calls from a user space all the way into a kernel.

The notes about "predictable scheduler behavior" in the standard are completely bogus.

When we want the machine to execute statements in a predictable, well-defined order, the tool for that is the sequencing of statements within a single thread of execution: S1 ; S2. Statement S1 is "scheduled" before S2.

We use threads when we realize that some actions are independent and their scheduling order is not important, and there are performance benefits to be realized, like more timely response to real time events or computing on multiple processors.

At times when scheduling orders do become important among multiple threads, this falls under a concept called priority. Priority resolves what happens first when any one of N statements could potentially be scheduled to execute. Another tool for ordering under multithreading is queuing. Events are placed into a queue by one or more threads and a single service thread processes the events in the queue order.

The bottom line is, the placement of pthread_cond_broadcast is not an appropriate tool for controlling execution order. It will not make execution order predictable in the sense that the program suddenly has exactly the same, reproducible behavior on every platform.

"unpredictable scheduling behavior" means just that. You don't know what's going to happen. Nor do the implementation. It could work as expected. It could crash your app. It could work fine for years, then a race condition makes your app go monkey. It could deadlock.

Basically if any docs suggest anything undefined/unpredicatble can happen unless you do what the docs tell you to do, you better do it. Else stuff might blow up in your face. (And it won't blow up until you put the code into production , just to annoy you even more. Atleast that's my experience)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow