Question

I have 2 processes (A, B) sharing the same mutex (using WaitForSingleObject / ReleaseMutex calls). Everything works fine, but when process A crashes, process B is humming along happily. When I restart process A, there's a deadlock.

Deeper investigation reveals that process B can successfully call ReleaseMutex() twice after process A crashes.

My interpretation: After process A crashes, the mutex is still locked, but ownership of the mutex transfers readily to process B (which is a bug). That's why it's humming along happily, calling WaitForSingleObject (getting WAIT_OBJECT_0 in return) and ReleaseMutex (getting TRUE in return).

Is it possible to use a named synchronization primitive similar to Mutex in such a way that a crash in process A will release the mutex?

One solution is to use SEH and catch the crash and release mutex, but I really hope Windows has a robust primitive that doesn't deadlock like that on process crash.

Was it helpful?

Solution

Some basic assumptions you have to make here about how a mutex works on Windows:

  • a mutex is an operating system object that's reference-counted. It will not disappear until the last handle on the mutex is closed
  • any handle that's left unclosed when a process terminates is closed by the operating system, decrementing the reference count
  • a mutex is re-entrant, calling WaitForSingleObject on a mutex on the same thread succeeds and needs to be balanced with an equal number of ReleaseMutex calls
  • an owned mutex becomes abandoned when the thread that owns it terminates without calling ReleaseMutex. Calling WaitForSingleObject on a mutex in this state generates the WAIT_ABANDONED error return code
  • it is never a bug in the operating system.

So you can draw conclusions from this by what you observed. Nothing happens to the mutex when A crashes, B still has an handle on it. The only possible way B can notice that A crashed is when A crashed while it owned the mutex. Very low odds for that and easily observed since B will deadlock. Far more likely is that B will happily motor on since it is now completely unobstructed, nobody else is going to acquire the mutex anymore.

Furthermore, a deadlock when A starts back proves something you already knew: B owns the mutex permanently for some reason. Possibly because it acquired the mutex recursively. You know this because you noticed you had to call ReleaseMutex twice. This is a bug you need to fix.

You'll need to protect yourself against a crashing sibling process and you need to write explicit code for that. Call OpenProcess on the sibling to obtain a handle on the process object. A WaitForSingleObject call on the handle will complete when the process terminates.

OTHER TIPS

If the process holding the mutex crashes, then it becomes abandoned. It's up to the other application how it deals with this state returned from the wait functions.

If it gets WAIT_ABANDONED back then it can either carry on as if all was ok (presumably what it does now) or "potentially unstable data, proceed with caution". The ownership is not passed to another process automatically.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top