Deadlock with flock, fork and terminating parent process

https://stackoverflow.com/questions/9106997

21-04-2021
|

题

I have a pretty complicated python program. Internally it has a logging system that uses an exclusive (LOCK_EX) fcntl.flock to manage global locking. Effectively, whenever a log message is dumped, the global file lock is acquired, message is emitted to file (different from lock file) and global file lock is released.

The program also forks itself several times (after log management is set up). Generally everything works.

If the parent process is killed (and children stay alive), I occasionally get a deadlock. All programs block on the fcntl.flock() forever. Trying to acquire the lock externally also blocks forever. I have to kill the children programs to fix the problem.

What is baffling though is that lsof lock_file shows no process as holding the lock! So I cannot figure out why the file is being locked by the kernel but no process is reported as holding it.

Does flock have issues with forking? Is the dead parent somehow holding the lock even though it is no longer in the process table? How do I go about resolving this issue?

解决方案

lsof is almost certainly simply not showing flock() locks, so not seeing one tells you nothing about whether there is one.

flock() locks are inherited via fd-sharing (dup() system call, or fork-and-exec that leaves the file open) and anyone with the shared descriptor can unlock the lock, but if the lock is already held, any attempt to lock it again will block. So, yes, it's likely that the parent locked the descriptor, then died, leaving the descriptor locked. The child process then tries to lock as well and blocks because the descriptor is already locked. (The same would happen if a child process locked the file, then died.)

Since `fcntl()' locks are per-process, the dying process releases all its locks, so that you can proceed, which is what you want here.

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow