After NFS server reboots, all clients that have any active file locks start the lock reclamation procedure that lasts no longer than so-called "grace period" (just a constant). If the reclamation procedure fails during the grace period, NFS client (usually a kernel space beast) sends SIGUSR1 to a process that wasn't able to recover its locks. That's the root of your problem.
When the lock succeeds on the server side, rpc.lockd on the client system requests another daemon, rpc.statd, to monitor the NFS server that implements the lock. If the server fails and then recovers, rpc.statd will be informed. It then tries to reestablish all active locks. If the NFS server fails and recovers, and rpc.lockd is unable to reestablish a lock, it sends a signal (SIGUSR1) to the process that requested the lock.
http://menehune.opt.wfu.edu/Kokua/More_SGI/007-2478-010/sgi_html/ch07.html
You're probably wondering how to avoid this. Well, there're a couple of ways, but none is ideal:
- Increase grace period. AFAIR, on linux it can be changed via /proc/fs/nfsd/nfsv4leasetime.
- Make a SIGUSR1 handler in your code and do something smart there. For instance in a signal handler you could set a flag denoting that locks recovery is failed. If this flag is set your program can try to wait for a readiness of NFS server (as long as it needs) and then it can try to recover locks itself. Not very fruitful...
- Do not use NFS locking ever again. If it's possible switch to zookeeper as was suggested earlier.