Question

I have a character device driver which is causing a system deadlock on a multicore system. The write call has a critical section protected by a spin lock (spin_lock_irqsave). The ISR must obtain this lock to finish its task as well. If the ISR is called on one core while the write is executing the critical section on another, a panic occurs due to a watchdog timer detecting a hard lockup on the core for the ISR. The write process never returns to finish executing. Shouldn't the write process continue to execute on its core, release the lock which will allow the other core in its ISR to then run?

The critical section requires about 5us to complete. The hard lock occurs after 5 seconds.

I assume I'm doing something wrong but do not know what.

Appreciate any help!

Was it helpful?

Solution

Turns out the critical section was calling wait_for_completion_timeout. Even though the timeout was zero, it still slept and didn't wake up to release the spin lock if the interrupt occurred in the blocking section. Using try_wait_for_completion in this case resolved the issue.

I would have posted source but it spans many modules and has architecture abstractions for portability between operating systems. Would have been a mess.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top