Question

What could be reasons for the following message:

BUG: spinlock lockup suspected on CPU#0, sh/11786

lock: kmap_lock+0x0/0x40, .magic: dead4ead, .owner: sh/11787, .owner_cpu: 1

Was it helpful?

Solution

Blockquote BUG: spinlock lockup suspected on CPU#0, sh/11786

This indicates the CPU0 is lockup, and the thread/Process would be sh (or start by sh, I am not sure). You should have a look at the stack strace info dumped by the kernel. For example:

127|uid=0 gid=1007@nutshell:/var # [  172.285647] BUG: spinlock lockup on CPU#0, swapper/0, 983482f0
[  172.291523] [<8003cb44>] (unwind_backtrace+0x0/0xf8) from [<801853e4>] (do_raw_spin_lock+0x100/0x164)
[  172.300768] [<801853e4>] (do_raw_spin_lock+0x100/0x164) from [<80350508>] (_raw_spin_lock_irqsave+0x54/0x60)
[  172.310618] [<80350508>] (_raw_spin_lock_irqsave+0x54/0x60) from [<7f3cf4a0>] (mlb_os81092_interrupt+0x18/0x68 [os81092])
[  172.321636] [<7f3cf4a0>] (mlb_os81092_interrupt+0x18/0x68 [os81092]) from [<800abee0>] (handle_irq_event_percpu+0x50/0x184)
[  172.332781] [<800abee0>] (handle_irq_event_percpu+0x50/0x184) from [<800ac050>] (handle_irq_event+0x3c/0x5c)
[  172.342622] [<800ac050>] (handle_irq_event+0x3c/0x5c) from [<800ae00c>] (handle_level_irq+0xac/0xfc)
[  172.351767] [<800ae00c>] (handle_level_irq+0xac/0xfc) from [<800ab82c>] (generic_handle_irq+0x2c/0x40)
[  172.361090] [<800ab82c>] (generic_handle_irq+0x2c/0x40) from [<800552e8>] (mx3_gpio_irq_handler+0x78/0x140)
[  172.370843] [<800552e8>] (mx3_gpio_irq_handler+0x78/0x140) from [<800ab82c>] (generic_handle_irq+0x2c/0x40)
[  172.380595] [<800ab82c>] (generic_handle_irq+0x2c/0x40) from [<80036904>] (handle_IRQ+0x4c/0xac)
[  172.389402] [<80036904>] (handle_IRQ+0x4c/0xac) from [<80035ad0>] (__irq_svc+0x50/0xd0)
[  172.397416] [<80035ad0>] (__irq_svc+0x50/0xd0) from [<80036bb4>] (default_idle+0x28/0x2c)
[  172.405603] [<80036bb4>] (default_idle+0x28/0x2c) from [<80036e9c>] (cpu_idle+0x9c/0x108)
[  172.413793] [<80036e9c>] (cpu_idle+0x9c/0x108) from [<800088b4>] (start_kernel+0x294/0x2e4)
[  172.422181] [<800088b4>] (start_kernel+0x294/0x2e4) from [<10008040>] (0x10008040)

[1]This would tell you the function call relationships. Notice the info: [ 172.310618] [<80350508>] (_raw_spin_lock_irqsave+0x54/0x60) from [<7f3cf4a0>] (mlb_os81092_interrupt+0x18/0x68 [os81092]) This tells mlb_os81092_interrupt function try to use the spin_lock_irqsave to lock something. So we can just found this spinlock is used to lock what, and try to analyse or and logs to detect which one is holding the lock. Then found the method to avoid it.

[2]Also because the CPU0 is locked, and there can be MP system, you should make sure whether there is the a irq which may use the critical resource, if the handler of irq is assigned to other CPUs(like the CPU1), is's OK, but if CPU0 deals with the handler of irq, this would cause the deadlock if you use the spin_lock not the spin_lock_irqsave, so check it.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top