jffs2 scheduling while atomic error on kernel 2.6

https://stackoverflow.com/questions/17198046

01-06-2022
|

Question

Has anybody seen an error of "scheduling while atomic" with the jffs2 fs? It looks to be in the gcd kernel thread. What's the exact fix? It's on 2.6.39 kernel. If preemption is not configured, the problem won't happen. Thanks in advance.

[72940.050000] BUG: scheduling while atomic: sync_supers/74/0x00000002
[72940.060000] 4 locks held by sync_supers/74:
[72940.060000]  #0:  (&type->s_umount_key#14){......}, at: [<c00ae9b0>] sync_supers+0x68/0xf0
[72940.070000]  #1:  (&type->s_lock_key){......}, at: [<c010e9a8>] jffs2_write_super+0x14/0x40
[72940.080000]  #2:  (&(&c->erase_completion_lock)->rlock){......}, at: [<c010baa0>] jffs2_garbage_collect_pass+0x1fc/0x790
[72940.090000]  #3:  (&c->alloc_sem){......}, at: [<c010baac>] jffs2_garbage_collect_pass+0x208/0x790
[72940.100000] Modules linked in:
[72940.110000] [<c0038e78>] (unwind_backtrace+0x0/0xe4) from [<c0214940>] (schedule+0x54/0x3c0)
[72940.210000] BUG: spinlock cpu recursion on CPU#0, rm/5927
[72940.210000]  lock: c306e560, .magic: dead4ead, .owner: sync_supers/74, .owner_cpu: 0
[72940.220000] [<c0038e78>] (unwind_backtrace+0x0/0xe4) from [<c0152d48>] (do_raw_spin_lock+0x60/0x120)
[72940.230000] [<c0152d48>] (do_raw_spin_lock+0x60/0x120) from [<c011016c>] (__jffs2_flush_wbuf+0xae8/0xc90)
[72940.240000] [<c011016c>] (__jffs2_flush_wbuf+0xae8/0xc90) from [<c0110514>] (jffs2_flash_writev+0x200/0x400)
[72940.250000] [<c0110514>] (jffs2_flash_writev+0x200/0x400) from [<c0107a14>] (jffs2_write_dirent+0x274/0x484)
[72940.260000] [<c0107a14>] (jffs2_write_dirent+0x274/0x484) from [<c0108c78>] (jffs2_do_unlink+0x1ec/0x3ac)
[72940.270000] [<c0108c78>] (jffs2_do_unlink+0x1ec/0x3ac) from [<c0101448>] (jffs2_unlink+0x44/0x84)
[72940.280000] [<c0101448>] (jffs2_unlink+0x44/0x84) from [<c00b6364>] (vfs_unlink+0x68/0xec)
[72940.290000] [<c00b6364>] (vfs_unlink+0x68/0xec) from [<c011a010>] (call_unlink+0x90/0xe0)
[72940.290000] [<c011a010>] (call_unlink+0x90/0xe0) from [<c011a090>] (vfsub_unlink+0x30/0x50)
[72940.300000] [<c011a090>] (vfsub_unlink+0x30/0x50) from [<c0129dac>] (aufs_unlink+0x154/0x22c)
[72940.310000] [<c0129dac>] (aufs_unlink+0x154/0x22c) from [<c00b6364>] (vfs_unlink+0x68/0xec)
[72940.320000] [<c00b6364>] (vfs_unlink+0x68/0xec) from [<c00b8c84>] (do_unlinkat+0xac/0x158)
[72940.330000] [<c00b8c84>] (do_unlinkat+0xac/0x158) from [<c00334e0>] (ret_fast_syscall+0x0/0x2c)

[72945.070000] BUG: spinlock lockup on CPU#0, rm/5927, c306e560
[72945.080000] [<c0038e78>] (unwind_backtrace+0x0/0xe4) from [<c0152dc8>] (do_raw_spin_lock+0xe0/0x120)
[72945.090000] [<c0152dc8>] (do_raw_spin_lock+0xe0/0x120) from [<c011016c>] (__jffs2_flush_wbuf+0xae8/0xc90)
[72945.100000] [<c011016c>] (__jffs2_flush_wbuf+0xae8/0xc90) from [<c0110514>] (jffs2_flash_writev+0x200/0x400)
[72945.110000] [<c0110514>] (jffs2_flash_writev+0x200/0x400) from [<c0107a14>] (jffs2_write_dirent+0x274/0x484)
[72945.120000] [<c0107a14>] (jffs2_write_dirent+0x274/0x484) from [<c0108c78>] (jffs2_do_unlink+0x1ec/0x3ac)
[72945.130000] [<c0108c78>] (jffs2_do_unlink+0x1ec/0x3ac) from [<c0101448>] (jffs2_unlink+0x44/0x84)
[72945.140000] [<c0101448>] (jffs2_unlink+0x44/0x84) from [<c00b6364>] (vfs_unlink+0x68/0xec)
[72945.150000] [<c00b6364>] (vfs_unlink+0x68/0xec) from [<c011a010>] (call_unlink+0x90/0xe0)
[72945.150000] [<c011a010>] (call_unlink+0x90/0xe0) from [<c011a090>] (vfsub_unlink+0x30/0x50)
[72945.160000] [<c011a090>] (vfsub_unlink+0x30/0x50) from [<c0129dac>] (aufs_unlink+0x154/0x22c)
[72945.170000] [<c0129dac>] (aufs_unlink+0x154/0x22c) from [<c00b6364>] (vfs_unlink+0x68/0xec)
[72945.180000] [<c00b6364>] (vfs_unlink+0x68/0xec) from [<c00b8c84>] (do_unlinkat+0xac/0x158)
[72945.190000] [<c00b8c84>] (do_unlinkat+0xac/0x158) from [<c00334e0>] (ret_fast_syscall+0x0/0x2c)

[73016.460000] BUG: soft lockup - CPU#0 stuck for 63s! [rm:5927]
[73016.460000] Modules linked in:
[73016.460000] 
[73016.460000] Pid: 5927, comm:                   rm
[73016.460000] CPU: 0    Tainted: G        W    (2.6.39.2 #506)
[73016.460000] PC is at do_raw_spin_lock+0x90/0x120
[73016.460000] LR is at do_raw_spin_lock+0xac/0x120
[73016.460000] pc : [<c0152d78>]    lr : [<c0152d94>]    psr: 80000013
[73016.460000] sp : c36f9cb0  ip : c00334e0  fp : 00da47f0
[73016.460000] r10: 00000028  r9 : c3701384  r8 : 03144800
[73016.460000] r7 : 00000000  r6 : c36f8000  r5 : 001b16c7  r4 : c306e560
[73016.460000] r3 : 00000000  r2 : 00000000  r1 : 00000001  r0 : 00000001
[73016.460000] Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[73016.460000] Control: 0005317f  Table: 82b90000  DAC: 00000015
[73016.460000] Kernel panic - not syncing: softlockup: hung tasks
[73016.460000] [<c0038e78>] (unwind_backtrace+0x0/0xe4) from [<c0048740>] (panic+0x5c/0x19c)
[73016.460000] [<c0048740>] (panic+0x5c/0x19c) from [<c0078420>] (watchdog_timer_fn+0x11c/0x15c)
[73016.460000] [<c0078420>] (watchdog_timer_fn+0x11c/0x15c) from [<c006467c>] (hrtimer_run_queues+0x11c/0x1e8)
[73016.460000] [<c006467c>] (hrtimer_run_queues+0x11c/0x1e8) from [<c00538dc>] (run_local_timers+0xc/0x1c)
[73016.460000] [<c00538dc>] (run_local_timers+0xc/0x1c) from [<c0053b30>] (update_process_times+0x2c/0x4c)
[73016.460000] [<c0053b30>] (update_process_times+0x2c/0x4c) from [<c006bd70>] (tick_handle_periodic+0x18/0x84)
[73016.460000] [<c006bd70>] (tick_handle_periodic+0x18/0x84) from [<c003d890>] (lpc32xx_timer_interrupt+0x7c/0xa0)
[73016.460000] [<c003d890>] (lpc32xx_timer_interrupt+0x7c/0xa0) from [<c0078780>] (handle_irq_event_percpu+0x2c/0x19c)
[73016.460000] [<c0078780>] (handle_irq_event_percpu+0x2c/0x19c) from [<c007892c>] (handle_irq_event+0x3c/0x5c)
[73016.460000] [<c007892c>] (handle_irq_event+0x3c/0x5c) from [<c007aadc>] (handle_level_irq+0xbc/0xf0)
[73016.460000] [<c007aadc>] (handle_level_irq+0xbc/0xf0) from [<c0028078>] (asm_do_IRQ+0x78/0xa0)
[73016.460000] [<c0028078>] (asm_do_IRQ+0x78/0xa0) from [<c00330e8>] (__irq_svc+0x48/0x90)
[73016.460000] Exception stack(0xc36f9c68 to 0xc36f9cb0)
[73016.460000] 9c60:                   00000001 00000001 00000000 00000000 c306e560 001b16c7
[73016.460000] 9c80: c36f8000 00000000 03144800 c3701384 00000028 00da47f0 c00334e0 c36f9cb0
[73016.460000] 9ca0: c0152d94 c0152d78 80000013 ffffffff
[73016.460000] [<c00330e8>] (__irq_svc+0x48/0x90) from [<c0152d78>] (do_raw_spin_lock+0x90/0x120)
[73016.460000] [<c0152d78>] (do_raw_spin_lock+0x90/0x120) from [<c011016c>] (__jffs2_flush_wbuf+0xae8/0xc90)
[73016.460000] [<c011016c>] (__jffs2_flush_wbuf+0xae8/0xc90) from [<c0110514>] (jffs2_flash_writev+0x200/0x400)
[73016.460000] [<c0110514>] (jffs2_flash_writev+0x200/0x400) from [<c0107a14>] (jffs2_write_dirent+0x274/0x484)
[73016.460000] [<c0107a14>] (jffs2_write_dirent+0x274/0x484) from [<c0108c78>] (jffs2_do_unlink+0x1ec/0x3ac)
[73016.460000] [<c0108c78>] (jffs2_do_unlink+0x1ec/0x3ac) from [<c0101448>] (jffs2_unlink+0x44/0x84)
[73016.460000] [<c0101448>] (jffs2_unlink+0x44/0x84) from [<c00b6364>] (vfs_unlink+0x68/0xec)
[73016.460000] [<c00b6364>] (vfs_unlink+0x68/0xec) from [<c011a010>] (call_unlink+0x90/0xe0)
[73016.460000] [<c011a010>] (call_unlink+0x90/0xe0) from [<c011a090>] (vfsub_unlink+0x30/0x50)
[73016.460000] [<c011a090>] (vfsub_unlink+0x30/0x50) from [<c0129dac>] (aufs_unlink+0x154/0x22c)
[73016.460000] [<c0129dac>] (aufs_unlink+0x154/0x22c) from [<c00b6364>] (vfs_unlink+0x68/0xec)
[73016.460000] [<c00b6364>] (vfs_unlink+0x68/0xec) from [<c00b8c84>] (do_unlinkat+0xac/0x158)
[73016.460000] [<c00b8c84>] (do_unlinkat+0xac/0x158) from [<c00334e0>] (ret_fast_syscall+0x0/0x2c)

A note: Another trace without enabling debug in kernel shows:

BUG: scheduling while atomic: jffs2_gcd_mtd7/288/0x00000002
[<c002e4e4>] (unwind_backtrace+0x0/0xe4) from [<c0200a64>] (schedule+0x54/0x3ec)
...
BUG: scheduling while atomic: sync_supers/72/0x00000002
[<c002e4e4>] (unwind_backtrace+0x0/0xe4) from [<c0200a64>] (schedule+0x54/0x3ec)

Solution

The sem is acquired when holding spinlock. Once the order is swapped, the problem goes away. The problem has been fixed in late kernels.

#2: (&(&c->erase_completion_lock)->rlock), jffs2_garbage_collect_pass+0x1fc/0x790
#3: (&c->alloc_sem), jffs2_garbage_collect_pass+0x208/0x790

The code is in fs/jffs2/gc.c. Swap line 228 and 229 to solve the problem.

227         D1(printk(KERN_DEBUG "No progress from erasing blocks; doing GC     anyway\n"));
228         spin_lock(&c->erase_completion_lock);
229         mutex_lock(&c->alloc_sem);
230 }

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow