Domanda

            We are facing an issue on which we need some help.

Brief write-up :

            We have enabled SMP in Linux 2.6.39.4 kernel and cross compiled it for PPC-476. After booting, kernel is able to map both the processors (2 cores at h/w). The problem we are facing is, while running modprobe command repeatedly,  one of the cpu goes into stall state. We have tried to dump stack of all active cpus (using sysrq) while one of the cpu is in stall state. The stack dump showed both the processors were executing same process (with same PID) i.e. modprobe.

Question : 1. is it possible for both the processors to be executing same process at a movement with same PID. 2. Is the execution of both process simultaneously give rise to some race condition which is causing either of CPU to go in stall state.

LOGS===============================================

SysRq : Show backtrace of all active CPUs
CPU0:
NIP: 701786c4 LR: 701752f0 CTR: 00000004
REGS: 9fb4fdc0 TRAP: 0501   Not tainted  (2.6.39.4)
MSR: 00029000 <EE,ME,CE>  CR: 44002048  XER: 00000000
TASK = 8f868ae0[827] 'modprobe' THREAD: 9fb48000 CPU: 0
GPR00: 08101820 9fb4fe70 8f868ae0 22222222 0002374d 00000002 00849ffc 00000000
GPR08: a2bb3a8c 00000810 a2c2aac4 00000148 44002088
NIP [701786c4] __sw_hweight32+0x50/0x58
LR [701752f0] __bitmap_weight+0x54/0xc0
Call Trace:
[9fb4fe70] [20000000] 0x20000000 (unreliable)
[9fb4fe90] [7006b1cc] sys_init_module+0x11a8/0x1ca4
[9fb4ff40] [7000f1b8] ret_from_syscall+0x0/0x3c
--- Exception: c01 at 0x10050e38
    LR = 0x100a6708
Instruction dump:
7c634838 7c004838 7c001a14 5409e13e 7c090214 3d200f0f 61290f0f 7c004838
5409c23e 7c090214 5409843e 7c090214 <5403063e> 4e800020 5460f87e 70005555
CPU1:
NIP: 700a9678 LR: 700a964c CTR: 7012b0f8
REGS: 9fb4fe30 TRAP: 0501   Not tainted  (2.6.39.4)
MSR: 00029000 <EE,ME,CE>  CR: 80008022  XER: 20000000
TASK = 8f868ae0[827] 'modprobe' THREAD: 9fb48000 CPU: 1
GPR00: 00000000 9fb4fee0 8f868ae0 9efb9f20 9efb9f20 9fb4fee8 1004d6d0 0002d000
GPR08: 9efb9d68 9f8eee00 9efb9f20 00000000 20002022
NIP [700a9678] do_munmap+0x114/0x314
LR [700a964c] do_munmap+0xe8/0x314
Call Trace:
[9fb4fee0] [00000014] 0x14 (unreliable)
[9fb4ff20] [700aa7c8] sys_munmap+0x44/0x74
[9fb4ff40] [7000f1b8] ret_from_syscall+0x0/0x3c
--- Exception: c01 at 0x100510a8
    LR = 0x10022088
Instruction dump:
4bffe461 7c641b79 41820010 80040004 7f9d0040 419d01d4 83210008 2e190000
41920200 83d9000c 801f0074 2f800000 <419e00f8> 2f9e0000 419e00f0 801e0004
Call Trace:
[9ffaff00] [70008654] show_stack+0x6c/0x1a4 (unreliable)
[9ffaff40] [70192d30] showacpu+0x84/0xcc
[9ffaff60] [700679bc] generic_smp_call_function_single_interrupt+0x100/0x18c
[9ffaff90] [7000ff6c] call_function_single_action+0x10/0x24
[9ffaffa0] [7006f7f4] handle_irq_event_percpu+0xa0/0x21c
[9ffaffe0] [700728d8] handle_percpu_irq+0x88/0xb8
[9ffafff0] [7000e038] call_handle_irq+0x18/0x28
[9fb4fdf0] [700044b0] do_IRQ+0xe8/0x1a0
[9fb4fe20] [7000f81c] ret_from_except+0x0/0x18
--- Exception: 501 at do_munmap+0x114/0x314
    LR = do_munmap+0xe8/0x314
[9fb4fee0] [00000014] 0x14 (unreliable)
[9fb4ff20] [700aa7c8] sys_munmap+0x44/0x74
[9fb4ff40] [7000f1b8] ret_from_syscall+0x0/0x3c
--- Exception: c01 at 0x100510a8

=============================================================

È stato utile?

Soluzione

Both questions of yours are False. There is no possibility that two cores run the same task at the same moment.

From the OOPS/panic trace, we can know that kernel panic on CPU 0 first, then it triggered the SysRq to "Show backtrace of all active CPUs". That is to say, CPU 1 works good at the time CPU 0 panic, but CPU 0 exception triggered to dump the backtrace of CPU 1.

Now let us analysis why CPU 0 panic: from the backtrace, it happened during "modprobe", so please analysis the kernel modules on your system to locate the one that trigger the panic.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top