Linux system call from kernel crashing (weird offset)

https://stackoverflow.com/questions/9758869

24-05-2021
|

Pergunta

I'm trying to call a system call from a kernel module, I have this code:

    set_fs( get_ds() );    // lets our module do the system-calls 


    // Save everything before systemcalling

    asm ("     push    %rax     "); 
    asm  ("     push    %rdi     "); 
    asm  ("     push    %rcx     "); 
    asm  ("     push    %rsi     "); 
    asm  ("     push    %rdx     "); 
    asm  ("     push    %r10     "); 
    asm  ("     push    %r8      "); 
    asm  ("     push    %r9      "); 
    asm  ("     push    %r11     "); 
    asm  ("     push    %r12     "); 
    asm  ("     push    %r15     "); 
    asm  ("     push    %rbp     "); 
    asm  ("     push    %rbx     "); 


    // Invoke the long sys_mknod(const char __user *filename, int mode, unsigned dev);

    asm volatile ("     movq    $133, %rax     "); // system call number

    asm volatile ("    lea    path(%rip), %rdi     "); // path is char path[] = ".."

    asm volatile ("     movq    mode, %rsi     "); // mode is S_IFCHR | ...

    asm volatile ("     movq    dev, %rdx     ");  // dev is 70 >> 8

    asm volatile ("     syscall     "); 


      // POP EVERYTHING 

    asm ("     pop     %rbx     "); 
    asm ("     pop        %rbp     "); 
    asm ("     pop     %r15     "); 
    asm ("     pop        %r12     "); 
    asm ("     pop        %r11     "); 
    asm ("     pop        %r9      "); 
    asm ("     pop        %r8      "); 
    asm ("     pop        %r10     "); 
    asm ("     pop        %rdx     "); 
    asm ("     pop        %rsi     "); 
    asm ("     pop        %rcx     "); 
    asm ("     pop        %rdi     "); 
    asm ("     pop     %rax     "); 



    set_fs( savedFS );    // restore the former address-limit value

This code isn't working and is crashing the system down (it's a kernel module).

The dump of that piece of code with relocation infos is:

  2c:    50                      push  %rax 
  2d:    57                      push  %rdi 
  2e:    51                      push  %rcx 
  2f:    56                      push  %rsi 
  30:    52                      push  %rdx 
  31:    41 52                    push  %r10 
  33:    41 50                    push  %r8 
  35:    41 51                    push  %r9 
  37:    41 53                    push  %r11 
  39:    41 54                    push  %r12 
  3b:    41 57                    push  %r15 
  3d:    55                      push  %rbp 
  3e:    53                      push  %rbx 
  3f:    48 c7 c0 85 00 00 00     mov    $0x85,%rax 
  46:    48 8d 3d 00 00 00 00     lea    0x0(%rip),%rdi        # 4d <init_module+0x4d> 
            49: R_X86_64_PC32    path-0x4 
  4d:    48 83 c7 04              add    $0x4,%rdi 
  51:    48 8b 34 25 00 00 00     mov    0x0,%rsi 
  58:    00 
            55: R_X86_64_32S    mode 
  59:    48 8b 14 25 00 00 00     mov    0x0,%rdx 
  60:    00 
            5d: R_X86_64_32S    dev 
  61:    0f 05                    syscall 
  63:    5b                      pop    %rbx 
  64:    5d                      pop    %rbp 
  65:    41 5f                    pop    %r15 
  67:    41 5c                    pop    %r12 
  69:    41 5b                    pop    %r11 
  6b:    41 59                    pop    %r9 
  6d:    41 58                    pop    %r8 
  6f:    41 5a                    pop    %r10 
  71:    5a                      pop    %rdx 
  72:    5e                      pop    %rsi 
  73:    59                      pop    %rcx 
  74:    5f                      pop    %rdi 
  75:    58                      pop    %rax

I'm wondering.. why is there a -0x4 offset in the 49: R_X86_64_PC32 path-0x4 ?

I mean: mode and dev should be resolved automatically without problems, but what about the path? Why the -0x4 offset?

I tried to "compensate it" with

lea 0x0(%rip),%rdi // this somehow adds a -0x4 offset add $0x4, %rdi ....

but the code still crashed.

Where am I getting wrong?

Solução

My guess as to what's going on here is a stack problem. Unlike int $0x80, the syscall instruction will not set up a stack for the kernel. If you look at the actual code from system_call:, you'll see something like SWAPGS_UNSAFE_STACK. The meat of this macro is the SwapGS instruction - see page 152 here. When kernel mode is entered, the kernel needs a way to pull a pointer to its data structures, and this instruction lets it do precisely this. It does so by swapping the user %gs register with a value saved in a model-specific register, from which it can then pull the kernel-mode stack.

You could imagine that once the syscall entry point is invoked, this swap is producing the wrong value since you were already in kernel mode, and the kernel starts trying to use a bogus stack. You might try invoking SwapGS manually,making the kernel's SwapGS result in what it expects, and see if that works.

Outras dicas

It seems that you can't do that in a such way. See the comment before system_call:

 /*
  * Register setup:
  * rax  system call number
  * rdi  arg0
  * rcx  return address for syscall/sysret, C arg3
  * rsi  arg1
  * rdx  arg2
  * r10  arg3    (--> moved to rcx for C)
  * r8   arg4
  * r9   arg5
  * r11  eflags for syscall/sysret, temporary for C
  * r12-r15,rbp,rbx saved by C code, not touched.
  *
  * Interrupts are off on entry.
  * Only called from user space.
  *
  * XXX  if we had a free scratch register we could save the RSP into the stack frame
  *      and report it properly in ps. Unfortunately we haven't.
  *
  * When user can change the frames always force IRET. That is because
  * it deals with uncanonical addresses better. SYSRET has trouble
  * with them due to bugs in both AMD and Intel CPUs.
  */

So, you can't call the syscall from the kernel. But you can try to use int $0x80 for that purposes. As I see it kernel_execve stub uses that trick

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow