Linux system call from kernel crashing (weird offset)
-
24-05-2021 - |
Pergunta
I'm trying to call a system call from a kernel module, I have this code:
set_fs( get_ds() ); // lets our module do the system-calls
// Save everything before systemcalling
asm (" push %rax ");
asm (" push %rdi ");
asm (" push %rcx ");
asm (" push %rsi ");
asm (" push %rdx ");
asm (" push %r10 ");
asm (" push %r8 ");
asm (" push %r9 ");
asm (" push %r11 ");
asm (" push %r12 ");
asm (" push %r15 ");
asm (" push %rbp ");
asm (" push %rbx ");
// Invoke the long sys_mknod(const char __user *filename, int mode, unsigned dev);
asm volatile (" movq $133, %rax "); // system call number
asm volatile (" lea path(%rip), %rdi "); // path is char path[] = ".."
asm volatile (" movq mode, %rsi "); // mode is S_IFCHR | ...
asm volatile (" movq dev, %rdx "); // dev is 70 >> 8
asm volatile (" syscall ");
// POP EVERYTHING
asm (" pop %rbx ");
asm (" pop %rbp ");
asm (" pop %r15 ");
asm (" pop %r12 ");
asm (" pop %r11 ");
asm (" pop %r9 ");
asm (" pop %r8 ");
asm (" pop %r10 ");
asm (" pop %rdx ");
asm (" pop %rsi ");
asm (" pop %rcx ");
asm (" pop %rdi ");
asm (" pop %rax ");
set_fs( savedFS ); // restore the former address-limit value
This code isn't working and is crashing the system down (it's a kernel module).
The dump of that piece of code with relocation infos is:
2c: 50 push %rax
2d: 57 push %rdi
2e: 51 push %rcx
2f: 56 push %rsi
30: 52 push %rdx
31: 41 52 push %r10
33: 41 50 push %r8
35: 41 51 push %r9
37: 41 53 push %r11
39: 41 54 push %r12
3b: 41 57 push %r15
3d: 55 push %rbp
3e: 53 push %rbx
3f: 48 c7 c0 85 00 00 00 mov $0x85,%rax
46: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # 4d <init_module+0x4d>
49: R_X86_64_PC32 path-0x4
4d: 48 83 c7 04 add $0x4,%rdi
51: 48 8b 34 25 00 00 00 mov 0x0,%rsi
58: 00
55: R_X86_64_32S mode
59: 48 8b 14 25 00 00 00 mov 0x0,%rdx
60: 00
5d: R_X86_64_32S dev
61: 0f 05 syscall
63: 5b pop %rbx
64: 5d pop %rbp
65: 41 5f pop %r15
67: 41 5c pop %r12
69: 41 5b pop %r11
6b: 41 59 pop %r9
6d: 41 58 pop %r8
6f: 41 5a pop %r10
71: 5a pop %rdx
72: 5e pop %rsi
73: 59 pop %rcx
74: 5f pop %rdi
75: 58 pop %rax
I'm wondering.. why is there a -0x4 offset in the 49: R_X86_64_PC32 path-0x4 ?
I mean: mode and dev should be resolved automatically without problems, but what about the path? Why the -0x4 offset?
I tried to "compensate it" with
lea 0x0(%rip),%rdi // this somehow adds a -0x4 offset add $0x4, %rdi ....
but the code still crashed.
Where am I getting wrong?
Solução
My guess as to what's going on here is a stack problem. Unlike int $0x80
, the syscall
instruction will not set up a stack for the kernel. If you look at the actual code from system_call:
, you'll see something like SWAPGS_UNSAFE_STACK
. The meat of this macro is the SwapGS instruction - see page 152 here. When kernel mode is entered, the kernel needs a way to pull a pointer to its data structures, and this instruction lets it do precisely this. It does so by swapping the user %gs
register with a value saved in a model-specific register, from which it can then pull the kernel-mode stack.
You could imagine that once the syscall
entry point is invoked, this swap is producing the wrong value since you were already in kernel mode, and the kernel starts trying to use a bogus stack. You might try invoking SwapGS manually,making the kernel's SwapGS result in what it expects, and see if that works.
Outras dicas
It seems that you can't do that in a such way. See the comment before system_call
:
/*
* Register setup:
* rax system call number
* rdi arg0
* rcx return address for syscall/sysret, C arg3
* rsi arg1
* rdx arg2
* r10 arg3 (--> moved to rcx for C)
* r8 arg4
* r9 arg5
* r11 eflags for syscall/sysret, temporary for C
* r12-r15,rbp,rbx saved by C code, not touched.
*
* Interrupts are off on entry.
* Only called from user space.
*
* XXX if we had a free scratch register we could save the RSP into the stack frame
* and report it properly in ps. Unfortunately we haven't.
*
* When user can change the frames always force IRET. That is because
* it deals with uncanonical addresses better. SYSRET has trouble
* with them due to bugs in both AMD and Intel CPUs.
*/
So, you can't call the syscall
from the kernel. But you can try to use int $0x80
for that purposes. As I see it kernel_execve
stub uses that trick