Far call into __USER32_CS from 64-bit code on Linux

https://stackoverflow.com/questions/18272384

24-06-2022
|

Question

Recently I realized that you can do this in 64-bit code:

  const size_t kLowStackSize = 1024UL * 1024UL * 4UL;
  void *low_stack = mmap(NULL, kLowStackSize, PROT_READ | PROT_WRITE,
      MAP_PRIVATE | MAP_ANONYMOUS | MAP_32BIT, -1, 0);
  struct __attribute__((packed, aligned(16))) {
    int32_t address;
    int16_t segment;
  } target = {(uint32_t) (uint64_t) code, 0x23};
  asm volatile(
      "mov %%rsp, %%r8\n"
      "mov %[stack], %%rsp\n"
      "push %%r8\n"
      "lcall *(%[target])\n"
      "pop %%rsp"
      :
      : [stack] "r" (low_stack + kLowStackSize), [target] "r" (&target)
      : "r8");

where code points to a piece of 32-bit code located on an executable page in the lower 4GiB of the address space, and 0x23 is the value of the __USER32_CS segment selector in Linux's x86 headers. I don't know whether the attributes are necessary for the jump target, but I added the for good measure. Of course, to make the far return possible this calling code itself must be located somewhere in the lower 4 GiB of the virtual address space. I found that placing it into main is sufficient.

I understand this is mostly useless (there are no 32-bit libraries loaded, the calling conventions are different, etc.) and prone to breakage (the value of __USER32_CS is not part of Linux's userspace-facing API).

My question: Is there a simple way to demonstrate the the target of the call is indeed executed in 32-bit mode? Are the any practical uses (existing software of libraries leveraging it, or at least not-so-impractical possibilites) for this kind of call?

Solution

In x86, the 32bit and 64bit instruction encodings are mostly identical.

The big exception to that are the 16 single-byte INC and DEC instruction opcodes. These 16 bytes, in 64bit mode, have been repurposed into the REX prefix family, which allows to specify 64bit operand size as well as the usage of the new registers in 64bit mode.

This means 64bit code like:

    xorl %eax, %eax
    .byte 0x48, 0xff, 0xc8
; this is the same as:
;   decq %rax         ; opcode: 0x48 0xff 0xc8
    lret $0

is valid 32bit code but will there be executed as:

    xorl %eax, %eax
    decl %eax         ; opcode: 0x48
    decl %eax         ; opcode: 0xff 0xc8
    lret $0

So you can ljmp to this piece of code, and test the (32bit) return value; it'll be -1 if executed in 64bit mode but -2 if executed in 32bit mode.

I do not know what the preconditions for far returns from 32bit to 64bit mode are. I suspect you might have to set up both a "low mem" 64bit stack pointer to start with as well as a low-mem 64bit code address "trampoline" (so that both the return EIP and return ESP in the far call frame are 32bit values).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow