EDIT: found a solution - skip to see below...
First, ensure that stack (%rsp
) is 16-byte aligned:
pushq %rbp
movq %rsp, %rbp
andq $-0x10, %rsp ; rsp = rsp & 0xffffffffffffff0
This is problematic, as it's normally the caller's responsibility to ensure %rsp
is 16-byte aligned, as %rbp + 16.n
might not be on a 16-byte boundary. So perhaps movq %rsp, %rbp
should appear after the alignment of %rsp
.
sub $0xf0, %rsp
allocates 0xf0 byte of stack space; 0xf0 being a multiple of (16). If %rsp
is not 16-byte aligned, movaps %xmm7, -0xd0(%rbp)
=> movaps %xmm7, 0x20(%rsp)
. In other words, the SSE register is stored at %rsp + 32
. If not aligned, this raises a 'general protection exception', i.e., a segfault.
Another issue you might encounter are reads/writes to -0x170(%rbp)
=> that's -0x80(%rsp)
, which is either on (or past? I might be out in my offsets) the boundary of the red zone. As this is a leaf function, you are free to use it, but not write past it.
Note: if your function was called, you should subtract another (8) bytes from %rsp
to ensure 16-byte alignment. This will in turn affect the offsets for (%ebp)
.
I'm not fact checking with the ABI standard here, I may have made some mistakes; so it might be best to check with the x86-64 SysV ABI (section 3.2).
SOLUTION: compiling the function above with the -mstackrealign
flag explicitly aligns %rsp
to a 16-byte boundary. I'm using clang on OS X, which is basically the same as x86-64 SysV (x86-64 ELF / Linux) with respect to calling conventions and alignment requirements:
clang -nostdlib -mstackrealign -c crt1.c
0000000000000000 pushq %rbp
0000000000000001 movq %rsp, %rbp
0000000000000004 andq $0xfffffffffffffff0, %rsp
000000000000000b subq $0x170, %rsp
0000000000000012 testb %al, %al
...
BTW - this avoids the %rbp
issue entirely by making all loads/stores relative to %rsp
. Consequently, there is no use of the red zone - at least with Apple's LLVM 3.3 based clang.