Yes, RAX
(actually AL
) should hold the number of XMM
registers used.
Your stack alignment code is overcomplicated, normally you just do AND rsp, -16
. Also, stack alignment is typically only done once (usually at the start of main
) and then it is maintained by always adjusting rsp
appropriately.
The SYSV ABI doesn't use shadow space (that's microsoft convention) instead it uses a "red zone", but that's not affecting the calling sequence.
Update about stack alignment:
In functions that already get aligned RSP
(generally everything except main
), you just make sure any called functions in turn get RSP
that's changed by a multiple of 16.
If you are using a standard frame pointer, then your functions start with a PUSH RBP
so then you only have to make sure you allocate space in multiples of 16 (if needed), like so:
push rbp
mov rbp, rsp
sub rsp, n*16
...
mov rsp, rbp
pop rbp
ret
Otherwise, you'll have to compensate for the 8 bytes of RIP
put on the stack (as you correctly pointed that out in your comment):
sub rsp, n*16+8
...
add rsp, n*16+8
ret
Both of the above apply only if you call other functions, that is in leaf functions you can do whatever you want. In addition, the red zone I mentioned earlier is useful in leaf functions, because you can use 128 bytes under the stack pointer without explicit allocation, meaning you don't have to adjust RSP
at all:
; in leaf functions you can use memory under the stack pointer
; (128 byte red zone)
mov [rsp-8], rax