Question

I have a function written in 64 bit x86 assembly (AT&T syntax for gcc and GAS) which performs some SSE2 operations. I've checked the result by using gdb with disassembly and looking at the register values, so I know it's producing the correct result. After the retq instruction, I get a segementation fault. Since I'm new to assembly (and never took any classes on it), I'm guessing I'm not handling the function/main program interface correctly. The function takes in 2 pointers and an int and is expected to return a float. This is how I handle the inputs/output in my assembly function:

float foo(float *x,float *y,unsigned int s)
{
    __asm__ __volatile__(
    "movl   -0x14(%%rbp),%%ecx \n\t"   //ecx = s
    "movq   -0x8(%%rbp),%%rax \n\t"    //rax -> x
    "movq   -0x10(%%rbp),%%rdx \n\t"   //rdx -> y
    "subq   $4,%%rsp \n\t"             //function result
    #sse2 operations that end up with the answer in xmm4...
    "movss  %%xmm4,(%%rsp) \n\t"       //store result
    "flds   (%%rsp) \n\t"              //load function result
    "addq   $4,%%rsp \n\t"             //adjust stack
    "ret \n\t"
    :
    :"g"(s)
    :"%ecx","%rax","%rdx"
    );
}

And here is the line that seems to cause the segfault (which is the instruction right after ret in the disassembly):

0x00007fffffffe0d0 in ?? ()
=> 0x00007fffffffe0d0:  00 00   add    %al,(%rax)

I have no idea why it's adding the value in rax's low bit back to rax after executing my function, but it seems to be crashing things. Am I not allowed to use rax in my assembly function even though it's general purpose and I'm declaring it clobbered?

I'm not sure if you need to see this part, but this is how gcc expects to handle the function; I've included the disassembly of the line that calls my function :

    #asm dealing with function inputs
    callq  0x400520 <foo>
    movss  %xmm0,-0x48(%rbp)
    mov    -0x48(%rbp),%eax
    mov    %eax,-0x34(%rbp)

Which brings me to my second question, why is it arbitrarily moving the value in xmm0 to two places? Should I have had my function end up with a result in xmm0 or does this instead mean I should avoid using xmm0? I'm very confused and would appreciate any help. Thanks in advance for anyone who took time to read my noob post :)

Was it helpful?

Solution

Your problem is that inline assembly does not replace the function. Your function compiles to this:

_foo:
 push   %rbp              ; function prologue
 mov    %rsp,%rbp
 mov    %rdi,-0x8(%rbp)
 mov    %rsi,-0x10(%rbp)
 mov    %edx,-0x14(%rbp)
 mov    -0x14(%rbp),%eax
 mov    %eax,-0x1c(%rbp)

 mov    -0x14(%rbp),%ecx  ; your code
 mov    -0x8(%rbp),%rax
 mov    -0x10(%rbp),%rdx
 sub    $0x4,%rsp
 movss  %xmm4,(%rsp)
 flds   (%rsp)
 add    $0x4,%rsp
 retq                     ; your return

 movss  -0x18(%rbp),%xmm0 ; function epilogue
 pop    %rbp
 retq                     ; gcc's return

retq pops a value of the stack, and jumps to it. If everything goes right, it was a value pushed by callq. gcc generated a function prologue (the first two instructions above) including push %rbp. So when your retq runs, it pops rbp (a pointer to the stack), and jumps to it. This is probably causing a segmentation fault because the stack is not executable (it could also be because %rax is an invalid pointer, if for some reason your stack is executable). The values on the stack that it happened to point to are 00 00 (which show up a lot in memory, unsurprisingly) and coincidentally disassemble to add %al,(%rax).

Now, I'm new to SSE, and I've only used GCC inline assembly a handful of times, so I'm not sure if this is a working solution. You really shouldn't be looking at the stack, or returning, because different compilers will have different function prologues the relative location of the arguments on the stack by the time your code runs.

Try something like:

#include <stdio.h>

float foo(float *x,float *y,unsigned int s)
{
    float result;

    __asm__ __volatile__(
    "movss  (%%rax),%%xmm4 \n\t"       // xmm4 = *x
    "movss  (%%rdx),%%xmm5 \n\t"       // xmm5 = *y
    "addss  %%xmm5,%%xmm4  \n\t"       // xmm4 += xmm5

    "movss  %%xmm4,(%%rbx) \n\t"       // result = xmm4
    :
    :"c"(s), "a"(x), "d"(y), "b"(&result)  // ecx = s, eax = x, edx = y, ebx = &result
    :"memory", "cc"
    );

    return result;
}

int main() {
    float x = 1.0, y = 2.0;
    printf("%f", foo(&x, &y, 99));
    return 0;
}

All stack allocation, argument handling and returning is done in C. It also passes in a pointer for storing the float result.

This generates the following assembly, which is roughly what you were looking for:

_foo:
 push   %rbp              ; prologue
 mov    %rsp,%rbp
 push   %rbx

 lea    -0xc(%rbp),%rbx   ; set up registers
 mov    %edx,%ecx
 mov    %rdi,%rax
 mov    %rsi,%rdx

 movss  (%rax),%xmm4      ; your code
 movss  (%rdx),%xmm5
 addss  %xmm5,%xmm4
 movss  %xmm4,(%rbx)

 movss  -0xc(%rbp),%xmm0  ; retrieve result to xmm0 (the return register)

 pop    %rbx              ; epilogue
 pop    %rbp
 retq   

The other option is always to write it in an assembly file, and link that with your C code later.

I hope that was somewhat helpful, but I'm sorry if it didn't fully answer your question.

Edit: updated code to something that actually runs for me.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top