The mistake here is that one has to be careful when using movupd
. With this instruction, you actually copy 128 bit of memory, in and out.
By chance the first function can copy these values out too, but the second one, has got only 64 bit space in ret
variable. As expected this corrupts stack, yields to undefined behaviour?
Substituting movupd
with movlpd
(or movhpd
), things work a charm.
Am I still clobbering the right registers?
Following code works just fine when compiled with g++ -O3 -o asm_test asm_test.cpp
void my_func(const double *in, double *out) {
asm ("mov %0, %%r8" : : "r"(in));
asm ("movhpd (%%r8), %%xmm0" :);
asm ("movhpd (%%r8), %%xmm1" :);
asm ("addpd %%xmm1, %%xmm0" :);
asm ("movhpd %%xmm0, (%0)" : : "r"(out) : "memory", "%r8", "%xmm0", "%xmm1");
}
double my_func2(const double *in) {
double ret;
asm("mov %0, %%r8" : : "r"(in));
asm("movlpd (%%r8), %%xmm0" :);
asm("movlpd (%%r8), %%xmm1" :);
asm("addpd %%xmm1, %%xmm0" :);
asm("movlpd %%xmm0, %0" : "=m"(ret) : : "memory", "%r8", "%xmm0", "%xmm1");
return ret;
}