Question

I'm trying to optimize the following C macro:

rotate(v0, v1) a0 = v0, b0 = v1, v0 = a0*c - b0*s, v1 = a0*s + b0*c

where all variables are doubles for the Cortex-A8 processor.

The inline assembly looks the following:

            __asm__ __volatile__("vmul.f64 %[v0], %[a0], %[c];\n\t"
                                 "vmul.f64 %[v1], %[a0], %[s];\n\t"
                                 "vmls.f64 %[v0], %[b0], %[s];\n\t"
                                 "vmla.f64 %[v1], %[b0], %[c];\n\t"
                                 :[v0]"=w"(v0), [v1]"=w"(v1)
                                 :[s]"w"(s), [c]"w"(c),
                                  [a0]"w"(v0), [b0]"w"(v1)
                                 :);

Generated assembly looks the following way:

@ InlineAsm Start
vmul.f64 d13, d13, d9;
vmul.f64 d12, d13, d8;
vmls.f64 d13, d12, d8;
vmla.f64 d12, d12, d9;
@ InlineAsm End

As you can see, the compiler uses only 4 registers instead of 6 that are necessary for getting the correct result.

How can I say to the compiler that I need 6 registers?

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top