Question

I've been looking for the answer for how to use BSWAP for lower 32-bit sub-register of 64-bit register. For example, 0x0123456789abcdef is inside RAX register, and I want to change it to 0x01234567efcdab89 with a single instruction (because of performance).

So I tried following inline function:

#define BSWAP(T) {  \
    __asm__ __volatile__ (  \
            "bswap %k0" \
            : "=q" (T)  \
            : "q" (T)); \
}

And the result was 0x00000000efcdab89. I don't understand why the compiler acts like this. Does anybody know the efficient solution?

Was it helpful?

Solution

Ah, yes, I understand the problem now:

the x86-64 processors implicitly zero-extend the 32-bit registers to 64-bit when doing 32-bit operations (on %eax, %ebx, etc). This is to maintain compatibility with legacy code that expects 32-bit semantics for these registers, as I understand it.

So I'm afraid that there is no way to do ror on just the lower 32 bits of a 64-bit register. You'll have to do use a series of several instructions...

OTHER TIPS

Check the assembly output generated by gcc! Use the gcc -s flag to compile the code and generate asm output.

IIRC, x86-64 uses 32-bit integers by default when not explicitly directed to do otherwise, so this may be (part of) the problem.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top