Question

I have some SIMD code in Altivec processing 32 bit integer values in parallel. In some cases I want to load the integers as little endian, in other cases as big endian (note: this choice is regardless of the native CPU endianess; it is based on what algorithm is running). Doing the actual byte swap is very easy using Altivec's permute operations, as documented by Apple.

The part I'm worried about is that PowerPC allows either big or little endian operation, and so I don't know if I need to byte swap on little endian loads/stores or big endian loads/stores. (Currently my code just always does it for little endian and never swaps for big endian memory ops, which works fine on the 970 I'm currently using since of course it's running big-endian).

From what I can find, PPCs in little-endian mode are relatively rare, but they do exist, and ideally I'd like to have my code work correctly and quickly regardless of mode.

Is there a way of handling big and little endian loads to AltiVec registers regardless of CPU endianness? Are there other issues related to this I should know about? Wikipedia has the (uncited, naturally) statement:

"AltiVec operations, despite being 128-bit, are treated as if they were 64-bit. This allows for compatibility with little-endian motherboards that were designed prior to AltiVec."

which makes me think there may be other nastiness specific to AltiVec in little-endian mode.

Was it helpful?

Solution

Pretty much all PowerPC code out there will assume big-endian and all ARM code out there will assume little endian.

There are a few specialized cases where endian-swapping is used — apparently VirtualPC relied on little endian mode and thus initially didn't work on the G5 (which doesn't include it) — but I wouldn't worry about these.

ARM has a similar problem in big-endian mode: doubles are mixed-endian. The "pseudo-endianness" is achieved by XORing the low-order address bits with 0x2 (for halfword accesses) and 0x3 (for byte accesses) so that the effective order within a 32-bit word is swapped, but this breaks for 64-bit accesses. I suspect the same trick is used on PowerPC except done 64 bits at a time.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top