Question

I'm trying to reverse the order of a 128 bit vector (uint16x8).

For example if I have

a b c d e f g h

I would like to obtain

h g f e d c b a

Is there a simple way to do that with NEON intrinsics? I tried with the VREV but it doesn't work.

Was it helpful?

Solution

You want vrev64.16 instruction however it doesn't swap between double registers of a single quad register. You need to achieve that using an additional vswp.

For intrinsics

q = vrev64q_u16(q)

should do the trick for swapping inside double words, then you need to swap double words in quad register. However that gets cumbersome since there is no vswp intrinsics directly which forces you to use something like

q = vcombine_u16(vget_high_u16(q), vget_low_u16(q))

which actually ends up as a vswp instruction.

See below for an example.

#include <stdio.h>
#include <stdlib.h>
#include <arm_neon.h>

int main() {
    uint16_t s[] = {0x101, 0x102, 0x103, 0x104, 0x105, 0x106, 0x107, 0x108};
    uint16_t *t = malloc(sizeof(uint16_t) * 8);
    for (int i = 0; i < 8; i++) {
        t[i] = 0;
    }
    uint16x8_t a = vld1q_u16(s);
    a = vrev64q_u16(a);
    a = vcombine_u16(vget_high_u16(a), vget_low_u16(a));
    vst1q_u16(t, a);
    for (int i = 0; i < 8; i++) {
        printf("0x%3x ", t[i]);
    }
    printf("\n");
    return 0;
}

which generates an assembly like below

vld1.16 {d16-d17}, [sp:64]
movs    r4, #0
vrev64.16   q8, q8
vswp    d16, d17
vst1.16 {d16-d17}, [r5]

and outputs

$ rev
0x108 0x107 0x106 0x105 0x104 0x103 0x102 0x101
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top