Question

I don't understand how I differentiate between vbit, vbsl and vbif with neon intrinsics. I need to do the vbit operation but if I use the vbslq instruction from the intrinsics I don't get what I want.

For example I have a source vector like this:

uint8x16_t source = 39 62 9b 52 34 5b 47 48 47 35 0 0 0 0 0 0

The destination vector is:

uint8x16_t destination = 0 0 0 0 0 0 0 0 0 0 0 0 c3 c8 c5 d5

I would like to have as an output this:

39 62 9b 52 34 5b 47 48 47 35 0 0 c3 c8 c5 d5

meaning that I want to copy the first ten bytes from the source and leave the other 6 unchanged. I'm using this mask:

{0,0,0,0,0,0,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF};

What is the correct way to use the vbslq_u8?

Was it helpful?

Solution

The ARM documentation is not very clear, but it looks like you would need to use the intrinsic like this:

uint8x16_t src =  {0x39,0x62,0x9b,0x52,0x34,0x5b,0x47,0x48,
                   0x47,0x35,0x00,0x00,0x00,0x00,0x00,0x0};
uint8x16_t dest = {0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
                   0x00,0x00,0x00,0x00,0xc3,0xc8,0xc5,0xd5};
uint8x16_t mask = {0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff,
                   0xff,0xff,0x00,0x00,0x00,0x00,0x00,0x00};

dest = vbslq_u8(mask, src, dest);

Note that order of bytes in the mask needs to correspond with the order in the source/dest registers (they seem to be swapped in your question ?).

Also note that the first param to the intrinsic appears to be the selection mask, where a 1 bit selects the corresponding bit from the second param and a 0 bit selects the corresponding bit from the third param.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top