質問

I'm trying to figure out how to generate a conditional Store in ARM neon. What I would like to do is the equivalent of this SSE instruction:

void _mm_maskmoveu_si128(__m128i d, __m128i n, char *p);

which Conditionally stores byte elements of d to address p.The high bit of each byte in the selector n determines whether the corresponding byte in d will be stored.

Any suggestion on how to do it with NEON intrinsics? Thank you

This is what I did:

int8x16_t store_mask = {0,0,0,0,0,0,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff};

int8x16_t tmp_dest = vld1q_u8((int8_t*)p_dest);
vbslq_u8(source,tmp_dest,store_mask);
vst1q_u8((int8_t*)p_dest,tmp_dest);
役に立ちましたか?

解決

Assuming vectors of 16 x 1 byte elements, you would set up a mask vector where each element is either all 0s (0x00) or all 1s (0xff) to determine whether the element should be stored on not. Then you need to do the following (pseudo code):

 init mask vector = 0x00/0xff in each element
 init source vector = data to be selectively stored
 load dest vector from dest location
 apply `vbslq_u8` (`vbit` instruction) with dest vector, source vector and mask vector
 store dest vector back to dest location
ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top