Question

I have asked a question for vclt_s8 comparation. Does anybody know how to use Neon intrinsics uint8x8_t vclt_s8 (int8x8_t, int8x8_t)

However, if we have such code:

if(a > b + c) {
    a = b + c;
} else if(a < b - c) {
    a = b - c;
}

How can I transform it to Neon intrinsics? It seems that we can not do 8 operator parallel operation in such case. Isn't it?

Était-ce utile?

La solution

Obviously you can't do branching with SIMD, so you have to look at how do implement this kind of logic in a branchless way, using masks. I'll just give pseudo code, so you get the general idea - coding this should be fairly straightforward:

bc = b + c       ; get `(b + c)` in a vector register
mask = a > bc    ; use compare instruction to generate mask (-1 = true, 0 = false)
bc = bc & mask   ; use bitwise AND to zero out elements of `(b + c)` which we do not want
a = a & ~mask    ; use bitwise ANDC to zero out elements of `a` which we do not want
a = a | bc       ; combine required elements into `a` using bitwise OR

bc = b - c       ; get `(b - c)` in a vector register
mask = a < bc    ; use compare instruction to generate mask (-1 = true, 0 = false)
bc = bc & mask   ; use bitwise AND to zero out elements of `(b - c)` which we do not want
a = a & ~mask    ; use bitwise ANDC to zero out elements of `a` which we do not want
a = a | bc       ; combine required elements into `a` using bitwise OR

Note that I've cheated a little here and omitted the else from your scalar code (assuming that the two branches are mutually exclusive) so what I've implemented is actually equivalent to:

if (a > b + c) {
    a = b + c;
}
if (a < b - c) {
    a = b - c;
}

If this is a bad assumption then you'll need to do some additional bitwise operations to implement the logical else.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top