Question

I have asked a question for vclt_s8 comparation. Does anybody know how to use Neon intrinsics uint8x8_t vclt_s8 (int8x8_t, int8x8_t)

However, if we have such code:

if(a > b + c) {
    a = b + c;
} else if(a < b - c) {
    a = b - c;
}

How can I transform it to Neon intrinsics? It seems that we can not do 8 operator parallel operation in such case. Isn't it?

Was it helpful?

Solution

Obviously you can't do branching with SIMD, so you have to look at how do implement this kind of logic in a branchless way, using masks. I'll just give pseudo code, so you get the general idea - coding this should be fairly straightforward:

bc = b + c       ; get `(b + c)` in a vector register
mask = a > bc    ; use compare instruction to generate mask (-1 = true, 0 = false)
bc = bc & mask   ; use bitwise AND to zero out elements of `(b + c)` which we do not want
a = a & ~mask    ; use bitwise ANDC to zero out elements of `a` which we do not want
a = a | bc       ; combine required elements into `a` using bitwise OR

bc = b - c       ; get `(b - c)` in a vector register
mask = a < bc    ; use compare instruction to generate mask (-1 = true, 0 = false)
bc = bc & mask   ; use bitwise AND to zero out elements of `(b - c)` which we do not want
a = a & ~mask    ; use bitwise ANDC to zero out elements of `a` which we do not want
a = a | bc       ; combine required elements into `a` using bitwise OR

Note that I've cheated a little here and omitted the else from your scalar code (assuming that the two branches are mutually exclusive) so what I've implemented is actually equivalent to:

if (a > b + c) {
    a = b + c;
}
if (a < b - c) {
    a = b - c;
}

If this is a bad assumption then you'll need to do some additional bitwise operations to implement the logical else.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top