Question

I think this a silly problem but i tried for a day to resolve this with not luck, so here is.

i have register of four vectors (float32x4), and i want to make some process on some of them and the other i want to set it on 0's.

For example this problem in c:

for (int i=1; i<=4; i++)
{
    float b = 4/i;
    if(b<=3)
        result += process(b);
}

so the first one will not process but the other will, so i need a register where the firs lane i have 0's and the other one have the result.

But i don't know how to do this on neon intrinsics.

i know that there is a vcltq_f32 but i tried with this one and but with no result.

Was it helpful?

Solution

Like this:

const float32x4_t vector_3 = vdupq_n_f32(3.0f);
uint32x4_t mask = vcleq_f32(vector_b, vector_3);
vector_b = (float32x4_t)vandq_u32((uint32x4_t)vector_b, mask);

OTHER TIPS

I don't know much about Neon but in most SIMD architectures you would do this by comparing and masking (bitwise AND). You use a compare instruction which then generates a mask which you can typically use for this.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top