Question

val = ( val < 0 ) ? 0 : val;

I want an instruction for the above . (i.e) if val is less dan 0 then will have value of '0' and if val is greter than 0 then 'val' will be the result. Are there any set of neon instructions which will execute the above??

Was it helpful?

Solution 2

Assuming you are dealing with 16bit signed data , d0 containing the values :

vshr.s16 d1, d0, #15
vbic.16  d0, d0, d1

that will do the trick.

Alternately, you could resort to :

vshll.s16 q0, d0, #16
vqshrun.s32 d0, q0, #16

or even :

vmovl.s16 q0, d0
vqmovun.s32 d0, q0

Even if you are dealing with float data, you can handle them just like s32 ones for this purpose :

vshr.s32 d1, d0, #31
vbic.32 d0, d0, d1

you know, the MSB is the sign bit on float as well as on int, and 0.0f is nothing else tha n 0x00000000.

plain and simple.

Edit :

People seem to be confused by the bit operations in my code above. Here is the explanation :

int MinusIsZero(int n)
{
  if (n < 0) n = 0;
  return n;
}

As you can see, it's quite a simple function doing what the OP wanted.

However, such a simple 'if' statement is a real pain for SIMD due to its vector nature.

Fortunately, it's very well doable with ALU instructions without an 'if'.

int MinusIsZero(n)
{
  int mask;
  mask = (n>>31);
  n &= ~mask;
  return n;
}

First things first : if you right shift a signed int32 by 31 bits, the result can only be 0x00000000(if positive) or 0xffffffff(if negative).

If n was positive, n & ~0x00000000 would result in n.

If n was negative, n & ~0xffffffff would result in 0.

Exactly what the OP wanted.

Beside it's by far the most efficient method on SIMD units like NEON, (ALU instructions are the fastest) it's also a very desirable one even on integer cores, because it doesn't corrupt the CPSR.

Corrupting the CPSR unnecessarily could cripple the pipeline and out-of-order execution capability seriously depending on the other parts around the routine.

OTHER TIPS

It is possible. Using NEON quite simple even because it has min and max instructions.

Here is an example using the float data-type.

float32x2_t clampToZero (float32x2_t value)
{
  // generate a vector containing all zeros:
  float32x2_t allZero = vdup_n_f32 (0.0f);

  // take the parallel maximum between your value and zero.
  return vmax_f32 (allZero, value); 
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top