how to truncate value using SIMD instructions

Question 1

Assuming you are dealing with 16bit signed data , d0 containing the values :

vshr.s16 d1, d0, #15
vbic.16  d0, d0, d1

that will do the trick.

Alternately, you could resort to :

vshll.s16 q0, d0, #16
vqshrun.s32 d0, q0, #16

or even :

vmovl.s16 q0, d0
vqmovun.s32 d0, q0

Even if you are dealing with float data, you can handle them just like s32 ones for this purpose :

vshr.s32 d1, d0, #31
vbic.32 d0, d0, d1

you know, the MSB is the sign bit on float as well as on int, and 0.0f is nothing else tha n 0x00000000.

plain and simple.

Edit :

People seem to be confused by the bit operations in my code above. Here is the explanation :

int MinusIsZero(int n)
{
  if (n < 0) n = 0;
  return n;
}

As you can see, it's quite a simple function doing what the OP wanted.

However, such a simple 'if' statement is a real pain for SIMD due to its vector nature.

Fortunately, it's very well doable with ALU instructions without an 'if'.

int MinusIsZero(n)
{
  int mask;
  mask = (n>>31);
  n &= ~mask;
  return n;
}

First things first : if you right shift a signed int32 by 31 bits, the result can only be 0x00000000(if positive) or 0xffffffff(if negative).

If n was positive, n & ~0x00000000 would result in n.

If n was negative, n & ~0xffffffff would result in 0.

Exactly what the OP wanted.

Beside it's by far the most efficient method on SIMD units like NEON, (ALU instructions are the fastest) it's also a very desirable one even on integer cores, because it doesn't corrupt the CPSR.

Corrupting the CPSR unnecessarily could cripple the pipeline and out-of-order execution capability seriously depending on the other parts around the routine.

Question 2

It is possible. Using NEON quite simple even because it has min and max instructions.

Here is an example using the float data-type.

float32x2_t clampToZero (float32x2_t value)
{
  // generate a vector containing all zeros:
  float32x2_t allZero = vdup_n_f32 (0.0f);

  // take the parallel maximum between your value and zero.
  return vmax_f32 (allZero, value); 
}