Should we consider for overflow when use Neon intrinsics such as vadd_s8

https://stackoverflow.com/questions/20426746

arm
neon

29-08-2022
|

Вопрос

If we have such C code

spatial_pred= (cur[mrefs] + cur[prefs])>>1;

when transform to Neon intrinsics

int8x8_t cur_mrefs = vld1_s8(cur+mrefs);
int8x8_t cur_prefs = vld1_s8(cur+prefs);
int8x8_t spatial_pred = vshr_n_s8(vadd_s8(cur_mrefs, cur_prefs), 1);

Do we need to consider for overflow when vadd_s8(cur_mrefs, cur_prefs)? Whether we should use vadd_s16 instead?

Решение

If you don't want to lose overflow information, you should first move int8x8_t to int16x8_t then do the summing.

If you want result to saturate then you should use vqadd.

Vector saturating add: vqadd -> Vr[i]:=sat<size>(Va[i]+Vb[i])

If you just want to convert C version you should use vhadd or vrhadd (rounds) which does halving the sum instead of trying to do shift as a second step.

Vector halving add: vhadd -> Vr[i]:=(Va[i]+Vb[i])>>1
Vector rounding halving add: vrhadd -> Vr[i]:=(Va[i]+Vb[i]+1)>>1

Лицензировано под: CC-BY-SA с атрибуция

Не связан с StackOverflow