"Long"
A multiply by two fixed point numbers gives a "long" result of double the length, and therefore requires double the register width to store the results.
So s8 x s8 => s16 result.
This is technically required for any fixed point multiply to store the results with perfect precision, since multiplying two "n" bit numbers requires (2 * n) - 1
bits with the VMULL
for signed numbers (2 * n)
for unsigned. But sometimes in arithmetic you only operate on smaller integers and you don't care about the upper bits, so you can use VMUL
which is faster and requires less registers. Sometimes in signal processing, if you are representing some decimal format (sometimes called "Q" numbers), for example Q15 using 16 bit signed numbers (s16), then you actually don't need the lower bits, and NEON provides this too with VQDMLH
. No matter what you need, NEON can do it.