Question

My question is about ARM NEON.

First question is about the register's size. I'd like to know Apple A6's and Cortex A15's actual SIMD register size.

Second question is about the SIMD instruction's cycle. I assume that lot of ARM processor's NEON register is 64bit.

The manual states: "As dual view, it's 128 bit wide". Does this mean that even if I use 4 x 32bit value at 2 of 64 bit NEON registers, It'll be processed at one cycle?

I'd also like to know cycle count differences between 128 bit NEON and 64 bit NEON.

Was it helpful?

Solution

It depends on the instruction executed.

As a general rule of thumb, simple ALU instructions require no more cycles dealing with Q registers than D registers, but multiply and/or permute instructions need twice the cycles when operating on Q registers. You should also be aware that very often the results in the lower 64-bits of Qd are available earlier than the ones in the upper half.

I don't think Apple's A6 behaves much differently than the "original" CA-15 when it comes to cycles. And since they all share the very same ISA, you can be assured that the registers are the same within the ARMv7 architecture.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top