문제

I am new to assembler and NEON programming. My task is to convert part of an algorithm from C to ARM Assembler using NEON instructions. The algorithm takes an int32 array, loads different values from this array, does some bitshifting and Xor and writes the result in another array. Later I will use an array with 64bit values, but for now i just try to rewrite the code.

C Pseudo code:

out_array[index] = shiftSome( in_array[index] ) ^ shiftSome( in_array[index] );

So here are my questions regarding NEON Instructions:

1.) If i load a register like this:

vld1.32 d0, [r1]

will it load only 32Bit from the memory or 2x32Bit to fill the 64Bit Neon D-Register?

2.) How can I access the 2/4/8 (i32, i16, i8) parts of the D-Register?

3.) I am trying to load different values from the array with an offset, but it doesn't seem to work...what am I doing wrong... here is my code: (it is an integer array so I´m trying to load for example the 3-element, which should have an offset of 64Bit = 8 Byte)

asm volatile(
"vld1.32 d0, [%0], #8 \n"     
"vst1.32 d0, [%1]" : : "r" (a), "r" (out): "d0", "r5");

where "a" is the array and "out" is an pointer to an integer (for debugging).

4.) After I load a value from the array I need to shift it to the right but it doesn't seem to work:

vshr.u32 d0, d0, #24     // C code:   x >> 24;

5.) Is it possible to only load 1 Byte in a Neon register so that I don't have to shift/mask something to get only the one Byte i need?

6.) I need to use Inline assembler, but I am not sure what the last line is for:

input list : output list : what is this for?

7.) Do you know any good NEON References with code examples?

The Program should run on an Samsung Galaxy S2, cortex-A9 Processor if that makes any difference. Thanks for the help.

----------------edit-------------------

That is what i found out:

  1. It will always load the full Register (64Bit)
  2. You can use the "vmov" instruction to transfer part of a neon register to an arm register.
  3. The offset should be in an arm register and will be added to the base address after the memory access.
  4. It is the "clobbered reg list". Every Register that is used and neither in the input or output list, should be written here.
도움이 되었습니까?

해결책

I can answer most of your questions: (update: clarified "lane" issue)

1) NEON instructions can only load and store entire registers (64-bit, 128-bit) at a time to and from memory. There is a MOV instruction variant that allows single "lanes" to be moved to or from ARM registers.

2) You can use the NEON MOV instruction to affect single lanes. Performance will suffer when doing too many single element operations. NEON instructions benefit application performance by doing parallel operations on vectors (groups of floats/ints).

3) The immediate value offsets in ARM assembly language are bytes, not elements/registers. NEON instructions allow post increment with a register, not immediate value. For normal ARM instructions, your post-increment of 8 will add 8 (bytes) to the source pointer.

4) Shifts in NEON affect all elements of a vector. A shift right of 24 bits using vshr.u32 will shift both 32-bit unsigned longs by 24 bits and throw away the bits that get shifted out.

5) NEON instructions allow moving single elements in and out of normal ARM registers, but don't allow loads or stores from memory directly into "lanes".

6) ?

7) Start here: http://blogs.arm.com/software-enablement/161-coding-for-neon-part-1-load-and-stores/ The ARM site has a good tutorial on NEON.

다른 팁

6) Clobbered registers.

asm(code : output operand list : input operand list : clobber list);

If you are using registers, which had not been passed as operands, you need to inform the compiler about this. The following code will adjust a value to a multiple of four. It uses r3 as a scratch register and lets the compiler know about this by specifying r3 in the clobber list. Furthermore the CPU status flags are modified by the ands instruction. Adding the pseudo register cc to the clobber list will keep the compiler informed about this modification as well.

asm (
"ands R3, %1, #3"
"eor %0, %0, r3"
: "=r"(len)
: "0"(len)
: "cc", "r3"
);
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top