Question

I have a code with one function optimized with neon assembly. I build it with gcc and run on Cortex A9 (hard float image).

When I build a non optimized code (pure c without assembly) with hard float options such as: -mapc -march=armv7-a -mtune=cortex-a9 -mfloat-abi=hard -mfpu=neon , it works fine.

When I introduce my assembly code and assemble it with following flags : -march=armv7-a -mfloat-abi=hard -mfpu=neon it builds fine but gives a segmentation fault.

Also a point to be noted is that, if i build the assembly optimized code with -mfloat-abi=softfp in place of hard(and link with -static option), it runs fine.

Why is assembly code creating a problem with Hard float? I have encountered other posts on hard/soft abi options, however I did not find a solution to my particular case (C code working but neon assembly giving segmentation fault)

Edit: The board does not have a gdb and I did try remotely via gdb server but that raised other issues of connectivity. So I am not able to debug using gdb. However, what I did to debug: As soon as i entered the assembly function, I pushed registers, then branched to the end and popped the registers again. It still gave a segmentation fault. Can I infer that it is not a particular instruction giving the fault. Some flag missing in the makefile or other formalities and syntax in the assembly specific to Hard float?

Also, when I built the library using armcc with option --fpu=vfpv3_d16 it still works. That is hard float right?. So I infer that the combination GCC + Assembly code + Hard Float is a problem... Please give in your suggestions if you have worked with this/similar combination.

Edit The code runs if the assembly code has plain instructions like MOV r0, r1 or ADD r1, r2,r2 but gives segmentation fault in case of any constant or memory related operation like LDR r1, [r2] or mov r0, #0 . Does it help to decipher the fault?

Was it helpful?

Solution

Okay, so here is what I found out. The missing line was one attribute:

.type   function_name, %function 

One may also check other attributes required if missing. This line worked for me. Thank you for contributing your answers

OTHER TIPS

I suspect you have a problem with alignment calling conventin (in particular, stack allocation) but don't have enough information to say anything conclusive.

Can you use GCC inline assembly in your function instead of writing the whole function with assembly? Then GCC should take care of alignment and calling convention. Inline assembly should be enough for accelareted arithmetics with NEON SIMD.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top